Taro Logo

Design File System #6 Most Asked

Medium
19 views
Topics:
ArraysStrings

You are asked to design a file system which provides two functions:

  • createPath(path, value): Creates a new path and associates a value to it.
  • get(path): Returns the value associated with a path or returns -1 if the path doesn't exist.

The path is the absolute path represented as a string following this format: /one/two/three. For example, /leetcode and /leetcode/problems are valid paths while an empty string "", /, // and /one//two are not.

Implement the FileSystem class:

  • FileSystem() Initializes the object of the system.
  • bool createPath(string path, int value) Creates a new path with the given value. Returns true if the path was not existent and was successfully created, or false if the path already exists.
  • int get(string path) Returns the value associated with the path or returns -1 if the path doesn't exist.

Example 1:

Input:
["FileSystem","createPath","get","createPath","get"]
[[],["/a",1],["/a"],["/a/b",2],["/b"]]
Output:
[null,true,1,true,-1]
Explanation:
FileSystem fileSystem = new FileSystem();
fileSystem.createPath("/a", 1); // return true
fileSystem.get("/a"); // return 1
fileSystem.createPath("/a/b", 2); // return true
fileSystem.get("/b"); // return -1. "/b" does not exist.

Example 2:

Input:
["FileSystem","createPath","createPath","get","createPath","get"]
[[],["/leet",1],["/leet/code",10],["/leet/code"],["/c/d",1],["/c"]]
Output:
[null,true,true,10,true,-1]

Constraints:

  • 1 <= path.length <= 100
  • 1 <= value <= 109
  • Each operation will be called at most 1000 times.
  • All paths are absolute paths in the form of /one/two/three.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. When calling `createPath` for a path like `/a/b/c`, must the parent path `/a/b` already exist in the system for the creation to be successful?
  2. Is it possible for a path to function as both a 'file' with an associated value and a 'directory' with subpaths? For example, if `createPath("/a", 1)` is called, is a subsequent call to `createPath("/a/b", 2)` considered valid?
  3. If a path like `/a/b` is created successfully, what should a call to `get("/a")` return, assuming `/a` was not explicitly created with its own value?
  4. The problem description gives examples of invalid paths. Should my implementation assume all input path strings are well-formed, or do I need to add validation to handle cases like trailing slashes or empty components, for example `/a//b`?
  5. For the `createPath` function, what is the precise definition of a path 'already existing'? Does it mean the exact path string has a value, or does its existence as an intermediate directory for another path also count?

Brute Force Solution

Approach

The simplest approach is to maintain a single master list of all the file paths and their corresponding values. For any action, like creating a new file or finding an existing one, we just search through this entire list from top to bottom every single time.

Here's how the algorithm would work step-by-step:

  1. To store our file system, we'll just use a basic list where each item is a file path paired with its value.
  2. When asked to create a new file, we first have to make sure its parent folder already exists.
  3. To do this, we figure out the parent's path and then search through our entire list, checking every single entry to see if we can find it.
  4. If we can't find the parent after checking everything (unless it's the main root folder), the creation fails.
  5. We also have to check that the new file path doesn't already exist by searching the whole list for it again. If it does, the creation fails.
  6. If the parent exists and the new path is unique, we can finally add the new file path and its value to our list.
  7. When asked to retrieve a file's value, we again search the entire list from the beginning.
  8. We compare the requested path with every path in our list until we find an exact match, then we return its value. If we get to the end without a match, the file isn't there.

Code Implementation

class FileSystem:
    def __init__(self):
        self.paths_and_values = []

    def createPath(self, path, value):
        if not path or path == "/" or not path.startswith("/"):
            return False

        # We must check our whole collection to make sure the path we want to create doesn't already exist.

        for existing_path, _ in self.paths_and_values:
            if existing_path == path:
                return False
        
        last_slash_index = path.rfind("/")
        if last_slash_index == 0:
            parent_path = "/"
        else:
            parent_path = path[:last_slash_index]
        
        # A new path is only valid if its parent exists, so we must exhaustively search for it.

        parent_path_exists = False
        if parent_path == "/":
            parent_path_exists = True
        else:
            for existing_path, _ in self.paths_and_values:
                if existing_path == parent_path:
                    parent_path_exists = True
                    break
        
        if parent_path_exists:
            self.paths_and_values.append((path, value))
            return True
        
        return False

    def get(self, path):
        # To get a value, we search the entire collection from the start until we find an exact match.

        for stored_path, stored_value in self.paths_and_values:
            if stored_path == path:
                return stored_value
        
        return -1

Big(O) Analysis

Time Complexity
O(n²)Let n be the number of file paths created. Each `create` operation must scan the entire list of existing paths, once to validate the parent and again to check for duplicates. As we add more paths, the list grows, making each subsequent scan progressively more expensive. Creating the nth path requires scanning a list of size n-1. The total work for n creations is the sum of an arithmetic series 1 + 2 + ... + (n-1), leading to roughly n * n / 2 operations, which simplifies to a time complexity of O(n²).
Space Complexity
O(N * L)The dominant space usage is the single master list storing all file paths and their values. If N represents the total number of files ever created in the system, this list will grow to contain N entries. Since each entry consists of a path string and an integer value, the total memory is proportional to the number of files multiplied by their average length. Therefore, if L is the average length of a file path, the space complexity is O(N * L).

Optimal Solution

Approach

The core idea is to model the file system like a branching tree of folders. We use a nested, map-like structure where each name points to another map representing its contents, allowing us to easily follow a path from the root down to its destination.

Here's how the algorithm would work step-by-step:

  1. Imagine the entire file system starts as a single, main folder, which we can call the root.
  2. Inside this root folder, and inside every subsequent folder, we keep a record of its contents. Each item in the record has a name and can be another folder itself.
  3. We also attach a special spot to every single item to hold a value, in case a value needs to be associated with that specific path.
  4. When we get a request involving a path like '/level1/level2/file', we first break it down into its separate parts: 'level1', 'level2', and 'file'.
  5. To create a new path, we start at the root and travel down, one part at a time. We must confirm that the parent folder already exists before we can add a new folder or file inside it.
  6. If we attempt to create a path but its parent folder is missing, or if the path we're trying to create already exists, the operation must fail.
  7. To get the value at a specific path, we perform the same journey. We start at the root and follow the path's parts. If we can't find a part along the way, the path doesn't exist. If we reach the end successfully, we just read the value stored there.

Code Implementation

class FileSystem:
    def __init__(self):
        self.file_system_root = {}

    def createPath(self, path, value):
        path_components = [component for component in path.split('/') if component]

        if not path_components:
            return False

        parent_path_components = path_components[:-1]
        final_item_name = path_components[-1]

        current_directory = self.file_system_root

        for directory_name in parent_path_components:
            if directory_name not in current_directory:
                return False
            
            current_directory = current_directory[directory_name]

            # A path component must be a directory (a dict), not a file, to navigate into it.

            if not isinstance(current_directory, dict):
                return False

        # To prevent overwrites, the new item cannot already exist in the parent directory.

        if final_item_name in current_directory:
            return False

        current_directory[final_item_name] = value
        return True

    def get(self, path):
        path_components = [component for component in path.split('/') if component]

        current_node = self.file_system_root

        for component_name in path_components:
            # The traversal fails if a directory in the path does not exist or is actually a file.

            if not isinstance(current_node, dict) or component_name not in current_node:
                return -1
            
            current_node = current_node[component_name]

        # A value can only be retrieved from a file (an int), not from a directory (a dict).

        if isinstance(current_node, dict):
            return -1

        return current_node

Big(O) Analysis

Time Complexity
O(n)Let n be the length of the input path string. The dominant cost comes from processing this path. First, the path is split into its constituent parts, an operation that takes time proportional to n. Subsequently, the algorithm traverses the file system tree by iterating through these parts, performing a hash map lookup for each component. Since the total time for all lookups is also proportional to the path length, the total operations are directly proportional to n, simplifying to O(n).
Space Complexity
O(N)The auxiliary space is determined by the nested map-like data structure used to model the entire file system. This single structure stores all the created folders, files, and their associated values from the root. Let N be the cumulative number of characters in all the unique path components (folder and file names) created over time. The memory usage grows linearly with N, as each component's name must be stored as a key in the map structure.

Edge Cases

Creating a path where its parent path does not exist, e.g., createPath("/a/b", 1) when "/a" hasn't been created.
How to Handle:
The operation must fail and return false, as the parent path must exist before a child path can be created.
Attempting to create a path that already exists, e.g., createPath("/a", 2) after createPath("/a", 1) was successful.
How to Handle:
This call should return false, and the original value associated with the path must remain unchanged.
Input paths that are invalid according to the problem description, such as an empty string "", the root "/", or paths with consecutive slashes like "/a//b".
How to Handle:
These malformed paths should be rejected immediately, with createPath returning false and get returning -1.
Creating a path that is a direct child of the root, like "/a".
How to Handle:
The parent-finding logic must correctly identify the root "/", which is considered to always exist, allowing the creation.
Requesting the value of a path that serves as a parent but was never explicitly created with a value, e.g., get("/a") after creating "/a/b".
How to Handle:
This should return -1 because the existence of a child path does not implicitly create a value for its parent.
Path strings that include a trailing slash, such as "/a/".
How to Handle:
The system should normalize paths by removing any trailing slashes to treat "/a" and "/a/" as identical.
The shortest possible valid path, consisting of a single component like "/a".
How to Handle:
The logic must correctly handle this minimal case, identifying its parent as the root path "/".
Operations performed directly on the root path, i.e., createPath("/", 1) or get("/").
How to Handle:
The root path itself cannot have an associated value, so these operations should consistently fail.
0/0 completed