Taro Logo

Design In-Memory File System

Hard
Meta logo
Meta
9 views
Topics:
Trees

Design an in-memory file system with the following operations:

  1. ls(path): Given a path, return the list of file and directory names in the directory (not including subdirectories). The order should be lexicographical.
  2. mkdir(path): Create a directory at the given path. If the parent directory does not exist, create it as well.
  3. addContentToFile(filePath, content): Append the given content to the file. If the file does not exist, create it. If the parent directory does not exist, create it as well.
  4. readContentFromFile(filePath): Return the content of the file.

Assume that:

  • The path always starts with /.
  • Duplicate file names are not allowed in the same directory.
  • You can assume all inputs are valid.

Example:

FileSystem fs = new FileSystem();
fs.ls("/");           // Returns []
fs.mkdir("/a/b/c");
fs.addContentToFile("/a/b/c/d", "hello");
fs.readContentFromFile("/a/b/c/d"); // Returns "hello"
fs.addContentToFile("/a/b/c/d", " world");
fs.readContentFromFile("/a/b/c/d"); // Returns "hello world"
fs.ls("/");           // Returns ["a"]
fs.ls("/a");          // Returns ["b"]
fs.ls("/a/b");         // Returns ["c"]
fs.ls("/a/b/c");       // Returns ["d"]

How would you implement this file system, keeping in mind efficiency and scalability?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum size of the in-memory file system, specifically the total storage capacity we need to support?
  2. Are there any restrictions on the types of files and directories we can store (e.g., size limits, character set for names)?
  3. How should we handle errors, such as trying to create a file with the same name as an existing directory, or accessing a non-existent path?
  4. What operations should we prioritize optimizing for (e.g., creating files, reading files, listing directories)?
  5. Are we concerned with concurrency, or can we assume that only one operation will be performed at a time?

Brute Force Solution

Approach

Let's create a very simple file system in our computer's memory. A brute force method would involve directly searching and creating the files and folders whenever needed, without any clever organization.

Here's how the algorithm would work step-by-step:

  1. To create a file, first check if the specified path to the file already exists. If so, creating a new file with the same path is not allowed, so the operation fails.
  2. If the path does not exist, create the file by storing the file content directly. Also make sure the folder in the path exists, creating any missing folders if necessary.
  3. To create a directory, similar to creating files, check if the directory path exists. Creating a new directory with the same path isn't allowed, so the operation fails.
  4. If the path doesn't exist, create the folder. There's no content, just a name and a location.
  5. To read a file, start at the 'root' folder and trace the path given. If any folder in the path doesn't exist or if the path leads to a folder instead of a file, the reading operation fails.
  6. If the path is valid and leads to a file, then you're done. Output the contents that were saved when creating the file.
  7. To list contents of a directory, start at the 'root' folder and trace the path given. If any folder in the path doesn't exist, the listing operation fails.
  8. If the path is valid and leads to a folder, look at all the files and folders directly inside that folder and display them.

Code Implementation

class FileSystem:

    def __init__(self):
        self.root = {}

    def create_file(self, file_path, content):
        path_components = file_path.split('/')
        current_directory = self.root
        for i in range(1, len(path_components) - 1):
            component = path_components[i]
            if component not in current_directory:
                return False
            current_directory = current_directory[component]

        file_name = path_components[-1]

        # Check if file exists already
        if file_name in current_directory:
            return False

        current_directory[file_name] = content
        return True

    def create_directory(self, directory_path):
        path_components = directory_path.split('/')
        current_directory = self.root
        for i in range(1, len(path_components)):
            component = path_components[i]

            # Directory exists, creation fails
            if component in current_directory:
                return False
            current_directory[component] = {}
            current_directory = current_directory[component]
        return True

    def read_file(self, file_path):
        path_components = file_path.split('/')
        current_directory = self.root
        for i in range(1, len(path_components) - 1):
            component = path_components[i]
            if component not in current_directory:
                return None
            current_directory = current_directory[component]

        file_name = path_components[-1]
        if file_name not in current_directory:
            return None

        # Return content if file is found
        return current_directory[file_name]

    def list_contents(self, directory_path):
        path_components = directory_path.split('/')
        current_directory = self.root
        for i in range(1, len(path_components)):
            component = path_components[i]
            if component not in current_directory:
                return None
            current_directory = current_directory[component]

        # Returns all immediate children
        return list(current_directory.keys())

Big(O) Analysis

Time Complexity
O(n)The time complexity is determined by the length of the file or directory path, which we denote as n. Creating a file or directory, reading a file, or listing directory contents all involve traversing the path from the root, potentially creating directories along the way. Each step in the path requires a constant amount of work (checking existence, creating a node). Therefore, the operations scale linearly with the path length, leading to a time complexity of O(n).
Space Complexity
O(N)The space complexity is primarily determined by the storage of the file contents and the directory structure. Creating files involves storing the file content, which in the worst case, could be proportional to the total size of all file contents (N). Similarly, the directory structure, represented by nested folders, can grow proportionally to the number of files and directories, also contributing to the overall space. Therefore, the auxiliary space used to store the file system structure and content grows linearly with the input size N, where N is the total size of file contents and the number of directories.

Optimal Solution

Approach

We'll simulate a simplified file system in memory. The core idea is to represent the file system structure using a tree-like organization, where each node can be either a directory or a file.

Here's how the algorithm would work step-by-step:

  1. First, create a basic structure that represents files and folders. A file holds its content, and a folder holds a list of its children (files and other folders).
  2. Start with a single root folder, which is the entry point of the entire file system.
  3. To create a new directory or file, follow the path provided. If a folder in the path doesn't exist, create it along the way.
  4. When you want to add a file, find the correct folder from the path, and then store the file with its content there.
  5. To read a file, navigate to the correct folder, find the file by its name, and then give back its content.
  6. To list the contents of a folder, go to that folder, and provide a list of the names of its files and subfolders.
  7. By organizing everything like a tree, we can quickly navigate the file system using paths, create new things, read content, and list folder contents, all without using a real hard drive.

Code Implementation

class File:
    def __init__(self, name, content):
        self.name = name
        self.content = content

class Directory:
    def __init__(self, name):
        self.name = name
        self.children = {}

class FileSystem:
    def __init__(self):
        # Initialize the root directory for the file system.
        self.root = Directory("/")

    def create_directory(self, path):
        path_parts = path.split("/")
        current_directory = self.root

        for part in path_parts[1:]:
            if not part:
                continue
            if part not in current_directory.children:
                current_directory.children[part] = Directory(part)

            current_directory = current_directory.children[part]

    def create_file(self, path, content):
        path_parts = path.split("/")
        file_name = path_parts[-1]
        current_directory = self.root

        for part in path_parts[1:-1]:
            if not part:
                continue
            if part not in current_directory.children:
                # Create intermediate directories if they don't exist.
                current_directory.children[part] = Directory(part)

            current_directory = current_directory.children[part]

        current_directory.children[file_name] = File(file_name, content)

    def read_file(self, path):
        path_parts = path.split("/")
        file_name = path_parts[-1]
        current_directory = self.root

        for part in path_parts[1:-1]:
            if not part:
                continue

            current_directory = current_directory.children[part]

        if file_name in current_directory.children:
            return current_directory.children[file_name].content
        else:
            return None

    def list_directory(self, path):
        path_parts = path.split("/")
        current_directory = self.root

        for part in path_parts[1:]:
            if not part:
                continue

            # Navigate to the target directory.
            if part in current_directory.children:
                current_directory = current_directory.children[part]
            else:
                return []

        # Extract the names of all children (files and directories).
        return list(current_directory.children.keys())

Big(O) Analysis

Time Complexity
O(n)The dominant operation is traversing the file system path, which can have a maximum length of n, where n is the number of directories in the path. Creating directories along the path also takes O(n) in the worst case if each directory in the path does not exist and needs to be created. Adding or reading a file involves finding the target directory (O(n)) and then a constant time lookup within that directory (assuming a hashmap or similar efficient data structure). Listing contents requires traversing the directory's children, taking O(m) where m is the number of children, but m is bounded by n (the path length to the deepest file). Therefore, the overall time complexity is O(n).
Space Complexity
O(N)The space complexity is determined by the number of directories and files created in the file system, which is proportional to the input data. In the worst case, each operation in creating directories or files might add a new node to the tree structure. Therefore, auxiliary space is used to store the file system's tree structure, where N is the total number of files and directories created, leading to a space complexity of O(N).

Edge Cases

CaseHow to Handle
Null or empty pathReturn error or handle gracefully if the path is null or empty, depending on the operation (create, delete, read).
Path contains consecutive slashes '//'Normalize the path by collapsing consecutive slashes into a single slash.
Path is too long exceeding the maximum file system path lengthReturn an error indicating the path is too long.
Creating a file or directory with the same name as an existing one.Return an error if attempting to create a file/directory with the same name as an existing one, or overwrite depending on specification.
Deleting a non-empty directory.Implement recursive delete if necessary, or return error if non-empty directories cannot be deleted directly.
Reading from a file that does not exist or a directoryReturn an error indicating that the file does not exist or it is a directory.
Writing to a file exceeding maximum file sizeReturn an error or handle file truncation depending on specification.
Insufficient memory to create a new file or directory.Return an error indicating insufficient memory.