Taro Logo

Remove Sub-Folders from the Filesystem

Medium
Verkada logo
Verkada
3 views
Topics:
ArraysStrings

Given a list of folders folder, return the folders after removing all sub-folders in those folders. You may return the answer in any order.

If a folder[i] is located within another folder[j], it is called a sub-folder of it. A sub-folder of folder[j] must start with folder[j], followed by a "/". For example, "/a/b" is a sub-folder of "/a", but "/b" is not a sub-folder of "/a/b/c".

The format of a path is one or more concatenated strings of the form: '/' followed by one or more lowercase English letters.

  • For example, "/leetcode" and "/leetcode/problems" are valid paths while an empty string and "/" are not.

Example 1:

Input: folder = ["/a","/a/b","/c/d","/c/d/e","/c/f"]
Output: ["/a","/c/d","/c/f"]
Explanation: Folders "/a/b" is a subfolder of "/a" and "/c/d/e" is inside of folder "/c/d" in our filesystem.

Example 2:

Input: folder = ["/a","/a/b/c","/a/b/d"]
Output: ["/a"]
Explanation: Folders "/a/b/c" and "/a/b/d" will be removed because they are subfolders of "/a".

Example 3:

Input: folder = ["/a/b/c","/a/b/ca","/a/b/d"]
Output: ["/a/b/c","/a/b/ca","/a/b/d"]

Constraints:

  • 1 <= folder.length <= 4 * 104
  • 2 <= folder[i].length <= 100
  • folder[i] contains only lowercase letters and '/'.
  • folder[i] always starts with the character '/'.
  • Each folder name is unique.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input `folder` array be empty or contain null/empty strings?
  2. Is the file path guaranteed to be well-formed (e.g., start with '/', not contain consecutive '//')?
  3. If multiple folders share the same prefix but neither is a subfolder of the other, should both be included in the result?
  4. Is the order of the output important, or can I return the list of root folders in any order?
  5. Could the input contain file paths that are identical, and if so, how should I handle them?

Brute Force Solution

Approach

The brute force method is like checking every possible combination to find the answer. For this problem, it involves comparing each folder to every other folder to see if one is contained inside the other.

Here's how the algorithm would work step-by-step:

  1. Start by taking the first folder on the list.
  2. Compare it to every other folder on the list.
  3. If you find another folder that starts with the first folder's name, followed by a special separator (like a slash), then the first folder is a parent folder and shouldn't be kept.
  4. If the first folder is a parent folder, move on to the next folder on the list.
  5. If you compare the first folder to every other folder and don't find any that start with its name (followed by the separator), then keep it.
  6. Repeat these steps for every folder on the original list.
  7. At the end, you'll have a new list of folders that are not sub-folders of any other folder on the list.

Code Implementation

def remove_subfolders_brute_force(folder_list):
    result = []

    for current_folder in folder_list:
        is_subfolder = False

        for other_folder in folder_list:
            # Avoid comparing folder to itself
            if current_folder == other_folder:
                continue

            # Check if current folder is a subfolder
            if other_folder.startswith(current_folder + "/"):
                is_subfolder = True

                # If current folder is subfolder, break early
                break

        # Add the current folder to result list
        if not is_subfolder:
            result.append(current_folder)

    return result

Big(O) Analysis

Time Complexity
O(n²)The algorithm iterates through each of the n folders in the input list. For each folder, it compares it against every other folder in the list to check for subfolder relationships. This results in nested loops where the outer loop iterates n times and the inner loop iterates approximately n times in the worst case. Therefore, the number of comparisons grows proportionally to n * n which simplifies to O(n²).
Space Complexity
O(N)The algorithm creates a new list to store the results, which in the worst case, might contain all the original folders if none are subfolders of each other. Therefore, the space required to store the result list grows linearly with the number of input folders. N represents the number of folders in the original input list. Thus, the auxiliary space complexity is O(N).

Optimal Solution

Approach

The key idea is to treat the folder paths like words in a dictionary. We want to quickly identify and remove the 'child' folders, which are simply longer paths that start with a shorter 'parent' path. Sorting the paths makes this process very efficient.

Here's how the algorithm would work step-by-step:

  1. First, put all the folder paths in order, just like you'd find them in a dictionary. This will group related folders together.
  2. Go through the sorted list one by one.
  3. For each folder path, check if it starts with the path of the previous 'parent' folder that you kept.
  4. If it does NOT start with the previous 'parent' folder, then you know it's a new, independent parent folder, so keep it.
  5. If it DOES start with the previous 'parent' folder, it's a sub-folder (child) of the previous parent, so you can ignore and remove it.
  6. In the end, the folders you kept are the ones you want to return.

Code Implementation

def remove_subfolders(folder): 
    folder.sort()
    result = []
    previous_parent = ""

    for current_folder in folder:
        # Check if the current folder is a subfolder of the previous parent.
        if not current_folder.startswith(previous_parent):

            result.append(current_folder)

            # Update the previous parent to the current folder.
            previous_parent = current_folder + "/"

    return result

Big(O) Analysis

Time Complexity
O(n log n)Sorting the input folder paths, which contains n elements, takes O(n log n) time. Iterating through the sorted list of n folders takes O(n) time. Within the loop, the 'startsWith' operation on strings can take up to O(k) time in the worst case, where k is the length of the folder path. However, since the folders are already sorted, checking if a folder is a subfolder of the previous kept folder takes O(k) on average but we only perform a startsWith comparison once per element. Since sorting dominates the linear scan and the startsWith checks, the overall time complexity is O(n log n).
Space Complexity
O(N)The algorithm sorts the input list of folder paths. Python's `sort()` method typically uses Timsort, which has a space complexity of O(N) in the worst case, where N is the number of folder paths. Additionally, the algorithm constructs a list to store the results (the kept folder paths). In the worst-case scenario, all folder paths are unique parent folders, and thus the result list can also grow to a size of N. The space used by the sorted list and the output list dominates the space complexity. Therefore, the overall auxiliary space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty input listReturn an empty list as there are no folders to process.
List with a single folderReturn the list as is, since a single folder cannot be a sub-folder of another.
List with identical folder pathsThe solution should return the duplicate paths only once.
Folder paths with trailing slashesThe solution should normalize the paths by removing trailing slashes before comparison.
Folder paths that are prefixes of each other but not direct subfolders (e.g., 'a/b' and 'a/bc')The algorithm must only identify as sub-folders those paths that begin with another folder's path + '/'.
Very long folder path stringsThe solution should handle potentially long strings without causing performance issues or memory overflow, using efficient string comparison methods.
Folder paths containing special characters or unicode charactersThe algorithm should correctly handle special characters and Unicode characters within the folder paths by using appropriate string comparison functions.
A folder being a subfolder of multiple other folders.The subfolder should be removed regardless of how many root folders include it.