Taro Logo

Number of Distinct Islands

Medium
Meta logo
Meta
1 view
Topics:
GraphsArraysRecursion

You are given a 2D grid represented by a matrix where 1 represents land and 0 represents water. An island is a group of connected 1s surrounded by water. Two islands are considered the same if and only if one can be translated (shifted without rotation or reflection) to equal the other. Your task is to find the number of distinct islands.

For example:

Example 1:

Input:
grid = [
  [1,1,0,0,0],
  [1,1,0,0,0],
  [0,0,0,1,1],
  [0,0,0,1,1]
]
Output: 1

The two islands are considered the same because the second island is just a translation of the first.

Example 2:

Input:
grid = [
  [1,1,0,1,1],
  [1,0,0,0,0],
  [0,0,0,0,1],
  [1,1,0,1,1]
]
Output: 3

Explain how you would approach solving this problem efficiently. What data structures and algorithms would you use? How would you handle edge cases such as an empty grid, a grid with no islands, or islands touching only diagonally? Provide a code example illustrating your approach. What is the time and space complexity of your solution?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the dimensions of the grid, and what's the maximum size of the grid I should expect?
  2. Is the grid guaranteed to only contain 0s and 1s, or could there be other values?
  3. What defines two islands as 'distinct'? Is it their shape or their position in the grid?
  4. If the grid is empty or contains no islands, what should I return?
  5. Are the islands guaranteed to be fully connected (i.e., no diagonal connections are allowed)?

Brute Force Solution

Approach

We want to find the number of different island shapes in a grid. The brute force way is to explore each piece of land on the grid, recording the shape of the island it belongs to, and then comparing all the island shapes to see how many are truly distinct.

Here's how the algorithm would work step-by-step:

  1. Go through the entire grid, one square at a time.
  2. If you find a piece of land, explore the entire island it's connected to.
  3. As you explore the island, record its shape by noting the path you take to visit each piece of land from the starting point.
  4. Once you've explored the entire island, you have a 'shape' represented by the path you took.
  5. Store this island shape in a list.
  6. Repeat steps 2-5 for every piece of land you find in the grid.
  7. Finally, compare each stored island shape to every other shape in the list to see if they are the same. If two shapes are the same, they're the same island shape, even if they're in different locations.
  8. Count how many unique island shapes you found. This is the total number of distinct islands.

Code Implementation

def number_of_distinct_islands(grid):
    number_of_rows = len(grid)
    number_of_columns = len(grid[0])
    island_shapes = set()

    def explore_island(row, column, path, start_row, start_column):
        if row < 0 or row >= number_of_rows or column < 0 or column >= number_of_columns or grid[row][column] == 0:
            return

        grid[row][column] = 0

        # Record the path relative to the starting point
        path.append((row - start_row, column - start_column))

        explore_island(row + 1, column, path, start_row, start_column)
        explore_island(row - 1, column, path, start_row, start_column)
        explore_island(row, column + 1, path, start_row, start_column)
        explore_island(row, column - 1, path, start_row, start_column)

    for row in range(number_of_rows):
        for column in range(number_of_columns):
            if grid[row][column] == 1:
                island_path = []

                # We need to capture the shape of this specific island.
                explore_island(row, column, island_path, row, column)

                island_shapes.add(tuple(island_path))

    return len(island_shapes)

Big(O) Analysis

Time Complexity
O(M * N * log(I))Let M be the number of rows and N be the number of columns in the grid. The outer loops iterate through each cell of the grid, which takes O(M * N) time. For each land cell, we explore the entire island using Depth-First Search (DFS) to record its shape. In the worst case, the entire grid is a single island, so the DFS could take O(M * N) time. Furthermore, we need to compare the shape of each discovered island with all previously discovered islands to check for uniqueness. In the worst case, we might have I distinct islands (where I <= M*N), and each shape comparison can take up to O(M * N) (or less if comparing based on hashes, approximately log(I) comparing sets if we assume the islands are sets). Therefore, finding distinct islands will be O(I * M * N) if linearly comparing. Assuming the islands are stored as sets and checking these sets for equality, or if we hash these shapes, the island comparison step can be improved to approximately log(I), so the overall complexity becomes O(M * N * log(I)).
Space Complexity
O(M*N)The algorithm uses extra space to store the island shape as a path and a set to keep track of the distinct island shapes. The path, in the worst-case scenario, could visit every cell in the grid (M rows and N columns) if the entire grid is one island, leading to a path of length M*N. The set to store distinct island shapes, in the worst case, might also need to store up to M*N distinct island shapes if each land cell constitutes a distinct island. Thus, the auxiliary space complexity is O(M*N). Note that recursion depth for DFS is also bounded by O(M*N) in worst case, which contributes to the overall space complexity.

Optimal Solution

Approach

The goal is to identify unique island shapes within a grid. We can find all connected land pieces and represent the shape of each island as a unique string representing movements to explore it. We use this string as a way of 'fingerprinting' the island, and count how many unique fingerprints we find.

Here's how the algorithm would work step-by-step:

  1. Look at each piece of land on the map.
  2. When you find a new piece of land, explore the entire island it belongs to.
  3. While exploring an island, record your path as a series of movements: up, down, left, right.
  4. Combine those movements into a single string to represent the unique shape of the island.
  5. Store that shape in a collection of island shapes you've already found.
  6. If the shape is new, add it to the collection; if you have seen it before, ignore it.
  7. Repeat this process for every piece of land on the map.
  8. The number of unique shapes in your collection is the number of distinct islands.

Code Implementation

def number_of_distinct_islands(grid):
    if not grid:
        return 0

    number_of_rows = len(grid)
    number_of_columns = len(grid[0])
    distinct_islands = set()

    def explore_island(row, column, path):
        if row < 0 or row >= number_of_rows or column < 0 or column >= number_of_columns or grid[row][column] == 0:
            return

        grid[row][column] = 0  # Mark as visited to avoid cycles.
        
        path.append('D')
        explore_island(row + 1, column, path)
        
        path.append('U')
        explore_island(row - 1, column, path)
        
        path.append('R')
        explore_island(row, column + 1, path)
        
        path.append('L')
        explore_island(row, column - 1, path)
        
        path.append('B') #Backtrack

    for row in range(number_of_rows):
        for column in range(number_of_columns):
            if grid[row][column] == 1:
                # Found a new island, explore it.
                island_path = []
                explore_island(row, column, island_path)

                # Generate a unique fingerprint for the island.
                island_string = ''.join(island_path)
                
                #Only add the island if it is unique
                if island_string not in distinct_islands:
                    distinct_islands.add(island_string)

    #The total count of the distinct islands are returned
    return len(distinct_islands)

Big(O) Analysis

Time Complexity
O(m*n)We iterate through each cell of the grid, which has dimensions m x n. For each land cell (value of 1), we perform a Depth First Search (DFS) to explore the entire island. In the worst case, the entire grid is a single island, so the DFS could potentially visit every cell in the grid once. Therefore, the time complexity is driven by visiting each of the m*n cells possibly multiple times. Thus the overall time complexity is O(m*n).
Space Complexity
O(M*N)The space complexity is determined by the space used by the recursion stack during the island exploration and the space used to store the island shapes. In the worst-case scenario, the entire grid is a single island, and the depth-first search (DFS) could traverse all M*N cells, leading to a recursion stack of size O(M*N). Additionally, we store the island shapes as strings in a set. In the worst case, each distinct island could also have a size of O(M*N), and we might need to store a number of such islands. However, the recursion stack during DFS traversal dominates the space complexity. Therefore, the overall auxiliary space complexity is O(M*N), where M is the number of rows and N is the number of columns in the grid.

Edge Cases

CaseHow to Handle
Null or empty gridReturn 0 as there are no islands in an empty grid
Grid with all water (all 0s)Return 0 as there are no islands to count
Grid with all land (all 1s)Return 1 if all cells are connected forming only one island shape
Grid with a single '1' surrounded by '0'sThe algorithm correctly identifies this as a single island
Large grid size, potential stack overflow with recursionUse iterative DFS (using a stack data structure) instead of recursive DFS to prevent stack overflow errors
Grid containing disconnected islands with identical shapesThe shape comparison logic will correctly identify these as duplicates.
Grid with islands touching at corners only (not considered connected)Ensure the DFS only considers adjacent (up, down, left, right) cells, not diagonal ones.
Integer overflow in grid dimensions (rows * cols)Check dimensions for reasonableness to avoid potential integer overflow when used in calculations