Alien Dictionary

Hard

Google

1 view

Topics:

GraphsStrings

You are given a sorted dictionary (a list of words) of an alien language. In this language, the ordering among letters is unknown. You need to determine the order of the letters in the alien language based on the given dictionary. The dictionary is sorted lexicographically according to the rules of the alien language.

For example:

Given the following words:

[
  "wrt",
  "wrf",
  "er",
  "ett",
  "rftt"
]

The correct order is "wertf".

Another example:

[
  "z",
  "x"
]

The correct order is "zx".

Constraints:

The input is a list of lowercase English words.
If the order is invalid (i.e., there is a cycle), return an empty string.
The characters must be uniquely ordered.

Write a function alienOrder(words) that takes a list of strings words as input and returns a string representing the order of the letters in the alien language. If no valid order exists, return an empty string.

Alien Dictionary

Hard

Google

1 view

Topics:

GraphsStrings

For example:

Given the following words:

[
  "wrt",
  "wrf",
  "er",
  "ett",
  "rftt"
]

The correct order is "wertf".

Another example:

[
  "z",
  "x"
]

The correct order is "zx".

Constraints:

The input is a list of lowercase English words.
If the order is invalid (i.e., there is a cycle), return an empty string.
The characters must be uniquely ordered.

Solution

Let's tackle the "Alien Dictionary" problem. We're given a list of words from an alien language, and our goal is to deduce the order of the letters in this language. A naive approach would involve comparing all pairs of words to find letter orderings, but this can be quite inefficient.

Naive Approach

Compare all word pairs: Iterate through all possible pairs of words in the input list.
Find the first differing character: For each pair, compare characters at the same index until the first difference is found. This reveals an ordering (e.g., if "abc" and "abd" are compared, we learn 'c' comes before 'd').
Build a graph: Use the discovered orderings to build a graph where nodes are letters and edges represent precedence.
Detect Cycles: After constructing the graph, we have to perform cycle detection because cyclic dependencies will make it impossible to return the topological sort. We can use Depth-First Search(DFS) to detect cycles.
Topological Sort: Perform a topological sort on the graph. If a cycle exists, return an empty string; otherwise, return the sorted letters. If there is cycle, it's not a valid dictionary.

This approach is inefficient because it involves redundant comparisons and can have a high time complexity.

Optimal Solution

A more efficient solution involves using a graph and topological sorting. This approach focuses on extracting character orderings directly from adjacent words in the input list and utilizes topological sorting to create the final alphabet order.

Build the Graph: Construct a directed graph where nodes represent unique characters and edges represent precedence. We get precedence by comparing adjacent words. For example, if we have words = ["wrt", "wrf", "er", "ett", "rftt"], comparing "wrt" and "wrf" tells us 't' comes before 'f'.
Calculate In-Degrees: Calculate in-degrees of all nodes (characters). In-degree of a node is the number of incoming edges to that node.
Topological Sort: Use a queue to perform a topological sort. Start by adding all nodes with an in-degree of 0 to the queue.
Process Queue: While the queue is not empty:
- Dequeue a node and append it to the result.
- For each neighbor of the dequeued node, decrease its in-degree by 1. If the in-degree becomes 0, enqueue the neighbor.
Cycle Detection: If the length of the result is not equal to the number of unique characters, it means there's a cycle in the graph, and no valid order exists. Return an empty string.

Here's the Python code:

from collections import defaultdict, deque

def alienOrder(words):
    graph = defaultdict(list)
    in_degree = {}
    unique_chars = set()
    
    for word in words:
        for char in word:
            in_degree[char] = 0
            unique_chars.add(char)
            
    for i in range(len(words) - 1):
        word1, word2 = words[i], words[i+1]
        min_len = min(len(word1), len(word2))
        for j in range(min_len):
            if word1[j] != word2[j]:
                if word2[j] not in graph[word1[j]]:
                    graph[word1[j]].append(word2[j])
                    in_degree[word2[j]] += 1
                break
        else:
            if len(word1) > len(word2):
                return "" # Invalid order
    
    queue = deque([char for char in in_degree if in_degree[char] == 0])
    result = []
    
    while queue:
        char = queue.popleft()
        result.append(char)
        for neighbor in graph[char]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)
    
    if len(result) != len(unique_chars):
        return "" # Cycle detected
    
    return "".join(result)

Complexity Analysis

Time Complexity: O(V + E), where V is the number of unique characters and E is the number of orderings (edges in the graph). Building the graph and performing topological sort both take O(V + E) time.
Space Complexity: O(V + E) to store the graph and in-degree counts.

Edge Cases

Empty input: If the input list of words is empty, the output should be an empty string or handle as defined by the problem description. In most cases, returning an empty string is reasonable.
Single word: If the input contains only one word, the output should be the unique characters in that word in their order of appearance.
Inconsistent order: If the input words define an impossible order (e.g., a cycle exists in the character dependencies), the algorithm should return an empty string.
Prefix word: If one word is a prefix of another and appears later in the input list (e.g. ["abc", "ab"]), it represents an invalid ordering, and the algorithm should return an empty string.
Duplicate Characters: The code efficiently handles duplicate characters by using sets and dictionaries to keep track of unique characters and dependencies.

Solution

Naive Approach

Compare all word pairs: Iterate through all possible pairs of words in the input list.
Find the first differing character: For each pair, compare characters at the same index until the first difference is found. This reveals an ordering (e.g., if "abc" and "abd" are compared, we learn 'c' comes before 'd').
Build a graph: Use the discovered orderings to build a graph where nodes are letters and edges represent precedence.
Detect Cycles: After constructing the graph, we have to perform cycle detection because cyclic dependencies will make it impossible to return the topological sort. We can use Depth-First Search(DFS) to detect cycles.
Topological Sort: Perform a topological sort on the graph. If a cycle exists, return an empty string; otherwise, return the sorted letters. If there is cycle, it's not a valid dictionary.

This approach is inefficient because it involves redundant comparisons and can have a high time complexity.

Optimal Solution

Build the Graph: Construct a directed graph where nodes represent unique characters and edges represent precedence. We get precedence by comparing adjacent words. For example, if we have words = ["wrt", "wrf", "er", "ett", "rftt"], comparing "wrt" and "wrf" tells us 't' comes before 'f'.
Calculate In-Degrees: Calculate in-degrees of all nodes (characters). In-degree of a node is the number of incoming edges to that node.
Topological Sort: Use a queue to perform a topological sort. Start by adding all nodes with an in-degree of 0 to the queue.
Process Queue: While the queue is not empty:
- Dequeue a node and append it to the result.
- For each neighbor of the dequeued node, decrease its in-degree by 1. If the in-degree becomes 0, enqueue the neighbor.
Cycle Detection: If the length of the result is not equal to the number of unique characters, it means there's a cycle in the graph, and no valid order exists. Return an empty string.

Here's the Python code:

from collections import defaultdict, deque

def alienOrder(words):
    graph = defaultdict(list)
    in_degree = {}
    unique_chars = set()
    
    for word in words:
        for char in word:
            in_degree[char] = 0
            unique_chars.add(char)
            
    for i in range(len(words) - 1):
        word1, word2 = words[i], words[i+1]
        min_len = min(len(word1), len(word2))
        for j in range(min_len):
            if word1[j] != word2[j]:
                if word2[j] not in graph[word1[j]]:
                    graph[word1[j]].append(word2[j])
                    in_degree[word2[j]] += 1
                break
        else:
            if len(word1) > len(word2):
                return "" # Invalid order
    
    queue = deque([char for char in in_degree if in_degree[char] == 0])
    result = []
    
    while queue:
        char = queue.popleft()
        result.append(char)
        for neighbor in graph[char]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)
    
    if len(result) != len(unique_chars):
        return "" # Cycle detected
    
    return "".join(result)

Complexity Analysis

Time Complexity: O(V + E), where V is the number of unique characters and E is the number of orderings (edges in the graph). Building the graph and performing topological sort both take O(V + E) time.
Space Complexity: O(V + E) to store the graph and in-degree counts.

Edge Cases

Empty input: If the input list of words is empty, the output should be an empty string or handle as defined by the problem description. In most cases, returning an empty string is reasonable.
Single word: If the input contains only one word, the output should be the unique characters in that word in their order of appearance.
Inconsistent order: If the input words define an impossible order (e.g., a cycle exists in the character dependencies), the algorithm should return an empty string.
Prefix word: If one word is a prefix of another and appears later in the input list (e.g. ["abc", "ab"]), it represents an invalid ordering, and the algorithm should return an empty string.
Duplicate Characters: The code efficiently handles duplicate characters by using sets and dictionaries to keep track of unique characters and dependencies.