Taro Logo

Alien Dictionary

Hard
Amazon logo
Amazon
5 views
Topics:
GraphsStrings

You are given a dictionary of words where each word is a valid word in some alien language. The alphabet used in the alien language is unknown, but the order of characters within each word is valid according to the alien language's rules. The words in the dictionary are sorted lexicographically according to the rules of the alien language.

Your Task:

Determine the order of the characters in the alien alphabet based on the given dictionary of words. If the given words result in an invalid order (e.g., due to contradictions or cycles), return an empty string.

Example 1:

Input: words = ["wrt", "wrf", "er", "ett", "rftt"]

Output: "wertf"

Explanation:

From "wrt" and "wrf", we can determine that t comes before f. From "wrt" and "er", we can determine that w comes before e. From "er" and "ett", we can determine that r comes before t. Thus, one possible order is "wertf".

Example 2:

Input: words = ["z", "x"]

Output: "zx"

Example 3:

Input: words = ["z", "x", "z"]

Output: ""

Explanation: The order "z" then "x" then "z" implies there is no valid ordering.

Constraints:

  • 1 <= words.length <= 100
  • 1 <= words[i].length <= 100
  • words[i] consists of lowercase English letters.

Solution


Alien Dictionary

Naive Approach

A brute-force approach would involve generating all possible permutations of the characters present in the given list of words and then checking if any of these permutations satisfy the given ordering of the words. This is highly inefficient due to the factorial time complexity of generating permutations. Moreover, validating each permutation against the given word list also takes time. Therefore, this approach is not practical.

Optimal Solution

The problem can be modeled as a directed graph where each character is a node, and an edge from character a to character b exists if a comes before b in the alien alphabet. The ordering can be determined by performing a topological sort on this graph. The topological sort provides a linear ordering of the vertices such that for every directed edge u -> v, vertex u comes before vertex v in the ordering.

Here's the algorithm:

  1. Build the Graph: Iterate through the given list of words and compare adjacent words to determine the order of characters. If the first differing character in two adjacent words is a and b, then add an edge from a to b in the graph.
  2. Calculate In-degrees: Calculate the in-degree of each node (character), which is the number of incoming edges.
  3. Topological Sort: Use a queue to perform a topological sort. Add all nodes with an in-degree of 0 to the queue. While the queue is not empty, remove a node from the queue, add it to the result, and decrement the in-degree of all its neighbors. If any neighbor's in-degree becomes 0, add it to the queue.
  4. Cycle Detection: If the length of the result is not equal to the number of unique characters, then there is a cycle in the graph, and no valid ordering exists.

Edge Cases

  • Empty Input: If the input list of words is empty, return an empty string.
  • Single Word: If the input list contains only one word, the alien alphabet consists of the unique characters in that word, and their order doesn't matter, so return any permutation of those characters. Ordering them lexicographically is an acceptable solution.
  • Inconsistent Ordering: If the input words provide an inconsistent ordering (i.e., there is a cycle in the character dependencies), return an empty string.
  • Prefix Words: Handle the case where one word is a prefix of another. If a shorter word is a prefix of a longer word, the ordering is valid. If the longer word is a prefix of the shorter word, the ordering is invalid.

Code Implementation (Python)

from collections import defaultdict, deque

def alienOrder(words):
    graph = defaultdict(list)
    in_degree = {}
    unique_chars = set()

    # Initialize in_degree and unique characters
    for word in words:
        for char in word:
            in_degree[char] = 0
            unique_chars.add(char)

    # Build the graph and update in_degree
    for i in range(len(words) - 1):
        word1, word2 = words[i], words[i + 1]
        min_len = min(len(word1), len(word2))
        for j in range(min_len):
            if word1[j] != word2[j]:
                if word2[j] not in graph[word1[j]]:
                    graph[word1[j]].append(word2[j])
                    in_degree[word2[j]] += 1
                break # Important: Only the first diff matters
        else:
            # Check for invalid order (e.g., "abc", "ab")
            if len(word1) > len(word2):
                return ""

    # Topological Sort
    queue = deque([char for char in in_degree if in_degree[char] == 0])
    result = []

    while queue:
        char = queue.popleft()
        result.append(char)

        for neighbor in graph[char]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    # Cycle detection
    if len(result) != len(in_degree):
        return ""

    return "".join(result)

Time and Space Complexity

  • Time Complexity: O(V + E), where V is the number of unique characters (vertices) and E is the number of character ordering pairs (edges). Specifically, building the graph takes O(N * L) in the worst case where N is the number of words and L is the maximum length of word in words. Topological sort takes O(V+E). Thus O(NL + V + E). In most practical situations N*L > V and N*L > E so the complexity is considered O(NL).
  • Space Complexity: O(V + E), where V is the number of unique characters, and E is the number of edges, to store the graph and in-degree counts. Specifically, the hash map in_degree and the adjacency list graph will take O(V + E).