Taro Logo

Alien Dictionary

Hard
Microsoft logo
Microsoft
0 views
Topics:
GraphsStringsArrays

You are given a sorted dictionary of an alien language. The dictionary is represented as an array of strings, where each string is a word in the alien language. The words in the dictionary are sorted lexicographically according to the rules of the alien language. Your task is to construct the order of the alphabet in the alien language, returning a string representing that order.

For example:

words = ["wrt", "wrf", "er", "ett", "rftt"]

In this example, the correct order should be "wertf". Based on the given words, you can derive the following order:

  • 'w' comes before 'e'
  • 'e' comes before 'r'
  • 'r' comes before 't'
  • 't' comes before 'f'

Therefore, the lexicographical order is "wertf".

Another example:

words = ["z", "x"]

In this case, the order is "zx".

Consider these constraints:

  • The length of the words array is between 1 and 100.
  • The length of each word is between 1 and 20.
  • The words consist of only lowercase English letters.

Develop an algorithm to determine the order of the alien alphabet. If no valid order exists (e.g., due to contradictory information or cycles), return an empty string. Make sure to consider edge cases like an empty input list, or cases where the order cannot be determined due to insufficient information.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What characters are allowed in the words? Is it only lowercase English letters, or can there be other characters?
  2. If the input words do not define a valid alien alphabet (e.g., contain contradictory orderings), what should I return? Should I return an empty string or throw an error?
  3. Is it guaranteed that the input list of words is non-empty? If the input is empty, what should be the expected output?
  4. If there are multiple valid alien alphabets, is any valid ordering acceptable, or is there a specific ordering that should be returned?
  5. Can a character appear in the alphabet without appearing in the input words, and should these unmentioned letters be included in the output alphabet?

Brute Force Solution

Approach

The alien dictionary problem asks us to figure out the order of letters in an alien language based on a list of words. A brute force approach involves trying out every possible order of letters and checking if that order is consistent with the given list of words.

Here's how the algorithm would work step-by-step:

  1. First, list all the unique letters that appear in the alien words.
  2. Then, imagine creating every possible ordering of these letters. This is like shuffling a deck of cards in every conceivable sequence.
  3. For each of these possible letter orderings, go through the given list of alien words and check if they are in the correct order according to your current letter ordering.
  4. To check the order, compare adjacent words in the list. The first different letter between the two words tells you the order of those letters in the alien alphabet. See if this agrees with the current letter ordering you are testing.
  5. If any pair of words is out of order according to your tested letter ordering, reject that letter ordering and move on to the next possible ordering.
  6. If you find a letter ordering that makes all the words appear in the correct order, you've found a valid alien alphabet order.
  7. If you go through all possible letter orderings and none of them work, it means there's no valid alien alphabet based on the input.

Code Implementation

import itertools

def alien_dictionary_brute_force(words):
    unique_characters = set(''.join(words))
    possible_orders = list(itertools.permutations(unique_characters))

    for alphabet_order in possible_orders:
        alphabet_map = {character: index for index, character in enumerate(alphabet_order)}
        is_valid_order = True

        # Compare adjacent words to determine validity
        for i in range(len(words) - 1):
            word1 = words[i]
            word2 = words[i + 1]
            min_length = min(len(word1), len(word2))

            for j in range(min_length):
                if word1[j] != word2[j]:
                    # Found a difference, check alphabet order
                    if alphabet_map[word1[j]] > alphabet_map[word2[j]]:
                        is_valid_order = False
                        break
                    else:
                        break
            else:
                # If word1 is longer than word2 and all preceding chars are same, invalid
                if len(word1) > len(word2):
                    is_valid_order = False

            if not is_valid_order:
                break

        # If a valid order is found, return letters in the order
        if is_valid_order:
            return ''.join(alphabet_order)

    return ""

Big(O) Analysis

Time Complexity
O(N! * M * L)The brute force approach first identifies all unique characters, say K. It then generates all possible permutations of these characters, which takes O(K!) time. For each permutation, it iterates through the list of N words. For each adjacent pair of words, it compares them up to length L (the maximum length of a word), to determine if the current permutation is a valid ordering. Thus, each validation takes O(N * L) time. Therefore, the overall time complexity is O(K! * N * L). Since K can be at most the number of words N, and assuming K is closer to N than a constant, we can write the overall time complexity as O(N! * N * L). We can simplify this to O(N! * M * L), where M is the number of words.
Space Complexity
O(N!)The brute force approach described involves generating every possible ordering of the unique letters. The number of these orderings is N! (N factorial), where N is the number of unique letters. Since each ordering needs to be stored (implicitly or explicitly) before being tested, the space required scales with the number of permutations that needs to be stored for example in a list. Therefore, the space complexity is O(N!).

Optimal Solution

Approach

The alien dictionary problem is about finding the order of letters in an alien language, given a sorted list of words in that language. The optimal strategy involves building a graph representing letter dependencies and then using that graph to determine the order of letters.

Here's how the algorithm would work step-by-step:

  1. First, look at pairs of adjacent words to find the order of letters. If two words share a prefix but one is shorter, that tells us something is wrong, so handle that edge case.
  2. Compare the first differing letters of each pair of adjacent words. The letter in the first word comes before the letter in the second word in the alien alphabet.
  3. Represent these relationships as a graph. Letters are nodes, and the 'comes before' relationships are directed edges.
  4. Count how many edges point to each letter, which gives you the 'in-degree' of each node. This tells you how many letters must come before a given letter.
  5. Start with the letters that have no incoming edges (in-degree of zero). These are the starting points of the alphabet.
  6. Pick any letter with no incoming edges, add it to your ordered alphabet, then remove all its outgoing edges.
  7. Removing edges might create new letters with no incoming edges. Keep picking letters with no incoming edges and adding them to the alphabet until all letters are processed.
  8. If you can't process all letters (meaning there's a cycle in the graph), it means the input is invalid and there's no valid alphabet.
  9. The final order of the letters you picked is the order of the alien alphabet.

Code Implementation

def alien_order(words):
    # Build the graph of letter dependencies.
    adjacency_list = {character: [] for word in words for character in word}
    in_degree = {character: 0 for character in adjacency_list}

    for i in range(len(words) - 1):
        word1 = words[i]
        word2 = words[i + 1]
        min_length = min(len(word1), len(word2))
        if len(word1) > len(word2) and word1[:min_length] == word2:
            return ""

        for j in range(min_length):
            if word1[j] != word2[j]:
                if word2[j] not in adjacency_list[word1[j]]:
                    adjacency_list[word1[j]].append(word2[j])
                    in_degree[word2[j]] += 1
                break

    # Add nodes with in_degree of zero to the queue
    queue = [character for character in in_degree if in_degree[character] == 0]
    result = []

    while queue:
        character = queue.pop(0)
        result.append(character)

        # Iterate through neighbors
        for neighbor in adjacency_list[character]:
            in_degree[neighbor] -= 1

            # If in-degree becomes 0, add to queue
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    # Check for cycles in the graph.
    if len(result) != len(in_degree):
        return ""
    
    return "".join(result)

Big(O) Analysis

Time Complexity
O(C + W)The time complexity is determined by two main components: building the graph and performing the topological sort. Building the graph involves iterating through the list of words (W words) and comparing adjacent words, potentially character by character (C total unique characters/letters). The topological sort visits each node (character) and edge once. The number of nodes is bounded by C (unique characters), and the number of edges is related to the comparisons made during graph construction, also bounded by comparisons of characters in words W. Therefore, the overall time complexity is O(C + W), where C represents the total number of unique characters and W is the total number of words. In the worst case, if all words contain unique characters, the complexity approaches O(W) or O(C), whichever is greater.
Space Complexity
O(1)The auxiliary space is primarily determined by the graph representation and the in-degree counts. The graph, at most, can contain all unique characters from the input words as nodes and edges representing the order between them. While the number of unique characters is technically dependent on the input words, it is bounded by a constant value representing the size of the alphabet. The in-degree count also needs to keep track of the in-degrees of the alphabets and this is also bounded. Therefore, both the graph and the in-degree counts' space requirement remains constant irrespective of the input size of words.

Edge Cases

CaseHow to Handle
Empty input list of wordsReturn an empty string since no order can be determined.
List with a single wordReturn the unique characters in the single word, preserving order.
Words with no discernible order (e.g., ['abc', 'bac'])The algorithm should detect a cycle in the graph and return an empty string.
Conflicting order (e.g., ['ab', 'ba']) leading to a cycleThe algorithm should detect a cycle in the graph and return an empty string.
Words contain characters outside the lowercase English alphabetThrow an exception or filter out invalid characters to prevent errors and ensure correct ordering is determined from only valid alphabet characters.
Maximum length of the input list leading to high memory usage for adjacency listEnsure that the adjacency list implementation scales efficiently to avoid exceeding memory limits by considering using appropriate data structures and optimize memory allocation.
A character only appears at the end of the words and does not have dependencyMake sure topological sort still includes character without any dependencies.
Inconsistent ordering: 'abc', 'ab'This implies 'c' < '', which is invalid and constitutes a cycle, so return an empty string.