Taro Logo

Is Subsequence

Easy
Apple logo
Apple
1 view
Topics:
StringsTwo PointersBinary Search

Given two strings s and t, return true if s is a subsequence of t, or false otherwise.

A subsequence of a string is a new string that is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (i.e., "ace" is a subsequence of "abcde" while "aec" is not).

Example 1: Input: s = "abc", t = "ahbgdc" Output: true

Example 2: Input: s = "axc", t = "ahbgdc" Output: false

Follow up: Suppose there are lots of incoming s, say s1, s2, ..., sk where k >= 10^9, and you want to check one by one to see if t has its subsequence. In this scenario, how would you change your code?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. Can the input strings s and t be empty or null?
  2. Are s and t case-sensitive, or should I perform a case-insensitive comparison?
  3. What are the maximum lengths of strings s and t?
  4. If s is an empty string, should I return true or false?
  5. Is the problem only about determining existence of a subsequence, or are there other potential tasks or requirements to consider?

Brute Force Solution

Approach

The brute force method for determining if one string is a subsequence of another involves checking every possible way the first string could appear within the second. Essentially, we're trying all combinations to see if we can find the characters of the first string in the correct order within the second string.

Here's how the algorithm would work step-by-step:

  1. Begin by comparing the first character of the first string with the first character of the second string.
  2. If they match, move on to the second character of the first string and look for it in the remaining characters of the second string.
  3. If they don't match, skip the first character of the second string and compare the first character of the first string to the second character of the second string instead.
  4. Continue doing this, skipping characters in the second string until either a match is found or the end of the second string is reached.
  5. If, at any point, you reach the end of the second string but haven't found all the characters of the first string, then the first string is not a subsequence of the second.
  6. If you successfully find all characters of the first string in the correct order within the second string, then the first string is a subsequence of the second.

Code Implementation

def is_subsequence_brute_force(subsequence, main_string):
    subsequence_index = 0
    main_string_index = 0

    while subsequence_index < len(subsequence) and main_string_index < len(main_string):
        # If characters match, move to next char in subsequence
        if subsequence[subsequence_index] == main_string[main_string_index]:
            subsequence_index += 1

        # Always move to the next char in the main string
        main_string_index += 1

    # Check if we've reached the end of the subsequence
    if subsequence_index == len(subsequence):
        return True
    else:
        return False

Big(O) Analysis

Time Complexity
O(m*n)Let 's' be the length of the first string and 't' be the length of the second string. In the worst-case scenario, for each character in 's', we might have to iterate through almost all characters in 't' to find a match. This occurs when the characters in 's' are sparsely distributed in 't' or absent until the end. Therefore, the time complexity is proportional to the product of the lengths of the two strings, resulting in O(m*n) where m is the length of the first string, 's', and n is the length of the second string, 't'.
Space Complexity
O(1)The algorithm described in the plain English explanation primarily uses index manipulation for string traversal. No auxiliary data structures like lists, dictionaries, or arrays are created to store intermediate results. Therefore, only a constant amount of extra memory is used, irrespective of the lengths of the input strings, denoted as N. This constant space usage leads to a space complexity of O(1).

Optimal Solution

Approach

We want to check if one string appears in order within another. The best way to do this is to walk through both strings at the same time, looking for matching letters and advancing through both strings, or only the second string, as needed.

Here's how the algorithm would work step-by-step:

  1. Start at the beginning of both strings.
  2. Check if the current letter in the first string matches the current letter in the second string.
  3. If they match, move to the next letter in both strings. This means we found one letter of the first string in the correct order within the second string.
  4. If they don't match, only move to the next letter in the second string. We are looking for the current letter in the first string, so we check the next letter in the second string.
  5. Keep doing this until you reach the end of either string.
  6. If you reach the end of the first string, it means you've found all its letters in the second string in the correct order, so the first string is a subsequence of the second string.
  7. If you reach the end of the second string before reaching the end of the first string, it means you didn't find all the letters of the first string, so it's not a subsequence.

Code Implementation

def is_subsequence(subsequence_string, main_string):
    subsequence_pointer = 0
    main_string_pointer = 0

    while subsequence_pointer < len(subsequence_string) and \
            main_string_pointer < len(main_string):

        # If chars match, advance both pointers
        if subsequence_string[subsequence_pointer] == main_string[main_string_pointer]:

            subsequence_pointer += 1

        # Always advance main string pointer
        main_string_pointer += 1

    # If we reached the end of subsequence, it's a subsequence
    if subsequence_pointer == len(subsequence_string):

        return True

    return False

Big(O) Analysis

Time Complexity
O(n)We iterate through both strings, 's' and 't', using two pointers. In the worst case, we iterate through the entire length of 't' (let its length be n) trying to find the characters of 's'. The number of operations is bounded by the length of 't' since we only increment the pointer for 's' when a match is found. Therefore, the time complexity is directly proportional to the length of 't', which is O(n).
Space Complexity
O(1)The algorithm iterates through the strings using index variables for both strings (let's call them s_index and t_index). These index variables are the only extra memory used, and their number doesn't depend on the length of either string, which we can consider the input size 'N' (where N could be the combined length of both strings). Therefore, the space complexity is constant. The amount of auxiliary memory does not scale with the input.

Edge Cases

CaseHow to Handle
s is an empty stringAn empty string is always a subsequence of any other string, so return true.
t is an empty string but s is notIf s is not empty and t is empty, s cannot be a subsequence of t, so return false.
Both s and t are empty stringsAn empty string is a subsequence of another empty string, return true.
s is longer than tIf s is longer than t, s cannot be a subsequence of t, so return false.
s and t are identical stringsIdentical strings satisfy the subsequence condition, return true.
t is a very long string, but s is a short string at the very end of tThe iterative approach handles this by scanning t until the matching characters of s are found or t is exhausted.
s contains duplicate characters.The iterative approach handles this by searching through t for each character in s independently.
s contains characters not found in t.The algorithm will eventually fail to find the character in t, returning false.