Taro Logo

Repeated String Match

Medium
Google logo
Google
1 view
Topics:
Strings

Given two strings a and b, return the minimum number of times you should repeat string a so that string b is a substring of it. If it is impossible for b to be a substring of a after repeating it, return -1. For example:

  • If a = "abcd", and b = "cdabcdab", the answer is 3 because repeating a three times gives "abcdabcdabcd", and b is a substring of this.
  • If a = "a", and b = "aa", the answer is 2.

Here are some constraints:

  • The lengths of a and b are between 1 and 10,000.
  • a and b consist of lowercase English letters.

Describe an algorithm to solve this problem efficiently. What is the time and space complexity of your solution?

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What are the maximum lengths of strings `a` and `b`? Are they guaranteed to be non-empty?
  2. If `b` cannot be found in any repetition of `a`, what value should I return? (e.g., -1, 0, null)?
  3. Is the matching of `b` in the repetition of `a` required to be a contiguous substring, or can it be composed of disjointed parts of `a`?
  4. Are `a` and `b` case-sensitive? (e.g., should "abc" match "ABC"?)
  5. Can I assume that `a` and `b` will only contain alphanumeric characters or might they contain other special characters or unicode characters?

Brute Force Solution

Approach

We're given two strings, call them string A and string B. The brute force approach involves repeatedly writing string A next to itself and then checking if string B appears anywhere within the combined string.

Here's how the algorithm would work step-by-step:

  1. Start by writing string A once.
  2. Check if string B is inside this single copy of string A.
  3. If not, write string A again, so now you have two copies of string A joined together.
  4. Check again if string B is inside this new, longer string.
  5. Keep repeating the process of adding another copy of string A and checking if string B is present.
  6. Stop when you either find string B inside the repeated string A, or when the repeated string A becomes too long, meaning it's impossible for string B to ever fit inside.

Code Implementation

def repeated_string_match_brute_force(string_a, string_b):
    repeated_string = ""
    repetitions = 0

    #Keep appending string_a until string_b is found or it's too large
    while len(repeated_string) < len(string_b) + 2 * len(string_a):
        repeated_string += string_a
        repetitions += 1

        #Check if string_b is a substring of the current repeated_string
        if string_b in repeated_string:
            return repetitions

    #If string_b is not found within a reasonable number of repetitions
    return -1

Big(O) Analysis

Time Complexity
O(m * n)The algorithm repeatedly concatenates string A to itself. In the worst case, we concatenate A multiple times until its length exceeds the length of B significantly. Let n be the length of string A and m be the length of string B. The contains check (is B a substring of the repeated A) typically takes O(m * k) time, where k is the length of the repeated A string. Since we repeat A until k is proportional to m+n in the worst case, and we perform the contains operation in each iteration of the loop, the worst-case time complexity becomes O(m * n) due to the string contains operation being performed in the while loop.
Space Complexity
O(N)The algorithm constructs a string by repeatedly concatenating string A. In the worst case, the repeated string A could grow to be larger than string B before either containing string B or determining that string B will never be present. The size of this repeated string is at most proportional to the length of string B's length (to ensure it's long enough to contain B), so if we consider the combined length of A and B to be N, then the repeated string has a maximum length proportional to N. Therefore, the auxiliary space is O(N) due to the growing string.

Optimal Solution

Approach

The goal is to determine how many times a given string A needs to be repeated so that another string B becomes a substring of the repeated A. We can avoid unnecessary repetitions by first figuring out the absolute minimum number of repetitions needed and then checking only a small range around that minimum.

Here's how the algorithm would work step-by-step:

  1. Figure out the minimum number of times you absolutely need to repeat string A to make it at least as long as string B. This is a lower bound.
  2. Check if string B is contained within this minimum repeated string A.
  3. If string B is not contained, repeat string A one more time and check again. This is because B might span across the boundary between two repetitions of A.
  4. If string B is still not contained after the second check, repeat string A yet one more time and check if string B is a substring. This is the maximum repetition number we need to check.
  5. If string B is still not contained, then it is impossible to make string B a substring by repeating A. Return -1.
  6. If string B is found at any step, return the corresponding number of repetitions.

Code Implementation

def repeatedStringMatch(string_a, string_b):
    minimum_repetitions = (len(string_b) + len(string_a) - 1) // len(string_a)

    repeated_string = string_a * minimum_repetitions

    #Check if B is a substring after repeating A
    if string_b in repeated_string:
        return minimum_repetitions

    repeated_string += string_a

    #Check if B is a substring after repeating A one more time
    if string_b in repeated_string:
        return minimum_repetitions + 1

    repeated_string += string_a
    # B might span across the boundary between repetitions of A
    if string_b in repeated_string:
        return minimum_repetitions + 2
    
    # If B is still not contained after the second check
    return -1

Big(O) Analysis

Time Complexity
O(m*n)The algorithm repeats string A a maximum of three times, creating a string whose length is roughly proportional to n (length of string A). The core operation is the substring check, performed by `contains()`, which in the worst case can take O(m*n) time where m is the length of string B and n is the length of the repeated string A. Since we do this at most three times, the overall time complexity remains O(m*n).
Space Complexity
O(N)The primary space consumption arises from constructing the repeated string A. In the worst-case scenario, A might be repeated up to ceil(len(B) / len(A)) + 2 times to check if B is a substring, resulting in a temporary string potentially three times the length of B. Thus the auxiliary space scales linearly with the length of string B, where N represents the length of string B. The space complexity is O(N).

Edge Cases

CaseHow to Handle
Empty string A or BIf either A or B is empty, return -1 since no repetition of A can contain B.
String B is much longer than string AThe solution needs to intelligently determine the maximum number of repetitions of A necessary to potentially contain B.
String A is a very long stringThe solution should avoid unnecessary string concatenations which could lead to memory issues or performance degradation.
String B cannot be formed by repeating A, regardless of how many times A is repeatedThe algorithm should detect this case and return -1 efficiently, without entering an infinite loop.
String A contains string B as a substringHandle the case where B is already a substring of A to immediately return 1.
Integer overflow when calculating the number of repetitionsCarefully consider the upper bounds of the number of repetitions to avoid integer overflow issues when calculating the required repetitions of A.
String A and B are identicalReturn 1 since B is already present in A after just one repetition.
String B consists of characters not present in String AIf B has characters not in A, return -1 after checking character set inclusion