Taro Logo

Longest Subsequence Repeated k Times

Hard
Meta logo
Meta
Topics:
StringsDynamic ProgrammingGreedy Algorithms

You are given a string s of length n, and an integer k. You are tasked to find the longest subsequence repeated k times in string s.

A subsequence is a string that can be derived from another string by deleting some or no characters without changing the order of the remaining characters.

A subsequence seq is repeated k times in the string s if seq * k is a subsequence of s, where seq * k represents a string constructed by concatenating seq k times.

  • For example, "bba" is repeated 2 times in the string "bababcba", because the string "bbabba", constructed by concatenating "bba" 2 times, is a subsequence of the string "**b**a**bab**c**ba**".

Return the longest subsequence repeated k times in string s. If multiple such subsequences are found, return the lexicographically largest one. If there is no such subsequence, return an empty string.

For example:

s = "letsleetcode", k = 2
Output: "let"
Explanation: There are two longest subsequences repeated 2 times: "let" and "ete".
"let" is the lexicographically largest one.
s = "bb", k = 2
Output: "b"
Explanation: The longest subsequence repeated 2 times is "b".
s = "ab", k = 2
Output: ""
Explanation: There is no subsequence repeated 2 times. Empty string is returned.

Can you provide an efficient algorithm and code implementation to solve this problem? What is the time and space complexity of your solution?

Solution


Naive Approach

A brute-force approach would involve generating all possible subsequences of the input string s and then checking if each subsequence, when repeated k times, is also a subsequence of s. The longest lexicographically largest subsequence that meets this condition would be the answer.

Algorithm

  1. Generate all subsequences of s.
  2. For each subsequence seq, create a string repeated_seq by concatenating seq k times.
  3. Check if repeated_seq is a subsequence of s.
  4. Keep track of the longest subsequence that satisfies the condition and is lexicographically largest.

Code (Python)

def is_subsequence(s, t):
    i = 0
    j = 0
    while i < len(s) and j < len(t):
        if s[i] == t[j]:
            i += 1
        j += 1
    return i == len(s)

def longest_repeated_subsequence_naive(s, k):
    n = len(s)
    longest_subsequence = ""
    
    for i in range(2**n):
        subsequence = ""
        for j in range(n):
            if (i >> j) & 1:
                subsequence += s[j]
        
        repeated_subsequence = subsequence * k
        
        if is_subsequence(repeated_subsequence, s):
            if len(subsequence) > len(longest_subsequence) or \
               (len(subsequence) == len(longest_subsequence) and subsequence > longest_subsequence):
                longest_subsequence = subsequence
                
    return longest_subsequence

Time Complexity

  • Generating all subsequences takes O(2n) time.
  • Checking if a subsequence is repeated k times and is a subsequence of s takes O(k * len(subsequence) + n) in the worst case for each subsequence.
  • Overall, the time complexity is O(2n * (k * n + n)), which simplifies to O(2n * k * n).

Space Complexity

  • O(n) to store the subsequence and the repeated subsequence.

Optimal Approach: Dynamic Programming + Greedy

The optimal approach combines dynamic programming to precompute character positions and a greedy strategy to build the longest repeated subsequence.

Algorithm

  1. Precompute Character Positions: Create a list of lists pos where pos[c] stores the indices of character c in the string s. This allows for quick lookups of character positions during subsequence construction.
  2. Greedy Construction: Start with an empty subsequence. Iterate through characters 'z' to 'a' (lexicographically largest first). For each character c, try to append it to the current subsequence. Check if the new subsequence (repeated k times) is a subsequence of s. If it is, append c to the subsequence and continue. If not, move to the next character.
  3. Subsequence Check: To efficiently check if a subsequence seq * k is a subsequence of s, maintain an index pointer into s. For each character in seq * k, find its next occurrence in s starting from the current index. If any character cannot be found, the subsequence is invalid.

Code (Python)

def longest_repeated_subsequence(s, k):
    pos = {c: [] for c in 'abcdefghijklmnopqrstuvwxyz'}
    for i, c in enumerate(s):
        pos[c].append(i)

    res = ""
    idx = -1

    for char in sorted(pos.keys(), reverse=True):
        new_res = res + char
        cur_idx = -1
        possible = True

        for _ in range(k):
            for c in new_res:
                indices = pos[c]
                next_idx = next((i for i in indices if i > cur_idx), None)

                if next_idx is None:
                    possible = False
                    break
                cur_idx = next_idx
            if not possible:
                break

        if possible:
            res = new_res

    return res

Time Complexity

  • O(n) to create the pos dictionary.
  • O(26 * k * len(res) * n) in the worst case, where len(res) is the length of the result which is at most n. In the worst case, len(res) can be n.
  • Overall, the time complexity is approximately O(n * k * n) which simplifies to O(kn2).

Space Complexity

  • O(26 * n) to store the pos dictionary (at most 26 characters, each with at most n positions).
  • O(n) to store the resulting subsequence.
  • Overall, the space complexity is O(n).

Edge Cases

  1. Empty String: If the input string s is empty, the longest repeated subsequence is an empty string.
  2. No Repeated Subsequence: If there is no subsequence that can be repeated k times, the function should return an empty string. Example: `s =