Taro Logo

Number of Good Ways to Split a String

Medium
Meta logo
Meta
3 views
Topics:
StringsArraysTwo Pointers

You are given a string s.

A split is called good if you can split s into two non-empty strings s_left and s_right where their concatenation is equal to s (i.e., s_left + s_right = s) and the number of distinct letters in s_left and s_right is the same.

Return the number of good splits you can make in s.

For example:

s = "aacaba"

Output: 2

Explanation: There are 5 ways to split "aacaba" and 2 of them are good.

  • ("a", "acaba") Left string and right string contains 1 and 3 different letters respectively.
  • ("aa", "caba") Left string and right string contains 1 and 3 different letters respectively.
  • ("aac", "aba") Left string and right string contains 2 and 2 different letters respectively (good split).
  • ("aaca", "ba") Left string and right string contains 2 and 2 different letters respectively (good split).
  • ("aacab", "a") Left string and right string contains 3 and 1 different letters respectively.

As another example:

s = "abcd"

Output: 1

Explanation: Split the string as follows ("ab", "cd").

Write a function to determine the number of good splits in a given string s.

Solution


Good Splits in a String

Problem Description

You are given a string s. A split is called good if you can split s into two non-empty strings s_left and s_right where their concatenation is equal to s (i.e., s_left + s_right = s) and the number of distinct letters in s_left and s_right is the same. Return the number of good splits you can make in s.

Naive Solution

A straightforward, brute-force approach would involve iterating through all possible split points in the string s. For each split, we determine the number of distinct characters in both the left and right substrings and then compare those counts.

Code

def count_good_splits_naive(s):
    n = len(s)
    count = 0
    for i in range(1, n):
        left = s[:i]
        right = s[i:]
        
        distinct_left = len(set(left))
        distinct_right = len(set(right))
        
        if distinct_left == distinct_right:
            count += 1
    return count

Time Complexity

  • O(n^2): The outer loop runs n-1 times. In each iteration, constructing the substrings left and right takes O(n) time, and creating sets takes O(n) as well.

Space Complexity

  • O(n): Space is used for the left and right substring.

Optimal Solution

To improve efficiency, we can precompute the number of distinct characters for all prefixes and suffixes of the string. We can use two arrays, left_distinct and right_distinct, to store these values. Then, we can iterate through the string once to count the number of good splits based on these precomputed values.

Code

def count_good_splits_optimal(s):
    n = len(s)
    left_distinct = [0] * n
    right_distinct = [0] * n
    
    left_count = set()
    for i in range(n):
        left_count.add(s[i])
        left_distinct[i] = len(left_count)
        
    right_count = set()
    for i in range(n - 1, -1, -1):
        right_count.add(s[i])
        right_distinct[i] = len(right_count)
        
    count = 0
    for i in range(n - 1):
        if left_distinct[i] == right_distinct[i + 1]:
            count += 1
            
    return count

Time Complexity

  • O(n): We iterate through the string three times (twice to precompute distinct character counts and once to count good splits), making the overall time complexity linear.

Space Complexity

  • O(n): We use two arrays of size n to store the distinct character counts for prefixes and suffixes.

Edge Cases

  • Empty string: Although the problem states the string length is always greater than or equal to 1, it's good practice to consider such cases in an interview. You can either return 0 directly or raise an exception.
  • String with only one character: There are no possible splits.
  • String with all same characters: Number of good splits would depend on distinct characters.

Summary

The optimal solution provides a significant improvement in time complexity by precomputing distinct character counts, reducing the time complexity from quadratic to linear. This approach also makes the code more readable and maintainable.