Taro Logo

Number of Same-End Substrings

Medium
Asked by:
Profile picture
3 views
Topics:
Strings

Given a string s, find the number of substrings that start and end with the same character.

Example 1:

Input: s = "abcba"
Output: 7
Explanation: 
The substrings that start and end with the same character are: "a", "b", "c", "b", "a", "bcb", "abcba".

Example 2:

Input: s = "aba"
Output: 4
Explanation: 
The substrings that start and end with the same character are: "a", "b", "a", "aba".

Example 3:

Input: s = "aaaaa"
Output: 15

Constraints:

  • 1 <= s.length <= 105
  • s consists of lowercase English letters.

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What is the maximum length of the input string, and are there any memory constraints I should be aware of?
  2. Is the input string case-sensitive? Should 'a' be considered the same as 'A'?
  3. What should I return if the input string is null or empty?
  4. By 'same-end substrings,' do you mean substrings that start and end with the same character?
  5. Are overlapping substrings considered distinct? For example, in 'abababa', are 'a', 'aba', and 'ababa' all valid 'same-end substrings' starting at index 0?

Brute Force Solution

Approach

To solve this problem using brute force, we will check every single possible substring within the main string. We'll then count only those substrings that start and end with the same character. It's like checking every nook and cranny to find the ones that match our criteria.

Here's how the algorithm would work step-by-step:

  1. Look at the first character of the string and consider it as a substring.
  2. Then, consider the first two characters as a substring, and then the first three, and so on, until you reach the end of the string.
  3. For each of these substrings, check if the first and last characters are the same.
  4. If they are the same, then count this substring as a 'same-end' substring.
  5. Next, start at the second character of the original string and repeat the process of making substrings of increasing length until the end of the string.
  6. Continue this process, moving the starting point one character at a time towards the end of the original string. Each time, generate all possible substrings starting from that point, and check the first and last character.
  7. Add up all the 'same-end' substrings you've found during this exhaustive search. The final sum is the answer.

Code Implementation

def number_of_same_end_substrings_brute_force(input_string):
    number_of_same_end_substrings = 0

    string_length = len(input_string)

    for start_index in range(string_length):
        # Iterate through all possible starting positions

        for end_index in range(start_index, string_length):
            substring = input_string[start_index:end_index+1]

            # We need to check if the substring is at least one character long
            if len(substring) > 0:
                if substring[0] == substring[-1]:

                    # Count the substrings that have the same start and end character
                    number_of_same_end_substrings += 1

    return number_of_same_end_substrings

Big(O) Analysis

Time Complexity
O(n²)The brute force approach iterates through all possible substrings of the input string. The outer loop determines the starting position of the substring, running from the first to the last character of the string of length n. The inner loop determines the length of the substring, also iterating up to n. For each substring, a constant time comparison is performed. Therefore, the total number of operations is proportional to the sum of integers from 1 to n, which approximates to n * (n+1) / 2. This simplifies to O(n²).
Space Complexity
O(1)The provided brute force approach only uses a few constant space variables for iterating through the string and storing the count of same-end substrings. No auxiliary data structures like lists or hash maps are created to store substrings or intermediate results. Therefore, the space required does not scale with the input string size N, and the space complexity is constant.

Optimal Solution

Approach

The most efficient way to find these substrings is to count how many times each character appears in the string. Then, we can use a simple formula to calculate the number of substrings that start and end with the same character. This avoids checking every possible substring.

Here's how the algorithm would work step-by-step:

  1. Go through the string and count how many times each different character appears.
  2. For each character, take its count and use it to figure out how many substrings start and end with that character. The number of substrings for a given character is the count of that character times (count of that character plus one), all divided by two.
  3. Add up the number of substrings you found for each different character. The total is the answer.

Code Implementation

def number_of_same_end_substrings(input_string):
    character_counts = {}
    
    for character in input_string:
        if character in character_counts:
            character_counts[character] += 1
        else:
            character_counts[character] = 1
    
    total_substrings = 0
    
    # Iterate through counts of each character
    for character in character_counts:

        count_of_character = character_counts[character]

        # Use formula n * (n + 1) / 2 to calc substrings
        substrings_with_char =
        count_of_character * (count_of_character + 1) // 2

        total_substrings += substrings_with_char

    # The total number of same-end substrings
    return total_substrings

Big(O) Analysis

Time Complexity
O(n)The algorithm iterates through the string once to count the occurrences of each character. This loop's runtime is directly proportional to the length of the string, which we represent as 'n'. After counting, the algorithm iterates through the unique characters (at most the size of the alphabet, considered constant) to calculate the number of same-end substrings. The calculations for each character's contribution are constant time operations. Since the loop over characters (alphabet size) is independent of the input size 'n' and considered constant, the dominant factor is the initial string traversal. Therefore, the overall time complexity is O(n).
Space Complexity
O(1)The provided solution counts the occurrences of each character in the string. This requires storing character counts, which can be done using a hash map or a fixed-size array (of size equal to the number of possible characters, like 26 for lowercase English letters). The space used for the character counts is therefore constant, independent of the input string's length N. No other data structures that scale with N are used, so the overall auxiliary space complexity is O(1).

Edge Cases

CaseHow to Handle
Null or empty input stringReturn 0, as there are no substrings in an empty string.
String with only one characterReturn 1, since the entire string itself is a same-end substring.
String with all identical characters (e.g., 'aaaa')Handle efficiently by counting all substrings; the number of same-end substrings is N*(N+1)/2 where N is the length of the string.
String with no matching same-end charactersThe algorithm should return 0, signifying that there are no substrings that satisfy the condition (other than the whole string, if applicable).
Very long string to test efficiency.Ensure the solution avoids excessive memory allocation or quadratic time complexity for long strings.
String with non-alphanumeric characters.The algorithm should handle these characters correctly without causing errors, by using appropriate string comparison methods.
String with unicode characters.The algorithm should handle unicode characters correctly ensuring accurate indexing and substring comparisons.
Case-sensitivity of the characters (e.g., 'a' vs. 'A').Specify in the problem whether matching characters are case-sensitive or case-insensitive, and implement the appropriate comparison accordingly.