Taro Logo

Differences Between Two Objects

Medium
Asked by:
Profile picture
6 views

Given two objects, obj1 and obj2, return a new object that represents the differences between the two objects. The returned object should only contain keys where the values are different between obj1 and obj2. For each such key, the value in the returned object should be an array containing the values from obj1 and obj2, respectively. If a key only exists in one object, the value should be null in the corresponding array position.

Example 1:

Input: obj1 = {"a": 1, "b": 2, "c": 3}, obj2 = {"b": 2, "c": 4, "d": 5}
Output: {"a": [1, null], "c": [3, 4], "d": [null, 5]}
Explanation:
Key "a" exists in obj1 but not in obj2. Value in obj1 is 1, obj2 doesn't have this key so it is null.
Key "b" is the same in both objects, so it is not included in the result.
Key "c" exists in both objects but their values are different, so they are included in the result array. Value in obj1 is 3, value in obj2 is 4.
Key "d" exists in obj2 but not in obj1. Value in obj2 is 5, obj1 doesn't have this key, so it is null.

Example 2:

Input: obj1 = {"a": 1, "b": 2}, obj2 = {"a": 1, "b": 2}
Output: {}

Example 3:

Input: obj1 = {"a": 1, "b": 2, "c": 3}, obj2 = {"a": 1, "b": 2, "c": 3, "d": 4}
Output: {"d": [null, 4]}

Constraints:

  • obj1 and obj2 are valid JSON objects.
  • 2 <= JSON.stringify(obj1).length <= 105
  • 2 <= JSON.stringify(obj2).length <= 105

Solution


Clarifying Questions

When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:

  1. What data types will the objects contain, and can they contain nested objects or arrays?
  2. What should the output format be? For instance, should I return a list of keys that differ, a dictionary with the differences, or something else?
  3. How should I handle different data types for the same key across the two objects? Should I flag it as a difference, or attempt a conversion?
  4. Are null, undefined, or missing keys considered differences, and how should they be handled?
  5. Is the order of elements within arrays significant when comparing them? If so, should I consider [1, 2, 3] and [3, 2, 1] as different?

Brute Force Solution

Approach

The brute force method for finding differences between two objects means we'll compare every single part of one object to the other. We meticulously check each attribute to identify any discrepancies, leaving no stone unturned.

Here's how the algorithm would work step-by-step:

  1. Take the first object and get a list of all its properties or characteristics.
  2. Take the second object and do the same: list all of its properties or characteristics.
  3. Now, for each property in the first object, look for the exact same property in the second object.
  4. If you find a matching property, compare the value of that property in both objects. If they are different, mark it as a difference.
  5. If a property exists in the first object but is missing in the second object, also mark it as a difference.
  6. Repeat the process, but this time start with the properties of the second object and compare them to the first object. This will catch any properties unique to the second object.
  7. Finally, gather all the differences you found and present them as the changes between the two objects.

Code Implementation

def find_differences_brute_force(first_object, second_object):
    differences = []
    first_object_properties = dir(first_object)
    second_object_properties = dir(second_object)

    # Iterate through properties of the first object
    for first_object_property in first_object_properties:
        if first_object_property.startswith('__'):
            continue

        if hasattr(second_object, first_object_property):
            # Compare values if property exists in both objects
            first_object_value = getattr(first_object, first_object_property)
            second_object_value = getattr(second_object, first_object_property)

            if first_object_value != second_object_value:
                differences.append(f'Property {first_object_property} differs:'\
                                     f' {first_object_value} != {second_object_value}')
        else:
            # Report if property is missing in the second object
            differences.append(f'Property {first_object_property} missing in second object')

    # Iterate through properties of the second object to find unique ones
    for second_object_property in second_object_properties:
        if second_object_property.startswith('__'):
            continue

        # Check existence in first object to find unique properties
        if not hasattr(first_object, second_object_property):
            differences.append(f'Property {second_object_property} missing in first object')

    return differences

Big(O) Analysis

Time Complexity
O(n*m)Let n be the number of properties in the first object and m be the number of properties in the second object. The algorithm iterates through each property of the first object (n properties) and searches for the same property in the second object. The search for a matching property in the second object takes O(1) time if we assume the properties can be looked up in constant time (e.g., using a hash map or dictionary after the initial property listing). Then it iterates through the properties of the second object (m properties) and compares them to the first, checking if the value is different in the first object (also O(1) lookup). Therefore, the total time complexity is O(n*1 + m*1) = O(n+m). However, if we let n be the maximum of the two object sizes, the complexity becomes O(n*m) where m and n are the number of keys in each of the two objects.
Space Complexity
O(N)The algorithm creates lists of properties for both objects. In the worst case, each object could have a number of properties proportional to the size of the input data, which we denote as N, where N represents the combined number of properties across both objects. The algorithm also needs to store the differences identified. These differences, in the worst case, could also be proportional to N. Therefore, the auxiliary space used is proportional to N, giving us O(N) space complexity.

Optimal Solution

Approach

To efficiently find differences between two objects, we break down the problem into smaller, manageable parts. The key is to understand that comparing everything at once can be inefficient, so we use a divide-and-conquer strategy.

Here's how the algorithm would work step-by-step:

  1. First, identify what kind of objects you are working with. Are they simple data structures, or complex structures with nested objects?
  2. If the objects are simple (e.g., containing just basic data), compare them field by field. If a field is different, record that difference.
  3. If the objects contain nested objects, you need to apply the same comparison strategy recursively. That means, go inside each nested object and repeat the field-by-field comparison.
  4. Use a smart way to keep track of where the differences are found. For example, record the 'path' to each different field within the object structure.
  5. As you find differences, store them in a clear and structured way, like a list of changes with the location and the values that are different.
  6. When you have checked every part of both objects, you will have a complete list of all the differences.
  7. By breaking the problem into smaller comparisons and being organized about tracking the differences, you avoid inefficiently checking the same things multiple times or getting lost in a sea of data.

Code Implementation

def find_object_differences(object_one, object_two, path=None):
    if path is None:
        path = []
    differences = []

    if type(object_one) != type(object_two):
        differences.append({'path': path, 'object_one': object_one, 'object_two': object_two})
        return differences

    if isinstance(object_one, dict):
        # Handle dictionaries recursively
        all_keys = set(object_one.keys()) | set(object_two.keys())
        for key in all_keys:
            if key not in object_one:
                differences.append({'path': path + [key], 'object_one': None, 'object_two': object_two[key]})
            elif key not in object_two:
                differences.append({'path': path + [key], 'object_one': object_one[key], 'object_two': None})
            else:
                differences.extend(
                    find_object_differences(object_one[key], object_two[key], path + [key])
                )
    elif isinstance(object_one, list):
        # Handle lists by iterating and comparing elements
        max_length = max(len(object_one), len(object_two))
        for index in range(max_length):
            if index >= len(object_one):
                differences.append({'path': path + [index], 'object_one': None, 'object_two': object_two[index]})
            elif index >= len(object_two):
                differences.append({'path': path + [index], 'object_one': object_one[index], 'object_two': None})
            else:
                differences.extend(
                    find_object_differences(object_one[index], object_two[index], path + [index])
                )
    else:
        # Compare simple values. Crucial step to find actual differences.
        if object_one != object_two:
            differences.append({'path': path, 'object_one': object_one, 'object_two': object_two})

    return differences

def main():
    object_one = {
        'name': 'Alice',
        'age': 30,
        'address': {
            'street': '123 Main St',
            'city': 'Anytown'
        },
        'hobbies': ['reading', 'hiking']
    }

    object_two = {
        'name': 'Bob',
        'age': 35,
        'address': {
            'street': '456 Oak Ave',
            'city': 'Someville',
            'zip': '54321'
        },
        'hobbies': ['reading', 'coding', 'swimming']
    }

    # Invoke comparison function
    all_differences = find_object_differences(object_one, object_two)

    # Print the differences
    for difference in all_differences:
        print(difference)

if __name__ == "__main__":
    main()

Big(O) Analysis

Time Complexity
O(n)The time complexity is determined by the need to potentially traverse all fields in both objects. In the worst-case scenario, where the objects have 'n' fields in total (considering nested objects as well), we need to compare each field at least once. The recursive calls contribute to examining all nested fields, but each field is visited only a constant number of times. Therefore, the time complexity is directly proportional to the number of fields 'n', resulting in O(n).
Space Complexity
O(D)The primary driver of auxiliary space is the recursion depth when comparing nested objects. In the worst-case scenario, where objects are deeply nested, the call stack can grow proportionally to the maximum depth of nesting, which we'll denote as D. Additionally, the list to store the differences contributes to the space complexity, potentially holding up to N differences (where N is the total number of fields across both objects), though in cases where D < N, the recursion depth dominates the space usage. Therefore, the overall space complexity is approximated by O(D) where D represents the maximum depth of nesting within the compared objects.

Edge Cases

Both objects are null or undefined
How to Handle:
Return an empty difference list indicating no differences or throw an IllegalArgumentException based on requirements.
One object is null/undefined, the other is not
How to Handle:
Treat the non-null/undefined object as the sole source of differences, listing all its keys/values.
Objects contain circular references
How to Handle:
Implement cycle detection using a Set to track visited objects to prevent infinite recursion.
Objects have properties with different data types (e.g., string vs. number)
How to Handle:
Consider data type differences as differences and report them, possibly including type information.
Objects have properties that are functions
How to Handle:
Decide whether to compare function references (identity) or attempt to serialize and compare their code (string representation) or ignore them based on requirements.
Objects have properties that are Date objects
How to Handle:
Compare the Date objects by their getTime() value (milliseconds since epoch) for accurate comparison.
Objects contain nested objects/arrays with differing levels of nesting
How to Handle:
Ensure the recursive comparison logic correctly handles varying depths of nested structures.
Objects contain NaN (Not a Number) values
How to Handle:
Handle NaN values by checking for `isNaN(value)` and treating them as equal or unequal based on problem requirements since `NaN !== NaN`.