Valid Phone Numbers Interview Question for Google

Given a text file file.txt that contains a list of phone numbers (one per line), write a one-liner bash script to print all valid phone numbers.

You may assume that a valid phone number must appear in one of the following two formats: (xxx) xxx-xxxx or xxx-xxx-xxxx. (x means a digit)

You may also assume each line in the text file must not contain leading or trailing white spaces.

For example, assume that file.txt has the following content:

987-123-4567
123 456 7890
(123) 456-7890

Your script should output the following valid phone numbers:

987-123-4567
(123) 456-7890

Can you provide a bash one-liner to solve this problem? Consider edge cases like empty files or lines with invalid characters.

Naive Solution

The simplest approach is to use grep with a regular expression that matches either of the specified phone number formats. This involves constructing a regex that accounts for both (xxx) xxx-xxxx and xxx-xxx-xxxx patterns.

Optimal Solution

The optimal solution is essentially the same as the naive one, as grep is already highly optimized for pattern matching. The key is crafting an efficient regular expression. We can combine the two patterns into a single regex.

Bash Script

grep -E '^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$|^[0-9]{3}-[0-9]{3}-[0-9]{4}$' file.txt

Explanation

grep -E: Uses extended regular expression syntax.
^: Matches the beginning of the line.
$[0-9]{3}$ : Matches an opening parenthesis, followed by three digits, followed by a closing parenthesis and a space. Note the escaping of ( and ) since they have special meanings in regex.
[0-9]{3}-: Matches three digits followed by a hyphen.
[0-9]{4}$: Matches four digits at the end of the line.
|: Acts as an "or", allowing either pattern to match.
xxx-xxx-xxxx: Matches the format of three digits followed by a hyphen, three digits followed by a hyphen, and four digits
$: Matches the end of the line.

Big O Runtime

The runtime complexity is O(n*m), where n is the number of lines in the file and m is the average length of a line. grep needs to examine each character in each line to see if the regex matches.

Big O Space

The space complexity is O(1) because grep operates line by line and doesn't store the entire file in memory.

Edge Cases

Empty File: If file.txt is empty, no output will be produced (which is correct).
Lines with extra characters: The ^ and $ anchors ensure that only lines exactly matching the formats are printed.
Invalid characters: The [0-9] character class ensures that only digits are matched in the number portions.
Mixed formats: The | in the regex handles both allowed formats correctly.

Example

If file.txt contains:

987-123-4567
123 456 7890
(123) 456-7890
(111)222-3333
111-222-333

The script will output:

987-123-4567
(123) 456-7890