Given a text file file.txt
that contains a list of phone numbers (one per line), write a one-liner bash script to print all valid phone numbers.
You may assume that a valid phone number must appear in one of the following two formats: (xxx) xxx-xxxx
or xxx-xxx-xxxx
. (x means a digit)
You may also assume each line in the text file must not contain leading or trailing white spaces.
For example, assume that file.txt
has the following content:
987-123-4567
123 456 7890
(123) 456-7890
Your script should output the following valid phone numbers:
987-123-4567
(123) 456-7890
Can you provide a bash one-liner to solve this problem? Consider edge cases like empty files or lines with invalid characters.
The simplest approach is to use grep
with a regular expression that matches either of the specified phone number formats. This involves constructing a regex that accounts for both (xxx) xxx-xxxx
and xxx-xxx-xxxx
patterns.
The optimal solution is essentially the same as the naive one, as grep
is already highly optimized for pattern matching. The key is crafting an efficient regular expression. We can combine the two patterns into a single regex.
grep -E '^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$|^[0-9]{3}-[0-9]{3}-[0-9]{4}$' file.txt
grep -E
: Uses extended regular expression syntax.^
: Matches the beginning of the line.\([0-9]{3}\)
: Matches an opening parenthesis, followed by three digits, followed by a closing parenthesis and a space. Note the escaping of (
and )
since they have special meanings in regex.[0-9]{3}-
: Matches three digits followed by a hyphen.[0-9]{4}$
: Matches four digits at the end of the line.|
: Acts as an "or", allowing either pattern to match.xxx-xxx-xxxx
: Matches the format of three digits followed by a hyphen, three digits followed by a hyphen, and four digits$
: Matches the end of the line.The runtime complexity is O(n*m), where n is the number of lines in the file and m is the average length of a line. grep
needs to examine each character in each line to see if the regex matches.
The space complexity is O(1) because grep
operates line by line and doesn't store the entire file in memory.
file.txt
is empty, no output will be produced (which is correct).^
and $
anchors ensure that only lines exactly matching the formats are printed.[0-9]
character class ensures that only digits are matched in the number portions.|
in the regex handles both allowed formats correctly.If file.txt
contains:
987-123-4567
123 456 7890
(123) 456-7890
(111)222-3333
111-222-333
The script will output:
987-123-4567
(123) 456-7890