[Pdx-pm] regex & phone numbers...
Tkil
tkil at scrye.com
Thu Jul 25 14:28:49 CDT 2002
>>>>> "Kari" == Kari Chisholm <karic at lclark.edu> writes:
Kari> I want to convert the actual seven- or ten-digit phone number part to
Kari> just xxx-xxx-xxxx. I also want to leave alone anything that comes
Kari> after that - which is obviously the tough part. The logic should be
Kari> basically this: just process through the number left to right,
Kari> grabbing the first seven or ten numbers, then reformat those and tack
Kari> on whatever's left. The challenge is figuring out when it's a
Kari> seven-digit or a ten-digit number.
Kari> I've conceptualized any number of highly complex and idiotic
Kari> ways of doing this. I'm just wondering if there's a simpler
Kari> regex approach to this... Any ideas?
Get a list of all the formats you think you need to worry about, write
a set of regexps that can handle all of them, then return an error if
you can't parse a new one. That list bit is important; this keeps you
from making assumptions that might trip you up.
[I unintentionally did this to GBARR's Date::Parse::str2time function.
Going through a few 100k mail messages, I found about 0.1% that had
bogus date strings that it couldn't parse gracefully. Bit of a stress
test there. And a sign of over-zealous error checking: str2time
rejected New Zealand Daylight Saving Time, because UTC+1300 is
"obviously" a bogus time zone...]
A straightforward version might be:
| #!/usr/bin/perl -w
|
| use strict;
|
| sub normalize_phone_number ( $ $ )
| {
| my ($in, $default_ac) = @_;
|
| # abbreviations
| my $d3 = '(\d{3})';
| my $d4 = '(\d{4})';
|
| # is it already sane?
| $in =~ /^ $d3 [\.\-\s] $d3 [\.\-\s] $d4 \s* (.*)/x
| and return "$1-$2-$3 $4";
|
| # area code in parens
| $in =~ /^ \( $d3 \) [\-\s] $d3 [\-\s] $d4 \s* (.*)/x
| and return "$1-$2-$3 $4";
|
| # missing area code
| $in =~ /^ $d3 [\.\-\s] $d4 \s* (.*)/x
| and return "$default_ac-$1-$2 $3";
|
| return;
| }
|
| while (my $in = <DATA>)
| {
| chomp $in;
| if (my $out = normalize_phone_number $in, '503')
| {
| printf "%-30s => %s\n", $in, $out;
| }
| else
| {
| print "$in: couldn't parse!\n";
| }
| }
|
| __END__
| (503) 123-4567
| 503.123.4567
| (503)-123-4567
| 123-4567
| 503 123 4567
| 503-123-4567 ext. 89
| 123-4567 ext. 89
| 503-123-4567-mom's house
| 858-123-4239 x23
t.
More information about the Pdx-pm-list
mailing list