[Pdx-pm] regex & phone numbers...

Joshua Keroes jkeroes at eli.net
Thu Jul 25 14:02:03 CDT 2002


On (Thu, Jul 25 10:23), Kari Chisholm wrote:
> I want to clean up phone numbers that people submit in a form.  They
> could come in all kinds of ways, like...
> 
> (503) 123-4567
> 503.123.4567
> (503)-123-4567
> 123-4567
> 503 123 4567
> 503-123-4567 ext. 89
> 123-4567 ext. 89
> 503-123-4567-mom's house
> 
> I want to convert the actual seven- or ten-digit phone number part
> to just xxx-xxx-xxxx. [snip] I've conceptualized any number of
> highly complex and idiotic ways of doing this.  I'm just wondering
> if there's a simpler regex approach to this...  Any ideas?

Just make sure you trim the PDX-pm email footer off the DATA section.

-Joshua


#!/usr/local/bin/perl -w

use strict;

our $AREACODE  = 503; # Output default
our $DELIM     = '-'; # Output default

my $areacode_re	 = qr/\(? ( \d{3} )? \)?/x;
my $delim_re	 = qr/[-. ]/;
my $lastseven_re = qr/( \d{3} ) $delim_re ( \d{4} )/x;
my $ext_delim_re = qr/(?:ext\s? | ext| x)/x;
my $ext_re	 = qr/( \d+ )/x;

my $phone_re	 = qr/
    $areacode_re   \s*
    $delim_re?	   \s*
    $lastseven_re  \s*
    $ext_delim_re? \s*
    $ext_re?
/x;

while (<DATA>) {
    chomp;
    my $nice = format_phone($_) || '?';
    printf "%25s => %s\n", $_, $nice;
}

exit;

# subs
sub format_phone {
    my $ugly = shift or die "Didn't get a phone number. Aborting";
    my ($areacode, $mid3, $last4, $ext) = $ugly =~ $phone_re;

    unless ($mid3 && $last4) {
	warn "Unable to parse phone number: '$ugly'";
	return;
    }

    $areacode ||= $AREACODE;

    my $nice = join $DELIM, ($areacode, $mid3, $last4);
    $nice .= " x$ext" if $ext;

    return $nice;
}

__DATA__
(503) 123-4567
503.123.4567
(503)-123-4567
123-4567
503 123 4567
503-123-4567 ext. 89
123-4567 ext. 89
503-123-4567-mom's house




More information about the Pdx-pm-list mailing list