[Chicago-talk] Help with Regex

Alan Mead amead at alanmead.org
Sat Dec 24 10:38:42 PST 2022


Richard,

You need a good regex debugger. My favorite is:

https://regex101.com/

Your two new samples fail because there's a comma after state and your 
regex doesn't allow that (you can add: ,* after ([A-Z]{2}).

I have no idea how regular your data are, but if these are random 
addresses, you will have all kinds of problems. For example the two "\s" 
below require exactly a single space character (no more, no less).

^
([^,]+)
*,\s*
([A-Z]{2})
*\s*
(\d{5}(?:-?\d{4})?)
$

-Alan


> Thank you very much for your reply. Your level of coding is way superior to mine. This was very helpful however for some reason certain instances,
> like the examples below, do not produce matches.
>
> r("Ogden, UT, 84415");
> r("Dallas, TX, 75234");
>
> Can't figure out why.
>   
>
>
>
>
>
> On Sat, 24 Dec 2022 11:50:14 -0500, James E Keenan<jkeenan at pobox.com>  wrote:
>
> On 12/24/22 10:59, Richard Reina wrote:
>> Happy Holidays Perl Family,
>>
>> I am trying to use the regex below to find City, ST. Zip in a file. While the below does work for instances like Chicago, IL 60614 or Dallas, TX 75234,
>> it does not work in instances with multi word cities like Salt Lake City, UT 89159 or in instances with nine digit zip codes like Tampa, FL 33592-2787.
>> Any help in getting my regex to work would be greatly appreciated.
>>
>>
>>
>>   if ($row =~ /^([^,]+),\s([A-Z]{2})(?:\s(\d{5}-?\d{4}?))?$/) {
>>
>>        print "I think I found a city state zip:\n";
>>        print "$row\n";
>>        chomp (my $ff=);
>>
>>        }
>>
> I found that, as written, your regex does precisely the opposite of what
> you claimed it did.
>
> #####
> sub p {
> my $address = shift;
> my ($city, $state, $zip);
> my $regex = qr/^([^,]+),\s([A-Z]{2})(?:\s(\d{5}-?\d{4}?))?$/;
> if ($address =~ m/$regex/) {
> my ($city, $state, $zip) = ($1,$2,$3);
> print "$city, $state $zip\n";
> }
> else {
> print "No match\n";
> }
> }
>
> p("Chicago, IL 60614");
> p("Chicago, IL 60614-0000");
> p("Chicago, IL 606140001");
> p("Dallas, TX 75234");
> p("Salt Lake City, UT 89159");
> p("Tampa, FL 33592-2787");
>
> No match
> Chicago, IL 60614-0000
> Chicago, IL 606140001
> No match
> No match
> Tampa, FL 33592-2787
> #####
>
> In this portion of your pattern ...
>
> #####
> (?:\s(\d{5}-?\d{4}?))?
> #####
>
> ... the '?:' at the beginning means "cluster, but don't capture". (See
> 'perldoc perlre'.)
>
> The following worked for me.
>
> #####
> sub r {
> my $address = shift;
> my ($city, $state, $zip);
> my $regex = qr/^
> ([^,]+)
> ,\s
> ([A-Z]{2})
> \s
> (\d{5}(?:-?\d{4})?)
> $/x;
> if ($address =~ m/$regex/) {
> my ($city, $state, $zip) = ($1,$2,$3);
> print "$city, $state $zip\n";
> }
> else {
> print "No match\n";
> }
> }
>
> r("Chicago, IL 60614");
> r("Chicago, IL 60614-0000");
> r("Chicago, IL 606140001");
> r("Dallas, TX 75234");
> r("Salt Lake City, UT 89159");
> r("Tampa, FL 33592-2787");
>
> Chicago, IL 60614
> Chicago, IL 60614-0000
> Chicago, IL 606140001
> Dallas, TX 75234
> Salt Lake City, UT 89159
> Tampa, FL 33592-2787
> #####
>
>   
> _______________________________________________
> Chicago-talk mailing list
> Chicago-talk at pm.org
> https://mail.pm.org/mailman/listinfo/chicago-talk

-- 

Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.

science + technology = better workers

https://talalg.com


Take care to get what you like or you will be forced to like what
you get. Where there is no ventilation fresh air is declared
unwholesome. Where there is no religion hypocrisy becomes good
taste. Where there is no knowledge ignorance calls itself science.

-- Shaw, from "Maxims for Revolutionists"

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/chicago-talk/attachments/20221224/9e5a6d12/attachment-0001.html>


More information about the Chicago-talk mailing list