[tpm] Regex assistance

Christopher Jones cj at enersave.ca
Fri Aug 12 03:06:35 PDT 2016


Thanks all for your kind assistance.

 

You are correct, I do end up with a badly formatted csv strings to analyze but that was my doing.

 

The document starts as a badly formatted PDF which I converted to Excel – the conversion doesn’t recognize all the columns successfully. My thought was to loop through the rows in the excel tables and create the csv strings on each row then search in those strings for the patterns I need to extract. 

 

Perhaps there is a PDF Perl module that would be useful for this kind of task?

 

Christopher Jones

14 Oneida Avenue

Toronto, ON

416-697-0056

 

From: legrady [mailto:legrady at gmail.com] 
Sent: Thursday, August 11, 2016 1:22 PM
To: Rob Janes; Chris Jones
Cc: toronto-pm at pm.org
Subject: Re: [tpm] Regex assistance

 

How about if you split () on Co. Mas, and then just check the elements of the array for the values you want?

 

You pattern captures into separate variables : two digits, a dot, two digits, a dot, another two digits.  From your description you want to the whole thing: 

 

      /(\d{2}\.\d {2}\.\d {2})/

 

OR a bit sloppily,  but valid 

 

     /([0-9.]{8})/

 

It matches non valid strings,  but you say those won't happen. Or by adding an 'x' flag, you can comment your regex. 

 

    / (                     # capture

        [0-9.] {8}      # 8 chars which are digit or dot

        ) /'

 

 

Probably more useful in more complex instances.

 

 

 

 

 

Sent from my Samsung Galaxy smartphone.

 

-------- Original message --------

From: Rob Janes <janes.rob at gmail.com> 

Date: 2016-08-11 11:29 (GMT-05:00) 

To: Chris Jones <cj at enersave.ca> 

Cc: toronto-pm at pm.org 

Subject: Re: [tpm] Regex assistance 

 

^,* in front then |(\d\d\.\d\d\d),*$ after?

Ok replace bar with ,.*,

On Aug 11, 2016 10:38 AM, "Chris Jones" <cj at enersave.ca> wrote:


Hello Perl Mongers,

I am looking for assistance with a regex. I have a bunch of strings in for form:

"01.03.16,,Studio one, Space 22,1         500,500,01.051,,"
or
",01.03.16,,Studio one, Space 22,1         500,500,01.051,"
or
",01.03.16,,Studio one, Space 22,1         500,500,01.051,,"
or
",01.03.16,,Studio one, Space 22, ,01.051,,"

So the middle section can be one or more comma separated strings.

I am trying to match and return the first non-blank pattern and the last non-blank pattern
01.03.16 and 01.051 – these numbering formats are always the same: xx.xx.xx and yy.yyy

So far I have a regex that matches the first pattern:

"([0-9]{2})([\.])([0-9]{2})([\.])([0-9]{2})"

In any of those above example.

I am stuck after that.
Any insights appreciated!



-- 

Chris Jones
14 Oneida Avenue
Toronto, ON M5J 2E3
416-697-0056

_______________________________________________
toronto-pm mailing list
toronto-pm at pm.org
http://mail.pm.org/mailman/listinfo/toronto-pm

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20160812/a493e5e5/attachment.html>


More information about the toronto-pm mailing list