SUMMARY: Odd match?

Robert L. Harris Robert.L.Harris at rdlg.net
Wed May 29 14:42:20 CDT 2002



I'm posting a summary as I received afew "please tell me what you find"
type requests.

I tried both of these.  At first I went with the RegEx match.  This
works great as I have a known number of valid characters and a lot of
unknowns.

In addition I did 2 time runs to see the difference.  This is against a
74Meg text file with miscelaneous garbage lines strewn about.  Yes it
could be faster however this is actually doing a number of other things
with the data at the same time so it's not just a read and throw out.

RegEx:
#Loading Initial Data
#Done
#   93.02s real    92.05s user     1.01s system


TR:

#Loading Initial Data
#Done
#   92.67s real    91.80s user     0.87s system

It is faster but not significanly for this file.

I should have a multi-gig file coming my way soon so it might make a
difference worth remembering.


Many thanks.






Thus spake David R. Waddell (dave.waddell at wcom.com):

> Date: Wed, 29 May 2002 12:03:00 -0600
> From: "David R. Waddell" <dave.waddell at wcom.com>
> Subject: Re: Odd match?
> To: "Robert L. Harris" <Robert.L.Harris at rdlg.net>
> X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32)
> 
> I've heard that tr/// is faster than regular expression matching. However,
> I'm not sure which of these is more efficient in this case since 
> you would think the deletion step would add to the processing time
> of the tr and it is going to process the entire line even when
> it has already encountered a single non-valid character. Which
> is faster might depend on the data.
> 
> regular expression:
> unless ( m/^[A-Za-z0-9\_\(\)\'\`]+$/){next LINE}
> 
> tr:
> if(tr/A-Za-z0-9\_\(\)\'\`//dc){next LINE}
> 
> tr will return the number of characters deleted. c takes the complement
> of the character set).
> At 11:32 AM 5/29/02 -0600, you wrote:
> >
> >
> >I'm trying to rip through some data.  Unfortunately there are some
> >corrupt lines that contain some odd control characters which screw up
> >the output.  They can be simply thrown out without consequence.  Whats
> >the best way to do  a next on anyline that contains something other
> >than:
> >
> >A-Z, a-z, 0-9, ()`'_
> >
> >The last set of 4 throws out using \w or \W.
> >
> >?
> >
> >
> >:wq!
> >---------------------------------------------------------------------------
> >Robert L. Harris                |  Micros~1 :
> >Senior System Engineer          |    For when quality, reliability
> >  at RnD Consulting             |      and security just aren't
> >                                \_       that important!
> >DISCLAIMER:
> >      These are MY OPINIONS ALONE.  I speak for no-one else.
> >FYI:
> > perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
> >
> >



:wq!
---------------------------------------------------------------------------
Robert L. Harris                |  Micros~1 :
Senior System Engineer          |    For when quality, reliability
  at RnD Consulting             |      and security just aren't
                                \_       that important!
DISCLAIMER:
      These are MY OPINIONS ALONE.  I speak for no-one else.
FYI:
 perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'




More information about the Pikes-peak-pm mailing list