SUMMARY: Odd match?
Robert L. Harris
Robert.L.Harris at rdlg.net
Wed May 29 14:42:20 CDT 2002
I'm posting a summary as I received afew "please tell me what you find"
type requests.
I tried both of these. At first I went with the RegEx match. This
works great as I have a known number of valid characters and a lot of
unknowns.
In addition I did 2 time runs to see the difference. This is against a
74Meg text file with miscelaneous garbage lines strewn about. Yes it
could be faster however this is actually doing a number of other things
with the data at the same time so it's not just a read and throw out.
RegEx:
#Loading Initial Data
#Done
# 93.02s real 92.05s user 1.01s system
TR:
#Loading Initial Data
#Done
# 92.67s real 91.80s user 0.87s system
It is faster but not significanly for this file.
I should have a multi-gig file coming my way soon so it might make a
difference worth remembering.
Many thanks.
Thus spake David R. Waddell (dave.waddell at wcom.com):
> Date: Wed, 29 May 2002 12:03:00 -0600
> From: "David R. Waddell" <dave.waddell at wcom.com>
> Subject: Re: Odd match?
> To: "Robert L. Harris" <Robert.L.Harris at rdlg.net>
> X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32)
>
> I've heard that tr/// is faster than regular expression matching. However,
> I'm not sure which of these is more efficient in this case since
> you would think the deletion step would add to the processing time
> of the tr and it is going to process the entire line even when
> it has already encountered a single non-valid character. Which
> is faster might depend on the data.
>
> regular expression:
> unless ( m/^[A-Za-z0-9\_\(\)\'\`]+$/){next LINE}
>
> tr:
> if(tr/A-Za-z0-9\_\(\)\'\`//dc){next LINE}
>
> tr will return the number of characters deleted. c takes the complement
> of the character set).
> At 11:32 AM 5/29/02 -0600, you wrote:
> >
> >
> >I'm trying to rip through some data. Unfortunately there are some
> >corrupt lines that contain some odd control characters which screw up
> >the output. They can be simply thrown out without consequence. Whats
> >the best way to do a next on anyline that contains something other
> >than:
> >
> >A-Z, a-z, 0-9, ()`'_
> >
> >The last set of 4 throws out using \w or \W.
> >
> >?
> >
> >
> >:wq!
> >---------------------------------------------------------------------------
> >Robert L. Harris | Micros~1 :
> >Senior System Engineer | For when quality, reliability
> > at RnD Consulting | and security just aren't
> > \_ that important!
> >DISCLAIMER:
> > These are MY OPINIONS ALONE. I speak for no-one else.
> >FYI:
> > perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
> >
> >
:wq!
---------------------------------------------------------------------------
Robert L. Harris | Micros~1 :
Senior System Engineer | For when quality, reliability
at RnD Consulting | and security just aren't
\_ that important!
DISCLAIMER:
These are MY OPINIONS ALONE. I speak for no-one else.
FYI:
perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
More information about the Pikes-peak-pm
mailing list