[Brisbane-pm] To Chomp or not to Chomp

Tue Feb 27 00:47:59 PST 2007

On Tuesday 27 February 2007 17:32, Martin Jacobs wrote:
> Hi folks,
>
> Here's another one of those 'it should be easy' scenarios. Again, I
> would appreciate your help.

See the attached - it is untested and will likely eat your harddrive, so take 
it and use it as an example.  Don't run it, or accept as gospel.

Probably you want to process your file line by line.  

The fact that you called your variable "$file" indicates that maybe you 
thought you were slurping in the whole file.  (See "perldoc perlvar" and 
search for "slurp" to see how to slurp a whole file into a scalar).

Slurping is OK if you know for sure your file is small, but line by line often 
works the best.

When the code did "$file = <FOO>;" it only go the first line of the file, not 
the whole file.

You'll also see "@file = <FOO>;" which does get the whole file, but again, 
what if the file is very big?  Why not just process it line by line as in the 
attached.

> This time, I've got a borrowed 'chomp' regex (chomp behaves in
> exactly the same way). 

Why exactly?  Why not use chomp?

"perldoc -f chomp"

Or if you need to extract information from each line, why bother chomping?

> But, instead of chopping off the last 
> character in the string, it returns the value '1'.

The s/// is the substitution operator.

Look at "perldoc -q space" for how to remove spaces from a line like this 
using regex.

Below is the "why" for the "1".

>
> I can only guess that this is because Perl is working in 'list mode',
> not in 'paragraph mode' (see here), which makes sense because the
> file it is reading from is a list. And its chopping off 1 character
> (the new line character).

The bind operator =~ causes the regex to operate on what you bind it to.  Here 
you bind s/// which modifies things its bound to.

When you assign to the *result of a bind* in *list context* you get back a 
list of all the match capture values:

# $line (and $line_too) contains "Job Number:   J2700"

# some regex
my $field_regex = qr/^([^:]+):\s+(.*?)$/ ;

# assign a list to the result of the bind
my ( $field, $value ) = $line =~ m/$field_regex/ ;

# so now $field and $value are equal to the capture values (in round brackets)
# from the above regex, eg "Job Number" and "J2700"

# assign a scalar and get the count of elements
my $count_of_matches = $line_too =~ m/$field_regex/ ;

In your code you actually altered $file and threw the alteration away:

   $num_of_matches = $file =~ s/[\r\n]+$//;   # not recommended, use chomp

   # $file is now altered by the s/// substitution operator

   $file = $num_of_matches;  # you overwrote your alteration

> Please tell me how to get Perl doing the right thing here. I am going
> to need to get it right for my 7 million line long arrays in future.
> I could resort to substr, but that's not as efficient (apparently),
> and I've got to get to grips with regexes anyway.

Can I suggest getting hold of a copy of the "Perl Cookbook" which has lots of 
good recipes for doing stuff, and also the context of how to use them.

The attached might get you started.
>
> I enclose the relevant code. PERRMOSS_Control_File.txt needs to be in
> the same directory as Input_file_chomp.pl.
>
> Regards,
> Martin
> Visit my website...
> http://web.mac.com/martin_jacobs1
>
> 

-- 
Sarah Smith BSc MACS
Senior Software Engineer
Ph +61 7 321 999 06 x109
Trolltech (Australia) Pty Ltd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Input_file_chomp2.pl
Type: application/x-perl
Size: 554 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/brisbane-pm/attachments/20070227/7e2a7089/attachment.bin