[Brisbane-pm] To Chomp or not to Chomp

Tue Feb 27 01:24:09 PST 2007

G'day Martin,

I'm glad you're persevering with this, it's nice to see some life on this list.

Martin Jacobs wrote:

> This time, I've got a borrowed 'chomp' regex (chomp behaves in exactly
> the same way). But, instead of chopping off the last character in the
> string, it returns the value '1'.

chomp does not behave in exactly the same way at all.  chomp is much better
behaved.  Chomp (by default) will remove your file system's newline, which may
be LF - *nix, CR-LF - Win 32, CR - Mac OS 9....  Further, you can change $/ (or
$INPUT_RECORD_SEPARATOR) and have chomp adjust itself appropriately.

> #!/usr/local/bin/perl
> use warnings;
> use strict;
> 
> open (PCF, "PERRMOSS_Control_File.txt") or die "couldn't open the file!";
> my $file = <PCF>;
> print 'Testprint line 07 $file = '."$file\n";
> $file = $file =~ s/[\r\n]+$//;
> print 'Testprint line 09 $file = '."$file\n";

As Sarah has said, this only reads a single line in from your file.  It then
removes one or more carriage returns and line feeds (in any order) at the end of
line.  Once it's done that it returns the number of substitutions made (1) and
puts that into $file.  Oops!

To fix this all you need to do is:

	$file =~ s/[\r\n]+$//;

without the assignment.  Alternatively

	chomp $file;

is easy-to-maintain and known to be correct.

If you plan to read the whole file into memory, and chomp it all then you can write:

	#!/usr/local/bin/perl
	use warnings;
	use strict;

	open (PCF, "<", "PERRMOSS_Control_File.txt")
		or die "couldn't open the file!";
	my @file = <PCF>;
	print 'Testprint line 07 $file = '."$file\n";
	chomp @file;
	print 'Testprint line 09 $file = '."$file\n";

This will read the whole file into memory (one line per array element) and chomp
each line.  The return value of chomp(), if you cared, would be the number of
newlines removed.

If eventually you find that your file contents have multiple jobs per file
separated by some easily identifiable string Perl can make that easy too.  For
example imagine your text looks like:

... parts from previous record
Catchment02.txt
Catchment03.txt
----
Job Number:  J2700
Job Name:    Capalaba
Scenario:    Scenario 1
Author:      Rob Ot
Input
Results
PERRMOSS_Results_Summary.txt
24/02/1990 23:54:00
25/02/1990 01:54:30
360
1
Catchment11.txt
Catchment12.txt
Catchment13.txt
----

such that each record is separated by "\n----\n" then you can do the following:

	open(PCF, "<", "PERRMOSS_Control_File.txt")
		or die "couldn't open the file!";

	# Change our record separator to represent end of record
	local $/ = "\n---\n";

	while( my $record = <PCF> ) {
		chomp $record; # removes \n---\n

		# process record,
	}

Finally, please, pretty please, specify a mode when opening your files.  I know
that naked two argument open works, but it's a bad habit to get into due to the
possible security implications.  Please either use non-naked 2-arg:
	open (PCF, "< PERRMOSS_Control_File.txt")
                    ^
or 3-arg
	open(PCF, "<", "PERRMOSS_Control_File.txt")

If you want reference material on any of the above, please let me know.

All the best,

	Jacinta

-- 
   ("`-''-/").___..--''"`-._          |  Jacinta Richardson         |
    `6_ 6  )   `-.  (     ).`-.__.`)  |  Perl Training Australia    |
    (_Y_.)'  ._   )  `._ `. ``-..-'   |      +61 3 9354 6001        |
  _..`--'_..-_/  /--'_.' ,'           | contact at perltraining.com.au |
 (il),-''  (li),'  ((!.-'             |   www.perltraining.com.au   |