[Pdx-pm] data munging: line-endings

Colin Kuskie ckuskie at dalsemi.com
Tue Oct 14 14:13:45 CDT 2003


On Tue, Oct 14, 2003 at 11:48:09AM -0700, Thomas Keller wrote:
> 
> my %conv = (	Mac => "\cM",	## Mac used CR
> 				Unix => "\cJ",			## Unix uses 
> 				LF
> 				Win => "\cM\cJ");		## Windows 
> 				uses both

I understand.  Because of the Mac requirement, you can't rely on
\n always being there.

In that case, you could build an automatic line ending detector
like this:

1) Slurp the entire file in by setting $/ to undef.
2) Starting at the top of the file, go through character by
   character until you reach something that looks like a
   line ending.
   if char eq "\cM" then
     if next char eq "\cJ" then
       line_ending = DOS; done
     else
       line_ending = MAC; done
   if char eq "\cJ" then
     line_ending = UNIX; done
3) Using split, create an array of lines based on line_ending
4) Iterate over the array of lines 

For large files (think megabytes here), this will be memory intensive
since you have two copies of the file (slurped and split).  If that
is a problem, try using sysread and seek to do the line ending part and after
you've determined which kind of file you have, then seek back to the
top of the file, set $/ to the appropriate line_ending and keep using
good old while(<>) to iterate through the file.

Colin



More information about the Pdx-pm-list mailing list