[Pdx-pm] data munging: line-endings
Colin Kuskie
ckuskie at dalsemi.com
Tue Oct 14 14:13:45 CDT 2003
On Tue, Oct 14, 2003 at 11:48:09AM -0700, Thomas Keller wrote:
>
> my %conv = ( Mac => "\cM", ## Mac used CR
> Unix => "\cJ", ## Unix uses
> LF
> Win => "\cM\cJ"); ## Windows
> uses both
I understand. Because of the Mac requirement, you can't rely on
\n always being there.
In that case, you could build an automatic line ending detector
like this:
1) Slurp the entire file in by setting $/ to undef.
2) Starting at the top of the file, go through character by
character until you reach something that looks like a
line ending.
if char eq "\cM" then
if next char eq "\cJ" then
line_ending = DOS; done
else
line_ending = MAC; done
if char eq "\cJ" then
line_ending = UNIX; done
3) Using split, create an array of lines based on line_ending
4) Iterate over the array of lines
For large files (think megabytes here), this will be memory intensive
since you have two copies of the file (slurped and split). If that
is a problem, try using sysread and seek to do the line ending part and after
you've determined which kind of file you have, then seek back to the
top of the file, set $/ to the appropriate line_ending and keep using
good old while(<>) to iterate through the file.
Colin
More information about the Pdx-pm-list
mailing list