SPUG:Split question

Michael R. Wolf MichaelRunningWolf at att.net
Sun Mar 23 13:31:18 CST 2003


"Michael R. Wolf" <MichaelRunningWolf at att.net> writes:

> Better illustration of trailing empty fields.....

But also illustrating a bug...  

IMO, a delimiter should not be allowed the freedom to match the entire
character class for each field.  It should be allowed the freedom on
the first field, but thereafter, it should honor its first choice.

while ($line = <DATA>) {
    chomp $line;
    @fields = split /([ ,:])/, $line, -1;

    foreach my $field(@fields[grep {$_%2 == 0} 0..$#fields]) {
        $field = qq("$field");
    }

    print join("|", @fields), "\n";
}    
    
__DATA__
comma,separated,line
comma,separated,line,,empty,fields,,,,
colon:separated:line
colon:separated:line:But, as promised, with a bug
space separated line

Output:
"comma"|,|"separated"|,|"line"
"comma"|,|"separated"|,|"line"|,|""|,|"empty"|,|"fields"|,|""|,|""|,|""|,|""
"colon"|:|"separated"|:|"line"
"colon"|:|"separated"|:|"line"|:|"But"|,|""| |"as"| |"promised"|,|""| |"with"| |"a"| |"bug"
"space"| |"separated"| |"line"

================================================================

Here's my solution.  It assumes that each line can choose its own
delimiter, which is enforced for all subsequent fields.  It would be
fairly easy to modify to have the first delimiter of the first line to
choose the delimiter for the remainder of the file.

while ($line = <DATA>) {
    chomp $line;

    # was --- @fields = split /([ ,:])/, $line, -1;
    my @delim_pref_order = (":", ",", " ");
    foreach my $delim (@delim_pref_order) {
	@fields = split /($delim)/, $line, -1;
	last if @fields > 2;	# The multi-field case.
    }
    @fields = ($line) unless @fields; # The 1 field case.

    foreach my $field(@fields[grep {$_%2 == 0} 0..$#fields]) {
        $field = qq("$field");
    }

    print join("|", @fields), "\n";
}    
    
__DATA__
comma,separated,line
comma,separated,line,,empty,fields,,,,
colon:separated:line
colon:separated:line:But, as promised, with a bug
space separated line
solitary-field

Output:
"comma"|,|"separated"|,|"line"
"comma"|,|"separated"|,|"line"|,|""|,|"empty"|,|"fields"|,|""|,|""|,|""|,|""
"colon"|:|"separated"|:|"line"
"colon"|:|"separated"|:|"line"|:|"But, as promised, with a bug"
"space"| |"separated"| |"line"
"solitary-field"


And Yitzchak, I agree, it's interesting how many ways this went.  I,
too, thought of the look-ahead, look-behind "root of this equation"
upon first read of the problem, but pursued the thread as it was
further refined.

-- 
Michael R. Wolf
    All mammals learn by playing!
        MichaelRunningWolf at att.net




More information about the spug-list mailing list