[ABE.pm] perl regex Q
Jim Eshleman
jce0 at Lehigh.EDU
Wed Dec 15 00:35:49 CST 2004
>>>>This:
>>>>
>>>>perl -ne 'chomp;print join(",",map {$_ or "\\N"} split ",",$_,-1), "\n"'
>>>>
>>>>should work and handle the corner cases of leading and/or trailing
>>>>commas. Likely not as fast as a regex but *maybe* easier to understand.
>>>
>>>
>>>I think you mean:
>>>perl -ne 'chomp;print join(",",map {defined $_ ? $_ : "\\N"} split
>>>",",$_,-1), "\n"'
>>>
>>>Otherwise, a 0 will become a \N :)
>>
>>Actually, that doesn't work at all (just reproduces the input string)
>>because when there's no value between delimiters split returns the empty
>>string ('') and that *is* defined.
>>
>> What we really meant is:
>>
>>perl -ne 'chomp; print join(",", map { $_ eq "" ? "\\N" : $_ } split
>>",", $_, -1), "\n"'
>>
>> Now we can process zeros (0) and zeros (0.0) :)
>
>
> Either of you boys wanna explain this to us mere mortals?
Ok, gotta read from right to left. First the split:
@valuesin = split ",", $_, -1;
This returns a list (to be used by map) that contains the comma
delimited fields. The delimiters are thrown away. Where the field is
null (between two consecutive commas, or between the beginning of the
string and a leading comma, or between a trailing comma and the end of
the string) the list element will be the empty string (''), or null.
Note that by default split doesn't return trailing nulls (like if your
input string ended in one or more consectutive commas) so you'll need to
specify a negative LIMIT, -1 in this case. So using your example the
list returned by split is:
(1.2, 3.4, '', 5.67, 78.9, '', '', '', '', 0.01)
Now we feed that list to map:
@valuesout = map { $_ eq "" ? "\\N" : $_ } @valuesin;
The map function iterates over @valuesin, internally setting $_ to each
element and evaluating the code block. The output of map is a list of
the result of each of these evaluations. The code block evaluates to
"\\N" if the list element is null, otherwise it evaluates to the list
element itself. So the list returned by map is:
(1.2, 3.4, '\N', 5.67, 78.9, '\N', '\N', '\N', '\N', 0.01)
Now we join the list elements returned by map together, delimited by commas:
join ",", @valuesout;
which returns a string:
'1.2,3.4,\N,5.67,78.9,\N,\N,\N,\N,0.01'
Just add a newline and you're done.
My first attempt used the map code block:
{$_ or "\\N"}
which would evaluate to $_ if $_ was true (not null), else '\N'. The
problem pointed out by Ric was that if the list element was zero (0) it
evaluates to '\N', because zero is false. Interestingly a different
zero ('0.0') here is true :-)
Ric suggested:
{defined $_ ? $_ : "\\N"}
which evaluates to $_ if $_ is defined, else '\N'. However split
returns a list element of '' for a null, not the value undef. With this
block map just returns the same list it was fed by split.
Finally, this block should work as intended:
{ $_ eq "" ? "\\N" : $_ }
as it evaluates to '\N' if the list element is null, else $_.
Anyway, this approach might be needed, rather than the regex, if you
need to do some processing on those comma delimited values. Like maybe
you want to round them to one digit after the decimal, you could use
this code block for the map:
{ $_ eq "" ? "\\N" : sprintf "%.1f", $_ }
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://mail.pm.org/archives/abe-pm/attachments/20041215/c91c0b19/signature.bin
More information about the ABE-pm
mailing list