[ABE.pm] perl regex Q

Jim Eshleman jce0 at Lehigh.EDU
Wed Dec 15 00:35:49 CST 2004


>>>>This:
>>>>
>>>>perl -ne 'chomp;print join(",",map {$_ or "\\N"} split ",",$_,-1), "\n"'
>>>>
>>>>should work and handle the corner cases of leading and/or trailing 
>>>>commas.  Likely not as fast as a regex but *maybe* easier to understand.
>>>
>>>
>>>I think you mean: 
>>>perl -ne 'chomp;print join(",",map {defined $_ ? $_ : "\\N"} split 
>>>",",$_,-1), "\n"'
>>>
>>>Otherwise, a 0 will become a \N    :)
>>
>>Actually, that doesn't work at all (just reproduces the input string) 
>>because when there's no value between delimiters split returns the empty 
>>string ('') and that *is* defined.
>>
>>  What we really meant is:
>>
>>perl -ne 'chomp; print join(",", map { $_ eq "" ? "\\N" : $_ } split 
>>",", $_, -1), "\n"'
>>
>>  Now we can process zeros (0) and zeros (0.0)  :)
> 
> 
> Either of you boys wanna explain this to us mere mortals?

Ok, gotta read from right to left.  First the split:

   @valuesin = split ",", $_, -1;

This returns a list (to be used by map) that contains the comma 
delimited fields.  The delimiters are thrown away.  Where the field is 
null (between two consecutive commas, or between the beginning of the 
string and a leading comma, or between a trailing comma and the end of 
the string) the list element will be the empty string (''), or null. 
Note that by default split doesn't return trailing nulls (like if your 
input string ended in one or more consectutive commas) so you'll need to 
specify a negative LIMIT, -1 in this case.  So using your example the 
list returned by split is:

   (1.2, 3.4, '', 5.67, 78.9, '', '', '', '', 0.01)

Now we feed that list to map:

   @valuesout = map { $_ eq "" ? "\\N" : $_ } @valuesin;

The map function iterates over @valuesin, internally setting $_ to each 
element and evaluating the code block.  The output of map is a list of 
the result of each of these evaluations.  The code block evaluates to 
"\\N" if the list element is null, otherwise it evaluates to the list 
element itself.  So the list returned by map is:

   (1.2, 3.4, '\N', 5.67, 78.9, '\N', '\N', '\N', '\N', 0.01)

Now we join the list elements returned by map together, delimited by commas:

   join ",", @valuesout;

which returns a string:

   '1.2,3.4,\N,5.67,78.9,\N,\N,\N,\N,0.01'

Just add a newline and you're done.

My first attempt used the map code block:

     {$_ or "\\N"}

which would evaluate to $_ if $_ was true (not null), else '\N'.  The 
problem pointed out by Ric was that if the list element was zero (0) it 
evaluates to '\N', because zero is false.  Interestingly a different 
zero ('0.0') here is true :-)

Ric suggested:

   {defined $_ ? $_ : "\\N"}

which evaluates to $_ if $_ is defined, else '\N'.  However split 
returns a list element of '' for a null, not the value undef.  With this 
block map just returns the same list it was fed by split.

Finally, this block should work as intended:

   { $_ eq "" ? "\\N" : $_ }

as it evaluates to '\N' if the list element is null, else $_.

   Anyway, this approach might be needed, rather than the regex, if you 
need to do some processing on those comma delimited values.  Like maybe 
you want to round them to one digit after the decimal, you could use 
this code block for the map:

   { $_ eq "" ? "\\N" : sprintf "%.1f", $_ }
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : http://mail.pm.org/archives/abe-pm/attachments/20041215/c91c0b19/signature.bin


More information about the ABE-pm mailing list