SPUG: itm 60 in effective perl programming

Thu Feb 24 10:26:30 PST 2005

On Thu, Feb 24, 2005 at 02:33:26AM -0800, John W. Krahn wrote:
> >$ echo 'foo' > test
> >$ echo 'bar' >> test
> >$ echo 'baz' >> test
> >$ wc -l test
> >      3 test
> >$ perl -pe 's/\n/" " . <>/e' test
> >foo bar
> 
> The two lines 'foo' and 'bar' *are* joined together!
> 
> 
> >baz 

Umm but not baz.

> >And in fact hangs until you Ctrl+D it if there is an odd number of lines
> >in the file.
> 
> It didn't when I tried it and it shouldn't according to the documentation.

It doesn't if you have an even number of lines.  An odd number and it
does.  

The reason is because <> has some magical effects.  Which is why most of
the people on this thread are really oversimplyfing things...

This script has two <>'s in it.  One from the implicit loop introducted
by the -p command line paramter, and one in the perl experssion on the
rhs of the regex.  This means every pass through the loop does two
reads.  Here's a simpler script to demonstrate what is happening:

echo stdin | perl -pe '$_ .= <>' test

First note that I'm putting the line "stdin" on the standard input to
the program.  So now let's run it with test containing three lines:
foo
bar
baz

$ echo stdin | perl -pe '$_ .= <>' test
foo
bar
baz
stdin

So what's happening here is as follows.  Every use of <> has it's own
magic.  foo and baz are read by the <> in the while loop test.  bar and
stdin are ready by the <> in the body of the loop.  When it runs the
first time it sees a file opened in the ARGV filehandle and reads a line
from it.  After baz is read the ARGV filehandle gets closed and the next
<> would return nothing.  But remember each <> has it's own magic.
Since the one is inside the loop each call to it actually ends up having
its own magic.  So it opens stdin and starts reading it.

Now let's try it with a file with an even number of lines:
foo
bar
baz
ack

$ echo stdin | perl -pe '$_ .= <>' test
foo
bar
baz
ack

Note it didn't read stdin this time.  foo and bar were read by the loop
test, bar and ack by the <> within the loop.  However, stdin was never
read because <> in the while test returned undef after it saw all the
files had been consumed.  

Part of the problem with the documentation is they show you pseudo code
for using it in a loop but never show how it behaves in other
circumstances.  

They do say this:
   The <> symbol will return "undef" for end-of-file only once.  If you
   call it again after this, it will assume you are processing another
   @ARGV list, and if you haven't set @ARGV, will read input from STDIN.

But it's not clear about the fact that only the original <> in the code will
return the undef, not other <>'s.

> > It hangs because the -p creates an implicit loop which
> >reads from <> and then a second 
> 
> A second what?

See my followup email.

> >Perhaps they meant this:
> >perl -pe 's/\n/ /' file
> >
> >tr would probably be moderately better:
> >perl -pe 'tr/\n/ /'
> >
> >But I'd prefer the following:
> >perl -pe 'chomp; $_ .= " "' file
> 
> Your examples replace *every* newline with a space which is not what the
> original does.

Umm yes I realize that.  I even explained that.  The original doesn't
join every line.  Which isn't what he said it did.  It joins every other
line.  Which I suppose if that's what you want is fine but that's not
what he said it did.

> >I can't imagine why anyone would want to use the regex version.
> 
> Because the substitution operator allows you to _e_valuate the replacement
> string as a perl expression.

I'm fully aware of what the original is doing.  But if it's described as
joining all the lines in a file it doesn't do that.  Just because you
can use the regex operator to do something doesn't mean it's the most
efficient.  And many times it's not the most readable.  

IMHO this script is seriously bugged.  It's misusing the <> operator and
I'm a bit suprised to hear that it was found in a book titled Effective
Perl.

-- 
Ben Reser <ben at reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken