SPUG: Some changes I'd like in Perl

Thu Nov 28 11:09:13 CST 2002

SPUGsters,

On this Thanksgiving day, I'm giving thanks to The Larry and
all his elves for the marvelous free invention that is Perl!

And in gratitude, and an effort to make Perl even more wonderful
than it already is, I'm recommending some changes that have
occurred to me either as a result of my extensive experience
in teaching this language to newbies, or my own personal
expectations as a long-time user of AWK, sed, grep, and the
shells.  None of these changes should be hard to implement,
but I believe each would provide a worthwhile improvement to
the language.

Before I run these changes past the "Perl Gods", I thought
I'd show them to the SPUG flock first, to obtain your comments
(and possibly corrections).

As an added bonus, some of you, especially those that don't
use the AWK-ish -n/-p options much, might learn about some of
Perl's capabilities that you didn't already know by studying
my suggested changes.

Incidentally, if anybody has recommendations on how I should
submit these changes, I'd be grateful for your advice.

I submitted a serious bug report concerning "Restricted
Hashes" earlier this week, using "perlbug", and it went to
the perl5porters list, where it's been vigorously ignored.
If that's the reception real bugs get, what chance would I
have with minor recommendations there?  Maybe I should just
email these suggestions directly to the pumpking?  Who is that
these days?  Etc. . . .

So without further ado, here's Tim Maher's "wish list' of
(minor) changes to Perl.

Happy Thanksgiving,
-Tim

*----------------------------------------------------------------------------*
| Tim Maher, CEO, CONSULTIX  (206) 781-UNIX; (866) DOC-PERL; (866) DOC-LINUX |
|  Ph.D. & JAWCAR ("Just Another White Camel Award Recipient")               |
|  tim at consultix-inc.com  teachmeunix.com  teachmeperl.com  teachmelinux.net |
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  |
| CLASSES: Hashes and Arrays in Perl: 12/5;   Minimal Perl Programming: 12/6 |
*----------------------------------------------------------------------------*

Overview:

1) There should be a way to tell the In-Place-Editing option
(-i) to use a unique string in composing the file-extension on
the backup filename.

2) Perl programmers should have the ability to define a continue
block within -n programs, or override the default continue
block within -p programs.

3) There should be a variable that allows dynamic setting of
the field separator (used by -a) during execution.

4) Perl needs a warning for misplaced statements that will be
run within the implicit loop of -n/-p.

5) There should be a better warning when the -e option is used
without a corresponding argument on the shebang line.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
1) There should be a way to tell the In-Place-Editing option (-i)
to use a unique string in composing the file-extension on the backup
filename.

COMMON MISTAKE:

perl -wni.bak  's/this/that/' X	 # No print means file X emptied!

# Changing -n to -p is the fix,
# but running it now empties X.bak too!
perl -wpi.bak  's/this/that/' X	 # Empty input files -> backups trashed!

# We need the capability of using PIDs in filenames:
perl -wpi_PID.bak  's/this/that/' *  # X_3672.bak, Y_3672.bak, ...

# some might prefer the following filename format,
#  for OS's with >3 chars allowed for file-extensions:
perl -wpi.PID  s/this/that/' *  # X.3672, Y.3672

DISCUSSION

Tragically, I've seen many students make the above mistake
of neglecting to print $_ after doing in-place editing on
"somefile", which causes it to become empty, although
"somefile.bak" would be a valid copy of the original.
And amazingly, the emotional response some have to the
emptying-out of their input files is to immediately re-run
the command, to see if it works better the second time (well,
it will definitely run FASTER, having empty input files to copy
over the backup files 8-{ ).

Even worse, those who are wise enough to fix the mistake
by changing -n to -p and then running the corrected program
(which is a generally valid strategy), will also ruin their
backup files, because if they neglected to change the "i.bak"
specification to something else, the empty files under the
original names will clobber the original backup files, causing
the *loss of all the original data*!

I personally fell into this trap once while doing:

	cd /pub_html; perl -wni.bak 's/this/that/' *.html

When I realized my mistake, I pulled that line out of the
shell's history and changed the -n to -p, and then, as is my
custom, I went to insert a # in front of it before hitting
<CR>, to store the command in my history so I could recover and
run it later after restoring the *.bak files to their original
names.

But somehow the # didn't get there before I hit the <CR> (yes,
after 24 years together VI still surprises me sometimes),
and I trashed my backups!

My personal experiences aside, what all perl programmers using
in-place editing need is an easy way to create a unique backup
filename for each run.

Implementing this recommendation would prevent disasters of the
kind I experienced, which apart from the obvious inconvenience
of (hopefully recoverable) data loss is exactly the kind of
thing that could easily turn an MIS manager against the language
forever, and even lead to sensational reporting in the media
that could be detrimental to Perl's reputation.

One obvious option would be to recognize $$ in -i's argument as
a request to use the PID of the perl process.  But that would
require single-quoting $$ at the UNIX shell level in -e programs,
to prevent the shell from substituting it's own PID there, which
of course would be the same for all runs (which wouldn't help a bit).

So using $$ would create a burden on the programmer to
single-quote the perl option string, and that's bad for
three reasons: 1) weak shell programmers typically have lots
of trouble with proper quoting techniques, 2) strong shell
programmers know that requests for variable substitution are
generally *double-quoted* (which would lead to disaster in this
case), and 3) Perl programmers are not in the habit of quoting
Perl's invocation-options.  The result would undoubtedly be that
many programmers would leave $$ unquoted, or double-quoted.
And all we would have accomplished by recognizing $$ as -i's
argument would be to make it hard and unnatural to use the feature
that could help prevent the trashing of data files. 

My recommendation is to use the literal string "PID" as
the request for $$ to be used in the backup filename(s).
It would need no special quoting at the shell level, it clearly
represents what it delivers, Perl's Process-ID, and it would
help perl programmers avoid data-loss when using the in-place
editing option.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
2) Perl programmers should have the ability to define a continue
block within -n programs, or override the default continue
block within -p programs.

# THIS WORKS:
#! /usr/bin/perl -w
while (<>) {
	$_ ne '' or next;
	print ;
}
continue {
	print "Finished with line $.\n";
}

# BUT EXPLICIT continue{} TRIGGERS A SYNTAX ERROR:
#! /usr/bin/perl -wn
 # while (<>) {
	$_ ne '' or next;
	print ;
 # }
continue {
	print "Finished with line $.\n";
}

DISCUSSION

The problem is that continue blocks cannot be defined for the
implicit loop provided by the -n/-p options (although -p causes
a default $_-printing one to be included).  This limitation
is undesirable.  Perl programmers using implicit loops should
be allowed to define continue blocks.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
3) There should be a variable that allows dynamic setting of the field
separator (used by -a) during execution.

In AWK programs, one can define the field separator through two methods;

awk -F':' ' program here'
	OR
awk 'BEGIN {FS=":"}; program here'  # FS can even be reset within
				    # the implicit loop in response
				    # to, for example, a change in
				    # input data

In comparison to the AWK examples above, Perl allows the
invocation-argument form of setting the field separator, but,
lacking the required variable, can't handle the second form:

perl -wnaF':'		-e 'program here'
perl -wnaF':' -i'.bak'	-e 'program here'
	OR
perl 'BEGIN { ?? =":" }; program here'	 # NO SUCH CAPABILITY!

This lack of a settable variable is a big disadvantage to Perl
programmers working on platforms that impose the restriction of
only a single option-cluster for shebang lines, because, for
example, it means they'd have to omit the -i.bak option-cluster
in the first example below.  That could be retained, and the -F:
replaced by a variable setting in BEGIN, if such a variable existed
(as in the second example below)

#! /usr/bin/perl -wnaF: -i.bak 	# -i.bak might not get parsed on OS!
#! /usr/bin/perl -wnai.bak 	# This okay if can set IFS variable!

DISCUSSION:

Perl can do almost everything else that AWK can do, so why
omit an "Input Field Separator" variable?  This is especially
incongruous in light of the fact that we have its output
counterpart, "$,".

This omission in Perl perplexes and annoys AWK refugees migrating
to Perl. (Another great feature that's obviously missing is
the ability to set the field separator to a *regex*, but at
least we have a (manual) workaround for that using split.)

Some time back, I asked Larry why there was no Awkish-FS
variable, and he said "because my Mom says don't buy anything
until you've felt the need for it on three separate occasions,
and you're the first guy to ask for this!"  What a guy! 8-} 

Larry's Mom gives excellent advice, I agree, but I think it's
time to rectify this oversight, and diminish the degree by which
"AWK has to be better at something" (another of my favorite 
Larry-isms).  Perl doesn't have to be less able than AWK!  And 
we won't hurt AWK's feelings by more completely emulating its
(excellent, and 1977 ground-breaking) feature set.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
4) Perl needs a warning for misplaced statements that will be run
within the implicit loop of -n/-p.

A warning is needed for the common newbie mistake, where
statement(s) are placed above the BEGIN block, or below the END
block, in a program using the -n (or -p) option.  The unintended
effect is as shown below:

# Run following as "echo | ./scriptname"
# To see that execution order matches statement numbers
#! /usr/bin/perl -wln

	print 's2';	# I'm too lazy or deficient with VI 
			# to put this statement within the BEGIN,
			# so I'll place it here to get *even earlier
			# execution* (yea, right!)

	BEGIN	{ print 's1'; }
		  print 's3';
	END	{ print 's5'; }

	print 's4';	# ditto for locating this here

Perl should say: "Warning: statement(s) placed before BEGIN block or 
	after END block will be run within implicit loop of -n/-p"

DISCUSSION:

Sure, in advanced Perl programming, there may be multiple
BEGIN/END blocks strewn throughout the program, with no
misconception at all about when the other statements in the
file will be executed.  But those programs won't typically be
using -n/-p, which cause any statements that aren't physically
within a BEGIN or END block to migrate into the scope of the
(invisible) implicit loop.  Beginners frequently run into trouble
with this, and are perplexed at the execution order that results.
A warning is all it would take to help them avoid this pitfall.

Although "use" statements might at first blush seem to need
similar treatment, they don't, because users don't (exactly)
compose statements for them to run, and are generally oblivious
to the actual execution order of the statements they generate
automatically.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
5) There should be a better warning when the -e option is used
without a corresponding argument on the shebang line.

In my beginning Perl classes, I help get students oriented to
the language by showing them perl -e incantations that replicate
the functionality of grep and sed.  Then, when we start writing
scripts, some of them will typically include a trailing -e
option in their shebang lines, like so:

	#! /usr/bin/perl -wne
	print "Why doesn't this print?\n";

The current warning is:
	Can't emulate -e on #! line at /tmp/shebang line 1.

Or even worse, if there's a space following the -e in this kind
of program, here named /tmp/shebang:
	#! /usr/bin/perl -we<SP>
	die;

	Bareword found where operator expected at -e line 1, near
	"/tmp/shebang"
		(Missing operator before hebang?)
	Unquoted string "hebang" may clash with future reserved word
	at -e line 1.  syntax error at -e line 1, next token ???
	Execution of -e aborted due to compilation errors.

DISCUSSION

A more useful warning than any of the above would be something like:

	The -e option requires a following argument that contains 
	a perl program at /tmp/shebang line 1.

And it would seem to be desirable that a -e followed only by a
<SP> should not be interpreted any differently than a -e lacking
the following <SP>.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
     Seattle Perl Users Group (SPUG) Home Page: http://seattleperl.org