[Chicago-talk] Dec Tech mtg...

Tue Dec 2 18:50:55 CST 2003

Bedside reading :-)

--
Steven Lembark                               2930 W. Palmer
Workhorse Computing                       Chicago, IL 60647
                                            +1 888 359 3508
-------------- next part --------------
+----------------------------------------------------------------------+
|                                                                      |
|               Some Notes on Using the Perl Debugger                  |
|                                                                      |
|                               or                                     |
|                                                                      |
|                     Confessions of a typist.                         |
|                                                                      |
+----------------------------------------------------------------------+

(C) 2003 Steven Lembark, Workhorse Computing <lembark at wrkhors.com>

"Hi, my name is Steve and I'm ..."

Well, I'll admit up front that the title of this might better have
been "confessions of a programmer": not all of my bugs are typos.
There are also times when I have to debug other people's code --
some of whom are are neither typists nor programmers. Be it theirs
or mine, searching for logic buried in the code can be frustrating.
This is especially true when the modules call modules that call
modules and you cannot put print statements into all of them.

Even when logic doesn't fail me, searching for syntax buried in
Perl can be frustrating. This is especially true when looking for
"cleaner" alternatives to apparently messy constructs or checking
some of the more offball syntax used by the idiots who wrote this
code last month, er, me? ah, yeah: great syntax...

If premature optimization is the root of all evil, finding places
where it matters is the root of many paychecks. Most of this comes
down to finding out where the code spends most of its time, how
much of is wasted or even under the program's control (e.g., I/O
blocks vs. tight loops). Hunting for WHERE the code spends its
time can be, Oh such joy -- if you really, really like warm Mtn.
Dew or stale coffee...

Fortunately for me, what I really am is:

"... a Perl hacker." 

Which means I can use Perl to debug my code for me. Or at least
help. 

If the ultimate goal of programming is lazyness and the ultimate
lazy programmer lets the language do the work, then Perl's Debugger
is the ultimate tool. At the touch of a command switch you can have
command line and Tk-GUI debugging or a detailed profiler doing the 
work for you.

What follows here is a short guide to lazyness for the impatient
[which obvously requires some hubris to present].

-------------------------------------------------
Introductory Domination: Perl at the Command Line.
-------------------------------------------------

Among the alphabet soup available for perl's switches is "-d". This
puts perl into "debug" mode with the default debugger: DB.pm. If
you grew up with GNU's "gdb" program the rest of this will be deja vu
all over again; if not you can open your mind to a whole new source
of frustration.

Aside: One of the really nice things about Perl is its level of 
documentation. Using 'perldoc perldebug' on your system of choice
will give you a nice, short summary of the debugger. Then again,
so will turning to Page 147 of Perl in a Nutshell (2nd ed) or 506
of Camel III. In fact, as you start up the perl debugger it'll 
tell you where to get the docs:

	$ perl -d -e 42

	Loading DB routines from perl5db.pl version 1.22
	Editor support available.

	Enter h or `h h' for help, or `man perldebug' for more help.

	main::(-e:1):   42

	DB<1> 

Because I started up perl without a program (unless there was
a ./42 file nearby) I'm at the command line, with the return 
value of my program before me. This can be handy for checking
out Perl syntax or whether modules are installed/functioning
on your system.

In this case the prompt shows me that I'm in the "main" package
at program "-e" line 1. 

At this point I can check Perl syntax or test code just by entering
it on the command line:

	DB<1> x localtime
	0  8
	1  19
	2  13
	3  2
	4  11
	5  103
	6  2
	7  335
	8  0
	DB<2> x getpwuid 0
	0  'root'
	1  'x'
	2  0
	3  0
	4  ''
	5  ''
	6  'root'
	7  '/root'
	8  '/bin/bash'

The 'x' command pretty prints a structure (vs. 'p' which
is mainly for printing strings). In this case I the returns
from "localtime" and "getpwuid 0" and displayed the resulting
lists. 

The 'DB' prompt shows me that I'm in the debugger, the <1> or 
<2> shows me that I'm in the outer level of calls on the first
and second debugger command, respectively. As I descend into
the hell of code calls the <> will nest to show me the levels.

This mode is good for testing Perl syntax, and can be useful 
for Q&D data munging where you want to fondle the data as it
is being processed. 

The one thing to remember is that each step in the debugger is
treated as being in its own block, so that lexical ("my") variables
will not survive between commands:

	DB<3> my @a = ( 0..10 )

	DB<4> x @a
	empty array

	DB<5> @a = (0..10)

	DB<6> x @a
	0  0
	1  1
	2  2
	3  3
	4  4
	5  5
	6  6
	7  7
	8  8
	9  9
	10  10

	DB<7> %b = @a

	DB<8> x %b
	0  8
	1  9
	2  6
	3  7
	4  4
	5  5
	6  0
	7  1
	8  10
	9  undef
	10  2
	11  3

	DB<9> x \%b
	0  HASH(0x8411c04)
	0 => 1
	10 => undef
	2 => 3
	4 => 5
	6 => 7
	8 => 9

Interesting thing to note: the display of referents (\%b
at <9>) is a bit more readable than naked hashes (<8>).
In most cases I prefer to read the stuff put out by referents.

If you really care about how the output is handled, or want
to use it for pretty printing your own output the Dumpvalue
module is what the debugger uses for its output.

After a while, however, you might get sick of just typing
in commands and want to see something actually running. There
are, of course, multiple ways to do it. One way is to simply
add a "-d" switch to your #! code. This works if you want to 
run a program in the debugger, run from, say, a data directory,
are are in no mood to type the full path to the perl program.
Another way is to simply use "perl -d" and give a program name.

	#!/opt/bin/perl -d

	$ perl -d /path/to/foobar [arguments] [more arguments];

The one catch to using '-d' is that it has to be removed by
hand before releasing the code. One giveaway that your code
has a stray "-d" in it is that cron jobs leave huge logfiles 
(caused by the debugger going into trace mode if there is no
controlling terminal).

All of the examples here will use "perl -d" from the command
line.

As an example, what if you wanted to debug:

	#!/opt/bin/perl

	$\ = "\n";
	$, = "\n\t";

	print 'You are:', getpwuid $<;

	0

	__END__

And it prints out:

	You are:
		lembark
		x
		1024
		1024

		/home/lembark
		/bin/bash

Hmm... what are the blanks doing there? It might be helpful to
know if they are empty strings, undefs, whatever... 

	$ perl -d ./hak

	Loading DB routines from perl5db.pl version 1.22
	Editor support available.

	Enter h or `h h' for help, or `man perldebug' for more help.

	main::(./hak:3):        $\ = "\n";

OK so far: I'm in the "main" package, in source file 'hak' on
line 3. 

At this point I'd like to walk through the program. The simplest
way is to single step through the code. As always there are more
than one way to do this. In most cases you'll use the 'n' command
to step to the Next source line; for debugging into calls you can
use the 's' command to Step Into subroutine calls along the way.
For now it doesn't matter since there are no subroutines in the 
code, I'll use 'n':

	main::(./hak:3):        $\ = "\n";
	  DB<1> n
	main::(./hak:4):        $, = "\n\t";
	  DB<1> n
	main::(./hak:6):        print 'You are:', getpwuid $<;
	  DB<1> n
	You are:
			lembark
			x
			1024
			1024

			/home/lembark
			/bin/bash
	main::(./hak:9):        0

Each time I typed in 'n' it got me one step further through
the code. When you get sick of typing 'n', pressing enter
repeats the last 'n' or 's' command:

	main::(./hak:3):        $\ = "\n";
	  DB<1> n
	main::(./hak:4):        $, = "\n\t";
	  DB<1> 
	main::(./hak:6):        print 'You are:', getpwuid $<;
	  DB<1> 
	You are:
			lembark
			x
			1024
			1024

			/home/lembark
			/bin/bash
	main::(./hak:9):        0
	  DB<1> 

	Debugged program terminated.  Use q to quit or R to restart,
	use O inhibit_exit to avoid stopping after program termination,
	h q, h R or h O to get additional info.  

Just single stepping through doesn't do much good, however. The
real fun is in being able to try things out on the way:

	main::(./hak:3):        $\ = "\n";
	  DB<1> n
	main::(./hak:4):        $, = "\n\t";
	  DB<1> 
	main::(./hak:6):        print 'You are:', getpwuid $<;
	  DB<1> x getpwuid $<;
	0  'lembark'
	1  'x'
	2  1024
	3  1024
	4  ''
	5  ''
	6  ''
	7  '/home/lembark'
	8  '/bin/bash'
	  DB<2> n
	You are:
			lembark
			x
			1024
			1024

			/home/lembark
			/bin/bash
	main::(./hak:9):        0

OK, the blank lines were caused by empty strings in the result
of getpwuid.

Viewing variables by package can also be handled using the 
X and V commands. Using either of these in the main package
results in LOTS of output.

Changing the code just a bit allows me to show the difference
between 'n' and 's':

	#!/opt/bin/perl

	$\ = "\n";
	$, = "\n\t";

	sub userinfo
	{
		my $uid = @_ ? shift : $<;

		print "Checking: $uid";

		getpwuid $uid;
	}

	print 'Clark Kent is:', userinfo 0;
	print 'You are:', userinfo;

	0

OK, at this point there is a subroutine call: "userinfo".

The 'n' command I've been using so far gives me a look at
what happens from the top level:

	main::(./hak:3):        $\ = "\n";
	  DB<1> n
	main::(./hak:4):        $, = "\n\t";
	  DB<1> 
	main::(./hak:15):       print 'Clark Kent is:', userinfo 0;
	  DB<1> 
	Checking: 1024
	Clark Kent is:
			lembark
			x
			1024
			1024

			/home/lembark
			/bin/bash
	main::(./hak:16):       print 'You are:', userinfo;
	  DB<1> 
	Checking: 1024
	You are:
			lembark
			x
			1024
			1024

			/home/lembark
			/bin/bash
	main::(./hak:19):       0

Nothing all that surprising, but notice that you didn't get to
see what happend under the hood. In many cases (say calling
Date::Manip::UnixDate) you just want to see the results. If you
want to crawl around under the bed, however, use the 's' command
instead:

	main::(./hak:3):        $\ = "\n";
	  DB<1> n
	main::(./hak:4):        $, = "\n\t";
	  DB<1> 
	main::(./hak:15):       print 'Clark Kent is:', userinfo 0;
	  DB<1> s
	main::userinfo(./hak:8):                my $uid = @_ ? shift : $<;
	  DB<1> 
	main::userinfo(./hak:10):               print "Checking: $uid";
	  DB<1> 
	Checking: 0
	main::userinfo(./hak:12):               getpwuid $uid;
	DB<1> x $uid
	0  0

Notice the 's', this steps into the userinfo call rather than
to the next source line. Also notice that 'n' and 's' can be
mixed as necessary to step over or into calls. I can also look
at the contents of lexicals within the current scope and check
that, yes, the value 0 did get passed in and used with $uid.

You can check the return values from subroutines (or escape
from really long ones after an accidental 's') with the 'r'
("Return") command:

	main::(./hak:15):       print 'Clark Kent is:', userinfo 0;
	  DB<1> s
	main::userinfo(./hak:8):                my $uid = @_ ? shift : $<;
	  DB<1> x @_
	0  0
	  DB<2> r
	Checking: 0
	list context return from main::userinfo:
	0  'root'
	1  'x'
	2  0
	3  0
	4  ''
	5  ''
	6  'root'
	7  '/root'
	8  '/bin/bash'
	Clark Kent is:
			root
			x
			0
			0

			root
			/root
			/bin/bash
	main::(./hak:16):       print 'You are:', userinfo;

The 'r' shows what came back from the call. This is really
handy for tracking down bugs in multiple call levels: at 
some point you will get an unexpected return from a seemingly-
sane set of arguments.

The 'n', 's', and 'r' commands will suffice for most step-by
step work. 

-----------------------------------
Domination at its best: Breakpoints
-----------------------------------

On the other hand, you may not want to debug 5_000 lines of
code in an Apache module that uses Date::Manip step-by-step.
For these cases the 'c' and 'b' commands come into play, as
does $DB::single. The 'l' command Lists source and is handy
for spotting the line numbers at which to break, and 'R' is
nice if you Go To Far. 'T' is helpful for finding out where
you are.

The 'c' command Continues execution, optionally with a one-
time breakpoint at a subroutine or line. Typing 'c' at any
point in the debugging session will simply run the program 
until it completes or hits a breakpoint. Using 'c' with a 
subroutine or line runs up to that point and stops:

	main::(./hak:3):        $\ = "\n";
	  DB<1> c userinfo
	main::userinfo(./hak:8):                my $uid = @_ ? shift : $<;

Using 'c' is handy when you only want to break at the next 
call to a subroutine and don't want to set and delete a 
breakpoint. It can also be useful for skipping over a bunch
of lines:

	  DB<1> l userinfo
	7       {
	8:              my $uid = @_ ? shift : $<;
	9       
	10:             print "Checking: $uid";
	11      
	12:             getpwuid $uid;
	13      }
	  DB<2> c 10
	main::userinfo(./hak:10):               print "Checking: $uid";

Here I listed the userinfo sub to find where I wanted to
break at then continued to line 10. Notice that the command
history count actually went up for a change: the 'l' command
can be recalled later on with the '!' (repeat) command:

	  DB<3> !1
	l userinfo
	7       {
	8:              my $uid = @_ ? shift : $<;
	9       
	10==>           print "Checking: $uid";
	11      
	12:             getpwuid $uid;
	13      }

A bit more DB nomenclature is useful here: the lines shown with
':' after their numbers are valid breakpoints; the '==>' is the
current line of code (i.e., what will happen on the next 'n' or
's' command):

	  DB<3> x $uid
	0  0
	  DB<4> r
	Checking: 0
	list context return from main::userinfo:
	0  'root'
	1  'x'
	2  0
	3  0
	4  ''
	5  ''
	6  'root'
	7  '/root'
	8  '/bin/bash'
	Clark Kent is:
			root
			x
			0
			0

			root
			/root
			/bin/bash

The 'c' command is nice enough for running things once or 
twice, but after a while 'c 10' could get tiring. That's
where breakpoints come in: letting your fingers stay Lazy.

The breakpoint command has a few flavors:

	- b <place>

	- b <place> <expression>

The first sets an uncontitional breakpoint. All this means 
is that the code will stop execution every time it gets to
that line:

	  DB<1> b 10
	  DB<2> c
	main::userinfo(./hak:10):               print "Checking: $uid";
	  DB<2> c
	Checking: 0
	Clark Kent is:
			root
			x
			0
			0

			root
			/root
			/bin/bash
	main::userinfo(./hak:10):               print "Checking: $uid";
	  DB<2> c
	Checking: 1024
	You are:
			lembark
			x
			1024
			1024

			/home/lembark
			/bin/bash
	Debugged program terminated.  Use q to quit or R to restart,
	  use O inhibit_exit to avoid stopping after program termination,
	  h q, h R or h O to get additional info.  

If I only wanted to see non-SU lookups:

	  DB<1> b 10 $uid != 0
	  DB<2> c
	Checking: 0
	Clark Kent is:
			root
			x
			0
			0

			root
			/root
			/bin/bash
	main::userinfo(./hak:10):               print "Checking: $uid";
	  DB<2> x $uid
	0  1024

The "b 10 $uid != 0" says to evaluate the expression "$uid != 0"
in the local context at line 10 (i.e., $uid has to be in scope
wherever line 10 leaves you) and then break if the expression is
true. This prints SuperUser's info then breaks for mine.

Conditional breakpoints are particularly useful for tracking down
out-of-bounds errors (e.g., a program blows up on the 10_000'th
input row because $age is less than zero).

You can also set breakpoints at subroutine calls, which can be helpful
for checking calls for bogus input:

	  DB<3> b userinfo @_ == 0
	  DB<4> c
	Checking: 0
	Clark Kent is:
			root
			x
			0
			0

			root
			/root
			/bin/bash
	main::userinfo(./hak:8):                my $uid = @_ ? shift : $<;

The 'R' command Restarts the current program -- resetting @ARGV and 
a number of other useful things on the way. It is useful if you 
suddenly notice that you've stepped too far into the execution cycle
and want to set a breakpoint or examine some data before it was
changed in the current cycle.

One of the more common occurrances during debugging is loosing track
of where you are. The 'T' command shows a stack Trace from the current
exeuction line:

	  DB<1> b userinfo @_ == 0
	  DB<2> c
	Checking: 0
	Clark Kent is:
			root
			x
			0
			0

			root
			/root
			/bin/bash
	main::userinfo(./hak:8):                my $uid = @_ ? shift : $<;
	  DB<2> T
	@ = main::userinfo() called from file `./hak' line 16

This can be particularly handy if you have several breakpoints
scattered around.

There are also times that you'd like to have a breakpoint set 
programatically. One might be adding a '-d' switch to you own
program to set breakpoints at obvious points for debugging:

	my @optionz = qw( debug+ );
	my $cmdline = {};

	GetOptions $cmdline, @optionz or die 'trying...';

	my $debug = $cmdline->{debug};

	...

	$DB::single = 1 if $debug;

	frobnicate $_ while <ARGV>;

	...

	sub frobnicate
	{
		my $line = shift;

		$DB::single = 1 if $debug > 1;

		...
	}

The Perl debugger is written in Perl and lives in the DB
package. Its global variables are in the symbol table just like 
any other package's and can be set by name. $DB::single is what
the debugger uses to tell itself to stop at the next line. You 
can tell it to stop also. In this case if $debug is true the 
code stops before frobnicat is called, if it's greater than one
then execution goes single-step for each of the lines being 
called. You can use tests based on the data values, command line,
error status:

	eval
	{
	};

	# trace error handling only.

	$DB::trace = 1 if $@;

The trace variable, as you probably guessed, turns on execution
tracing, which is what the 't' command toggles trace mode.

The one other thing you probably want to do occasionally is remove
the breaks. This is done with the 'd' command (for a particular 
line or subroutine) or 'D' to delete all breakpoints. Using 'd'
with no line on it deletes the current line's breakpoint.

----------------------------------------------
Perly Submission: Making Perl Do Your Commands
----------------------------------------------

Even if breakpoints let your fingers be lazy on the 's', they
don't save you from having to type 'x' to examine things. One
classic example of this is having to type 'x $object' every line
watching an initializer set up data structures. But, this is Perl,
so even that can be lazy.

The '{' command tells the debugger to run a debugger command
before every prompt:

	  DB<1> b userinfo ! @_   
	  DB<2> { x $uid
	  DB<3> c
	Checking: 0
	Clark Kent is:
			root
			x
			0
			0

			root
			/root
			/bin/bash
	main::userinfo(./hak:8):                my $uid = @_ ? shift : $<;
	auto(-1)  DB<3> x $uid
	0  undef
	  DB<4> n
	main::userinfo(./hak:10):               print "Checking: $uid";
	auto(-1)  DB<4> x $uid
	0  1024
	  DB<5> 
	Checking: 1024
	main::userinfo(./hak:12):               getpwuid $uid;
	auto(-1)  DB<5> x $uid
	0  1024

While obviously overkill for something this simple, it can be
a big help if you are watching a structure change over 100 lines
of code.

See also '<', '<<',  and '<<' for manaing the commands.

The 'a' command sets an Action to be performed a line is
executed. This is Perl code (such as a print statement) 
that gets run every time a line is passed. Aside from 
printing you could also increment a counter or push bogus
values onto a stack for later checking in the debugger.

The 'A' command deletes all installed actions. 

-------------------------------------
Lazy Domination: Automated Submission
-------------------------------------

The Perl Debugger has a number of options that control how
it runs the code and edits commands. Some of these are controlled
via Term::ReadLine and ~/.inputrc, others the 'O' command that
sets Options. You can also use .perldb or $ENV{PERLDB_OPTS} 
set things for you. Either way, DB will be obliging.

Since I use an obviously superior editor, it seems a waste to
learn a whole new set of movement keys to edit commands in the
debugger -- or fumble around for arrow keys. If my ~/.inputrc
has the proper commands I can avoid keystroke mismanagement:

	set editing-mode vi
	set show-all-if-ambiguous on

This allows natrual movement commands to navigate the debugger
history, correct typos, or manage repetative edits within the 
debugger.

I could also use a local .perldb file to automate setting 
breakpoints or control variable output.

The .perldb file can also set up aliases for more common 
commands -- or ones you repeat often when dealing with particular
source code.

The Nutcase book has a short section "Customizing the Debugger" 
on pp. 156-7 with some usable examples.

------------------------------
Tying Up A Mouse: Devel::ptkdb
------------------------------

Installing this module and running your code through:

	perl -d:ptkdb hak;

will bring up an X11 window with a nice, usable GUI debugger. This
basicall uses Tk to feed commands directly into DB: co-opting the
built in DB command line handler. ptkdb supports breakpoints,
watchpoints, one-time expressions. The breakpoints can be set by 
clicking on the line numbers, and a few buttons on the page allow
step and next exeuction by line.

Another alternative debugger is Devel::DProf, called via 

	perl -d:DProf hak;

This profiles your code by subroutine call/function point into a 
datafile that is post-processed by 'dprofpp' into all sorts of 
wonderful statistics.