From: Subject: Robert's Perl Tutorial Date: Mon, 5 May 2003 12:08:31 +0200 MIME-Version: 1.0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Location: http://www.sthomas.net/roberts-perl-tutorial.htm X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Robert's Perl Tutorial

Robert's Perl Tutorial

Version 4.1.1=20


The last hack was made on: 20th April = 1999
so=20 Henry can concentrate

THIS DOCUMENT IS=20 COPYRIGHTED.
Reproduction in whole or part is prohibited. Please = email me at=20 robert@netcat.co.uk if you = want to use=20 this information anywhere.

The location of this document=20 is
http://www.sthomas.net/roberts-perl-tutorial.htm
mirrored=20 from
http://www.netcat.co.uk/rob/perl/win32perltut.html

wit= h=20 permission.

Introduction


This tutorial is...

A basic Perl course primarily for use on Win32 platforms. It assumes = that the=20 reader knows nothing of programming whatsoever, but needs a solid = grounding for=20 further work. After you finish this course you'll be ready to specialise = in CGI,=20 sysadmin or whatever you want to do with Perl.


This tutorial is not...

  • A reference manual. You won't find all the regex stuff = under Regex.=20 I think it's more fun to learn the basics then add little extras along = the=20 way. Keeps you awake, and it is a good excuse for not organising it = better.=20
  • A FAQ.=20
  • Politically correct.=20
  • Complete. Please don't finish the course and assume you = know all=20 there is to know about Perl. There is certainly enough here to get you = started, but consider the contents of this course as the tip of the = iceberg.=20 Except maybe a little warmer.=20
  • Eror three.


The Table of Contents

I've had a fair amount of requests for a ToC, so here it is: =



= Introduction

This tutorial is...
This=20 tutorial is not...
How = to=20 Use...
The=20 Table of Contents
Conventions=20 used in this Tutorial
What=20 you need to know
Use=20 of this document

Personal Printouts
Intranet=20 usage
Mi= rroring
Translations


A=20 Short Introduction To Perl

What is Perl?
What=20 is ActivePerl? Are the other Perls inactive?
Can=20 I run Perl on my computer?
What=20 can I do with Perl ?

=

The Internet
Systems=20 Administration

What can't I do with Perl ?
Supp= ort


Setup<= /A>

1. Getting the Software
2.=20 Installation
3.=20 Testing - Your First Perl Script


The=20 Tutorial: The Journey Begins

Your First Time
What=20 if it doesn't...?
Assuming=20 it's now all right...
Sheb= ang
Va= riables

Scalars
$=20 % @ are Good Things
Typin= g
Variable=20 Interpolation
Changing=20 Variables

Auto(de|in)crements

Escaping
Context:=20 About Perl and @^$%&~`/?
Strings=20 and Increments
Print:=20 A List Operator
Subroutines=20 -- A First Look
Com= ments


= Comparisons

=

An iffy start
The=20 Truth According to Perl
Equality=20 and Perl

All Equality is Not Equal: Numeric versus = String

An interlude -- The Perl Motto
The=20 Comparison Operators Listed

The Golden Rule of Comparisons

More About If: Multiples

elsif


Us= er=20 Input

STDIN and other filehandles
Chop

Safe Chopping with Chomp


Array= s

Lists, herds -- what are arrays?

Basic Array Work

Elements of Arrays

How to refer to elements of an array

More ways to access arrays
For= =20 Loops

A for Loop demonstrated
For=20 loops with .. , the range operator
fore= ach
The=20 infamous $_
A=20 Premature End to your loop

A little more control over the premature ending:=20 Labels

Changing the Elements of an Array
Jiggerypokery=20 with Arrays

A table of array hacking functions
Splic= e


Deleting=20 Variables

False values versus Existence: It is, = therefore...


Basic=20 Regular Expressions

An introduction
Senstivity=20 -- regexes in touch with their inner child
Character=20 Classes
Matching=20 at specific points
Negating=20 the regex
Returning=20 the Match
*=20 + -- regexes become line noise
The=20 Difference Between + and *
Re-using=20 the match -- \1, $1...
How=20 to Avoid Making Mountains while Escaping Special = Characters


Subsitution=20 and Yet More Regex Power

Basic changes
\w
= Replacing=20 with what was found
x =

More Matching

Parentheses Again: OR

(?: OR Efficiency)

Matching specific amounts of...
Pre,=20 Post, and Match
RHS=20 Expressions

/e
/ee<= BR>

A Worked Example: Date Change


Split=20 and Join

Splitting
A = very FAQ
What=20 Humpty Dumpty needs : Join


A=20 recap, but with some new functions

=

Randomness
Concatenation


Files=

Opening
An=20 unforgivable error
\\=20 or / in pathnames -- your choice
Reading=20 a file
Writing=20 to a File

A simple write
A= ppending
@ARGV:=20 Command Line Arguments
Modifying=20 a File with $^I
$/=20 -- Changing what is read into $_
HE= RE=20 Docs


Reading=20 Directories

Globbing
readdir=20 : How to read from directories


Associative=20 Arrays

The Basics
A=20 Hash in Action
When=20 you should use hashes
Hash=20 Hacking Functions
More=20 Hash Access: Iteration, keys and values


Sor= ting

A Simple Sort
Numeric=20 Sorting -- How Sort Really Works
Sorting=20 Multiple Lists


= Grep=20 and Map

Grep
Map=
Writing=20 your own grep and map functions


External=20 Commands

=

Some ways to...
Exec
Syst= em
B= ackticks
When=20 to use external calls
Opening=20 a Process
Quote=20 execute


O= neliners

A short example
= File=20 access
Modifying=20 files with a oneliner and $^I


Subroutines=20 and Parameters

=

Parameters
= Namespaces
Variable=20 Scope
my=20 Variables
Multiple=20 Returns
Local=
Returning=20 arrays


Mod= ules

An introduction
File::Find=20 -- using a module
ChangeNotify
Your=20 Very Own Module


Bondage=20 and Discipline

-w

Shebang

use strict;


D= ebugging

Logical=20 Operators

or
Precedence:=20 What comes First
And=
Other=20 Logical Operators


L= ast=20 words

Th= anks=20 to...



What you need to know

You need to be able to differentiate between a PC and a toaster. No = programming experience is necessary. You do need to understand the = basics of=20 PC operation. If you don't understand what directories and files are = then=20 you'll find this difficult. You might find it difficult even if you do = :-)=20

You do need to exercise the brain cells, and you need time.

What you need to have

  • A PC which can run a Win32 operating system. That's Windows NT = 3.5,=20 3.51, 4.0 or later, or Windows 95 or Windows 98. Not Windows 3.1. = Sorry.=20 Now, you finally have a reason to upgrade.=20
  • You need to get hold of a copy of Perl, so for that you might = need an=20 Internet connection. But if you can get it some other way, you = don't.=20

Note: You don't even need a Win32 PC if you are comfortable=20 installing Perl under other operating systems like Linux, but not all = the=20 information here will be relevant.

You don't need a complier. Perl is an interpreted language, which = means you=20 run code directly, not compile it then run it.


How to use this tutorial...

Just work through from start to finish.

Generally, the explanation follows the code sample. Before you read = the=20 explanation, try and work out what the code does. Then check if you're = right.=20 In this way, you'll derive maximum value from the tutorial and = exercise the=20 old grey cells a little.

When you finish, please send me a critique. In fact, send one even = if you=20 don't finish. I appreciate all feedback! Please note -- = I am not=20 a source of free technical support. Do not email me your general Perl=20 problems. If you want support, ask on Usenet or the ActiveState = mailing lists.=20 That said, I welcome problems related to the tutorial itself. =


Conventions used in this Tutorial

The humour is non-conventional. I think. Of more importance, the = text is=20 coloured strangely in places. My intention is to aid your = comprehension, not=20 attempt beautification. The meaning of the colours:=20

  • Sometimes you'll need to type something in on the command line. = These=20 commands will be in green, for example :
    perl=20 changeworld.pl parm1 datafile.txt=20
  • Code that you should load into your editor and run is in blue = (don't run=20 this now, it's just an example):
    while =
    (<DATFILE>) {
    	printf "%2s : $_",$.;
    }
         
  • when functions are referred to in the text, their names are = highlighted=20 in red. For example, later we discover an interesting function = called split.

All the code examples have been tested, and you can just = cut'n'paste (brave=20 statement). I haven't listed the output of each example. You need to = run it=20 and see for yourself. Consider this course interactive. Consider it = any which=20 way you like.


Use of this document


Personal Printouts

Fine by me, feel free print to a copy for your own use.

Intranet usage

Just email me and let me know.

Mirroring

Again, all I ask is an email.

Translations

Every so often someone offers to translate the tutorial. Nobody has = actually done so. If you want to, the conditions are:=20

  • You don't change the text other than what can be reasonably = expected=20 during a translation;=20
  • The content, format and notices authorship remains the same;=20
  • You can add a 'translated by' notice in the intro and at the = end, plus=20 your own message;=20
  • Version numbers are respected but the ISO code for your country = is=20 added, eg 3.3.2.ES;=20
  • and you need to email me to discuss.

Remember this document is copyrighted and all = associated=20 rights are strictly reserved.

--

Robert = Pepper
mailto:Robert@netcat.co.uk =



A Short Introduction To Perl

If you already understand what Perl is designed to do, know its = features=20 and limitations then you can skip this very small but highly = informative=20 section, over which I laboured long and hard for those that didn't = know. If=20 you are really sure, jump to the Setup=20 Section.


What is Perl?

Perl is a programming language. Perl stands for Practical Report = and=20 Extraction Language. You'll notice people refer to 'perl' and "Perl". = "Perl"=20 is the programming language as a whole whereas 'perl' is the name of = the core=20 executable. There is no language called "Perl5" -- that just means = "Perl=20 version 5". Versions of Perl prior to 5 are very old and very = unsupported.=20

Some of Perl's many strengths are:=20

  • Speed of development. You edit a text file, and just run = it. You=20 can develop programs very quickly like this. No separate compiler = needed. I=20 find Perl runs a program quicker than Java, let alone compare the = complete=20 modify-compile-run-oh-no-forgot-that-semicolon sequence.=20
  • Power. Perl's regular expressions are some of the best = available.=20 You can work with objects, sockets...everything a systems = administrator=20 could want. And that's just the standard distribution. Add the = wealth of=20 modules available on CPAN and you have it all. Don't equate = scripting=20 languages with toy languages.=20
  • Usuability. All that power and capability can be learnt = in easy=20 stages. If you can write a batch file you can program Perl. You = don't have=20 to learn object oriented programming, but you can write OO programs = in Perl.=20 If autoincrementing non-existent variables scares you, make perl = refuse to=20 let you. There is always more than one way to do it in Perl. You = decide your=20 style of programming, and Perl will accommodate you.=20
  • Portability. On the Superhighway to the Portability = Panacea,=20 Perl's Porsche powers past Java's jaded jalopy. Many people develop = Perl=20 scripts on NT, or Win95, then just FTP them to a Unix server where = they run.=20 No modification necessary.=20
  • Editing tools You don't need the latest Integrated = Development=20 Environment for Perl. You can develop Perl scripts with any text = editor.=20 Notepad, vi, MS Word 97, or even direct off the console. Of course, = you can=20 make things easy and use one of the many freeware or shareware = programmer's=20 file editors.=20
  • Price. Yes, 0 guilders, pounds, dmarks, dollars or = whatever. And=20 the peer to peer support is also free, and often far better than = you'd ever=20 get by paying some company to answer the phone and tell you to do = what you=20 just tried several times already, then look up the same reference = books you=20 already own.


What is ActivePerl? Are the other Perls inactive?

A company named ActiveState exists to provide Perl tools for the = Win32=20 environment. ActiveState used to be ActiveWare, and before that it was = sort of=20 a part of Hip Communications. It now appears to be happy with its = current=20 name, having not changed it for over a year. Win32 means, at the time = of=20 writing, Windows 95, Windows 98 and Windows NT. It does not = mean=20 Windows 3.11, even with Win32s installed.

Prior to Perl version 5.005, there was one version of Perl for = Win32, and=20 another for all the other systems. The other version was known as the = "native=20 version".

The Win32 version was developed by ActiveState, called "Perl for = Win32" and=20 typically lagged slightly behind the native version. As of the 5.005 = release,=20 Perl for Win32 and the native version have merged -- the native = version now=20 supports Win32 directly and doesn't need any tweaking by ActiveState. =

ActiveState have dropped "Perl for Win32" and renamed their = distribution,=20 which comes with an InstallShield installer, "ActivePerl".

Incidentally, a few months before 5.005 merge the native Perl = version was=20 changed so it would run on Win32 directly. This version was best known = by the=20 creator's name, "Gurusamy Sarathy". However, there were still quite a = few=20 differences between it and Perl for Win32, so many people ran both. = The merge=20 brought the best of both worlds together.


Can I run Perl on my computer?

Probably. Perl runs on everything from Amigas to Macintoshes to = Unix boxen.=20 Perl also runs on Microsoft operating systems, namely Windows 95, = Windows 98=20 and Windows NT 3.51 and later. There are versions of Perl that run on = earlier=20 versions of these operating systems but they are no longer developed = or=20 supported. See http://www.perl.com/ for=20 full details.


What can I do with Perl ?

Just two popular examples :

The Internet

Go surf. Notice how many websites have dynamic pages with = .pl=20 or similar as the filename extension? That's Perl. It is the most = popular=20 language for CGI programming for many reasons, most of which are = mentioned=20 above. In fact, there are a great many more dynamic pages written with = perl=20 that may not have a .pl extension. If you code in Active = Server=20 Pages, then you should try using ActiveState's PerlScript. Quite = frankly,=20 coding in PerlScript rather than VBScript or JScript is like driving a = car as=20 opposed to riding a bicycle. Perl powers a good deal of the Internet. =


Systems Administration

If you are a Unix sysadmin you'll know about sed, awk and shell = scripts.=20 Perl can do everything they can do and far more besides. Furthermore, = Perl=20 does it much more efficiently and portably. Don't take my word for it, = ask=20 around.

If you are an NT sysadmin, chances are you aren't used to = programming. In=20 which case, the advantages of Perl may not be clear. Do you need it? = Is it=20 worth it?

After you read this tutorial you will know more than enough to = start using=20 Perl productively. You really need very little knowledge to save time. = Imagine=20 driving a car for years, then realising it has five gears, not four. = That's=20 the sort of improvement learning Perl means to your daily sysadminery. = When=20 you are proficient, you find the difference like realising the same = car has a=20 reverse gear and you don't have to push it backwards. Perl means you = can be=20 lazier. Lazy sysadmins are good sysadmins, as I keep telling my boss. =

A few examples of how I use Perl to ease NT sysadmin life:=20

  • User account creation. If you have a text file with the = user's=20 names in it, that is all you need. Create usernames automatically, = generate=20 a unique password for each one and create the account, plus create = and share=20 the home directory, and set the permissions.=20
  • Event log munging. NT has great Event Logging. Not so = great Event=20 Reading. You can use Perl to create reports on the event logs from = multiple=20 NT servers.=20
  • Anything else that you would have used a batch file for, = or=20 wished that you could automate somehow. Now you can. =


What can't I do with Perl ?

The question is, "what shouldn't I do with Perl". Write office = suites is=20 one answer. Perl, like most scripting languages, is a glue language = designed=20 for short and relatively simple tasks. Just don't equate this = philosophy with=20 a lack of power or "serious" features.


Support

See the FAQs at www.perl.com. Of course there are Usenet groups, = but also=20 many mailing lists. Microsoft Windows users will be interested in = those hosted=20 by http://www.activestate.com/ = which=20 discuss all things Perl and Windows.

Please, before you ask any question, anywhere:=20

  1. Make sure you read the group charter. Many people put = time and=20 effort into the creation of those charter in the interests of = efficient=20 discussion, so don't degrade the discussion quality and insult us by = ignoring the guidelines.=20
  2. Read the FAQs at least twice. Try and find related FAQs. = Try=20 hard. You won't be popular if you post a question starting "I've = looked at=20 all the FAQs..." and then ask something that actually is in = the FAQs.=20 Or the manual for that matter. Believe me, it will be patently = obvious to=20 all on the list if you haven't done your homework.=20
  3. Carefully phrase the questions and provide source code = because if=20 you do that, you may well end up solving the problem yourself = because you=20 have thought it through a little more.

Think to yourself -- honestly -- if I was a busy Perl = Professional,=20 would I want to answer my own question?

Does it clearly state what I want an answer to? Preferably just one = question at a time. Am I being unreasonable, for example asking for = someone to=20 code it for me? Have I shown evidence that I have tried to help = myself? Have I=20 made any mistakes in grammar? Is it polite? Is there enough = information in=20 there for the answer to be given?

Why should you care? Well, if you ask poorly-formed questions or = those=20 already answered in the FAQ...let's just say you won't get the answers = you=20 want. If you care about your online reputation and wasting other = people's time=20 -- two more reasons.




Setup

There are four stages:=20

  1. Get the software.=20
  2. Install it.=20
  3. Run a test Script.=20
  4. Celebrate or troubleshoot.


1. Getting the Software

An old version of Perl for Win32 is included with the Windows NT = Resource Kit. It is sadly out of date. Follow the steps below to get a = newer=20 version. Having said that, you can complete the tutorial with the = Resource Kit=20 version but you should upgrade as soon as you can.

Go to http://www.activestate.com/=20 and follow the links to download ActivePerl. It will be a single file, = and the=20 name will be something like api508e.exe. The = i=20 stands for Intel. If you have an Alpha, download = apaXXXe.exe. If=20 you're not sure, download the Intel version.

The 508e is the version number, so expect this to = change quite=20 rapidly. The file size will be just over 5Mb, so it will take a while = to=20 download via modem. If you know how to use FTP, try=20 ftp.activestate.com/activeperl/.

When you find ActivePerl, save the file into any directory you = please. I=20 like to organise my downloads into c:\downloads but that = is just=20 personal preference. As long as ActivePerl ends up on your hard disk = somewhere=20 it doesn't matter.

2. Installation

So you now have apixxxx.exe. If you forget where you = saved it,=20 don't panic, just run Windows Explorer and search for = api*e.exe=20

  1. Double-click the apixxxx.exe. You'll see the = fantastic ActivePerl graphic and be advised to close all open = applications=20 before proceeding. The lizard thing is a gecko, which adorns the = famous=20 O'Reilly book "Learning Perl on Win32 Systems". This tutorial is = aimed at a=20 more basic level than that book, in terms of the author's knowledge, = intended audience and quality of humour.=20
  2. Agree to the license agreement or cancel the install, = stop this=20 tutorial and deny yourself any hope of hackership.=20
  3. Destination directory is whatever you want. I usually = install=20 Perl in c:\progs\perl rather than c:\program=20 files\perl because many Win32 programs don't properly handle = long=20 filenames, let alone those with spaces in. Or you could accept the = default.=20 Your choice.=20
  4. Select Components. All you'll need for this tutorial is = "Perl for=20 Win32 Core", but installing the "Online Help and Documentation" and = "Example=20 Files" is highly recommended. If you run Internet Information Server = (IIS) 3=20 or later, or Personal Web Server (PWS), then install "Perl for = ISAPI" and=20 "PerlScript" too, although don't try either of these until you are=20 proficient with the basics. The phrase running before walking comes = to mind.=20
  5. Select Options.=20
    • "Associate '.pl' with Perl.exe". If you select this = option then=20 you can just type in the name of a script at the command line, or=20 double-click it and the script will run. If you don't, then in = order to=20 get a script to execute you'll need to type:
      perl=20 myscript.pl
      to execute myscript.pl. = Personally, I=20 prefer double-clicking to allow me to edit the file so I do not = select=20 this option. Also, perl has a plethora of command line arguments = which are=20 difficult to pass to a script if you run it by association. For = the=20 purposes of this tutorial I'm assuming that you haven't associated = .pl with perl.=20
    • "Add the Perl bin directory to your path". Do this, = otherwise=20 you'll have to specify the full path to perl.exe every time you = use it.=20 Not fun.=20
    • "Standard I/O redirection for IIS". If you run IIS or = PWS,=20 select this. It is a Good Thing. Understand it later.
  6. IIS Options If you use IIS or PWS you'll have this screen = -- just=20 accept both options.=20
  7. Program Folder whatever your preference is. This is just = a link=20 to the documentation, to the perl.exe itself.=20
  8. Confirmation make sure that what is displayed is what you = have=20 selected...=20
  9. The install program will now copy files. At the end it will run = a few=20 perl scripts itself, which briefly appear as DOS boxes. Don't worry, = it is=20 all quite normal.=20
  10. Release notes. Well worth a read.=20
  11. Reboot! Just so the path statement takes effect. In any = case, it=20 is always good practice to reboot after a new install.


3. Testing - Your First Perl Script

So you know what this tutorial is designed to do. You know what = Perl is=20 designed to do, and you have even installed it. It is now time to = start the=20 tutorial proper, and actually hack some code.



The Tutorial: The Journey Begins


Your First Time

Assuming all has gone to plan, you can now create your first Perl = script.=20 Follow these instructions, but before you start read them through = once, then=20 begin. That's a good idea with any form of computer-related procedure. = So, to=20 begin:=20

  1. Create a new directory for your perl scripts, separate to your = data=20 files and the perl installation. For example, = c:\scripts\,=20 which is what I'll assume you are using in this tutorial.=20
  2. Start up whatever text editor you're going to hack Perl with.=20 Notepad.Exe is just fine. If you can't find Notepad on your Start = menu,=20 press the Start button, then select Run, type in 'notepad' and click = OK.=20
  3. Type the following in Notepad
    print =
    "My first Perl script\n";     
  4. Save the to c:\scripts\myfirst.pl. Be careful! = Notepad will=20 may save files with a .txt extension, so you will end = up with=20 myfirst.txt.pl by default. Perl won't mind, it'll still = execute=20 the file. If your version of Notepad does this, select "All files" = before=20 saving or rename the file then load it again. Better yet, use a = decent text=20 editor!=20
  5. You don't need to exit Notepad -- keep it open, as we'll be = making=20 changes very soon.=20
  6. Switch to your command prompt. If you don't know how to start a = command=20 prompt, click 'Start' and then 'Run'. If using Windows 9x, type in = 'command'=20 and press enter. If using NT, type in 'cmd' and press Enter.=20
  7. Change to your perl scripts directory, for example cd = \scripts=20 .=20
  8. Hold your breath, and execute the script: perl=20 myfirst.pl

and you'll see the output. Welcome to the world of Perl ! See what = I mean=20 about it being easy to start ? However, it is difficult to finish with = Perl=20 once you begin :-)


What if it doesn't...?

So you typed in perl = myfirst.pl and=20 you didn't see My first Perl script on the screen. If you = saw=20 "bad command or filename" then either you haven't installed Perl or = perl.exe=20 is not in your path. Probably the latter. Reboot, then try again.

If you saw Can't open perl script "xxxx.pl": No such file or=20 directory then perl is defintely installed, but you have either = got the=20 name of the script wrong or the script is not in the same directory as = where=20 you are trying to run it from. For example, maybe you saved in script = in=20 c:\windows and you are in c:\scripts so of = course=20 Perl complains it can't find the script. Could you? Well, don't expect = Perl to=20 then. You don't have to run the script from the directory in which it = resides,=20 but it is easier.

Assuming it's now all right...

W need to analyse what's going on here a little. First note that = the line=20 ends with a semicolon ; . = Almost all=20 lines of code in Perl have to end with semicolons, and those that = don't have=20 to will accept semicolons anyway. The moral is -- use semicolons. = Sorry; the=20 moral is; use semicolons.

Oh, one more thing -- if you haven't already done so, continue = breathing.=20

Also note the \n . This = is the code=20 to tell Perl to output a newline. What's a newline? Delete the \n from the program and run it = again:=20

print "My first Perl =
script";

and all should become clear.    You have now written = your=20 first Perl script.   


Shebang

Almost every Perl book is written for UN*X, which is a problem for = Win32.=20 This leads to scripts like:

#!c:/perl/perl.exe

print "I'm a cool Perl hacker\n";
   

The function of the 'shebang' line is to tell the shell how to = execute the=20 file. Under UNIX, this makes sense. Under Win32, the system must = already know=20 how to execute the file before it is loaded so the line is not needed. =

However, the line is not completely ignored, as it is searched for = any=20 switches you may have given Perl (for example -w=20 to turn on warnings).

You may also choose to add the line so your scripts run directly on = UNIX=20 without modification, as UNIX boxes probably do need it. Win32 = systems=20 do not. We shall continue with the lesson.

Variables


Scalars

So Perl is working, and you are working with Perl. Now for = something more=20 interesting than simple printing. Variables. Let's take simple scalar=20 variables first. A scalar variable is a single value. Like = $var=3D10 which sets the variables = $var to the value of 10. Later, = we'll look=20 at lists like arrays and hashes, where @var=20 refers to more than one value. For the moment, remember = that=20 Scalar is Singular. If weird metaphors help, think of lots of = scaly=20 snakes at a singles bar. If that didn't help, I apologise for putting = the=20 thought into your mind.


$ % @ are Good Things

If you have any experience with other programming languages you = might be=20 surprised by the code $var=3D10. With=20 most languages, if you want to assign the value 10=20 to a variable called var=20 you'd write var=3D10.

Not so in Perl. This is a Feature. All variables are prefixed with = a symbol=20 such as $ @ % . This has = certain=20 advantages, like making programs easier to read. Honestly, I'm = serious! It=20 just takes some getting used to. The prefixes mean that you can see = where=20 the variables are quite easily. And not only that, what sort of = variable it is. The human language German has a similar principle = (except=20 nouns are capitalised, not prefixed with $ = and Perl is easier to pronounce). You'll agree later, I = think.=20

So, ever onwards. Time to try some more variables:

$string=3D"perl";
$num1=3D20;
$num2=3D10.75;
print "The string is $string, number 1 is $num1 and number 2 is =
$num2\n";

   

Typing

A closer look...notice you don't have to say what type of = variable=20 you are declaring. In other languages you need to say if the variable = is a=20 string, array, what sort of number it is and so on. You might even = have to=20 declare what type of number it is. As an example, in Java you'd been = saying=20 things like int var=3D10 which defines the variable var = as an=20 integer, with the value 10.

So, why do these other programming languages force you to declare = exactly=20 what your variables are? Wouldn't it be easier if we could just not = bother?=20

For short programs, yes. For really big projects with many = programmers=20 working on the same application, no. That's because forcing variable = type=20 declaration also forces a certain discipline and rigour which is what = you need=20 on big projects.

As you know, Perl is not designed for gigantic software engineering = efforts. It is all about small, quick programs. For these purposes you = don't=20 need the rigour of variable controls as much, so Perl doesn't bother. =

This idea of forcing a programmer to declare what sort of variable = is being=20 created is called typing. As Perl doesn't by default enforce = any rules=20 on typing, it is said to be a loosely typed language, as = opposed to=20 something like C++ which is strongly typed.


Variable Interpolation

We still haven't finished learning from that humble bit of code. To = refresh=20 your memory, here it is again:

$string=3D"perl";
$num1=3D20;
$num2=3D10.75;
print "The string is $string, number 1 is $num1 and number 2 is =
$num2\n";

Notice the way the variables are used in the string. Sticking=20     variables inside of strings has a technical term -=20 "variable     interpolation". Now, if we didn't = have the=20 handy $ prefix for we'd have = to do=20 something like     the example below, which is = pseudocode.=20 Pseudocode is code to demonstrate a    concept, not = designed to=20 be run. Like certain Microsoft software.   

print "The string is ".string." and the = number is=20 ".num."\n";

which is much more work. Convinced about those prefixes yet ?

Try running the following code:

$string=3D"perl";
$num=3D20;
print "Doubles: The string is $string and the number is $num\n";
print 'Singles: The string is $string and the number is $num\n';
   

Double quotes allow the aforementioned variable interpolation. = Single=20 quotes do not. Both have their uses as you will see later, depending = on=20 whether you wish to interpolate anything.


Changing Variables


Auto(de|in)crements

If you want to add 1 to a variable you can, logically, do this; = $num=3D$num+1 . There is a shorter = way to do=20 this, which is $num++. This = is an=20 autoincrement. Guess what this is; $num--=20 . Yes, an autodecrement.

This example illustrates the above:

$num=3D10;
print "\$num is $num\n";

$num++;
print "\$num is $num\n";

$num--;
print "\$num is $num\n";

$num+=3D3;
print "\$num is $num\n";
   

The last example demonstrates that it doesn't have to be just 1 you = can add=20 or decrease by.


Escaping

There's something else new in the code above. The \ . You can see what this does -- = it=20 'escapes' the special meaning of $=20 .

Escaping means that just the $=20 symbol is printed instead of it referring to a variable. =

Actually \ has a deeper = meaning --=20 it escapes all of Perl's special characters, not just $ . Also, it turns some = non-special=20 characters into something special. Like what ? Like n . Add the magic \ and the humble 'n' becomes the = mighty=20 NewLine ! The \ character = can also=20 escape itself. So if you want to print a single \=20 try:

print "the MS-DOS =
path is c:\\scripts\\";
   

Oh, '\' is also used for other things like references. But that's = not even=20 covered here.

There is a technical term for these 'special characters' such as = @ $ %. They are called = metacharacters.=20 Perl uses plenty of metacharacters. In fact, you'll wear your = keyboard=20 pretty evenly during a night's perl hacking. I think it is safe to say = that=20 Perl uses every possible keystroke and shifted keystroke on a standard = US PC=20 keyboard.

You'll be working with all sorts of obscure characters in your Perl = hacking=20 career, and I also mean those on your keyboard. This has earned perl a = reputation for being difficult to understand. That's entirely true. = Perl=20 does have such a reputation, no doubt about it.

Is the reputation justified? In my opinion, Perl does have a short = but=20 steep learning curve to begin with simply because it is so different. = However,=20 once you learn the character meanings reading perl code becomes much = easier=20 precisely because of all these strange characters.


Context: About Perl and @^$%&~`/?

Perl uses so many weird characters that there aren't enough to go = round. So=20 sometimes the same character has two or more meanings, depending on = its=20 context. As an example, the humble dot .=20 can join two variables together, act as a wildcard or = become a=20 range operator if there are two of them together. The caret ^ has different effects in [^abc] as opposed to [a^bc] .

If this sounds crazy, think about the English language. What do the = following mean to you ?=20

  • MEAN=20
  • POLISH=20
  • LIKE

Mean is, in one context, is a word to used describe the purpose of=20 something. It is also another word for average. Furthermore, it = describes a=20 nasty person, or a person who doesn't like spending money, and is used = in=20 slang to refer to something impressive and good.

That's five different uses for 'mean', and you don't have any = trouble=20 understanding which one I mean due to context.

Polish, when capitalised, can either mean pertaining to the country = Poland,=20 or the act of making something shiny. And 'like' can mean similar to, = or=20 affection for.

So, when you speak or write English (think of two, to and too) you = know=20 what these words mean by their context. It is exactly the same way = with Perl.=20 Just don't assume a given metacharacter always means what you first = thought it=20 did.

To finish off this section, try the following:

Strings and Increments

$string=3D"perl";
$num=3D20;
$mx=3D3;

print "The string is $string and the number is $num\n";

$num*=3D$mx;
$string++;
print "The string is $string and the number is $num\n";
Note the easy shortcut *=3D meaning 'multiply $num by $mx' =
or, $num=3D$num*$mx .
Of course Perl supports the usual + - * / ** =
% operators. The last two are
exponentiation (to the power of) and modulus (remainder of x divided by =
y).
Also note the way you can increment a string ! Is this language flexible =
or what ?=20
   

Print: A List Operator

The print function is a = list=20 operator. That means it accepts a list of things to print, = separated by=20 commas. As an example:

print "a =
doublequoted string ", $var, 'that was a variable called var', $num," =
and a newline \n";
Of course, you just put all the above inside a single=20
doublequoted string: 
print "a =
doublequoted string $var that was a variable called var $num and a =
newline \n";
to achieve the same effect. The advantage of using the =
print function in list context
is that expressions are evaluated before being printed. For example, try =
this: 
$var=3D"Perl";
$num=3D10;
print "Two \$nums are $num * 2 and adding one to \$var makes $var++\n";
print "Two \$nums are ", $num * 2," and adding one to \$var makes ", =
$var++,"\n";
   

You might have been slightly surprised by the result of that last=20 experiment. In particular, what happened to our variable $var ? It should have been = incremented by=20 one, resulting in Perm. The reason being that 'm' is the = next=20 letter after 'l' :-)

Actually, it was incremented by 1. We are = postincrementing=20 $var++ the variable, rather = than=20 preincrementing it.

The difference is that with postincrements, the value of the = variable is=20 returned, then the operation is performed on it. So in the example = above, the=20 current value of $var was = returned to=20 the print function, then 1 = was added.=20 You can prove this to yourself by adding the line print "\$var is now $var\n"; to = the end of=20 the example above.

If we want the operation to be performed on $var before the value is returned = to the=20 print function, then preincrement is the way to go. ++$var will do the trick.


Subroutines -- A First Look

Let's take a another look at the example we used to show how the=20 autoincrement system works. Messy, isn't it ? This is Batch File = Writing=20 Mentality. Notice how we use exactly the same code four times. Why not = just=20 put it in a subroutine?

$num=3D10;		# =
sets $num to 10
&print_results;		# prints variable $num

$num++;
&print_results;

$num*=3D3;
&print_results;

$num/=3D3;
&print_results;

sub print_results {
        print "\$num is $num\n";
}
   

Easier and neater. The subroutine can go anywhere in your script, = at the=20 beginning, end, middle...makes no difference. Personally I put all = mine at the=20 bottom and reserve the top part for setting variables and main program = flow.=20

A subroutine is just some code you want to use more than once in = the same=20 script. In Perl, a subroutine is a user-defined function. There is no=20 difference. For the purposes of clarity I'll refer to them as = subroutines.=20

A subroutine is defined by starting with sub=20 then the name. After that you need a curly left bracket = { , then all the code for your = subroutine.=20 Finish it off with a closing brace } = .=20 The area between the two braces is called a block. Remember = this. There=20 are such things as anonymous subroutines but not here. Everything here = has a=20 name.

Subroutines are usually called by prefixing their name with an = ampersand,=20 that is one of these -- & = , like=20 so &print_results; . It = used to be=20 cool to omit the & = prefix but all=20 perl hackers are now encouraged to use it to avoid ambiguity. = Ambiguity can=20 hurt you if you don't avoid it.

If you are worrying about variable visibility, don't. All the = variables we=20 are using so far are visible everywhere. You can restrict visibility = quite=20 easily, but that's not important right now. If you weren't worrying = about=20 variable visibility, please don't start. I'd tell you it's not = important but=20 that'll only make you worried. (paranoid ?) We'll cover it later.


Comments

Did you see a # crept in = there.=20 That's a comment. Everything after a #=20 is ignored. You can't continue it onto a newline = however, so if=20 your comment won't fit on one line start a new one with # . There are ways to create Plain = Old=20 Documentation (POD) and more ways to comment but they are not detailed = here.=20


Comparisons


An iffy start

An if statement is = simple. if=20 the day is Sunday, then lie in bed. A simple test, with two = outcomes.=20 Perl conversion (don't run this):

if ($day eq "sunday") {
	&lie_in_bed;
}

You already know that &lie_in_bed=20 is a call to a subroutine. We assume $day is set earlier in the = program. If $day is not equal to 'Sunday' = &lie_in_bed is not executed = (pity). You=20 don't need to say anything else. Try this:

$day=3D"sunday";

if ($day eq "sunday") {
        print "Zzzzz....\n";
}
Note the syntax. The if =
statement requires something to test for Truth. This =
expression must
be in (parens), then you have the braces to form a block.=20
   

The Truth According to Perl

There are many Perl functions which test for Truth. Some are if, while, unless . So it is = important you=20 know what truth is, as defined by Perl, not your tax forms. There are = three=20 main rules:=20

  1. Any string is true except for ""=20 and "0".=20
  2. Any number is true except for 0.=20 This includes negative numbers.=20
  3. Any undefined variable is false. A undefined variable is one = which=20 doesn't have a value, ie has not been assigned to.

Some example code to illustrate the point:

&isit;                   # $test1 is at this =
moment undefined

$test1=3D"hello";         # a string, not equal to "" or "0"
&isit;

$test1=3D0.0;             # $test1 is now a number, effectively 0
&isit;

$test1=3D"0.0";           # $test1 is a string, but NOT effectively 0 !
&isit;

sub isit {
        if ($test1) {                           # tests $test1 for truth =
or not
                print "$test1 is true\n";
        } else {                                # else statement if it =
is not true
                print "$test1 is false\n";
        }
}

The first test fails because $test1 =
is undefined. This means it has not been created by=20
assigning a value to it. So according to Rule 3 it is false. The last =
two tests are
interesting. Of course, 0.0 is the same as 0 in a numeric =
context. But it is not the
same as 0 in a string context, so in that case it is true.    

So here we are testing single variables. What's more useful is = testing the=20 result of an expression. For example, this is an expression; $x * 2 and so is this; $var1 + $var2 . It is the end = result of=20 these expressions that is evaluated for truth.

An example demonstrates the point:

$x=3D5;
$y=3D5;

if ($x - $y) {
        print '$x - $y is ',$x-$y," which is true\n";
} else {
        print '$x - $y is ',$x-$y," which is false\n";
}
   

The test fails because 5-5 of course is 0, which is false. The = print statement might look a = little strange.=20 Remember that print is a = list=20 operator? So we hand it a list. First item, a single-quoted string. It = is=20 single quoted because it we do not want to perform variable = interpolation on=20 it. Next item is an expression which is evaluated, and the = result=20 printed. Finally, a double-quoted string is used because we want to = print a=20 newline, and without the doublequotes the \n=20 won't be interpolated.

What is probably more useful than testing a specific variable for = truth is=20 equality testing. For example, has your lucky number been drawn? =

$lucky=3D15;
$drawnum=3D15;

if ($lucky =3D=3D $drawnum) {
        print "Congratulations!\n";
} else {
        print "Guess who hasn't won!\n";
}
   

The important point about the above code is the equality operator, = =3D=3D .


Equality and Perl

Now pay close attention, otherwise you'll end up posting an = annoying=20 question somewhere. This is a FAQ, as in a Frequently Asked Question. =

The symbol =3D is an = assignment=20 operator, not a comparison operator. Therefore:=20

  • if ($x =3D 10) is = always true,=20 because $x has been = assigned=20 the value 10 successfully.=20
  • if ($x =3D=3D 10) = compares=20 the two values, which might not be equal.

So far we have been testing numbers, but there is more to life than = numbers. There are strings too, and these need testing too.

$name	 =3D 'Mark';

$goodguy =3D 'Tony';

if ($name =3D=3D $goodguy) {
        print "Hello, Sir.\n";
} else {
        print "Begone, evil peon!\n";
}
   

Something seems to have gone wrong here. Obviously Mark is = different to=20 Tony, so why does perl consider them equal?

Mark and Tony are equal -- numerically. We should be testing = them as=20 strings, not as numbers. To do this, simply substitute =3D=3D for eq=20 and everything will work as expected.


All Equality is Not Equal: Numeric versus String

There are two types of comparison operator; numeric and=20 string. You've already seen two, =3D=3D=20 and eq. Run = this:=20

$foo=3D291;
$bar=3D30;

if ($foo < $bar) {=20
        print "$foo is less than $bar (numeric)\n";=20
}

if ($foo lt $bar) {=20
        print "$foo is less than $bar (string)\n";=20
}
   

The lt operator compares = in a=20 string context, and of course <=20 compares in a numeric context.

Alphabetically, that is in a string context, 291 comes before 30. = It is=20 actually decided by the ASCII value, but alphabetically is close = enough.=20 Change the numbers around a little. Notice how Perl doesn't care = whether it=20 uses a string comparison operator on a numeric value, or vice versa. = This=20 is typical of Perl's flexibility.

Bondage and discipline are pretty much alien concepts to Perl (and = the=20 author). This flexibility does have a drawback. If you're on a = programming=20 precipice, threatening suicide by jumping off, Perl won't talk you out = of your=20 decision but will provide several ways of jumping, stepping or falling = to your=20 doom while silently watching your early conclusion. So be careful. =


An interlude -- The Perl Motto

The Perl Motto is; "There is More Than One Way to Do It" or=20 TIMTOWTDI. Pronounced 'Tim-Toady'. This tutorial doesn't try and = mention all=20 possible ways of doing everything, mainly because the author is far = too lazy.=20 Write your Perl programs the way you want to.


The Comparison Operators Listed

The rest of the operators are:

Comparison Numeric String
Equal =3D=3D eq
Not equal !=3D ne
Greater than > gt
Less than < lt
Greater than or equal to >=3D ge
Less than or equal to <=3D le


The Golden Rule of Comparisons

They may be odious, but remember the following:=20

  • if you are testing a value as a string there should be = only=20 letters in your comparison operator.=20
  • if you are testing a value as a number there should only = be=20 non-alpha characters in your comparison operator=20
  • note 'as a' above. You can test numbers as string and vice = versa. Perl=20 never complains.


More About If: Multiples

More about if statements. = Run this:=20

$age=3D25;
$max=3D30;

if ($age > $max) {
        print "Too old !\n";
} else {
        print "Young person !\n";
}
It is easy to see what else does. If the expression is =
false then whatever is in=20
the else block is evaluated =
(or carried out, executed, whatever term you choose to use).
Simple. But what if you want another test ? Perl can do that too.=20
   

elsif

$age=3D25;
$max=3D30;
$min=3D18;

if ($age > $max) {
        print "Too old !\n";
} elsif ($age < $min) {=20
        print "Too young !\n";=20
} else {=20
        print "Just right !\n";=20
}
If the first test fails, the second is evaluated. This =
carries on until there are no=20
more elsif statements, or an =
else statement is reached. An =
else statement is optional,
and no elsif statements should =
come after it. Logical, really.    

There is a big difference between the above example the one below:=20

if ($age > $max) {
        print "Too old !\n";
}=20

if ($age < $min) {
        print "Too young !\n";
}
   

If you run it, it will return the same result - in this case. = However, it=20 is Bad Programming Practice. In this case we are testing a number, but = suppose=20 we were testing a string to see if it contained R or S. It is possible = that a=20 string could contain both R and S. So it would pass both 'if' = tests.=20 Using an elsif avoids this. = As soon as=20 the first statement is true, no more elsif = statements (and no else=20 statement) are executed.

You don't need to take up a whole three lines:

print "Too old\n" if     $age > $max;
print "Too old\n" unless $age < $max;
   

I added some whitespace there for aesthetic beauty. There are other = operators that you can use instead of if=20 and unless , = but that's=20 for later on.

Incidentally, the two lines of code above do not do exactly the = same thing.=20 Consider a maximum age of 50 and input age of 50. Therefore, you = should be=20 very careful about your logic when writing code (nice obvious = statement=20 there).

For those that were wondering, Perl has no case statement. This is = all=20 explained in the FAQ, which is located at http://www.perl.com/.


User Input


STDIN and other filehandles

Sometimes you have to interact with the user. It is a pain, but = sometimes=20 necessary, especially for the live ones. To ask for input and do = something=20 with it try this:

print "Please tell me =
your name: ";
$name=3D<STDIN>;
print "Thanks for making me happy, $name !\n";

New things to learn here. Firstly, <STDIN> . STDIN is a =
filehandle. Filehandles are what=20
you use to interact with things such as files, console input, socket =
connections and more.    

You could say STDIN is the standard source for input. Guess what = STDIN=20 stands for. In this case the STDIN filehandle is reading from the = console.=20

The angle brackets <> = read=20 data from a filehandle. Exactly how much is dependent on what you do, = but in=20 this case it is whatever was input at the prompt.

So we are reading from the STDIN filehandle. The value is assigned = to $name and printed. Any idea why = the ! ends=20 up on a new line ? on a new line on a newline ????

As you pressed Enter, you of course included a newline with your = name. The=20 easy way to get rid of it is to chop=20 it off:

Chop

print "Please tell me your =
name: ";
$name=3D<STDIN>;
chop $name
print "Thanks for making me happy, $name !\n"

and that fails with a syntax error. Can you spot why? Look at the error =
code, look at the=20
line number and see where the syntax is wrong. The answer is a missing =
semicolon=20
( ; ) on the end of the last two lines.    

If you add a ; to the end of line 3, but not to the = last line,=20 then the program works as it should. This is because Perl doesn't need = a=20 semicolon to end the last statement of a block. However, I'd advise = ending all=20 your statements with semicolons because you may well be adding more = code to=20 them and it is only one little keystroke.

When you add the semicolon(s), the program runs correctly. The = chop function removes the last = character of=20 whatever it is given to chop, in this case removing the newline for = us. In=20 fact, that can be shortened:

print =
"Please tell me your name: ";
chop ($name=3D<STDIN>);
print "Thanks for making me happy, $name !";
   

The parentheses ( ) force = chop to act on the result of what = is inside=20 them. So $name=3D<STDIN> = is=20 evaluated first, then the result from that, which is $name , is chopped. Try it = without.

You can read from STDIN as much as you like. For your entertainment = I have=20 created a sophisticated multinational greeting machine:

print "Please tell me your name: ";
chop ($name=3D<STDIN>);

print "Please tell me your nationality: ";
chop ($nation=3D<STDIN>);

if ($nation eq "British" or $nation eq "New Zealand") {
	print "Hallo $name, pleased to meet you!\n";

} elsif ($nation eq "Dutch" or $nation eq "Flemish") {
	print "Hoi $name, hoe gaat het met u vandaag?!\n";

} else {
	print "HELLO!!!  SPEAKEEE ENGLIEESH???\n";
}
   

Aside from demonstrating the native English speaker's linguistic = talents,=20 this script also introduces the or=20 logical operator. We'll cover or=20 and its associates in more detail later on. First, a = word of=20 warning.

Chopping is dangerous, as my friend One Hand Harold will tell you. = Everyone=20 is concerned about various forms of safety these days, and your perl = code=20 should be no exception.


Safe Chopping with Chomp

Rather than just wantonly remove the last character regardless of = whatever=20 it is, without a care in the world, just simply consigning the poor = little=20 thing to the Great Bit Bucket in the Sky, you can remove the last = character=20 only if it is a newline with chomp = :=20

chomp ($name=3D<STDIN>);
   

At this point the perl gurus are screaming "I found an error !". = Well,=20 chomp doesn't always remove = the last=20 character if it is a newline but if it doesn't, you have set a special = variable, namely $/ , to = something=20 different. I presume that if you do set $/ = you know what it does. It is explained later in this = very=20 document. Of course, being a good pupil, you wouldn't experiment with = the=20 unknown, blindly changing things just for the hell of it to see what = happens.=20

If you don't, you'll never learn anything useful.


Arrays


Lists, herds -- what are arrays?

Perl has two types of array, associative arrays (hashes) and = arrays. Both=20 types are lists. A list is just a collection of variables referred to = as the=20 collection, not as individual elements.

You can think of Perl's lists as a herd of animals. List context = refers to=20 the entire herd, scalar context refers to a single element. A list is = a herd=20 of variables. The variables don't have to be all of the same type -- = you might=20 have a herd of ten sheep, three lions and two wolves. It would = probably be=20 just three lions and one wolf before long, but bear with me. In the = same way,=20 you might have a Perl list of three scalar variables, two array = elements and=20 ten hash elements.

Certain types of lists are known by certain names. Just as a herd = of sheep=20 is called a flock, a herd of lions is called a pride, a herd of wolves = is=20 called a pack and a herd of managers a confusion, some types of Perl = list have=20 a special names.


Basic Array Work

For example, an array is an ordered list of scalar = variables. This=20 list can be referred to as a whole, or you can refer to individual = elements in=20 the list. The program below defines a an array, called @names . It puts five values into = the array.=20


@names=3D("Muriel","Gavin","Susanne","Sarah","Anna");

print "The elements of \@names are @names\n";
print "The first element is $names[0] \n";
print "The third element is $names[2] \n";
print 'There are ',scalar(@names)," elements in the array\n";

Firstly, notice how we define @names=20 . As it is in a list context, we are using = parens. Each=20 value is comma separated, which is Perl's default list = delimiter. The=20 double quotes are not necessary, but as these are string values it = makes it=20 easier to read and change later on.

Next, notice how we print it. Simply refer to it as a whole, that = is in=20 list context.. List context means referring to more than one = element of=20 a list at a time. The code print @names;=20 will work perfectly well too. But....

I usually learn something about Perl every time I work with it. = When=20 running a course, a student taught me this trick which he had = discovered:


@names=3D("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon=
");

print @names;
print "\n";
print "@names";
When a list is placed inside doublequotes, it is space =
delimited when interpolated. Useful.    

If we want to do anything with the array as a list, that is = doing=20 something with more than one value, then refer to the array as = @array . That's important. The = @ prefix is used when you want to = refer to=20 more than one element of a list.

When you refer to more than one, but not all elements of an array = that is=20 known as a slice . Cake analogies are appropriate. Pie = analogies are=20 probably healthier but equally accurate.


Elements of Arrays

Arrays are not much use unless we can get to individual elements. = Firstly,=20 we are dealing with a single element of the list, so we cannot use = @ which refers to multiple = elements of the=20 array. It is a single, scalar variable, so $=20 is used. Secondly, we must specify which element we = want.=20 That's easy - $array[0] for = the first,=20 $array[1] for the second and = so forth.=20 Array indexes start at 0, unless you do something which is so highly=20 deprecated ('deprecated' means allowed, usually for backwards = compatibility,=20 but disapproved of because there are better ways) I'm not even going = to=20 mention it.

Finally, we force what is normally list context (more than one = element)=20 into scalar context (single element) to give us the amount of elements = in the=20 array. Without the scalar , = it would=20 be the same as the second line of the program.

How to refer to elements of an array

Please understand this:

$myvar=3D"scalar variable";
@myvar=3D("one","element","of","an","array","called","myvar");

print $myvar;        # refers to the contents of a scalar variable =
called myvar
print $myvar[1];     # refers to the second element of the array myvar
print @myvar;        # refers to all the elements of array myvar

   

The two variables $myvar = and @myvar are not, in any way, = related. Not=20 even distantly. Technically, they are in different namespaces. =

Going back to the animal analogy, it is like having a dog named = 'Myvar' and=20 a goldfish called 'Myvar'. You'll never get the two mixed up because = when you=20 call 'Myvar !!!!' or open a can of dog food the 'Myvar' dog will come = running=20 and goldfish won't. Now, you couldn't have two dogs called 'Myvar' and = in the=20 same way you can't have two Perl variables in the same namespace = called=20 'Myvar'.

More ways to access arrays

The element number can be a variable.

print "Enter a number :";
chomp ($x=3D<STDIN>);

@names=3D("Muriel","Gavin","Susanne","Sarah","Anna");

print "You requested element $x who is $names[$x]\n";

print "The index number of the last element is $#names \n";

This is useful. Notice the last line of the example. It returns =
the index number of=20
the last element. Of course you could always just do this $last=3Dscalar(@names)-1; but=20
this is more efficient. It is an easy way to get the last element, as =
follows: 
print "Enter the number of the =
element you wish to view :";
chomp ($x=3D<STDIN>);

@names=3D("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon=
");

print "The first two elements are @names[0,1]\n";
print "The first three elements are @names[0..2]\n";
print "You requested element $x who is $names[$x-1]\n";		# starts at 0
print "The elements before and after are : @names[$x-2,$x]\n";
print "The first, second, third and fifth elements are =
@names[0..2,4]\n";

print "a) The last element is $names[$#names]\n";	# one way
print "b) The last element is @names[-1]\n";		# different way=20

It looks complex, but it is not. Really. Notice you can have =
multiple values separated=20
by a comma. As many as you like, in whatever order. The range operator =
.. gives you
everything between and including the values. And finally look at how we =
print the last=20
element - remember $#names =
gives us a number ? Simply enclose it inside square =
brackets
and you have the last element.    

Do also note that because element accesses such as [0,1] are more than one variable, = we cannot=20 use the scalar prefix, namely the $=20 symbol. We are accessing the array in list context, = so we use=20 the @ symbol. Doesn't = matter that=20 it is not the entire array. Remember, accessing more than one element = of an=20 array but not the entire array is called a slice. I won't go over the = food=20 analogies again.


For Loops


A for Loop demonstrated

All well and good, but what if we want to load each element of the = array in=20 turn ? Well, we could build a for loop like this:

@names=3D("Muriel","Gavin","Susanne","Sarah","Anna",=
"Paul","Trish","Simon");

for ($x=3D0; $x <=3D $#names; $x++) {
        print "$names[$x]\n";=20
}

which sets $x to 0, runs the =
loop once, then adds one to $x =
, checks it is less than=20
$#names , if so carries on. By =
the way, that was your introduction to for =
loops. Just=20
to go into a little detail there, the for =
loop has three parts to it:    
  • Initialisation=20
  • Test Condition=20
  • Modification

In this case, the variable $x = is=20 initialised to 0. It is immediately tested to see if it is smaller = than, or=20 equal to $#names . If that = is true,=20 then the block is executed once. Critically, if it is not true = the=20 block is not executed at all.

Once the block has been executed, the modification expression is = evaluated.=20 That's $x++ . Then, the test = condition=20 is checked to see if the block should be executed or not.

For loops with .. , the range operator

There is a another version:

for $x =
(0 .. $#names) {
        print "$names[$x]\n";
}

which takes advantage of the range operator .. (two dots together). This simply =
gives $x the=20
value of 0, then increments $x =
by 1 until it is equal to $#names .=20
   

foreach

For true beauty we must use foreach=20 .

foreach $person =
(@names) {
        print "$person";
}

This goes through each element ('iterates', another good    technical =
word to use)=20
of @names , and assigns each =
element in turn to the variable $person =
. Then you can do=20
what you like with the variable. Much easier. You can use 
for $person (@names) {
        print "$person";
}

if you want. Makes no difference at all, aside from a little  clarity.=20
   

The infamous $_

In fact, that gets shorter. And now I need to introduce you to = $_ , which is the Default Input = and=20 Pattern Searching Variable.

foreach =
(@names) {
        print "$_";
}

If you don't specify a variable to put each element into, $_ is used instead as it is=20
the default for this operation, and many, many others in Perl. Including =
the print function : =
foreach (@names) {
        print ;
}

As we haven't supplied any arguments to print , $_ is printed as default. You'll be =
seeing=20
a lot of $_ in Perl. Actually, =
that statement is not exactly true. You will be seeing lot
of places where $_ is used, =
but quite often when it is used, it is not actually written.=20
In the above example, you don't actually see $_ but you know it is there.=20
   

A Premature End to your loop

A loop, by its nature, continues. If that didn't make sense, start = reading=20 this sentence again.

The old jokes are the best, aren't they?

The joke above is a loop. You continue re-reading the sentence = until you=20 realise I'm trying to be funny. Then you exit the loop. Or maybe = somebody=20 doesn't exit it. Whatever, loops always run until the expression they = are=20 testing returns false. In the case of the examples above, a false = value is=20 returned when all the elements of the array have been cycled through, = and the=20 loop ends.

If you want an everlasting loop, just test an condition you know = will=20 always be true:

while (1) {
	$x++;
        print "$x:  Did you know you can press CTRL-C to interrupt a =
perl program?\n";
}

Another way to exit a loop is a simple foreach over the elements, as we =
have seen. But if we=20
don't know when we want to exit a loop? For example, suppose we want to =
print out a list of=20
names but stop when we find one with a particular title? You are =
throwing a huge party,=20
someone is allergic to vodka, and this person has drunk from the punch =
bowl despite being=20
assured by someone holding two empty bottles of Absolut that he was just =
using the bottles=20
to convey yet more orange juice into said punch bowl. So you need a =
doctor, and so you write
a Perl script to find one from the list of attendees, wanting the =
doctor's name to be the last=20
item printed: 
@names=3D('Mrs Smith','Mr =
Jones','Ms Samuel','Dr Jansen','Sir Philip');

foreach $person (@names) {
	print "$person\n";
	last if $person=3D~/Dr /;
}

The last operator is our =
friend. Don't worry about the /Dr / business -- that is a =
regular=20
expression which we cover next. All you need to know is that it returns =
true if the name begins=20
with 'Dr '. When it does return true, last =
is operated and the loop ends early.    


A little more control over the premature ending: Labels

So that's easy enough. But wait! We need a medical, human-fixer = type=20 doctor, not just anyone with a PhD. So, the same principle applies in = this=20 example here:

@names  =3D('Mrs =
Smith','Mr Jones','Ms Samuel','Dr Jansen','Sir Philip');
@medics =3D('Dr Black','Dr Waymour','Dr Jansen','Dr Pettle');

foreach $person (@names) {
	print "$person\n";
	if ($person=3D~/Dr /) {
		foreach $doc (@medics) {
			print "\t$doc\n";
			last if $doc eq $person;
		}
	}
}

Aside from showing one way to indent your code, this also demonstrates a =
nested loop. A nested=20
loop is a loop within a loop. What happens is that the =
@names array is searched for a 'Dr ',=20
and if it is found then the @medics array is searched to =
make sure the doctor is a human-fixing=20
doctor not a professor of physics or something. The regular expression =
has been shifted into=20
an if statement, where it =
works nicely as it only returns true or false.    

The problem with the code is that after we find our medical doctor = we want=20 it to stop. But it doesn't. It only stops the loop it is in, so Dr = Pettle=20 never gets printed. However, the code just carries on with Sir Philip = who is=20 terribly sorry old chap, but can't be of any bally use at all, what = ho! What=20 we need is a way to break out of the entire loop from within a nest. = Like so:=20

@names  =3D('Mrs Smith','Mr Jones','Ms =
Samuel','Dr Jansen','Sir Philip');
@medics =3D('Dr Black','Dr Waymour','Dr Jansen','Dr Pettle');

LBL: foreach $person (@names) {
	print "$person\n";
	if ($person=3D~/Dr /) {
		foreach $doc (@medics) {
			print "\t$doc\n";
			last LBL if $doc eq $person;
		}
	}
}

Only two changes here. We have defined a label, namely LBL. =
Instead of breaking out from=20
the current loop, which is the default, we specify a label to break out =
to, which is in=20
the outer loop. This works with as many nested loops as your brain can =
handle. You don't have=20
to use uppercase names but for namespace reasons it is recommended, and =
you can call your=20
labels whatever you please. I was just being unimaginative with the name =
of LBL, feel free=20
to invent labels called DORIS or MATILDA if that's what floats your =
personal boat.    


Changing the Elements of an Array

So we have @names . We = want to=20 change it. Run this:

print "Enter a =
name :";
chomp ($x=3D<STDIN>);

@names=3D("Muriel","Gavin","Susanne","Sarah");

print "@names\n";

push (@names, $x);

print "@names\n";

Fairly self explanatory. The push =
function just adds a value on to the end of the array.=20
Of course, Perl being Perl, it doesn't have to be just the one value: =
print "Enter a name :";
chop ($x=3D<STDIN>);

@names=3D("Muriel","Gavin","Susanne","Sarah");
@cities=3D("Brussels","Hamburg","London","Breda");

print "@names\n";

push (@names, $x, 10, @cities[2..4]);

print "@names\n";

This is worth looking at in more detail. It appears there is no fifth =
element of=20
@cities , as referred to by =
@cities[2..4] .    

Actually, there is a fifth element. Add this to the end of the = example :=20

print "There are ",scalar(@names)," =
elements in \@names\n";

There appear to be 8 elements in @names =
. However, we have just proved there are in fact 9.=20
The reason there are 9 is that we referred to non-existent elements of =
@cities , and Perl=20
has quite happily extended @names =
to suit. The array @cities =
remains unchanged. Try poping=20
the array if you don't believe me.    

So that's push . Now for = some...=20

Jiggerypokery with Arrays

@names=3D("Muriel","Gavin","Susanne","Sarah");
@cities=3D("Brussels","Hamburg","London","Breda");

&look;

$last=3Dpop(@names);
unshift (@cities, $last);

&look;

sub look {
        print "Names : @names\n";
        print "Cities: @cities\n";
}

Now we have two arrays. The pop =
function removes the last element of an array and returns =
it,
which means you can do something like assign the returned value to a =
variable.   =20
The unshift function adds a =
value to the beginning of the array. Hope you didn't forget that=20
&subroutinename calls a =
subroutine. Presented below are the functions you can use to work with =
arrays:=20
   

A table of array hacking functions

push Adds value to the end of the array
pop Removes and returns value from end of array
shift Removes and returns value from beginning of array
unshift Adds value to the beginning of array =

Now, accessing other elements of arrays. May I present the splice function ?

Splice

@names=3D("Muriel","Sarah","Susanne","Gavin");

&look;

@middle=3Dsplice (@names, 1, 2);

&look;

sub look {
        print "Names : @names\n";
        print "The Splice Girls are: @middle\n";
}

The first argument for splice =
is an array. Then second is the offset. The offset is the =
index   =20
number of the list element to begin splicing at. In this case it is 1. =
Then   =20
comes the number of elements to remove, which is sensibly 1 or more in =
this   =20
case. You can set it to 0 and perl, in true perl style, won't complain.  =
 =20
Setting to 0 is handy because splice =
can add elements to the middle of an array, and if you =
don't   =20
want any deleted 0 is the number to use. Like so: 
@names=3D("Muriel","Gavin","Susanne","Sarah");
@cities=3D("Brussels","Hamburg","London","Breda");

&look;

splice (@names, 1, 0, @cities[1..3]);

&look;

sub look {
        print "Names : @names\n";
        print "Cities: @cities\n";
}

Notice how the assignment to @middle    =

has gone -- it is no longer relevant.    

If you assign the result of a splice=20 to a scalar then:

@names=3D("Muriel","Sarah","Susanne","Gavin");

&look;

$middle=3Dsplice (@names, 1, 2);

&look;

sub look {
        print "Names : @names\n";
        print "The Splice Girls are: $middle\n";
}

then the scalar is assigned the last element removed, or undef if   =20
it doesn't work at all.    

The splice function is = also a way=20 to delete elements from an array. In fact, a discussion of :

Deleting Variables

is in order. Suppose we want to delete Hamburg from the following = array.=20 How do we do it ? Perhaps:

@cities=3D("Brussels","Hamburg","London","Breda");

&look;

$cities[1]=3D"";

&look;

sub look {
	print "Cities: ",scalar(@cities), ": @cities\n";
}
   

would be appropriate. Certainly Hamburg is removed. Shame, such a = great=20 lake. But note, the array element still exists. There are still four = elements=20 in @cities. So what we need = is the=20 appropriate splice function, = which=20 removes the element entirely.

splice =
(@cities, 1, 1);
   

Now that's all well and good for arrays. What about ordinary = variables,=20 such as these:

$car =3D"Porsche 911";
$aircraft=3D"G-BBNX";

&look;

$car=3D"";

&look;

sub look {
	print "Car :$car: Aircraft:$aircraft:\n";
	print "Aircraft exists !\n" if $aircraft;
	print "Car exists !\n" if $car;
}
   

It looks like we have deleted the $car=20 variable. Pity. But think about it. It is not deleted, = it is=20 just set to the null string "". As you recall (hopefully) from = previous=20 ramblings, the null string evaluates to false so the if test fails.

False values versus Existence: It is, therefore...

Just because something is false doesn't mean to say it doesn't = exist. A wig=20 is false hair, but a wig exists. Your variable is still there. Perl = does have=20 a function to test if something exists. Existence, in Perl terms, = means=20 defined. So:

print "Car is defined !\n" =
if defined $car;
   

will evaluate to true, as the $car=20 variable does in fact exist.

This begs the question of how to really wipe variables from the = face of the=20 earth, or at least your Perl script. Simple.

$car 	 =3D"Porsche 911";
$aircraft=3D"G-BBNX";

&look;

undef $car; # this undefines $car

&look;

sub look {
	print "Car :$car: Aircraft:$aircraft:\n";
	print "Aircraft exists !\n"  if $aircraft;
	print "Car exists !\n" 	     if defined $car;
}
   

This variable $car is = eradicated,=20 deleted, killed, destroyed.

And now for something completely different....

Basic Regular Expressions


An introduction

Or regex for short. These can be a little intimidating. But = I'll bet=20 you have already used some regex in your computing life so far. Have = you even=20 said "I'll have any Dutch beer ?" That's a regex which will match a = Grolsch or=20 Heineken, but not a Budweiser, orange juice or cheese toastie. What = about=20 dir *.txt ? That's a regular expression too, listing any = files ending=20 in .txt.

Perl's regex often look like this:

$name=3D~/piper/
   

That is saying "If 'piper' is inside $name, then True."

The regular expression itself is between / /=20 slashes, and the =3D~=20 operator assigns the target for the search.

An example is called for. Run this, and answer it with 'the faq'. = Then try=20 'my tealeaves' and see what happens.

print "What do you read before joining any Perl =
discussion ? ";
chomp ($_=3D<STDIN>);

print "Your answer was : $_\n";

if ($_=3D~/the faq/) {
        print "Right !  Join up !\n";
} else {
        print "Begone, vile creature !\n";
}

So here $_ is searched   =20
for 'the faq'. Guess what we don't need ! The =3D~ .
This works just as well: 
if (/the faq/) =
{

because if you don't specify a variable, then perl searches $_ by default.=20
In this particular case, it would be better to use 
 if ($_ eq "the faq") { as we are testing =
for exact matches.   =20
   

Senstivity -- regexes in touch with their inner child

But what if someone enters 'The FAQ' ? It fails, because the regex = is case=20 sensitive. We can easily fix that:

if =
(/the faq/i) {

with the /i switch, which   =20
specifies case-insensitivity. Now it works for all variations, such as =
"the   =20
Faq" and "the FAQ".    

Now you can appreciate why a regular expression is better in this = situation=20 than a simple test using eq = . As the=20 regex searches one string for another string, a response of "I would = read the=20 FAQ first !" will also work, because "the FAQ" will match the regex. =

Study this example just to clarify the above. Tabs and spaces have = been=20 added for aesthetic beauty:

$_=3D"perl =
for Win32";                            # sets the string to be searched

if ($_=3D~/perl/) { print "Found perl\n" };       # is 'perl' inside $_ =
?  $_ is "perl for Win32".
if (/perl/)     { print "Found perl\n" };       # same as the regex =
above.  Don't need the =3D~ as we are testing $_
if (/PeRl/)     { print "Found PeRl\n" };       # this will fail because =
of case sensitivity
if (/er/)       { print "Found er\n" };         # this will work, =
because there is an 'er' in 'perl'
if (/n3/)       { print "Found n3\n" };         # this will work, =
because there is an 'n3' in 'Win32'
if (/win32/)    { print "Found win32\n" };      # this will fail because =
of case sensitivity
if (/win32/i)   { print "Found win32 (i)\n" };  # this will *work* =
because of case insensitivity (note the /i)

print "Found!\n"  if      / /;                  # another way of doing =
it, this time looking for a space

print "Found!!\n" unless $_!~/ /;		# both these are the same, but =
reversing the logic with unless and !
print "Found!!\n" unless    !/ /;		# don't do this, it will always never =
not confuse nobody :-)
						# the ~ stays the same, but =3D is changed to ! (negation)

$find=3D32;                                       # Create some =
variables to search for
$find2=3D" for ";                                 # some spaces in the =
variable too

if (/$find/)  { print "Found '$find'\n" };      # you can search for =
variables like numbers
if (/$find2/) { print "Found '$find2'\n" };     # and of course strings =
!

print "Found $find2\n" if /$find2/;           # different way to do the =
above

As you can see from the last example, you can embed a variable in   =20
the regex too. Regular expressions could fill entire books (and they =
have   =20
done, see the book critiques at http://www.perl.com/) but here are some =
useful   =20
tricks:=20
   

Character Classes

@names=3Dqw(Karlson Carleon Karla Carla Karin =
Carina Needanotherword);

foreach (@names) {                      # sets each element of @names to =
$_ in turn
        if (/[KC]arl/) {                # this line will be changed a =
few times in the examples below
                print "Match !  $_\n";
        } else {
                print "Sorry.   $_\n";
        }
}

This time @names is   =20
initialised using whitespace as a delimiter instead of a comma. qw refers to=20
'quote words', which means split the list by words. A word ends with =
whitespace=20
(like tabs, spaces, newlines etc).    

The square brackets enclose single characters to be matched. = Here=20 either Karl or Carl must be in each element. It = doesn't=20 have to be two characters, and you can use more than one set. Change = Line 4 in=20 the above program to:

if =
(/[KCZ]arl[sa]/) {
   

matches if something begins with K, C, or Z, then arl, then either = s or a.=20 It does not match KCZarl. Negation is possible too, so try this = :=20

if (/[KCZ]arl[^sa]/) =
{
which returns things beginning with K, =
C or    Z, then arl, and then anything EXCEPT =
s or a. The    caret ^ =
has to be the first character,    otherwise it doesn't =
work as the negation. Having said [ ] =
defines single characters only, I should    mention than =
these two are the same : 
/[abcdeZ]arl/;
/[a-eZ]arl/;

if you use a hyphen then you get the list of characters including   =20
the start and finish characters. And if you want to match a special =
character   =20
(metacharacter), you must escape it: 
/[\-K]arl/;

matches Karl or -arl. Although the - character is represented = by two=20 characters, it is just the one character to match.

Matching at specific points

If you want to match at the end of the line, make sure a $ is the last character in the = regex. This=20 one pulls out all those names ending in a. Slot it into the example = above :=20

if (/a$/) {
   

And there is a corresponding character, the caret ^ , which in this context matches = at the=20 beginning of the string. Yes, the caret also negates a = character class=20 like this [^KCZ]arl but in = this case=20 it anchors the match to the beginning of the string. =


if (/n/i)  {
if (/^n/i) {

The first one is true if the word contains an 'n' anywhere in it.   =20
The second specifies that the 'n' must be at the beginning of the string =
to be   =20
matched. Use this anchor where you can, because it makes the whole regex =
  =20
faster, and safer if you know what the first character must be.=20
   

Negating the regex

If you want to negate the entire regex change =3D~=20 to !~ = (Remember ! means 'not equal to'.)

if ($_ !~/[KC]arl/) {

Of course, as we are testing $_    =

this works too: 
if (!/[KC]arl/) {

   

Returning the Match

Now things get interesting. What if we want pull something out of a = string=20 ? So far all we have done is test for truth, that is say yea or nay if = a=20 string matches, but not return what we found. Run this:

$_=3D'My email address is =
<Robert@NetCat.co.uk>.';

/(<robert\@netcat.co.uk>)/i;

print "Found it ! $1\n";

Firstly, note the single quotes when $_    =

is assigned. If there were double quotes, we'd need \@ instead of @ .
Remember, double quotes "" =
allow variable interpolation, so Perl looks for an=20
array called @NetCat which =
does not exist.    

Secondly, look at the parens around the entire regex. If you use = parens, a=20 side effect is that the first match is put into a variable called = $1 . We'll get to the main effect = later. The=20 second match goes into $2 = and so on.=20 Also note that the \@ has = been=20 escaped, so perl doesn't think it is an array. Remember \ either escapes a special = character, or=20 gives a special meaning. Think of it as Superman's telephone box. = Imagine=20 Clark Kent walking around with his magic partner Back Slash.

Notice how we specify in the regex case-insensitivity with = /i and the regex returns the=20 case-sensitive string - that is, exactly what it found.

Try the regex without parens. Then try this one:

/<(robert)\@netcat.co.uk>/i;

You can put the parens anywhere. More or less. Now, run this :    =
$_=3D'My email address is =
<Robert@NetCat.co.uk>.';

/<(robert)\@(netcat.co.uk)>/i;

print "Found it ! $1 at $2\n";

See, you can have more than one ! Look at the above regex. Looks   =20
easy now, don't you think ? What about five minutes ago ? It would have =
looked   =20
like a typing mistake ! Well, there are some hairier regex to come, but =
you'll   =20
have a good barber.=20
   

* + -- regexes become line noise

What if we didn't know what the email address was going to be ? =

$_=3D'My email address is =
<webslave@work.com>.';

print "Found it ! :$1:" if /(<.*>)/i;

When you see an if    
statement like this, read it right to left. The print statement is only executed if =
code on   =20
the right of the expression is true.    

We'll discuss this. Firstly, we have the opening parens ( . So everything from ( to ) = will be put into $1 = if=20 the match is successful. Then the first character of what we are = searching=20 for, < . Then we have a = dot, or=20 period . . For this regex, = we can=20 assume . matches any = character at=20 all.

So we are now matching <=20 followed by any character. The *=20 means 0 or more of the previous character. The regex = finishes by=20 requiring > .

This is important. Get the basics right and all regex are easy (I = read=20 somewhere once). An example best illustrates the point. Slot this = regex in=20 instead:

$_=3D'My email address is =
<webslave@work.com>.';

print "Found it ! :$1:" if /(<*>)/i;

What's happening here ?    

The regex starts, logically, at the start of the string. This = doesn't mean=20 it starts a 'M', it starts just before M. There is a 'nothing' between = the=20 string start and 'M'.

The regex is searching for <* = ,=20 which is 0 or more < . =

The first thing it finds is not <=20 , but the nothing in between the start of the string and = the 'M'=20 from 'My email...". Does this match ?

As the regex is looking for "0 or more" <=20 , we can certainly say that there are 0 < at the start of the string. = So the=20 match is, so far, successful. We have dealt with <* .

However, the next item to match is > = . Unfortunately, the next item in the string is 'M', = from 'My=20 email..". The match fails at this point. Sure, it matched < without any problem, but the=20 complete match has to work.

The only two characters that can match successfully at this point = are < or >=20 . The 'point' being that <*=20 has been matched successfully, and we need either > to complete the match or more = of < to continue the '0 or more' = match=20 denoted by * .

'M' is neither of them, so it fails at this point, when it has = matched

Quick clarification - the regex cannot successfully match < , then skip on ahead through = the string=20 until it matches > . = The=20 characters in the string between < > = also need to match the regex, and they don't in this = case.=20

All is not lost. Regexes are hardy little beasts and don't give up = easily.=20 An attempt is made to match the regex wherever possible. The regex = system=20 keeps trying the match at every possible place in the string, working = towards=20 the end.

Let's look at the match when it reaches the 'm' in 'work.com'.

Again, we have here 0 < = . So the=20 match works as before. After success on <*=20 the next character is analysed - it is a > , so the match is successful. =

But, be warned. The match may be successful but your job is not = done.=20 Assuming the objective of was to return the email address within the = angle=20 brackets then that regex is a miserable failure. Watch for traps of = this=20 nature when regexing.

That's * explained. Just = to=20 consolidate, a quick look at:

$_=3D'My =
email address is <webslave@work.com>.';
print "Match 1 worked :$1:" if /(<*)/i;

$_=3D'<My email address is <webslave@work.com>.';
print "Match 2 worked :$1:" if /(<*)/i;

$_=3D'My email address is <webslave@work.com<<<<>.';
print "Match 3 worked :$1:" if /(<*>)/i;

Match 1 is true. It doesn't return anything, but it is true   =20
because there are 0 < at =
the very   =20
start of the string.    

Match 2 works. After the 0 < = at=20 the start of the string, there is 1 <=20 so the regex can match that too.

Match 3 works. After the failing on the first < , it jumps to the second. = After that,=20 there are plenty more to match right up until the required ending. =

Glad you followed that. Now, pay even closer attention ! = Concentrate fully=20 on the task at hand ! This should be straightforward now:

$_=3D'HTML <I>munging</I> time !.';

/<I>(.*)<\/I>/i;

print "Found it ! $1\n";

Pretty much the same as the above, except the parens are moved so   =20
we return what's only inside the tags, not including the tags =
themselves. Also   =20
note how / is escaped like so; =
\/ otherwise Perl thinks =
that's the end of   =20
the regex.    

Now, suppose we change $_ = to :=20

$_=3D'HTML <I>munging</I> =
time is here <I>again</I> !.';

and run it again. Interesting effect, eh ? This is known as   =20
Greedy Matching. What happens is that when Perl finds the initial match, =
that   =20
is <I> it jumps right to =
the end   =20
of the string and works back from there to find a match, so the longest =
string   =20
matches. This is fine unless you want the shortest string. And there is =
a   =20
solution: 
/<I>(.*?)<\/I>/i;
Just add a question mark and Perl does stingy =
matching. No   =20
nationalistic jokes. I have Dutch and Scottish friends I don't want to =
offend.     


The Difference Between + and *

You know what * means, = namely match=20 0 or more. If you want to match 1 or more, then use + . The difference is important. =

$_=3D'The number is 2200 and the day is Monday';

($star)=3D/([0-9]*)/;

($plus)=3D/([0-9]+)/;

print "Star is '$star' and Plus is '$plus'\n";

You'll note that $star has no value. The match was   =20
successful though. It managed to match 0 or more characters from 0 to 9 =
at the   =20
very start of the regex.    

The second regex with $plus worked a little better, = because we=20 are matching one or more characters from 0 to 9. Therefore, unless one = 0 to 9=20 is found the match will fail. Once a 0-9 is found, the match continues = as long=20 as the next character is 0-9, then it stops.

Now we know this, there is another way to remove an email address = from=20 within angle brackets:

$_=3D'My email =
address is <robert@netcat.co.uk> !.';

/<([^>]+)/i;

print "Found it ! $1\n";
This regex matches <. Then the =
capturing parens   =20
start. They have no effect on this regex other than to capture the =
match.   =20
After that, there is a character class, containing one character. As =
^    
is the first character is the class, it negates the class. That's why    =

we are using a character class with only one character in it, because it =
can   =20
be negated.    

So far we have matched < and anything that is not=20 >. The + ensures we match as many = characters that=20 are not <'s as we can. This has the same effect as=20 .*? but is more efficient. It may also suit your = purposes, as=20 .*? relies on you knowing what you want to match up to, = whereas=20 [^>]+ simply contines matching until it finds = something that=20 fails its criteria. Just make sure you understand the difference = because it is=20 a crucial part of regexery.


Re-using the match -- \1, $1...

Suppose we didn't know what HTML tag we had to match ? It could be = B, I, EM=20 or whatever, and we want everything that is in between. Well, HTML = container=20 tags like B and EM have end tags which are the same as the start tag, = except=20 for the / . So what we could do is:=20

  • find out what is inside < >=20
  • search for exactly the same tag, but with the closing /=20
  • return whatever is in between.

Can this be done ? Of course. This is perl, all things are = possible. Now,=20 remember the side effect of parens. I promise I'll explain the primary = effect=20 at some point. If whatever is in (parens) matches, the result is = stored in a=20 variable called $1 . So we = can use=20 <(.*?)> which will = find us < then as many anythings (the = . and *=20 ) up to the next, not last >=20 (the ? forces = stingy=20 matching).

The result is stored in $1 = because=20 we used parens. Next, we need everything up to the closing tag. That's = easy :=20 (.*?) matches everything up = until the=20 next character or set of characters. And how exactly do we define = where to=20 stop ?

We can use $1 even in the = same=20 regex it was found in. However, it is not referred to within a regex = as $1 , but \1=20 .

So we want to match </$1>=20 which in perl code is <\/\1>=20 . The / must = be escaped=20 because it is the end of the regex, and 1=20 is escaped so it refers to $1=20 instead of matching the number 1.

Still here ? This is what it looks like:

$_=3D'HTML <I>munging</I> time is here =
<I>again</I> !.';
/<(.*?)>(.*?)<\/\1>/i;

print "Found it ! $2\n";

If you want to know how to return all the matches above, read on.   =20
But before that:=20
   =

How to Avoid Making Mountains while Escaping Special = Characters

You want to match this; http://language.perl.com/faq/ = . That's=20 a real (useful) URL by the way. Hint. To match it, you need to do = this:=20

/http:\/\/language\.perl\.com\/faq\//;

which should make the awful metaphor above clearer, if not   =20
funnier. The slash, / , is not =
  =20
normally a metacharacter but as it is being used for the regular =
expression   =20
delimiters, it needs to be escaped. We already know that . is special.    

Fortunately for our eyes, Perl allows you to pick your delimiter if = you=20 prefix it with 'm' as this example shows. We'll use a #:=20

m#http://language\.perl\.com/faq/#;=20
Which is a huge improvement, as we change / to # =
.
We can go further with readability by quoting everything:    
m#\Qhttp://language.perl.com/faq/\E#;
The \Q =
escapes everything   =20
up until \E or the regex =
delimiter (so   =20
we don't really need the \E above). In this case #    
will not be escaped, as it delimits the regex.    

Someone once posted a question about this to the Perl-Win32-Users = mailing=20 list and I was so intrigued about this apparently undocumented trick I = spent=20 the next twenty minutes figuring it out by trial and error, and posted = a=20 reply. Next day I found lots of messages telling the poster to read = the manual=20 because it was clearly documented. <face colour=3D'red' = intensity=3D'high'>=20 My excuse was I didn't have the docs to hand....moral of the story - = RTFM and=20 RTF FAQs !

Subsitution and Yet More Regex Power


Basic changes

Suppose you want to replace bits of a string. For example, 'us' = with=20 'them'.

$_=3D'Us ? The bus usually =
waits for us, unless the driver forgets us.';

print "$_\n";

s/Us/them/;   # operates on $_, otherwise you need $foo=3D~s/Us/them/;

print "$_\n";

What happens here is that the string 'Us' is searched for, and   =20
when a match is found it is replaced with the right side of the =
expression, in   =20
this case 'them'. Simple.    

You'll notice that only one substitution was made. To match = globally use=20 /g which runs through the = entire=20 string, changing wherever it can. Try:

s/Us/them/g;

which fails. This is because regexes are not, by default,   =20
case-sensitive. So: 
s/us/them/ig;

would be a better bet. Now, everything is changed. A little too   =
=20
much, but one problem at a time. Everything you have learn about regex =
so far   =20
can be used with s/// , like =
parens,   =20
character classes [ ] , greedy =
and   =20
stingy matching and much more. Deleting things is easy too. Just specify =
  =20
nothing as the replacement character, like so s/Us//; .    

So we can use some of that knowledge to fix this problem. We need = to make=20 sure that a space precedes the 'us'. What about:

s/ us/them/g;

An small improvement. The first 'Us' is now no longer changed,    =

but one problem at a time ! We'll first consider the problem of the =
regex   =20
changing 'usually' and other words with 'us' in them.    

What we are looking for is a space, then 'us', then a comma, period = or=20 space. We know how to specify one of a number of options - the = character=20 class.

s/ us[. ,]/them/g;

Another tiny step. Unfortunately, that step wasn't really in the  =
 =20
right direction, more on the slippery slope to Poor Programming =
Practice. Why?=20
Because we are limiting ourselves. Suppose someone wrote ' send it to =
us;   =20
when we get it'.    

You can't think of all the possible permutations. It is often = easier, and=20 safer, to simply state what must not follow the match. In this = case, it=20 can be anything except a letter. We can define that as a-z. So we can = add that=20 to the regex.

s/ us[^a-z]/ them/g;

the caret ^ negates the =
  =20
character class, and a-z =
represents   =20
every alphabet from a to z inclusive. A space has been added to the   =20
substitution part - as the original space was matched, it should be =
replaced   =20
to maintain readability.    


\w

What would be more useful is to use a-zA-Z=20 instead. If we weren't using /i=20 we'd need that. As a-zA-Z=20 is such a common construct, Perl provides an easy = shorthand:=20

s/ us[^\w]/ them/g;

The \w construct =
actually   =20
means 'word' - equivalent to a-zA-Z_0-9 =
.
So we'll use that instead.    

To negate any construct, simply capitalise it:

s/ us[\W]/ them/g;

and of course we don't need the negating caret now. In fact, we   =
=20
don't even need the character class ! 
s/ us\W/ them/g;

So far, so good. Matching the first 'us' is going to be difficult =
  =20
though. Fortunately, there is an easy solution. We've seen Perl's =
definition   =20
of a word - \w . Between each =
word is   =20
a boundary. You can match this with \b    =
. 
s/\bus\W/ them/g;

that's \b followed by =
'us', not 'bus' :-)
Now, we require a word boundary before 'us'. As there   =20
is a 'nothing' at the start of the string, we have a match. There is a =
space   =20
after the first 'Us', so the match is successful. You might notice an =
extra   =20
space has crept in - that's the space we added earlier. The match =
doesn't   =20
include the space any more - it matches on the word boundary, that is =
just=20
before the word begins. The space doesn't count.    

Did you notice the final period and the comma are replaced ? They = are part=20 of the match - it is the

Replacing with what was found

\W that matches them. We = can't=20 avoid that. We can however put back that part of the match.

s/\bus(\W)/them\1/g;

We start with capturing whatever the \W    
matches, using parens. Then, we add it to the replacement   =20
string. The capture is of course in $1 =
,
but as it is in a regex we refer to it as \1 =
.    

The final problem is of course capitalising the replacement string = when=20 appropriate. Which in old versions of the tutorial I left as an = exercise to=20 the reader, having run out of motivation. A reader by the name of Paul = Trafford duly solved the problem, and I have just inserted his = excellent=20 explanation for the elucidation of all concerned:

#         =
Solution to the us/them problem...
#
#   The program works through the text assigning the=20
#   variable $1 to 'U' or 'u' for any words where this=20
#   letter is followed by 's' and then by non 'word'=20
#   characters.   The latter is assigned to variable $2.
#
#   For each such matching occurrence, $1 is replaced by=20
#   the letter that precedes it in the alphabet using=20
#   operations 'ord' and 'chr' that return the ASCII value=20
#   of a character and the character corresponding to a=20
#   given natural number.  After this 'hem' is tacked on=20
#   followed by $2, to retain the shape of the original=20
#   sentence.  The '/e' switch is used for evaluation.
#
#   NOTES
#   1. This solution will not replace US (short for=20
#   United States) with Them or them.
#
#   2. If a 'magical' decrement operator '--' existed for=20
#   strings then the solution could be simplified for we=20
#   wouldn't need to use the 'chr' and 'ord' operators.
$_=3D'Us ? The bus usually waits for =
us, unless the driver forgets us.';

print "$_\n";

s/\b([Uu])s(\W)/chr(ord($1)-1).hem.$2/eg;

print "$_\n";
   

An excellent solution, thanks Paul.

There are several more constructs. We'll take a quick look at \d which means anything that is a = digit,=20 that is 0-9 . First we'll = use the=20 negated form, \D , which is = anything=20 except 0-9 : =

print "Enter a number :";
chop ($input=3D<STDIN>);

if ($input=3D~/\D/) {
        print "Not a number !!!!\n";
} else {
        print 'Your answer is ',$input x 3,"\n";

}

this checks that there are no non-number characters in $x . It's not perfect because=20
it'll choke on   =20
decimal points, but it's just an example. Writing your own =
number-checker is   =20
actually quite difficult, but it is an interesting exercise. Try it, and =
see   =20
how accurate yours is.    


x

I hope you trusted me and typed the above in exactly as it is show = (or=20 pasted it), because the x is = not a=20 mistake, it is a feature. If you were too smart and changed it to a = * or something change it back and = see what=20 it does.

Of course, there is another way to do it :

unless ($input=3D~/\d/) {
        print 'Your answer is ',$input x 3,"\n";
} else {
        print "Not a number !!!!\n";
}

which reverses the logic with an unless    =
statement.=20
   

More Matching

Assume we have:

$_=3D'HTML =
<I>munging</I> time is here <I>again</I> !.';

and we want to find all the italic words. We know that /g will match globally, so=20
surely this will    work : 
$_=3D'HTML =
<I>munging</I> time is here <I>again</I> ! What =
<EM>fun</EM> !';

$match=3D/<i>(.*?)<\/i>/ig;

print "$match\n";

except it returns 1, and there were definitely two matches. The   =20
match operator returns true or false, not the number of matches. So you =
can   =20
test it for truth with functions like if, =
while,   =20
unless Incidentally, the s///  =
  
operator does return the number of substitutions.    

To return what is matched, you need to supply a list.

($match) =3D /<i>(.*?)<\/i>/i;

which handily puts all the first match into $match . Note that an =3D =
is used   =20
(for assignment), as opposed to =3D~ (to point the regex at =
a    variable other than $_.    

The parens force a list context in this case. There is just the one = element=20 in the list, but it is still a list. The entire match will be assigned = to the=20 list, or whatever is in the parens. Try adding some parens:

$_=3D'HTML <I>munging</I> time is =
here <I>again</I> ! What <EM>fun</EM> !';

($word1, $word2) =3D /<i>(.*?)<\/i>/ig;

print "Word 1 is $word1 and Word 2 is $word2\n";

In the example above notice /g    =

has been added so a global replacement is done - this   =20
means perl carries on matching even after it finds the first match. Of =
course,   =20
you might not know how many matches there will be, so you can just use =
an   =20
array, or any other type of list: 
$_=3D'HTML <I>munging</I> time is here =
<I>again</I> ! What <EM>fun</EM> !';

@words =3D /<i>(.*?)<\/i>/ig;

foreach $word (@words) {
        print "Found $word\n";
}

and @words will be grown   =20
to the appropriate size for the matches. You really can supply what you =
like   =20
to be assigned to: 
($word1, =
@words[2..3], $last) =3D /<i>(.*?)<\/i>/ig;
you'll need more italics for that last one to work. It was only a =
   demonstration.    

There is another trick worth knowing. Because a regex returns true = each=20 time it matches, we can test that and do something every time it = returns true.=20 The ideal function is while = which=20 means 'do something as long the condition I'm testing is true'. In = this case,=20 we'll print out the match every time it is true.

$_=3D'HTML <I>munging</I> time is here =
<I>again</I> ! What <EM>fun</EM> !';

while (/<(.*?)>(.*?)<\/\1>/g) {
        print "Found the HTML tag $1 which has $2 inside\n";
}

So the while operator runs the regex, and if it is true, carries   =20
out the statements inside the block.    

Try running the program above without the /g=20 . Notice how it loops forever ? That's because the = expression=20 always evaluates to true. By using the /g=20 we force the match to move on until it eventually fails. =

Now we know this, an easy way to find the number of matches is: =

$_=3D'HTML <I>munging</I> time is =
here <I>again</I> ! What <EM>fun</EM> !';

$found++ while /<i>.*?<\/i>/ig;

print "Found $found matches\n";

You don't need braces in this case as nothing apart from the   =20
expression to be evaluated follows the while =
   function.=20
   

Parentheses Again: OR

The real use for them. Precedence. Try this, and yes you can try it = at=20 home:

$_=3D'One word sentences ? =
Eliminate. Avoid clich=E9s like the plague.  They are old hat.';

while (/o(rd|ne|ld)/gi) {
        print "Matched $1\n";
}

Firstly, notice the subtle introduction of the or operator, in this case | ,=20
the pipe. What I really want to explain   =20
however, is that this regex matches o followed by rd, ne or ld. Without =
the   =20
parens it would be /ord|ne|ld/ =
which   =20
is definitely not what we want. That matches just plain ord, or ne or =
ld.    


(?: OR Efficiency)

In the interests of efficiency, consider this:

print "Give me a name :";
chop($_=3D<STDIN>);

print "Good name\n" if /Pe(tra|ter|nny)/;

The code above functions correctly. If you were wondering what a   =20
good name is, Petra, Peter and Penny qualify. The regex is not as =
efficient as   =20
it could be though. Think about what Perl is doing with the regex, that =
you   =20
are just ignoring. Simply throwing away casually. Without consideration =
as to   =20
the effort that has gone into creating it for you. The resources =
squandered.   =20
The little bytes of memory whose sole function in life is to store this  =
 =20
information, which will never be used.    

What's happening is that because parens are used, perl is creating=20 $1 for your usage and abusage. While this may not seem = important,=20 a fair amount of resources go into creating $1, = $2=20 and so on. Not so much the memory used to store them, more the CPU = effort=20 involved. So, if you aren't going to use the parens for capturing = purposes,=20 why bother capturing the match?

print =
"Give me a name :";
chop($_=3D<STDIN>);

print "Good name\n" if /Pe(?:tra|ter|nny)/;

print "The match is :$1:\n";

The second print statement demonstrates that nothing is captured   =20
this time. You get the benefits of the paren's precedence-changing   =20
capabilities, but without the overhead of the capturing. This benefit is =
  =20
especially worthwhile if you are writing CGI programs which use parens =
in   =20
regex -- with CGI, every little of bit efficiency counts.    


Matching specific amounts of...

Finally, take a look at this :

$_=3D'I am sleepy....zzzz....DING ! Wake Up!';

if (/(z{5})/) {
        print "Matched $1\n";
} else {
        print "Match failed\n";
}

The braces { } specify   =20
how many of the preceding character to match. So z{2} matches exactly two 'z's and so =
on.   =20
Change z{5} to z{4} and see how it works. And =
there's    more...    
/z{3}/ 3 z only
/z{3,}/ At least 3 z
/z{1,3}/ 1 to 3 z
/z{4,8}/ 4 to 8 z

To any of the above you may suffix an question mark, the effect of = which is=20 demonstrated in the following program. Run it a couple of times, = inputting 2,=20 3 and 4:

print "How many letters do you =
want to match ? ";
chomp($num=3D<STDIN>);

# we assign and print in one smooth move
print $_=3D"The lowest form of wit is indeed sarcasm, I don't think.\n";

print "Matched \\w{$num,} : $1 \n"  if /(\w{$num,})/;

print "Matched \\w{$num,?}: $1 \n"  if /(\w{$num,}?)/;
   

The first match is 'match any word (that's a-Z0-9_) = equal to=20 or longer than $num = character, and=20 return it.' So if you enter 4, then 'lowest' is returned. The word = 'The'=20 doesn't match.

The second match is exactly the same, but the ?=20 forces a minimal match, so only the part actually = matched is=20 returned.

Just to clear this up, amend the program thus:

print "\nMatched \\w{$num,} :";
print "$1 " while /(\w{$num,})/g;

print "\nMatched \\w{$num,?} :";
print "$1 " while /(\w{$num,}?)/g;
   

Note the addition of /g . = Try it=20 without - notice how the match never moves on ?


Pre, Post, and Match

And now on the Regex Programme Today, we have guest stars Prematch, = Postmatch and Match. All of whom are going to slow our entire = programme down,=20 but are useful anyway :

$_=3D'I am =
sleepy....snore....DING ! Wake Up!';

/snore/;	# look, no parens !

print "Postmatch: $'\n";
print "Prematch: $`\n";
print "Match: $&\n";

If you are wondering what the difference between match and using  =
 =20
parens is you should remember than you can move the parens around, but =
you   =20
can't vary what $& and its =
ilk   =20
return. Also, using any of the above three operators does slow your =
entire   =20
program, whereas using parens will just slow the particular regex you =
use them   =20
for. However, once you've used one of the three matches you might as =
well use   =20
them all over the place as you've paid the speed penalty. Use parens =
where   =20
possible.=20
   

RHS Expressions


/e

RHS means Right Hand Side. Suppose we have an HTML file, which = contains:=20

<FONT SIZE=3D2> <FONT SIZE=3D4> <FONT =
SIZE=3D6>

and we wish to double the size of each font so 2 becomes 4 and 4 = becomes 8=20 etc. What about :

$data=3D"<FONT =
SIZE=3D2> <FONT SIZE=3D4> <FONT SIZE=3D6>";

print "$data\n";

$data=3D~s/(size=3D)(\d)/\1\2 * 2/ig;

print "$data\n";

which doesn't really work out. What this does is match   =20
size=3Dx, where x is any digit. The first =
match,   =20
size=3D, goes into $1 =
and   =20
the second match, whatever the digit is, goes into $2 . The second part of the regex =
simply   =20
prints $1 and $2 (referred to as \1 and \2    ), and attempts to multiply =
$2    
by 2. Remember /i means   =20
case insensitive matching.    

What we need to do is evaluate the right hand side of the regex as = an=20 expression - that is not just print out what it says, but actually = evaluate=20 it. That means work it through, not blindly treat it as string. Perl = can do=20 this:

$data=3D~s/(size=3D)(\d)/$1.($2 * =
2)/eig;

A little explanation....the LHS is the same as before. We add   =20
/e so Perl evaluates the RHS =
as an   =20
expression. So we need to change \1    =

into $1 and so on. The   =20
parens are there to ensure that $2 * 2    =

is evaluated, then joined to $1 . And that's it !    


/ee

It is even possible to have more than one /e=20 . For example:

$data=3D'The function is <5funcA>';

$funcA=3D'*2+4';

print "$data\n";

$data=3D~s/<(\d)(\w+)>/($1+2).${$2}/;	# first time
# $data=3D~s/<(\d)(\w+)>/($1+2).${$2}/e;	# second time
# $data=3D~s/<(\d)(\w+)>/($1+2).${$2}/ee;	# third time

print "$data\n";

To properly appreciate this you need to run it three times, each  =
 =20
time commenting out a different line. Only one regex line should be   =20
uncommented when the program is run.    

The first time round the regex is a dumb variable interpolation. = Perl just=20 searches the string for any variables, finds $1 and=20 $2, and replaces them.

Second time round the expression is evaluated, as opposed to just = plain=20 variable-interpolated. This means that $1+2 is evaluated. = $1 has a value of 5, pl, plus 2 =3D=3D 7. The other part = of the=20 replacement, ${$2} is evaluated only so far as working = out that=20 the variable named $2 should be placed in the string. =

Third time round and Perl now makes a second pass through the = string,=20 looking for things to do. After the first pass, and just before that = second=20 pass the string looks like this; 7*2+4 . Perl evaluates = this, and=20 prints the result.

So the more /e 's you add = on the=20 end of the regex, the more passes Perl makes through the replacement = string=20 trying to evaluate the code.

This is fairly advanced stuff here, and it is probably not = something you=20 will use every day. But knowing it is there is handy.


A Worked Example: Date Change

Imagine you have a list of dates which are in the US format of = month, day,=20 year as opposed to the rest of the world's logical notion of day, = month year.=20 We need a regex to transpose the day and month. The dates are: =

@dates=3D(
'01/22/95',
'05/15/87',
'8-13-96',
'5.27.78',
'6/16/1993'
);

The task can be split into steps such as:=20

  1. Match the first digit, or two digits. Capture this result.=20
  2. Match the delimiter, which appears to be one of / - = .=20
  3. Match the second two digits, and capture that result=20
  4. Rebuild the string, but this time reversing the day and month. =

That may not be all the steps, but it is certainly enough for a = start.=20 Planning regex is important. So, first pass:

@dates=3D(
'01/22/95',
'5/15/87',
'8-13-96',
'5.27.78',
'6/16/1993'
);

foreach (@dates) {
	print;
	s#(\d\d)/(\d\d)#$2/$1#;
	print " $_\n";
}

Hmm. This hasn't worked for the dates delimited with - . ,=20
and the last date hasn't worked either. The first problem is pretty   =20
easy; we are just matching / , nothing else. The second =
problem   =20
arises because we are matching two digits. Therefore, 5/15/87 is matched =
on   =20
the 15 and 87, not the 5 and 15. The date 6/16/1993 is matched on the 16 =
and   =20
the 19 of 1993.    

We can fix both of those. First, we'll match either 1 or 2 digits. = There=20 are a few ways of doing this, such as \d{1,2} which means = either=20 1 or two of the preceding character, or perhaps more easily = \d\d?=20 which means match one \d and the other digit is optional, = hence=20 the question mark. If we used \d+ then that would match = 19988883=20 which is not a valid date, at least not as far as we are concerned. =

Secondly, we'll use a character class for all the possible date = delimiters.=20 Here is just the loop with those amendments:

foreach (@dates) {
	print;
	s#(\d\d?)[/-.](\d\d?)#$2/$1#;
	print " $_\n";
}

which fails. Examine the error statement carefully. The key word   =20
is 'range'. What range? Well, the range between / and . because =
-   =20
is the range operator within a character class. That means it is a =
special   =20
character, or a metacharacter. And to negate the special meaning of   =20
metacharacters we have to use a backslash.    

But wait! I don't hear you cry. Surely . is a = metacharacter=20 too? It is, but not within a character class so it doesn't need to be = escaped.=20

foreach (@dates) {
	print;
	s#(\d\d?)[/\-.](\d\d?)#$2/$1#;
	print " $_\n";
}

Nearly there. However, we are always replacing the delimiter with   =20
/ which is messy. That's an easy fix: 
foreach (@dates) {
	print;
	s#(\d\d?)([/\-.])(\d\d?)#$3$2$1#;
	print " $_\n";
}

so that fixes that. In case you were wondering, the .    
dot does not act as '1 of anything' inside a character class. It would   =
=20
defeat the object of the character class if it did. So it doesn't need   =
=20
escaping. There is a further improvement you can make to this regex: =
$m=3D'/.-';

foreach (@dates) {
	print;
	s#(\d\d?)([$m])(\d\d?)#$3$2$1#;
	print " $_\n";
}

which is good practice because you are bound to want to change   =20
your delimiters at some point, and putting them inside the regex is   =20
hardcording, and we all know that ends in tears. You can also re-use the =
  =20
$m variable elsewhere, which is good pratice.    

Did you notice the difference between what we assign to = $m and=20 what we had before?

    /\-.
$m=3D'/.-';

The difference is that the - is no longer escaped. Why = not?=20 Logic. Perl knows - is the range operator. Therefore, = there must=20 be a character to the immediate left and immediate right of it in = order for it=20 to work, for example e-f. When we assign a string to=20 $m, the range operator is the last character and = therefore has no=20 character to the right of it, so Perl doesn't interpret as a range = operator.=20 Try this:

$m=3D'/-.';

and watch it fail.

Something else that causes heartache is matching what you don't = mean to.=20 Try this:

@dates=3D(
'01/22/95',
'5/15/87',
'8-13-96',
'5.27.78',
'/16/1993',
'8/1/993',
);

$m=3D'/.-';

foreach (@dates) {
	print;
	s#(\d\d?)([$m])(\d\d?)#$3$2$1# or print "Invalid date! ";
	print " $_\n";
}

The two invalid dates at the end are let through. If you wanted   =20
to check the validity of every possible date since the start of the =
modern   =20
calendar then you might be better off with a database rather than a =
regex, but   =20
we can do some basic checking. The important point is that we know the   =
=20
limitations of what we are doing.    

What we can do is make sure of two things; that there are three = sets of=20 digits seperated by our chosen delimiters, and that the last set of = digits is=20 either two digits, eg 99, 98, 87, or four digits, eg 1999, 1998, 1987. =

How can we do this? Extend the match. After the second digit match = we need=20 to match the delimter again, then either 2 digits or four digits. How = about:=20

$m=3D'/.-';

foreach (@dates) {
	print;
	s#(\d\d?)([$m])(\d\d?)[$m](\d\d|\d{4})#$3$2$1$2# or print "Invalid =
date! ";
	print " $_\n";
}

which doesn't really work out. The problem is it lets 993   =20
through. This is because \d\d will match on the front of 993. =
Furthermore, we   =20
aren't fixing the year back on to the end result.    

The delimiter match is also faulty. We could match / as the first=20 delimiter, and - as the second. So, three problems to fix:

foreach (@dates) {
	print;
	s#(\d\d?)([$m])(\d\d?)\2(\d\d|\d{4})$#$3$2$1$2$4# or print "Invalid!";
	print " $_\n";
}

This is now looking like a serious regex. Changes:    
  1. We are re-using the second match, which is the delimiter, = further on in=20 the regex. That's what the \2 is. This ensures the = second=20 delimiter is the same as the first one, so 5/7-98 gets rejected.=20
  2. The $ on the end means end of string. Nothing = allowed after=20 that. So the regex now has to find either 2 or 4 digits at the end = of the=20 string, or it fails.=20
  3. Added the match of the year ($4) to the rebuild = section of=20 the regex.

Regex can be as complex as you need. The code above can be improved = still=20 further. We could reject all years that don't begin with either 19 or = 20 if=20 they are four-digit years. The other problem with the code so far is = that it=20 would reject a date like 02/24/99 which is valid because = there=20 are characters after the year. Both can be fixed:

@dates=3D(
'01/22/95',
'5/15/87',
'8-13-96',
'5.27.78',
'/16/1993',
'8/1/993',
'3/29/1854',
'! 4/23/1972 !',
);

$m=3D'/.-';

foreach (@dates) {
	print;
	s#(\d\d?)([$m])(\d\d?)\2(\d\d|(?:19|20)\d{2})(?:$|\D)#$3$2$1$2$4# or =
print "Invalid!";
	print " $_\n";
}

We have now got a nested OR, and the inner OR is non-capturing   =20
for reasons of efficiency and readability. At the end we alternate =
between   =20
letting the regex match either an end of line or any non-digit, =
symbolised   =20
with \D.    

We could go on. It is often very difficult to write a regex that = matches=20 anything of even minor complexity with absolute certainity. Think = about IP=20 addresses for example. What is important is to build the regex = carefully, and=20 understand what it can and cannot do. Catching anything supposedly = invalid is=20 a good idea too. Test your regex with all sorts of invalid data, and = you'll=20 understand what it can do.

Split and Join


Splitting

While you are in the regex mood, a quick look at split and join=20 . Destruction is always easier (just ask your car = mechanic), so=20 lets start with split . =

$_=3D'Piper:PA-28:Archer:OO-ROB:Antwerp';

@details=3Dsplit /:/, $_;

foreach (@details) {
        print "$_\n";
}

Here we give split is   =20
given two arguments. The first one is a regex specifying what to split =
on. The   =20
next is what to split. Actually, I could leave $_    
out because as usual it is the default if nothing is specified.    =

The assignment can either be a scalar variable or a list like an = array (or=20 hash, but at this time 'hash' to you means what you think the Dutch do = or a=20 silly drinking event spoilt by some running). If it's a scalar = variable you=20 get the number of elements the split has splut. Should that be 'the = split has=20 splittered' or 'the split has splat'. Hmmm. Probably 'the split has = split'.=20 You know what I mean. I think I just generated a Fatal Error in = English.dll.=20 Whoops. In any case, splitting to a scalar variable is not always a = Good=20 Thing, as we'll see later.

If the assignment is an array, then as you can see in the above = example the=20 array is created with the relevant elements in order. You can also = assign to=20 scalars, for example :

$_=3D'Piper:PA-28:Archer:OO-ROB:Antwerp';

($maker,$model,$name,$reg,$location) =3D split /:/, $_;
(@aircraft[0..1],$aname,@regdetails) =3D split /:/, $_;

$number=3Dsplit /:/ ;             # not bothering with the $_ at the =
end, as it is the default

print "Using the first 'split'\n";
print "$reg is a $maker $model $name based in $location\n";
print "There are $number details available on this aircraft\n\n";

print "Using the second 'split'\n";
print "You can find $regdetails[0], an $aircraft[1], $regdetails[1]\n";

This demonstrates that a list can be a list of scalar variables   =
=20
(which is basically what an array is anyway), and that you can easily =
see how   =20
many elements the expression can be split into.    

The example below adds a third parameter to split, which is how = many=20 elements you want returned. If you don't want the extra stuff at the = end pop it.

$_=3D'Piper:PA-28:Archer:OO-ROB:Antwerp';

@details=3Dsplit /:/, $_, 3;

foreach (@details) {
        print "$_\n";
}

In the example below we split    =

on whitespace. Whitespace, in perl terms, is a space,   =20
tab, newline, formfeed or carriage return. Instead of writing \t\n\f\r for each=20
of the above, you can simply use \s =
, or the negated version   =20
\S which means anything =
except   =20
whitespace. Think of whitespace as anything you know is there, but you =
can't   =20
see.    

The whitespace split is = specially=20 optimised for speed. I've used spaces, double spaces, a tab and a = newline in=20 the list below. Also note the + = ,=20 which means one or more of the preceding character, so it will split on any combination of = whitespace. And=20 I think the final split is = useful to=20 know. The split function = does not=20 return the delimiter, so in this case the whitespace will not be = returned.=20

$_=3D'Piper       PA-28  Archer         =
  OO-ROB
Antwerp';

@details=3Dsplit /\s+/, $_;

foreach (@details) {
        print "$_\n";
}

@chars=3Dsplit //, $details[0];

foreach $char (@chars) {
        print "$char !\n";
}

   

A very FAQ

The following question has come up at least three times in the=20 Perl-Win32-Users mailing list. Can you answer it ?

"My data =
is delimited by |, for example:
name|age|sex|height|
Why doesn't
@array=3Dsplit /|/, $line;
work ?"

Why indeed. If you don't already know the answer, some simple=20 troubleshooting steps can be applied. First, create a sample program = and run=20 it.

$line=3D'name|age|sex|height';

@array=3Dsplit /|/,$line;

foreach (@array) { print "$_\n" }

The effect is to split    
each character. The | is   =20
returned. As it is the delimiter, |    =

should be ignored, not returned.    

At this point you should be thinking 'metacharacter'. A little = research=20 (looking at the documentation) will reveal that |=20 is indeed a metacharacter, which means 'or', when inside = a=20 regex. So, in effect, the regex /|/=20 means 'nothing, or nothing'. The split=20 is therefore performed on 'nothings', and there are = 'nothings'=20 in between each character. The solution is easy ; /\|/ .

$line=3D'name|age|sex|height';

@array=3Dsplit /\|/,$line;

foreach (@array) { print "$_\n" }
   

So that's the fun stuff, destruction. Now to put it back together = again=20 with join .

What Humpty Dumpty needs : Join

$w1=3D"Mission critical ?";
$w2=3D"Internet ready modems !";
$w3=3D"J(insert your cool phrase here)";	# anything prefixed by 'J' is =
now cool ;-)
$w4=3D"y2k compatible.";
$w5=3D"We know the Web.";
$w6=3D"...the leading product in an emerging market.";

$cool=3Djoin ' ', $w1,$w2,$w3,$w4,$w5,$w6;

print $cool;

Join takes a 'glue' operator, which is not a regular   =20
expression. It can be a scalar variable however. In this case it is a =
space.   =20
Then it takes a list, which can either be a list of scalar variables, an =
array   =20
or whatever as long as its a list. And you can see what the result is. =
You   =20
could assign it to an array, but you'd end up with everything in the =
first   =20
element of the array.    

The example below adds an array into the list, and demonstrates use = of a=20 variable as the delimiter.

$w1=3D"Mission critical ?";
$w2=3D"Internet ready modems !";
$w3=3D"J(insert your cool phrase here)"; 	# anything prefixed by 'J' is =
now cool ;-)
$w4=3D"y2k approved, tested and safe !";
$w5=3D"We know the Web.";
$w6=3D"...the leading product in an emerging market.";
@morecool=3D("networkable","compatible");

$sep=3D" ";

$cool=3Djoin $sep, $w1,$w2,$w3,@morecool,$w4,$w5,$w6;

print $cool;

   

A recap, but with some new functions


Randomness

Aren't you wishing you could mix and match randomly so you too = could get a=20 job marketing vapourware ? Heh.

@cool=3D(
"networkable directory services",
"legacy systems compatible",
"Mission critical, Business Ready",
"Internet ready modems !",
"J(insert your cool phrase here)",
"y2k approved, tested and safe !",
"We know the Web. Yeah.",
"...the leading product in an emerging market."
);

srand;

print "How many phrases would you like (max ",scalar(@cool),") ?";
while (1) {
        chop ($input=3D<STDIN>);
        if ($input <=3D scalar(@cool) and $input > 0) {
                last;
        }
        print 'Sorry, invalid input, try again :';
}

for (1..$input) {
        $index=3Dint(rand $#cool);
        print "$cool[$index] ";
        splice @cool, $index, 1;
}

A few things to explain. Firstly, while (1) =
{ .
We want an everlasting loop, and this one way to do it.   =20
1 is always true, so round it goes. We could test $input directly, but that wouldn't =
allow
last to be demonstrated.    =

Everlasting loops aren't useful unless you are a politician being=20 interviewed. We need to break out at some point. This is done by the = last function. When $input is between 1 and the number = of=20 elements in @cool then out = we go. (You=20 can also break out to labels, in case you were wondering. And break = out in a=20 sweat. Don't start now if you weren't.)

The srand operator = initialises the=20 random number generator. Works ok for us, but CGI programmers should = think of=20 something different because their programs are so frequently run (they = hope=20 :-).

rand generates a random = number=20 between 0 and 1, or 0 and a number it is given. In this case, the = number of=20 elements of @cool -1, so = from 0 to 7.=20 There is no point generating numbers between 1 and 8 because the array = elements run from 0 to 7.

The int function makes = sure it is=20 an integer, that is no messy bits after the decimal point.

The splice function = removes the=20 printed element from the array so it won't appear again. Don't want to = stress=20 the point.

Concatenation

There is another joining operator, this time the humble dot, or = period:=20 . . This concatanates = (joins)=20 variables:

$x=3D"Hello";
$y=3D" World";
$z=3D"\n";

print "$x\n";           # print $x and a newline

$prt=3D$x.$y.$z;          # make a new var $prt out of $x, $y and $z

print $prt;

$x.=3D$y." again ".$z;    # add stuff to $x

print $x;

   

Files


Opening

Perl is very good at handling files. Create, in your perl scripts = directory=20 c:\scripts, a file called stuff.txt. Copy = the=20 following into it :

The Main Perl =
Newsgroup:comp.lang.perl.misc
The Perl FAQ:http://www.perl.com/faq/
Where to download perl:http://www.activestate.com/

Now, to open and do things with this file. First, we must open   =20
the file and assign it to a filehandle. All operations will be =
done on   =20
the file via the filehandle. Earlier, we used <STDIN> as a filehandle - we =
read from   =20
it. 
$stuff=3D"c:\scripts\stuff.txt";

open STUFF, $stuff;

while (<STUFF>) {
        print "Line number $. is : $_";
}

What this script does is fail. What is should do is open   =20
the file defined in $stuff , =
assign it   =20
to the filehandle STUFF and =
then,   =20
while there are still lines left in the file, print the line number =
$. and the current line.    =


An unforgivable error

It fails. That's not so bad, everything fails sometimes. What is=20 unforgivable is NOT CHECKING THE ERROR CODE !

This is a better version:

open =
STUFF, $stuff or die "Cannot open $stuff for read :$!";

If the open operation   =20
fails, the or means that the =
code on   =20
the RHS (right hand side) is evaluated. Perl dies. This means it exits =
the   =20
script, performs a post-mortem which it writes up into $! and tells=20
you the line number at which it died. Just because $! contains useful   =20
information doesn't mean to say it is automagically printed, in true =
perl   =20
fashion. Usually you will wish to avail yourself of the information =
inside as   =20
it is of great help when working out why something is not going =
according to   =20
plan. The moral of the chapter is:    

Always check your return codes !

\\ or / in pathnames -- your choice

The problem should now be apparent. The backslashes, being escape=20 characters, are not displayed. There are two ways to fix this:=20

  • Escape the backslashes, like so $stuff=3D"c:\\scripts\\stuff.txt";=20
  • Convert backslashes into forward slashes : $stuff=3D"c:/scripts/stuff.txt"; =

The forward slashes are the preferred option, even under Win32, = because you=20 can then port the script direct to Unix or other platforms (assuming = you don't=20 use drive letters), and it is less typing. If you wish to use Perl to = start=20 external processes then you must use the \\=20 method, but this variable will be used only in a Perl = program,=20 not as a parameter to start an external program. Changing the $stuff variable results in a = working script.=20 Always check your return codes !

Reading a file

$stuff=3D"c:/scripts/stuff.txt";

open STUFF, $stuff or die "Cannot open $stuff for read :$!";

while (<STUFF>) {
        print "Line $. is : $_";
}

A little more detail on what is happening here. The file is   =20
opened for read. You can append and write too. You don't have to =
use a   =20
variable, but I always do because it is then easy to change and easy to =
insert   =20
into the or die section, and =
it is   =20
easy to change later on. Hardcoding things is not the best way to write =
a   =20
maintainable and flexible program. Just ask the Year 2000 people about =
code   =20
that lived a little longer than the authors imagined :-). 
open STUFF, "c:/scripts/stuff.txt" or die "Cannot =
open stuff.txt for read :$!";

is just as good but more work if you want to change anything.    

The line input operator (that's the angle brackets <> reads from the beginning = of the=20 file up until and including the first newline. The read data goes into = $_ , and you can do what you want = with it=20 there. On the next iteration of the loop data is read from where the = last read=20 left off, up to the next newline. And so on until there is no more = data. When=20 that happens the condition is false and the loop terminates. That's = the=20 default behaviour, but we can change this.

This means that you can open a 200Mb file in perl and run through = it=20 without having to load the entire file into memory. 200Mb of memory is = quite a=20 bit. If you really want to load the entire 200Mb file into one = variable, Perl=20 lets you. Limits are not the Perl Way.

The special variable $. = is the=20 current line number, starting at 1.

As usual, there is a quicker way to do the previous program. =

$STUFF=3D"c:/scripts/stuff.txt";

open STUFF or die "Cannot open $STUFF for read :$!";

while (<STUFF>) {
        print "Line $. is : $_";
}

This saves a little bit of typing, but does tie your filehandle   =20
to the variable name. In fact, that entire program could be compressed   =
=20
further, but that's for later.    

If you are really into shortness, try this:

$STUFF=3D"c:/scripts/stuff.txt";

open STUFF or die "Cannot open $STUFF for read :$!";

print "Line $. is : $_" while (<STUFF>);
       =20


   

Writing to a File


A simple write

$out=3D"c:/scripts/out.txt";

open OUT, ">$out" or die "Cannot open $out for write :$!";

for $i (1..10) {
        print OUT "$i : The time is now : ",scalar(localtime),"\n";
}

Note the addition of >    
to the filename. This opens it for writing. If we want to print   =20
to the file we now just specify the filehandle name. You print to the    =

filehandle, which is a gateway to the file.    

Filehandles don't have to be capitalised, but it is wise. All Perl=20 functions are lowercase, and Perl is case-sensitive. So if you = choose=20 uppercase names they are guaranteed not to conflict with current or = future=20 function words.

And a neat way to grab the date sneaked in there too. You should be = aware=20 that writing to a file overwrites the file. It does not = append=20 data! However, you may append:

Appending

$out=3D"c:/scripts/out.txt";

&printfile;

open OUT, ">>$out" or die "Cannot open $out for append :$!";

print OUT 'The time is now : ',scalar(localtime),"\n";

close OUT;

&printfile;

sub printfile {
        open IN, $out or die "Cannot open $out for read :$!";
        while (<IN>) {
                print;
        }
        close IN;
}

This script demonstrates subroutines again, and how to append to   =20
a file, that is write additional data at the end. The close function is introduced here. =
This,   =20
well, closes a filehandle. You don't have to close a filehandle - just =
leave   =20
it open until the script finishes, or the next open command to the same  =
 =20
filehandle will close it for you.=20
   

@ARGV: Command Line Arguments

Perl has a special array called @ARGV=20 . This is the list of arguments passed along with the = script=20 name on the command line. Run the following perl script as:

perl myscript.pl hello world how are you


foreach (@ARGV) {
        print "$_\n";
}

Another useful way to get parameters into a program -- this time   =20
without user input. The relevance to filehandles is as follows. Run the  =
 =20
following perl script as: 
perl =
myscript.pl stuff.txt out.txt

while (<>) {
        print;
}

Short and sweet ? If you don't specify anything in the angle   =20
brackets, whatever is in @ARGV =
is used   =20
instead. And after it finishes with the first file, it will carry on =
with the   =20
next and so on. You'll need to remove non-file elements from @ARGV before you use this.    

It can be shorter still:

perl =
myscript.pl stuff.txt out.txt

print while <>;

Read it right to left. It is possible to shorten it even further    ! =
perl myscript.pl stuff.txt out.txt

print <>;

This takes a little explanation. As you know, many things in   =20
Perl, including filehandles, can be evaluated in list or scalar context. =
The   =20
result that is returned depends on the context.    

If a filehandle is evaluated in scalar context, it returns the = first line=20 of whatever file it is reading from. If it is evaluated in list = context, it=20 returns a list, the elements of which are the lines of the files it is = reading=20 from.

The print function is a = list=20 operator, and therefore evaluates everything it is given in list = context. As=20 the filehandle is evaluated in list context, it is given a list !

Who said short is sweet? Not my girlfriend, but that's another = story. The=20 shortest scripts are not usually the easiest to understand, and not = even=20 always the quickest. Aside from knowing what you want to achieve with = the=20 program from a functional point of view, you should also know wheter = you are=20 coding for maximum performance, easy maintenance or whatever -- = because=20 chances those goals may be to some extent mutually exclusive.

Modifying a File with $^I

One of the most frequent Perl tasks is to open a file, make some = changes=20 and write it back to the original filename. You already have enough = knowledge=20 to do this. The steps would be:=20

  1. Make a backup copy of the file=20
  2. Open the file for read=20
  3. Open a new temporary file for write=20
  4. Go through the read file, and write it and any changes to the = temp file=20
  5. When finished, close both files=20
  6. Delete the original file=20
  7. Rename the temp file to the original filename

If you have managed to get this far and assiduously work through = the=20 examples, the above will be child's play. Play if you want, but there = is a=20 Better Way.

Make sure you have data in c:\scripts\out.txt=20 then run this:

@ARGV=3D"c:/scripts/out.txt";

$^I=3D".bk";              # let the magic begin

while (<>) {
        tr/A-Z/a-z/;    # another new function sneaked in
        print;          # this goes to the temp filehandle, ARGVOUT,=20
			# not STDOUT as usual, so don't mess with it !
}

So, what's happening? First, we load up @ARGV with   =20
the name of a file. It doesn't matter how @ARGV is loaded. =
We   =20
could have shifted the code =
from the   =20
command line.    

The $^I is a special = variable. You=20 knew that just by looking at it. It's name is the Inplace Edit = variable, and=20 when it has a value the effects are:=20

  1. The name of the file to be in-placed edited is taken from the = first=20 element of @ARGV. In this case, that is=20 c:/scripts/out.txt. The file is renamed to its existing = name=20 plus the value of $^I, ie=20 out.txt.bk.=20
  2. The file is read as usual by the diamond operator <>, placing a line at a = time into=20 $_.=20
  3. A new filehandle is opened, called ARGVOUT, and no prizes for = guessing it is=20 opened on a file called out.txt. The original=20 out.txt is renamed.=20
  4. The print prints = automatically=20 to ARGVOUT, not STDOUT as it would = usually.=20

At the end of the operation you have neatly edited the file and = made a=20 backup. If you don't want a backup, assign a null string to $^I but don't go crying on any = mailing lists=20 if you lose data.

The usual method of in-place editing would involve just printing = everything=20 back where it came from until your regex finds whatever needs = changing. You=20 could of course slurp the whole file into memory and play with it = there, which=20 could be a lot easier but if you are dealing with files of more than a = few=20 megabytes this is probably not a feasible approach.

Now take a look at out.txt = . Notice=20 how all capital letters have been transliterated into = lowercase. This=20 is the tr operator at work, = which=20 is more efficient than regex for changing single characters. But = that's=20 only a small part of the tr = function's=20 value to the world. More later.

You should also have an out.txt.bk=20 file. And finally, notice the way @ARGV has been created. You don't = have to=20 create it from the command line arguments -- it can be treated like an = ordinary array, for that is what it is.


$/ -- Changing what is read into $_

On a different note, what if your input file is doesn't look like = this:=20

Beer
Wine
Pizza
Catfood

which is nicely delimited with a newline each time, but like this: =

shorts
t-shirt
blouse

pizza
beer
wine
catfood

Viz
Private Eye
The Independent
Byte

toothpaste
soap
towel

which is delimited by TWO newlines, not one. You don't have to save = the=20 above as shop.txt, but if you don't, the examples will be = difficult to follow.

Now, if you want each set of items as elements in an array you'll = have to=20 do something like this:

$SHOP=3D"shop.txt";
$x=3D0;

open SHOP or die "Can't open $SHOP for read: $!\n";

while (<SHOP>) {
        if (/^\n/) {            # does line begin with newline ?
                $x++;           # if so, increment $x.  Rest of if =
statement not executed.
        } else {
                $list[$x].=3D$_;  # glue $_ on the end of whatever is in =
$list[$x], using a .
        }              =20
}

foreach (@list) {
        print "Items are:\n$_\n\n";
}

which works, but there is a much easier way to do it. You knew I   =20
was going to say that. 
$SHOP=3D"shop.txt";
$/=3D"\n\n";

open SHOP or die "Can't open $SHOP for read: $!\n";

while (<SHOP>) {
        push (@list, $_);
}

foreach (@list) {
        print "Items are:\n$_\n\n";
}

The $/ variable is a   =20
special variable (it even looks special). It is the Default Input =
Record   =20
Separator. Remember the operation of the angle brackets being to =
read a   =20
file in up until the next newline ? Time to come clean. What the angle =
bracket   =20
actually do is read up until whatever $/    =

is set to. It is set to a newline by default.    

So if we set it to two newlines, as above, then it reads up until = it finds=20 two consecutive newlines, then puts the data into $_=20 This makes the program a lot shorter and quicker. You = can set=20 $/ to just about anything, = not just a=20 newline. If you want to hack this list for example: =

Tea:Beer:Wine:Pizza:Catfood:Coffee:Chicken:Salmon:Icecream
=

you could just leave $/ = as a=20 newline and slurp it into memory in one go, but imagine the above = items are a=20 list of clothes that your girlfriend wants to buy or a list of clothes = your=20 boyfriend should have thrown away by now. Either are going to be = really big=20 files, and you might not want to read it all into memory in one go. So = set=20 $/=3D":"; and all will be = well. There=20 are also read and seek functions, but they aren't = covered=20 here. Those are useful for files where you read in a precise number of = bytes.=20

We'll go back to the last example for a moment. It is useful to = know how to=20 read just one line (well, up to $/ = )=20 at a time:

$SHOP=3D"shop.txt";
$/=3D"\n\n";

open SHOP or die "Can't open $SHOP for read: $!\n";

$clothes=3D<SHOP>;        # everything up until the first =
occurrence of $/ into $clothes

$food=3D<SHOP>;   # everything from first occurrence of $/ to the =
second into $food

print "We need...\n",$clothes,"...and\n",$food;

And now we know that, there is a even quicker way to achieve the   =20
aim of the original program : 
$SHOP=3D"shop.txt";
$/=3D"\n\n";

open SHOP or die "Can't open $SHOP for read: $!\n";

@list=3D<SHOP>;   # dumps *all* of $SHOP into @list, not just one =
line.

foreach (@list) {
        print "Items are:\n$_\n\n";
}

and you don't need to grab it all : 
@list[0..2]=3D<SHOP>=20

. We haven't mentioned list context for a while. Whether the line   =20
input operator <> =
returns a   =20
single value or a list depends on the context you use it in. When you =
supply   =20
@xxxxx then this must be a =
list. If   =20
you supply $xxxxx then that's =
a scalar   =20
variable. You can force it into list context by using parens.    

The two lines below are provided so you can paste them into the = above=20 program. They demonstrate how parens force list context. Remember to = replace=20 the foreach with something = that prints=20 the variables.

($first, $second) =3D =
<SHOP>;
$first,  $second  =3D <SHOP>;
   


HERE Docs

The problem:

print "This is a long =
line of text which might be too long to fit on just one line\n";
print "and I was right, it was too long to fit on one line.  In fact, it =
looks like it\n";
print "might very well take up to FOUR, yes FOUR lines to print.  That's =
four print\n";
print "statements, which takes up even more room.  But wait! I'm wrong!  =
It will take\n";
print "FIVE lines to print this statement!  Or is that six lines? I'm =
not sure....\n";
The solution: 
$var=3D'variable =
interpolated';

print <<PRT;
This is a long line of text which might be too long to fit on just one =
line
and I was right, it was too long to fit on one line.  In fact, it looks =
like
it might very well take up to FOUR, yes FOUR lines to print. =20

That's four print statements, which takes up even more room.  But wait! =
I'm=20
wrong!  It will take FIVE lines to print this statement!  Or maybe six =
lines?=20
I'm not sure....but anyway, just to prove this can be $var.
PRT
That's called a 'here' document and you don't need to use    =
PRT, you can use whatever you like within reason. You don't =
need    to put in explicit newlines, although if you do they perform as =
usual. Now you    know about here docs you can stop wearing the print    function out by calling it =
every couple of lines. You don't have    to use here docs to print to =
files, just anywhere you'd normally put a more    than one print statement.=20
   

Reading Directories


Globbing

For this exercise, I suggest creating another directory where you = have at=20 least two text files and two or more binary files. Copy a couple of = .dll files=20 from your WINDIR directory if you need to, those will do for the = binaries, and=20 save a couple of random text files. Size doesn't matter, in this case. =

Then run this, giving the directory as the command line argument: =

$dir=3Dshift;	# shifts @ARGV, the command line =
arguments after the script name

chdir $dir or die "Can't chdir to $dir:$!\n" if $dir;

while (<*>) {
	print "Found a file: $_\n" if -T;
}
   

The chdir function = changes perl's=20 working directory. You should, as ever, test to see if it worked or = not. In=20 this case we only try and change directory if $dir is = true.

The <*> construct reads all files from a given=20 directory, and prints if it passes the file test -T=20 , which returns true if the file is a non-binary, ie = text file.=20 You can be more specific:

$dir =
=3Dshift;
$type=3D'txt';

chdir $dir or die "Can't chdir to $dir:$!\n" if $dir;

while (<*.$type>) {
	print "Found a file: $_\n";
}
like so. But, there is a better way to read from directories. The =
   method above is rather slow and inflexible.=20
   

readdir : How to read from directories

Instead, there is readdir = . Another=20 version of the previous example:

$dir=3D shift || '.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

while ($file=3D readdir DIR) {
	print "Found a file: $file\n";
}
The first difference is the first line, which essentially says if =
   shift is false, then =
$dir =3D    ., which is of course the current directory. =
Then, the directory is    opened and we have the chance to trap the =
error. It is assigned a filehandle.    The readdir function reads each file    =
into $file. There is no while   =
 (<WDIR>) { construct.    

We can also apply the text file test. Run this, once without = entering a=20 directory and the second time with entering a directory path other = than the=20 one the script is in:

$dir=3D shift || =
'.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

while ($file=3D readdir DIR) {
	print "Found a file: $file\n" if -T $file ;
}
Firstly, because the filename is now not in $_ we    =
have to explicitly apply the -T =
test    to it with -T $file.    

Why did this not work the second time? Look at the code carefully. = You are=20 testing $file. If perl doesn't get a fully qualified = pathname, it=20 assumes you are still in the directory the script was run from, or = that of the=20 last successful chdir . Not=20 necessarily where you are readdir'ing=20 from. So, to fix it:

        print =
"Found a file: $dir/$file\n" if -T "$dir/$file" ;

where we now specify the pathname, both in the printout and in    =
the file test itself. The "" are used because otherwise perl tries to =
divide    $file by $dir.    

Try running this on a directory with only a few files in it: =

$dir=3D shift || '.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

while ($file=3D readdir DIR) {
	print "Found a file: '$file'\n";
}
Notice that two files are found which have interesting names,    =
namely . and .. . These two files are the =
current,    and lower directory respectively. Nothing new, they have =
always been there --    run the DOS command dir if you =
don't believe me. You don't    usually want to know about them, so: =
while ($file=3D readdir DIR) {
	next if $file=3D~/^\./;
	print "Found a file: '$file'\n";
}
is the usual workaround. You can use scalar context to dump    =
everything to a list of some description: 
$dir=3D shift || '.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

@files=3Dreaddir(DIR);

print "@files";
but that includes the . files, so it is best to    =
ensure they aren't included: 
@files=3Dgrep !/^\./, readdir(DIR);
We haven't met -T yet,  =
  but for the moment just remember it searches a list and if it returns =
true,    lets the variable pass. In this case, if it doesn't begin with =
. then that's    true so it goes into @files.    

There are other commands associated with reading directories, which = tell=20 you where in a directory you are, and then where to go to return. You = should=20 be aware of their existence, because you never know when you might = need them.=20 The one other command of use is closedir=20 , which closes a directory. Optional, but recommended = for=20 clarity.

Associative Arrays


The Basics

Very, very useful. First, a quick recap on arrays. Arrays are an = ordered=20 list of scalar variables, which you access by their index number = starting at=20 0. The elements in arrays always stay in the same order.

Hashes are a list of scalars, but instead of being accessed by = index=20 number, they are accessed by a key. The tables below illustrate = the=20 point:

@myarray
Index No. Value
0 The Netherlands
1 Belgium
2 Germany
3 Monaco
4 Spain
=
%myhash
Key Value
NL The Netherlands
BE Belgium
DE Germany
MC Monaco
ESSpain

So if we want 'Belgium' from @myarray=20 and also from %myhash = ,=20 it'll be:

print "$myarray[1]";
print "$myhash{'BE'}";
Notice that the $ =
prefix    is used, because it is a scalar variable. =
Despite the fact it is part of a    list, it is still a scalar variable. =
The hash syntax is simply to use braces    { =
} instead of square brackets.    

So why use hashes ? When you want to look something up by a = keyword.=20 Suppose we wanted to create a program which returns the name of the = country=20 when given a country code. We'd input ES, and the program would come = back with=20 Spain.

You could do it with arrays. It would be messy however. One = possible=20 approach:=20

  1. create @country , and = give it=20 values such as 'ES,Spain'=20
  2. Itierate over the entire array and=20
  3. split each element of = the array,=20 and check the first result to see if it matches the input=20
  4. If so, return the index
@countries=3D('NL,The =
Netherlands','BE,Belgium','DE,Germany','MC,Monaco','ES,Spain');

print "Enter the country code:";
chop ($find=3D<STDIN>);

foreach (@countries) {
        ($code,$name)=3Dsplit /,/;
        if ($find=3D~/$code/i) {
                print "$name has the code $code\n";
        }
}
Complex and slow. We could also store a reference to another    =
array in each element of @countries =
,    but that is not efficient. Whatever way we choose, =
you still need to search    the whole thing. And what if @countries    is a big array ? See =
how much easier a hash is:=20
   

A Hash in Action

%countries=3D('NL','The =
Netherlands','BE','Belgium','DE','Germany','MC','Monaco','ES','Spain');

print "Enter the country code:";
chop ($find=3D<STDIN>);

$find=3D~tr/a-z/A-Z/;
print "$countries{$find} has the code $find\n";
Very easy. All we need to do is make sure everything is in    =
uppercase with tr and we are =
there.    Notice the way %countries =
is defined -    exactly the same as a normal array, except =
that the values are put into the    hash in key/value pairs.    


When you should use hashes

So why use arrays ? One excellent reason is because when an array = is=20 created, its variables stay in the same order you created them in. = With a=20 hash, perl reorders elements for quick access. Add print %countries; to the end of = that program=20 above and run it. See what I mean ? No recognisable sequence at all. = It's like=20 trying to herd cats. If you were writing code that stored a list of = variables=20 over time and you wanted it back in the order you found it in, don't = use a=20 hash.

Finally, you should know that each key of a hash must be = unique.=20 Stands to reason, if you think about it. You are accessing the hash = via keys,=20 so how can you have two keys named 'NL' or something ? If you do = define a=20 certain key twice, the second value overwrites the first. This is a = feature,=20 and useful. The values of a hash can be duplicates, but never the = keys.

If you want to assign to a hash, there is of course no concept of = push , pop=20 and splice = etc. Instead:=20

Hash Hacking Functions

Assigning $countries{PT}=3D'Portugal';
Deleting delete=20 $countries{NL};

Accessing Your Hash

Assuming you keep the same %countries=20 hash as above, here are some useful ways to access it: =

All the keys print keys = %countries;=20
All the values print values = %countries;=20
A Slice of Hash :-) print = @countries{'NL','BE'};=20
How many elements ? print scalar(keys=20 %countries);
Does the key exist ? print "It's there !\n" if exists=20 $countries{'NL'};

Well, that last one is not an access as a such but useful anyway. =


More Hash Access: Iteration, keys and values

You may have noticed that keys = and=20 values return a list. And we = can=20 iterate over a list, using foreach :=20

foreach (keys %countries) {
        print "The key $_ contains $countries{$_}\n";
}
which is useful. Note how any list can be fed to foreach , and off it goes. As usual, =
there    is another way to do the above: 
while (($code,$name)=3Deach %countries) {
        print "The key $code contains $name\n";
}
The each function =
returns    each key/value pair of the hash, and is slightly faster. In =
this example we    assign them to a list (you spotted the parens ?) and =
away we go. Eventually    there are no more pairs, which returns false =
to the while loop and it =
stops.    

If you are into brevity, both the above can be accomplished in a = single=20 line:

print "The key $code contains =
$name\n" while ($code,$name)=3Deach %countries;

print "The key $_ contains $countries{$_}\n" foreach keys %countries;
   

Note -- this won't win any prizes for easily readable code by=20 non-programmers of Perl.


Sorting


A Simple Sort

If I was reading this I'd be wondering about sorting. Wonder no = more, and=20 behold:

foreach (sort keys %countries) =
{
        print "The key $_ contains $countries{$_}\n";
}
Spot the difference. Yes, sort    =
crept in there. If you want the list sorted backwards, =
some    cunning is called for. This is suitably foxy: 
foreach (reverse sort keys %countries) {
        print "The key $_ contains $countries{$_}\n";
}
Perl is just so difficult at times, don't you think ? This works  =
  because:    
  • keys returns a list=20
  • sort expects a list -- and gets one = from=20 keys , and sorts it=20
  • reverse also expects a list, so it = gets one=20 and returns it=20
  • then the whole list is foreach 'd = over.=20

This is a quick example to make sure the meaning of reverse is clear:

print "Enter string to be reversed: ";
$input=3D<STDIN>;

@letters=3Dsplit //,$input;	# splits on the 'nothings' in between each =
character of $input

print join ":", @letters;	# joins all elements of @letters with \n, =
prints it
print reverse   @letters;	# prints all of @letters, but sdrawkcab )-:
Perl's list operators can just feed directly to each other,    =
saving many lines of code but also decreasing readability to those that =
aren't    Perl-literate: 
print "Enter string to be reversed: ";
print join ":",reverse split //,$_=3D<STDIN>;
This section is about sorting, so enough of reverse . Time to go forwards instead.    


Numeric Sorting -- How Sort Really Works

That's easy alphabetical sorting by the keys. If you had a hash of=20 international access numbers like this one:

%countries=3D('976','Mongolia','52','Mexico','212','=
Morocco','64','New Zealand','33','France');

foreach (sort keys %countries) {
        print "The key $_ contains $countries{$_}\n";
}
You might want to sort numerically. In that case, you need to    =
understand how Perl's sort function works.  =
  

The sort function compares two = variables, $a and $b . They = must be=20 called $a and $b=20 otherwise it won't work. One chap published a book with stolen = code,=20 and he changed $a and $b=20 to $x and $y. He obviously didn't test the program because it = would=20 have failed and he would have noticed. And this book was really = published !=20 Don't believe everything you read in books -- but web tutorials are = always=20 100% truthful :-)

Back to sorting. $a and $b=20 are compared, and the result is:=20

  • 1 if $a is greater than = $b
  • -1 if $b is greater than = $a
  • 0 if $a and $b=20 are equal

So as long as the sort function gets = one of=20 those three values back it is happy. This means we can write our own = sort=20 routines, and feed them to sort. For example, we know the default sort = is=20 alphabetical. But if we write this:

%countries=3D('976','Mongolia','52','Mexico','212','=
Morocco','64','New Zealand','33','France');

foreach (sort supersort keys %countries) {
        print "$_ $countries{$_}\n";
}

sub supersort {
        if ($a > $b) {
                return 1;
        } elsif ($a < $b) {=20
		return -1;
	} else {=20
		return 0;=20
	}
}
then it works correctly. Of course, there is an easier way. The   =
 'spaceship' operator <=3D> =
. It    does exactly what the supersort subroutine does, =
namely return 1, -1 or 0    depending on the comparison of two given =
values.    

So we can write the above much more easily as:

%countries=3D('976','Mongolia','52','Mexico','212','=
Morocco','64','New Zealand','33','France');

foreach (sort { $a <=3D> $b } keys %countries) {
        print "$_ $countries{$_}\n";
}
Notice the { } braces, which define the contents as the    =
subroutine sort must use. Pretty short subroutine. There is a companion  =
  operator to <=3D> , =
namely cmp which does exactly =
the same thing but of    course compares the values as strings, not =
numbers.Remember, if you are    comparing numbers, your comparison =
operator should contain non-alphas, if you    are comparing strings the =
operator should contains alphas only. And don't talk    to =
strangers.    

Anyway, you now have enough knowledge to sort a hash by value = instead of=20 keys. Suppose your pointy haired manager bounced up to you and = demanded a hash=20 sorted by value ? What would you do ? OK, what should = you do ?=20

Well, we could just sort the values.

foreach (sort values %countries) {
But Pointy Hair wants the keys too. And if you have a value you   =
 can't find the key.    

So we have to iterate over the keys. But just because we are = iterating over=20 the keys doesn't mean to say we have to hand the keys over to sort . What about:

%countries=3D('976','Mongolia','52','Mexico','212','=
Morocco','64','New Zealand','33','France');

foreach (sort { $countries{$a} cmp $countries{$b} } keys %countries) {
        print "$_ $countries{$_}\n";
}
beautifully simple. If you want a reverse sort transpose $a and $b    .    


Sorting Multiple Lists

You can sort several lists at the same time:

%countries=3D('976','Mongolia','52','Mexico','212','=
Morocco','64','New Zealand','33','France');
@nations=3Dqw(China Hungary Japan Canada Fiji);

@sorted=3D sort values %countries, @nations;

foreach (@nations, values %countries) {
        print "$_\n";
}

print "#----\n";

foreach (@sorted) {
        print "$_\n";
}
This sorts @nations and =
   the values from %countries =
into a new    array.    

The example also demonstrates that you can foreach over more than one list = value --=20 each list is processed in turn. How I discovered that particular trick = with=20 Perl is instructive. I just tried it. If you think you should be able = to do=20 something with Perl, try it. Adhere to the syntax and conventions you = will be=20 familiar with from experience, in this case delimiting a list with = commas, and=20 try it. I'm always finding new shortcuts just by experimentation.


Grep and Map


Grep

If you want to search a list, and create another list of things you = found,=20 grep is one solution. This = is an=20 example, which also demonstrates join=20 again :

@stuff=3Dqw(flying gliding skiing dancing parties =
racing);	# quote-worded list

@new =3D grep /ing/, @stuff;	# Creates @new, which contains elements of =
@stuff=20
				# matching with 'ing' in them.

print join ":",@stuff,"\n";	# first makes one string out of the elements =
of @stuff, joined
				# with ':' , then prints it, then prints \n

print join ":",@new,"\n";
Remember qw means =
'quote    words', so word boundaries are used as delimiters instead. The =
grep function must be fed a =
list on the    right hand side. On the left side, you may assign the =
results to a list or a    scalar variable. Assigning to a list gives you =
each actual element, and to a    scalar gives you the number of matches =
found: 
@stuff=3Dqw(flying gliding skiing dancing parties =
racing);

$new =3D grep /ing/, @stuff;

print join ":",@stuff,"\n";

print "Found $new elements of \@stuff which matched\n";
If you decide to modify the elements on their way through grep , you actually modify the =
original    list. Be careful out there. 
@stuff=3Dqw(flying gliding skiing dancing parties =
racing);

@new =3D grep s/ing//, @stuff;

print join ":",@stuff,"\n";
print join ":",@new,"\n";
To determine what actually matches you can either use an    =
expression or a block. Up to now we've been using expressions, but when =
things    become more complicated use a block: 
@stuff=3Dqw(flying gliding skiing dancing parties =
racing);

@new =3D grep { s/ing// if /^[gsp]/ } @stuff;

print join ":",@stuff,"\n";
print join ":",@new,"\n";
Try removing the braces and you'll get an error. Notice that the  =
  comma before the list has gone. It is now obvious where the expression =
ends,    as it is inside a block delimited with { } . The regex says if =
the element    begins with g, s or p, then remove ing. The result is =
only assigned to @new if the =
expression is completely true -    'parties' does begin with p, so that =
works, but s/ing// fails so =
the overall result is    false, and the value is not assigned to @new    .=20
   

Map

Map works the same way as grep = , in=20 that they both iterate over a list, and return a list. There are two = important=20 differences however:=20

  • grep returns the = value of=20 everything it evaluates to be true;=20
  • map returns the = results=20 of everything it evaluates.

As usual, an example will assist the penny in dropping, clear the = fog and=20 turn on the light (if not make my metaphors easier to understand): =

@stuff=3Dqw(flying gliding skiing dancing parties =
racing);

print "There are ",scalar(@stuff)," elements in \@stuff\n";
print join ":",@stuff,"\n";

@mapped  =3D map  /ing/, @stuff;
@grepped =3D grep /ing/, @stuff;

print "There are ",scalar(@stuff)," elements in \@stuff\n";
print join ":",@stuff,"\n";

print "There are ",scalar(@mapped)," elements in \@mapped\n";
print join ":",@mapped,"\n";

print "There are ",scalar(@grepped)," elements in \@grepped\n";
print join ":",@grepped,"\n";
You can see that @mapped    =
is just a list of 1's. Notice that there are 5 ones =
   whereas there are six elements in the original array,    =
@stuff. This is because @mapped contains the =
true    results of map =
-- in every case    the expression /ing/ is successful,    except for =
'parties'.    

In that case there the expression is false, so the result is = discarded.=20 Contrast this action with the grep=20 function, which returns the actual value, but only if = it is=20 true. Try this:

@letters=3D(a,b,c,d,e);

@ords=3Dmap ord, @letters;
print join ":",@ords,"\n";

@chrs=3Dmap chr, @ords;  =20
print join ":",@chrs,"\n";
This uses the ord    =
function to change each letter into its ASCII equivalent, =
then    the chr function =
convert ASCII numbers    to characters. If you change map to    grep in the example above, you can =
see    that nothing appears to happen. What is happening is that grep is trying the expression on =
each    element, and if it succeeds (is true) it returns the element, =
not the result.    The expression succeeds for each element, so each =
element is returned in turn.    Another example: 
@stuff=3Dqw(flying gliding skiing dancing parties =
racing);

print join ":",@stuff,"\n";

@mapped  =3D map  { s/(^[gsp])/$1 x 2/e } @stuff;
@grepped =3D grep { s/(^[gsp])/$1 x 2/e } @stuff;

print join ":",@stuff,"\n";
print join ":",@mapped,"\n";
print join ":",@grepped,"\n";
Recapping on regex, what that does is match any element beginning =
   with g, s or p, and replace it with the same element twice. The caret =
^ forces a match at the =
beginning of the    string, the [square brackets] denote a character =
class, and /e forces Perl to =
evaluate the RHS as an    expression.    

The output from this is a mixture of 1 and nothing for map , and a three-element array = called @grepped from grep. Yet another = example:=20

@mapped  =3D map  { chop } @stuff;
@grepped =3D grep { chop } @stuff;
The chop function =
removes    the last character from a string, and returns it. So that's =
what you get back    from ^ , =
the result of the    expression. The grep function gives    you the =
mangled remains of the original value.=20
   

Writing your own grep and map functions

Finally, you can write your own functions:

@stuff=3Dqw(flying gliding skiing dancing parties =
racing);

print join ":",@stuff,"\n";

@mapped  =3D map  { &isit } @stuff;
@grepped =3D grep { &isit } @stuff;

print join ":",@mapped,"\n";
print join ":",@grepped,"\n";

sub isit {
        ($word)=3D/(^.*)ing/;

        if (length $word =3D=3D 3) {
                return "ok";
        } else {
                return 0;
        }
}
The subroutine isit =
first    grabs everything up until 'ing', puts it into =
$word    , then returns 'ok' =
if the there are three characters in $word =
. If not, it returns the false value    0. You can make =
these subroutines (think of them as functions) as complex as    you =
like.    

Sometimes it is very useful to have map = return the actual value, rather than the result. The = answer is=20 easy, but not obvious. Remember that subroutines return the value of = the last=20 expression evaluated? So, in this case, do blocks. What if the = expression was,=20 very simply:

@grepstuff=3D@mapstuff=3Dqw(flying gliding skiing =
dancing parties racing);

print join " ",map  { s/(^[gsp])/$1 x 2/e } @mapstuff;
print "\n";
print join " ",grep { s/(^[gsp])/$1 x 2/e } @grepstuff;
Now, make sure $_ is =
the    last thing evaluated: 
@grepstuff=3D@mapstuff=3Dqw(flying gliding skiing =
dancing parties racing);

print join " ",map  { s/(^[gsp])/$1 x 2/e;$_} @mapstuff;
print "\n";
print join " ",grep { s/(^[gsp])/$1 x 2/e } @grepstuff;
and there you have it. Now you understand that you can go and    =
impress your friends, but please don't count on success.=20
   

External Commands


Some ways to...

Perl can start external commands. There are five main ways to do = this:=20

  • system=20
  • exec=20
  • Command Input, also known as `backticks`=20
  • Piping data from a process=20
  • Quote execute

We'll compare system and = exec first.

Exec

Poor old exec is broken = on Perl for=20 Win32. What it should do is stop running your Perl script and start = running=20 whatever you tell it to. If it can't start the external process, it = should=20 return with an error code. This doesn't work properly under Perl for = Win32.=20 The exec function does work = properly=20 on the standard Perl distribution.


System

This runs an external command for you, then carries on with the = script. It=20 always returns, and the value it returns goes into $? . This means you can test to = see if the=20 program worked. Actually you are testing to see if it could be = started, what=20 the program does when it runs is outside your control if you use system .

This example demonstrates system = in=20 action. Run the 'vol' command from a command prompt first if you are = not=20 familiar with it. Then run the 'vole' command. I'm assuming you have = no cute=20 furry executables called vole on your system, or at least in the path. = If you=20 do have an executable called 'vole', be creative and change it. =

system("vole");

print "\n\nResult: $?\n\n";

system("vol");

print "\n\nResult: $?\n\n";
As you can see, a successful system call returns 0. An    =
unsuccessful one returns a value which you need to divide by 256 to get =
the    real return value. Also notice you can see the output. And =
because system returns, the =
code after the first    system =
call is executed. Not so with    exec, which will terminate your perl =
   script if it is successful. Perl's usual use of single and double =
quotes    applies as per variable interpolation.=20
   

Backticks

These `` are different = again to=20 system and exec. They also start external processes, but return the = output=20 of the process. You can then do whatever you like with the output. = If you=20 aren't sure where backticks are on your keyboard, try the top left, = just left=20 of the 1 key. Often around there. Don't confuse single quotes '' with backticks `` .

$volume=3D`vol`;

print "The contents of the variable \$volume are:\n\n";

print $volume;

print "\nWe shall regexise this variable thus :\n\n";

$volume=3D~m#Volume in drive \w is (.*)#;

print "$1\n";
As you can see here, the Win32 vol command is executed. We just   =
 print it out, escaping the $ =
in the    variable name. Then a simple regex, using # as a =
delimiter just in case you'd    forgotten delimiters don't have to be / =
.=20
   

When to use external calls

Before you get carried away with creating elaborate scripts based = on the=20 output from NT's net = commands, note=20 there are plenty of excellent modules out there which do a very good = job of=20 this sort of thing, and that any form of external process call slows = your=20 script. Also note there are plenty of built in functions such as readdir which can be used instead = of `dir` . You should use Perl = functions=20 where possible rather than calling external programs because = Perl's=20 functions are:=20

  • portable (usually, but there are exceptions). This means you can = write a=20 script on your Mac PowerBook, test it on an NT box and then use it = live on=20 your Unix box without modifying a single line of code;=20
  • faster, as every external process significantly slows your = program;=20
  • don't usually require regexing to find the result you want;=20
  • don't rely on output in a particular format, which might be = changed in=20 the next version of your OS or application;=20
  • are more likely to be understood by a Perl programmer -- for = example,=20 $files=3D`ls`; on a Unix = box means=20 little to someone that doesn't know that ls is the Unix = command=20 for listing files, as dir is in Windows.

Don't start using backticks all over the place when system will do. You might get a = very large=20 return value which you don't need, and will consequently slurp lots of = memory.=20 Just use them when you actually want to check the returned strings. =

Opening a Process

The problem with backticks is that you have to wait for the entire = process=20 to complete, then analyse the entire return code. This is a big = problem if you=20 have large return codes or slow processes. For example, the DOS = command=20 tree. If you aren't familiar with this command, run a = DOS/command=20 prompt, switch to the root directory (C:\ ) and type=20 tree. Examine the wondrous output.

We can open a process, and pipe data in via a filehandle in exactly = the=20 same way you would read a file. The code below is exactly the same as = opening=20 a filehandle on a file, with two exceptions:=20

  1. We use an external command, not a filename. That's the process = name, in=20 this case, tree.=20
  2. A pipe, ie | is = appended to the=20 process name.
open TRIN, "tree =
c:\\ /a |" or die "Can't see the tree :$!";

while (<TRIN>) {
	print "$. $_";
}
Note the | which =
denotes    that data is to be piped from the specified process. =
You can also pipe    data to a process by using | as    the first character.    =

As usual, $. is the line = number.=20 What we can do now is terminate our tree early. = Environmentally=20 unsound, but efficient.

open TRIN, =
"tree c:\\ /a |" or die "Can't see the tree :$!";

while (<TRIN>) {
	printf "%3s $_", $.;
	last if $. =3D=3D 10;
}
As soon as $. hits 10 =
we    shut the process off by exiting the loop. Easy.    

Except, maybe it won't. What if this was a long program, and you = forgot=20 about that particular line of code which exits the loop? Suppose that = $.=20 somehow went from 9 to 11, or was assigned to? It would never = reach 10.=20 So, to be safe

open TRIN, "tree c:\\ /a =
|" or die "Can't see the tree :$!";

while (<TRIN>) {
	printf "%3s $_", $.;
	last if $. >=3D 10;
}
exit your loops in a paranoid manner, unless you really    =
mean only to exit when at line ten. For maximum safety, maybe you should =
   create your own counter variable because $. is a global =
variable.    I'm not necessarily advocating doing any of the above, but =
I am suggested    these things are considered.    

You might notice the presence of a new keyword - printf . It works like print , but formats the = string before=20 printing. The formatting is controlled by such parameters as %3s , which means "pad out to a = total of=20 three spaces". After the doublequoted string comes whatever you want = to be=20 printed in the format specified. Some examples follow. Just uncomment = each=20 line in turn to see what it does. There is a lot of new stuff below, = but try=20 and work out what is happening. An explanation follows after the code. =

$windir=3D$ENV{'WINDIR'};		# yes, you =
can access the environment variables !

$x=3D0;

opendir WDIR, "$windir" or die "Can't open $windir !!! Panic : $!";

while ($file=3D readdir WDIR) {
	next if $file=3D~/^\./;		# try commenting this line to see why it is =
there

	$age=3D -M "$windir/$file";	# -M returns the age in days
	$age=3D~s/(\d*\.\d{3}).*/$1/;	# hmmmmm

	#### %4.4d - must take up 4 columns, and pad with 0s to make up space
	####         and minimum width is also 4
	#### %10s  - must take up 10 columns, pad with spaces
	# printf "%4.4d %10s %45s \n", $x, $age, $file;

	#### %-10s - left justify
	# printf "%4.4d %-10s %-45s \n", $x, $age, $file;

	####  %10.3 - use 10 columns, pad with 0s if less than 3 columns used
	# printf "%4.4d %10.3d %45s \n", $x, $age, $file;

	$x++;

	last if $x=3D=3D15;			# we don't want to go through all the files :-)
}
There are some intentionally new functions there. When you start  =
  hacking Perl (actually, you already started if you have worked through =
this    far) you'll see a lot of example code. Try and understand the =
above, then read    the explanation below.    

Firstly, all environment variables can be accessed and set via = Perl. They=20 are in the %ENV hash. If you = aren't=20 sure what environment variables are, refer to your friendly Microsoft=20 documentation or books. The best known environment variable is=20 path, and you can see its value and that of all other = environment=20 variables by simply typing set at your command prompt. =

The regex /^\./ bounces = out invalid=20 entries before we bother do any processing on them. Good programming = practice.=20 What it matches is "anything that begins with '.'". The caret anchors = the=20 match to the beginning of the string, and as .=20 is a metacharacter it has to be escaped.

Perl has several tests to apply on files. The -M=20 test returns the age in days. See the documentation for = similar=20 tests. Note that the calls to readdir=20 return just the file, not the complete pathname. As you = were=20 careful to use a variable for the directory to be opened rather than=20 hardcoding it (horrors) it is no trouble to glue it together by using=20 doublequotes.

Try commenting out $age=3D~s/(\d*\.\d{3}).*/$1/=20 and note the size of $age=20 . It could do with a trim. Just for regex practice, we = make it a=20 little smaller. What the regex does is:=20

  • start capturing with ( =
  • look for 0 or more digits \d* =
  • then a . (escaped)=20
  • followed by three digits \d{3} =
  • and that's all we want to capture so the parens are closed. = )
  • Finally, everything else in the string is matched .* where .=20 is any character (almost) and *=20 0 or more. This is pretty much guaranteed to match to = the end=20 of the line=20
  • Having matched the entire string (and put part of it into $1 by using parens) we simply = replace the=20 string with what we have matched.

Easy !

Mention should also be made of sprintf=20 , which is exactly like printf=20 except it doesn't print. You just use it to format = strings,=20 which you can do something with later. For example :

open TRIN, "tree c:\\ /a |" or die "Can't see the =
tree :$!";

while (<TRIN>) {
	$line=3D sprintf "%3s $_", $.;
	print $line;
	last if $. =3D=3D 10;
}

   

Quote execute

@opts=3Dqw(w on ad oe =
b);

for (@opts) {
	$result=3Dqx(dir /$_);
	print "dir /$_ resulted in:\n$result",'-' x 79;
	sleep 1;
}
Anything within qx( ) =
is    executed, and duly variable interpolated. This =
sample also demonstrated qw =
which is 'quote words', so the elements    of =
@opts are delimited by word boundaries, not the usual =
commas.    You can also use for =
instead of foreach =
if you want to save typing four    character for the sake =
of legibility.    

You may have noticed that system=20 outputs the result of the command to the screen whereas = qx does not. Each to its own.



Oneliners


A short example

You'll have noticed Perl packs a lot of power into a small amount = of code.=20 You can feed Perl code directly on the command line. This is known as = a=20 oneliner, for obvious reasons. An example:

perl -e"for (55..75) { print chr($_) }"

The -e switch tells Perl = that a=20 command is following. The command must be enclosed in doublequotes, = not=20 singles as on Unix. The command itself in this case simply prints the = ASCII=20 code for the number 55 to 75 inclusive.


File access

This is a simple find routine. As it uses a regex, it is infinitely = superior to NT's findstr :

perl -e"while (<>) {print if /^[bv]/i}" =
shop.txt

Remember, the while (<>)=20 construct will open whatever is in @ARGV . In this case, we have = supplied=20 shop.txt so it is opened and we print lines that begin = with=20 either 'b' or 'v'.

That can be made shorter. Run perl -h=20 and you'll see a whole list of switches. The one we'll = use now=20 is -n , which puts a while (<>) = {     }=20 loop around whatever code you supply with -e . So:

perl -ne"print if /^[bv]/i" shop.txt

which does exactly the same as the previous program, but uses the = -n switch to put a while (<>) loop around = whatever other=20 commands are supplied.

A slightly more sophisticated version:

perl -ne"printf \"$ARGV : %3s : $_\",$. if =
/^[bv]/i" shop.txt
which demonstrates that doublequotes must be escaped.    


Modifying files with a oneliner and $^I

If you don't remember $^I = then=20 please review the section on Files before proceeding. When you're = ready, copy=20 shop.txt to shop2.txt .

perl -i.bk -ne"printf \"%4s : $_\",$." shop2.txt

The -i switch primes the = inplace=20 edit operator. We still need -n = .

If you had a typical quoted email message such as: =

>> this is what was said
>> blah blah
> blaaaaahhh

The new text

and you wanted to remove the >, then:

perl -i.bk -pe"s/^>+ ?//" email.txt

does the trick. Regex recap -- the caret matches what follows to = the=20 beginning of the string, the + means one or more (no, we = do not=20 use * which means 0 or more), then we will match one = space with=20 \s , but it is not necessary for the space to be there = for the=20 match to be successful, hence ? .

What is new in terms of oneliners is the use of -p , which does exactly the same = thing as=20 -n except that it adds a = print statement too. In case you = were=20 wondering why the previous example used -n = and this one uses -p = --=20 the previous example uses prints data with printf, whereas this example = doesn't have an=20 explicit print statement so we provide one with -p=20 .

Some other useful oneliners -- a calculator and a ASCII number = lookup:=20

perl -e"print 50/200+2"
perl -e"for (50..90) { print chr($_) }"

There are plenty more oneliners, and they are an essential part of = any=20 sysadmin's toolbox. The two examples below are functionally equivalent = but the=20 lower one is perhaps a little more readable:

perl -e"for $i (50..90) { print chr($i),\" is =
$i\n\" }"

perl -e"for $i (50..90) { print chr($i),qq| is $i\n| }

Whatever follows qq is = used as a=20 delimiter, instead of having to escape the backslash. I learnt this = from the=20 Perl-Win32-Users mailing list (see top) - I think it was Lennart = Borgman who=20 pointed it out. He also mentioned that you don't need the closing = doublequote.=20 Saves a little typing.


Subroutines and Parameters

In Perl, subroutines are functions are subroutines. If you like, a=20 subroutine is a user defined function. It's a bit like calling a = script a=20 program, or a program a script. For the purposes of this tutorial = we'll refer=20 to functions as subroutines, except when we call them functions. Hope = that's=20 made the point.

For the purposes of this section we will develop a small program = which, by=20 the end, will demonstrate how subroutines work. It also serves to = demonstrate=20 how many programs are built, namely a little at a time, in manageable=20 sections. At least, that method works for me. engines.

The chosen theme is gliding. That's aeroplanes without engines. A = subject=20 close to every glider pilot's heart is how far they can fly from the = altitude=20 they are at. Our program will calculate this. To make it easy we'll = assume the=20 air is perfectly calm. Wind would be a complication we don't need, = especially=20 when in a crowded lift.

What we need in order to calculate the distance we can fly is:=20

  • How high we are (in feet)=20
  • How many metres we travel forward for every metre we drop. This = is the=20 glide ratio, for example 24:1 would mean travelling 24 metres = forward for=20 every 1 metre of height lost.

Obviously input is needed. We can either prompt the user or grab = the input=20 from the command line. The latter is easier so we'll just look at = @ARGV for the command line = parameters. Like=20 so:

($height,$angle)=3D@ARGV;		# @ARGV =
is the command line parameters

$distance=3D$height*$angle;	# an easy calculation

print "With a glide ratio of $angle:1 you can fly $distance from =
$height\n";

The above should be executed thus:

perl yourscript.pl 5000 24

or whatever your script is called, with whatever parameters you = choose to=20 use. I'm a poet and I don't even know it.

That works. What about a slight variation? The pilot does have some = control=20 over the glide ratio, for example he can fly faster but at a penalty = of a=20 lesser glide ratio. So we should perhaps give a couple of options = either side=20 of the given parameters:

($height,$angle)=3D@ARGV;

$distance=3D$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from =
$height\n";

$angle++;			# add 1 to $angle
$distance=3D$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from =
$height\n";

$angle-=3D2;			# subtract 2 from $angle so it is 1 less than the =
original
$distance=3D$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from =
$height\n";

That's cumbersome code. We repeat exactly the same statement. This = wastes=20 space, and if we want to change it there are three changes to be made. = A=20 better option is to put it into a subroutine:

($height,$angle)=3D@ARGV;

&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from =
$height\n";

$angle++;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from =
$height\n";

$angle-=3D2;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from =
$height\n";

sub howfar {				# sub subroutinename
	$distance=3D$height*$angle;
}

This is a basic subroutine, and you could stop here and have learnt = a very=20 useful technique for programming. Now, when changes are made they are = made in=20 one place. Less work, less chances of errors. Improvements can always = be made.=20 For example, pilots outside Eastern Europe generally measure height in = feet,=20 and glider pilots are usually concerned with how many kilometres they = travel=20 over the ground. So we can adapt our program to accept height in feet = and=20 output the distance in kilometres:

($height,$angle)=3D@ARGV;

$height/=3D3.2;			# divide feet by 3.2 to get metres

&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from =
$height\n";

$angle++;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from =
$height\n";

$angle-=3D2;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from =
$height\n";

sub howfar {
	$distance=3D$height*$angle;
}

When you run this you'll probably get a result which involves a = fair few=20 digits after the decimal point. This is messy, and we can fix this by = the=20 int function, which in Perl = and most=20 other languages returns a number as an integer, ie without those = irritating=20 numbers after the decimal point.

You might have also noticed a small bit of Bad Programming Practice = slipped=20 into the last example. It was the evil Constant, the '3.2' used to = convert=20 feet to metres. Why, I don't hear you ask, is this bad? Surely the = conversion=20 will never change?

It won't change, but our use of it might. We may decide that it = should be=20 3.208 instead of 3.2. We may decide to convert from feet to nautical = miles=20 instead. You don't know what could happen. Therefore, code with = flexibility in=20 mind and that means avoiding constants.

The new improved version with int=20 and constant removed:

($height,$ratio)=3D@ARGV;
$cnv1=3D3.2;			# now it is a variable.  Could easily be a cmd line=20
				# parameter too.  We have the flexibility.
$height  =3Dint($height/$cnv1);	# divide feet by 3.2 to get metres

&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from =
$height\n";

$ratio++;
&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from =
$height\n";

$ratio-=3D2;
&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from =
$height\n";

sub howfar {
	$distance=3Dint($height*$ratio);
}

We could of course build the print=20 statement into the subroutine, but I usually separate = output=20 presentation from the calculation. Again, that means it is easier to = modify=20 later on.

Something else we can improve about this code is the use of the = $ratio variable. We are having to = keep track=20 of what we do to it -- first add one, then subtract two in order to = subtract=20 one from the original input. In this case it is fairly easy, but with = a=20 complex program it can be difficult, and you don't want to be creating = lots of=20 variables just to track one input, for example $ratio1 ,=20 $ratio2 etc.


Parameters

One solution is to pass the subroutine parameters. In the best = tradition of=20 American columnists, who seem to have a particular affection for this = phrase,=20 'Here's how:'

($height,$ratio)=3D@ARGV;
$cnv1=3D3.2;		=09

&howfar($height,$ratio);
print "With a glide ratio of $ratio:1 you can fly $distance from =
$height\n";

&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from =
$height\n";

&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from =
$height\n";

sub howfar {
	print "The parameters passed to this subroutine are @_\n";
	($ht,$rt)=3D@_;
	$ht  =3Dint($ht/$cnv1);
	$distance=3Dint($ht*$rt);
}

Quite a few things have changed here. Firstly, the subroutine is = being=20 called with parameters. These are a comma-delimited list in parens = after the=20 subroutine call. The two parameters are $height and=20 $ratio.

The parameters end up in the subroutine as the @_=20 array. Being an array, they are in the same order as = passed. All=20 the usual array operations work. All we will do is assign the contents = of the=20 array to two variables.

We have also moved the conversion function into the subroutine, = because we=20 want to put all the code for generating the distance into one place. =


Namespaces

We cannot use the variable names $height and = $ratio=20 because we modify them in the subroutine and that will affect = the main=20 program. So we choose new ones to do the operation on. Finally, a = small change=20 is made to the print output.

This approach works well enough for our small program here. For = larger=20 programs, having to think of new variable names all the time is = difficult. It=20 would be even more difficult if different programmers were working on=20 different sections of the program. It would be impossible if a program = were=20 written, then an extension created by another person somewhere else, = and that=20 same extension had to be used by many people in many different = programs.=20 Obviously, the risk of using the same variable name is too great. = There are=20 only so many logical names out there.

There is a solution. Imagine you own a house with two gardens. You = have two=20 identical dogs, one in the front garden, one in the back garden. Bear = with me,=20 this is relevant. Both dogs are called Rover, because their owner = lacks=20 imagination.

When you go to the front garden and call 'Rover!!!' or open a can = of dog=20 food, the dog in the front garden comes running. Similarly, you go to = the back=20 garden, call your dog and the other dog bounces up to you.

You have two dogs, both called Rover, and you can change either one = of=20 them. Wash one, neuter the other -- it doesn't matter, but both are = dogs and=20 both have the same name. Changes to one won't affect the other. You = don't get=20 them confused because they are in different places, in two different=20 namespaces.


Variable Scope

To bring things back to Perl, a short diversion is necessary to = illustrate=20 the point with actual Perl code instead of canine metaphors:

$name=3D'Rover';
$pet =3D'dog';
$age =3D3;

print "$name the $pet is aged $age\n";

{
	my $age =3D4;	  # run this again, but comment this line out
	my $name=3D'Spot';  # and this one
	$pet    =3D'cat';

	print "$name the $pet is aged $age\n";
}

print "$name the $pet is aged $age\n";
This is pretty straightforward until we get to the { . This marks the start of a =
block.    One feature of a block is that it can have its own =
namespace. Variables    declared, in other words initialised, within =
that block are just normal    variables, unless they are declared with =
my    .    

When variables are declared with my=20 they are visible inside the block only. Also, any = variable which=20 has the same name outside the block is ignored. Points to note from = the=20 example above:=20

  • The two my variables = appear to=20 overwrite the variables of the same name from outside the block.=20
  • The two original variables aren't really overwritten because as = we prove=20 after the block has ended, they haven't been touched.=20
  • The variable $pet is accessible inside and outside = the=20 block as usual. Of course, if we declare it with my then things will change. =


my Variables

So there we have it. Namespaces. They work for all the other types = of=20 variable too, like arrays and hashes. This is how you can write code = and not=20 care about what other people use for variable names -- you just = declare=20 everything with my and have = your own=20 private party. Our original program about gliding can be improved now: =

($height,$ratio)=3D@ARGV;
$cnv1=3D3.2;		=09

&howfar($height,$ratio);
print "With a glide ratio of $ratio:1 you can fly $distance from =
$height\n";

&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from =
$height\n";

&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from =
$height\n";

sub howfar {
	my ($height,$ratio)=3D@_;
	$height  =3Dint($height/$cnv1);
	$distance=3Dint($height*$ratio);
}

The only change is that the parameters to the subroutine, ie the = contents=20 of the array @_ , are = declared with=20 my . This means they are now = only=20 visible within that block. The block happens to also be a subroutine. = Outside=20 of the block, the original variables are still accessible. At this = point I'll=20 introduce the technical term, which is lexical scoping. That = means the=20 variable is confined to the block -- it is only visible within the = block.

We still have to be concerned with what variables we use inside the = subroutine. The variable $distance is created in the = subroutine=20 and used outside of it. With larger programs this will cause exactly = the same=20 problem as before -- you have to be careful that the subroutine = variables you=20 use are the same ones as outside the subroutine. For all the same = reasons as=20 before, like two different people working on the code and use of = custom=20 extensions to Perl, that can be difficult.

The obvious solution is to declare $distance with = my , and thus lexically scope it. = If we do=20 this, then how do we get the result of the subroutine? Like so: =

($height,$ratio)=3D@ARGV;
$cnv1=3D3.2;		=09

$distance=3D&howfar($height,$ratio);  # run this again and delete =
'$distance=3D'
print "With a glide ratio of $ratio:1 you can fly $distance from =
$height\n";

$distance=3D&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from =
$height\n";

$distance=3D&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from =
$height\n";

sub howfar {
	my ($height,$ratio)=3D@_;
	my $distance;
	$height  =3Dint($height/$cnv1);
	$distance=3Dint($height*$ratio/1000); 	# output result in kilometres =
not metres
}

First change -- $distance is declared with my . Secondly, the result of the = subroutine=20 is assigned to a variable, which is also named $distance. = However, it is a $distance in a different namespace. = Remember the=20 two gardens. You may wish to delete the $distance=3D from = the first=20 assignment and re-run the code. The only other change is one to change = the=20 output from meters to kilometres.

We have now achieved a sort of Black Box effect, where the = subroutine is=20 given input and creates output. We pass the subroutine two numbers, = which may=20 or may not be variables. We assign the output of the subroutine to a = variable.=20 We care not what goes on inside the subroutine, what variables it = uses or=20 what magic it performs. This is how subroutines should operate. = The only=20 exception is the variable $cnv1. This is declared in the = main=20 body of the program but also used in the subroutine. This has been = done in=20 case we need to use the variable elsewhere. In larger programs it = would be a=20 good idea to pass it to subroutines along with the other parameters = too.


Multiple Returns

That's all the major learning out the way with. The next step is = relatively=20 easy, but we need to add new functionality to the program in order to=20 demonstrate it. What we will do is work out how long it will take the = glider=20 pilot to fly the distance. For this calculation, we need to know his = airspeed.=20 That can be a third parameter. The actual calculation will be part of=20 howfar. An easy change:

($height,$ratio,$airspeed)=3D@ARGV;
$cnv1=3D3.2;		=09
$cnv2=3D1.8;

($distance,$time)=3D&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=3D&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking =
$time\n";

($distance,$time)=3D&howfar($height,$ratio-1,$airspeed);
print "Glide ratio ",$ratio-1,":1, $distance from $height taking =
$time\n";

sub howfar {
	my ($height,$ratio,$airspeed)=3D@_;
	my ($distance,$time);			    # how to 'my' multiple variables
	$airspeed*=3D$cnv2;		 	    # convert knots to kmph
	$height  =3Dint($height/$cnv1);
	$distance=3Dint($height*$ratio/1000);
	$time	 =3Dint($distance/($airspeed/60));    # simple time conversion
	# print "Time:$time, Distance:$distance\n"; # uncomment this later
}

This doesn't work correctly. First, the changes. The result from=20 howfar is now assigned to two variables. Subroutines = return a=20 list, and so assigning to some scalar variables between parens = separated by=20 commas will work. This is exactly the same as reading the command line = arguments from @ARGV .

We are also passing a new parameter, $airspeed. There = is a=20 another conversion and a one-line calculation to provide the amount of = minutes=20 it will take to fly $distance.

If you look carefully, you can perhaps work out what the problem = is. There=20 was a clue in the Regex section, when /e=20 was explained.

The problem is that Perl returns the result of the last = expression=20 evaluated. In this case, the last expression is the one = calculating=20 $time, so the value $time is returned, and = it is the=20 only value returned. Therefore, the value of $time is = assigned to=20 $distance, and $distance itself doesn't = actually get=20 a value at all.

Re-run the program but this time uncomment the line in the = subroutine which=20 prints $distance and $time. You'll noticed = the value=20 is 1, which means that the expression was successful. Perl is = faithfully=20 returning the value of the last expression evaluated.

This is all well and good, but not what we need. What is required = is a=20 method of telling Perl what needs to be returned, rather than what = Perl thinks=20 would be a good idea:

($height,$ratio,$airspeed)=3D@ARGV;
$cnv1=3D3.2;		=09
$cnv2=3D1.8;

($distance,$time)=3D&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=3D&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking =
$time\n";

($distance,$time)=3D&howfar($height,$ratio-1,$airspeed);
print "Glide ratio ",$ratio-1,":1, $distance from $height taking =
$time\n";

sub howfar {
	my ($height,$ratio,$airspeed)=3D@_;
	my ($distance,$time);			 # how lexically scope multiple variables
	$airspeed*=3D$cnv2;			 # convert knots to kmph
	$height  =3Dint($height/$cnv1);
	$distance=3Dint($height*$ratio/1000); 	 # output result in kilometres =
not metres
	$time	 =3Dint($distance/($airspeed/60)); # simple time conversion
	return ($distance,$time);		 # explicit return
}

A simple fix. Now, we tell Perl what to return, with the aptly = named return function. With this = function we have=20 complete control over what is returned and when. It is quite usual to = use=20 if statements to control = different=20 return values, but we won't bother with that here.

There is a subtle flaw in the program above. It is not backwards = compatible=20 with the old method of calling the subroutine. Run this:

($height,$ratio,$airspeed)=3D@ARGV;
$cnv1=3D3.2;		=09
$cnv2=3D1.8;

($distance,$time)=3D&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=3D&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking =
$time\n";

$distance=3D&howfar($height,$ratio-1);	# old way of calling it
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from =
$height\n";

sub howfar {
	my ($height,$ratio,$airspeed)=3D@_;
	my ($distance,$time);			=20
	$airspeed*=3D$cnv2;			=20
	$height  =3Dint($height/$cnv1);
	$distance=3Dint($height*$ratio/1000); 	=20
	$time	 =3Dint($distance/($airspeed/60));=20
	return ($distance,$time);
}

A division by 0 results third time around. This is of course = because=20 $airspeed doesn't exist, so of course it will effectively = be 0.=20 Making your subroutines backwards compatible is important in large = programs,=20 or if you are writing an add-in module for other people to use. You = can't=20 expect everyone to retrofit additional parameters to their subroutine = calls=20 just because you decided to be a bit creative one day.

There are many ways to fix the problem, and this is just one: =

($height,$ratio,$airspeed)=3D@ARGV;
$cnv1=3D3.2;		=09
$cnv2=3D1.8;

($distance,$time)=3D&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=3D&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking =
$time\n";

$distance=3D&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from =
$height\n";

print "Direct print: ",join ",",&howfar(5000,55,60)," not bad for no =
engine!\n";

sub howfar {
	my ($height,$ratio,$airspeed)=3D@_;
	my ($distance,$time);			 # how to 'my' multiple variables
	$airspeed*=3D$cnv2;			 # convert knots to kmph
	$height  =3Dint($height/$cnv1);
	$distance=3Dint($height*$ratio/1000); 	 # output result in kilometres =
not metres
	if ($airspeed > 0) {
		$time	 =3Dint($distance/($airspeed/60));
		return ($distance,$time);
	} else {
		return $distance;
	}
}

Here we just test the $airspeed to ensure we won't be = doing=20 any divisions by 0. It also affects what we return. There is also a = new print statement, which shows that = you don't=20 need to assign to intermediate variables, or even pass variables as=20 parameters. Constants, evil things that they are, work just as well. I = already=20 mentioned this, but a demonstration doesn't hurt. Unless you work for = an=20 electric chair manufacturer.

The astute reader.....:-) Every time I read that I wonder what I've = missed.=20 Usually something obscure which the author knows nobody will ever = notice, but=20 likes to belittle the reader. No exception here! Anyway, you may be = wondering=20 why this would not have sufficed instead of the if=20 statement:

sub howfar {
	my ($height,$ratio,$airspeed)=3D@_;
	my ($distance,$time);			 # how to 'my' multiple variables
	$airspeed*=3D$cnv2;			 # convert knots to kmph
	$height  =3Dint($height/$cnv1);
	$distance=3Dint($height*$ratio/1000); 	 # output result in kilometres =
not metres
	$time	 =3Dint($distance/($airspeed/60)) if $airspeed > 0;
	return ($distance,$time);
}

After all, the first item returned is $distance, so = therefore=20 it should be the first one assigned via:

$distance=3D&howfar($height,$ratio-1);

and $time should just disappear into the bit bucket. =

The answer lies with scalars and lists. We are returning a list, = but=20 assigning it to a scalar. What happens when you do that? The scalar = takes on=20 the last value of the list. The last value of the list being = returned=20 is of course $time, which is has been declared but not = otherwise=20 touched. Therefore, it is nothing and appears as such on the printed=20 statement. A small program to demonstrate that point:

$word=3D&wordfunc("Greetings");
print "The word is $word\n";

(@words)=3D&wordfunc("Bonjour");
print "The words are @words\n";

sub wordfunc {
my $word=3Dshift;		# when in a subroutine, shifts @_ if no target =
specified
	my @words;				# how to my an array
	@words=3Dsplit //,$word;			# splits on the nothings between each letter
	($first,$last)=3D($words[0],$words[$#words]);  # see section on Arrays =
if required
	return ($first,$last);			# Returns just the first and last
}

As you can see, the first call prints the letter 's', which is the = last=20 element of the list that is returned. You could of course use a list=20 consisting of just one element:

($word)=3D&wordfunc("Greetings");

Now we are assigning a list to a list, so perl starts at the first = element=20 and keeps assigning till it runs out of elements. The parens turns a = lonely=20 scalar into an element of a list. You might consider always assigning = the=20 results of subroutines this way, as you never know when the subroutine = might=20 change. I know I've just evangelised about how subroutines shouldn't = change,=20 but if you take care and the subroutine write takes care, there = definitely=20 won't be any problems!

That's about it for good old my = .=20 There is a lot more to learn about it but that's enough to get = started. You=20 now know about a little about variable visibility, and I don't mean = changeable=20 weather.


Local

There is one more function that I'd like to draw to your attention, = and=20 we'll launch straight into the demonstration:

@words=3D@ARGV;

print "Output Field Separator is :$,:\n";
print '1. Words:', @words, "\n";

&change;

$,=3D'_';

print "\nOutput Field Separator is :$,:\n";
print '2. Words:', @words, "\n";

&change;

sub change {
	print '   Words:', @words, "\n";
}which should be executed something like this: 
perl test.pl sarcasm is the lowest form of wit
The special variable $,    =
defines what Perl should print in between lists it is =
given. By    default, it is nothing. So the first two prints should have =
no spaces between    the words. Then we assign '_' to $, so    the next prints have =
underscores between the words.    

If we want to use a different value for $,=20 in the change subroutine, and not disturb = the main=20 value, we have a little problem. This problem cannot be solved by = my because global variables like = $, cannot at this time be = lexically scoped.=20 So, we could manually do it:

@words=3D@ARGV;

print "Output Field Separator is :$,:\n";
print '1. Words:', @words, "\n";

&change;

$,=3D"_";

print "\nOutput Field Separator is :$,:\n";
print '2. Words:', @words, "\n";

&change;

sub change {
	$save=3D$,;
	$,=3D'*';
	print '   Words:', @words, "\n";
	$,=3D$save;
}
That works, but it is messy. Perl has a special function for    =
occasions of this nature, called local    =
. An example of local =
in    action: 
@words=3D@ARGV;

print "Output Field Separator is :$,:\n";
print '1. Words:', @words, "\n";

&change;

$,=3D"_";

print "\nOutput Field Separator is :$,:\n";
print '2. Words:', @words, "\n";

&change;

sub change {
        local $,=3D"!-!";
	print '   Words:', @words, "\n";
}
You can try it with my    =
instead but it won't work. I'm sure you'll try it anyway, =
I know    you learn things the hard way otherwise you a) wouldn't be =
programming    computers and b) wouldn't be using this tutorial to do =
it.    

The local function works = in a=20 similar way to my , but = assigns=20 temporary values to global variables. The my=20 function creates new variables that have the same name. = The=20 distinction is important, but the reasons require perl proficiency = beyond the=20 scope of this humble tutorial. In practice, the difference is:=20

  • lexically scoped variables (those declared with my )are faster than = non-lexically scoped=20 variables.=20
  • local variables are = visible to=20 called subroutines.=20
  • my doesn't work on = global=20 variables like $, so you = must use=20 local .


Returning arrays

So that's the end of subroutines and parameters. Would you believe = I have=20 only scratched the surface? There are closures, prototypes, = autoloading and=20 references to learn. Not, however, in this tutorial. At least not yet. = I'll=20 finish with one last demonstration. You may have noticed that Perl = returns one=20 long list from subroutines. This is fine, but suppose you want two = separate=20 lists, for example two arrays? This is one way to do it:

($w1,$w2)=3D&wordfunc("Hello World");	# =
Assign the array references to scalars

print "@$w1 and @$w2\n";		# deference, ie access, the arrays referred to
#print "$w1 and $w2\n";			# uncomment this next time round

sub wordfunc {
        my $phrase=3Dshift;
	my (@words,@word1,@word2);	# declare three variables lexically
	@words=3Dsplit /\s+/,$phrase;	# split the phrase on whitespace
	@word1=3Dsplit //,$words[0];	# create array of letters from the first =
word
	@word2=3Dsplit //,$words[1];	# and the second
	return (\@word1,\@word2);	# return references to the two arrays -- =
scalars
}

There is a lot going on there. It should be clear up until the = return statement. As we know, Perl = only=20 returns a single list. So, we make Perl return a list of the arrays it = has=20 just created. Not the actual arrays themselves, but references to the = arrays.=20 A bit like a shopping list is a just a bit of paper, not the actual = goods=20 itself. The reference is created by use of the \=20 backslash.

Having returned two array references they are assigned to scalar = variables.=20 If you uncomment the second print line you'll see two references to = arrays.=20

The next problem is how to dereference the references, or access = the=20 arrays. The construct @$xxx = does that=20 for us. I know I said I wouldn't cover references, and I haven't -- = that is=20 just a useful trick.

This little section is not designed as a complete guide, it is just = a=20 taster of things to come. Perl is immensely powerful. If you think = something=20 can't be done, the problem is likely to be it is beyond your ability, = not that=20 of Perl.


Modules


An introduction

Subroutines are oft-used pieces of code. They exist so you can = re-use the=20 code and not have to constantly rewrite it.

A module is, in principle, similar to a subroutine. It is also an = oft-used=20 piece of code. The difference is that modules don't live in your = program, they=20 are their own separate script outside your code. For example, you = might write=20 a routine to send email. You could then use this code in ten, a = hundred, a=20 thousand different programs just by referencing the original program. =

As you would expect, the basic Perl package includes a large number = of=20 modules. These have been written by people who had a need for the = code, made=20 it a module and released it into the big wide world. Many of these = modules=20 have been debugged, improved and documented by yet more people. To = quote the=20 OpenSource mantra, all bugs are shallow under the scrutiny of every=20 programmer.

Aside from the many modules included with Perl there are hundreds = more=20 available on CPAN, the Comprehensive Perl Archive Network. Refer to = your=20 documentation for details.


File::Find -- using a module

An example of a module included with Perl is = File::Find. There=20 are several modules under the File::Find section, such as = File::Basetree, File::Compare and=20 File::Stat.

This is an example of how File::Find can be used: =

use File::Find;

$dir1=3D'/some/dir/with/lots/of/files';
$dir2=3D'/another/directory/';

find(\&wanted, $dir1,$dir2);

sub wanted {
	print "Found it $File::Find::dir/$_\n" if /^[a-d]/i;

}

The first line is the most important. The use=20 function loads the File::Find module. Now, = all the=20 power and functionality of File::Find is available for = use. Such=20 as the find function. This accepts two basic parameters:=20

  • The name of a subroutine, usually wanted which = defines what=20 you want to do with the list of files being returned. The filename = will be=20 in $_.=20
  • A list of directories to be searched. Subdirectories will also = be=20 searched.

The subroutine wanted simply prints the directory the = file was=20 found in if the filename begins with a,b,c or d. Make your own regex = to suit.=20 The line $File::Find::dir means the $dir = variable in=20 the module $File::Find. This is explained further in the = next=20 section.

Note -- the \&wanted parameter is a reference to a = subroutine. Essentially, this means that the code in = File::Find=20 knows where to find the &wanted subroutine. It is = basically=20 like shortcuts under Windows 9x and NT4, instead of actual files (but = the UNIX=20 Perl people would slaughter me for that, so be quiet).


ChangeNotify

Another example is Win32::ChangeNotify. As you might = expect=20 there are a number of Win32-specific modules, and ChangeNotify is one = of them.=20 It waits until a something changes in a directory, then acts. What it = waits=20 for and what it does are up to you, for example:

use Win32::ChangeNotify;

$Path=3D'/downloads';
$WatchSubTree=3D0;
$Events=3D'FILE_NAME';
$browser=3D'E:/progs/netscape/Communicator/program/netscape.exe';
$changes=3D0;

$notify =3D Win32::ChangeNotify->new($Path,$WatchSubTree,$Events);

while (1) {
	print "- ",scalar(localtime)," $changes so far to $Path.\n";
	$notify->wait;
	++$changes;
	print "- ",scalar(localtime), " Launching $browser...\n";
	system("$browser $Path");
	$notify->reset;
}


Again, the module is incorporated into the program with use . An object referred to by the = variable=20 $notify is created. The parameters passed are the path to = be=20 watched, whether we want to watch subtrees, and what sort of events we = want to=20 be notified about, in this case only filename changes.

Then, we enter a loop which continues while 1 is true -- which will = be=20 forever.

The program pauses when the wait method of the=20 $notify notify object is called. Only when there is a = change to=20 the directory, then the rest of the subroutine completes, launching = the=20 browser. We have to reset the $notify object.

There is some pretty frightening stuff about objects in the = explanation.=20 But you don't actually need to understand anything about objects. Just = read=20 the documentation, and experiment.

You can use as many modules as you like in one program. As they are = all=20 written with carefully scoped variables you need not worry about = programmers=20 using the same variable names in different modules. Now you *really*=20 appreciate scoping!


Your Very Own Module

You too can write your own modules. It is easy. First, we will = create the=20 fantastic bit of code that we want to re-use everywhere. First, we'll = write a=20 normal Perl program:

$name=3Dshift;

print &logname($name);

sub logname {
	my $name=3Dshift;
	my @namebits;
	my ($logon,$inital);
	@namebits=3Dsplit /\s+/,$name;
	($inital)=3D$name=3D~/(\w)/;
	$logon=3D$inital.$namebits[$#namebits];
	$logon=3Dlc $logon;
	return $logon;
}

Execute like so; perl script.pl "Nick=20 Bladon"

The script itself is nothing amazing. The lc=20 function stands for LowerCase, or probably lOWERcASE -- = you can=20 see what it does.

In order to turn it into a module carry out the following steps:=20

  1. Find out where your copy of Perl is installed, for example=20 c:\progs\perl.=20
  2. Within that directory there should be a lib = directory.=20
  3. Make a directory within lib, for example=20 c:\progs\perl\lib\RMP\

Now we'll make the module. Remember, a module is just code you are = going to=20 reuse. So we don't need all of the above example. Just this bit: =

sub logname {
        my $name=3Dshift;
        my @namebits;
        my ($logon,$inital);
        @namebits=3Dsplit /\s+/,$name;
        ($inital)=3D$name=3D~/(\w)/;
        $logon=3D$inital.$namebits[$#namebits];
        $logon=3Dlc $logon;
        return $logon;
}

1;

The bit that has been added is the 1 at the bottom. = Why? Perl=20 requires that all modules return true. We know that a subroutine = always=20 returns the value of the last expression evaluated. As 1 evaluates to = true,=20 that'll do.

You need to save this as logon.pm in your newly = created=20 directory under lib. The pm stands for Perl = Module.=20

That's it. A module created. To use, just make a normal Perl script = such=20 as:

use RMP::logon;

$name=3Dshift;

print logname($name);

and hey presto! Module power is yours!

You don't have to create your own subdirectory within = lib but=20 I would advise it for the sake of neatness. And as you might expect, = there is=20 a lot more to learn about modules but this is supposed to be a basic = tutorial,=20 so that's enough for the time being.


Bondage and Discipline

Perl is a very flexible language. It is designed as a hacking tool, = for=20 quick sysadmin magic. It can do quite a bit more besides, but being = small and=20 powerful is a core Perl feature. Earlier on I said Perl is not a = bondage and=20 discipline language -- to qualify that, it doesn't have to be. = However,=20 there is a time and place for everything.

For tiny scripts you don't want to be declaring variables, = typecasting and=20 generally spending more time obeying rules than you do getting the job = done.=20 So, Perl doesn't force you to do all of these good programming = practices.=20 However, not all your programs are going to be five-minute hacks. Some = will be=20 pretty large. Therefore, some Discipline is in order.

Perl has two primary methods of enforcing discipline. They are:=20

  • -w for Warnings=20
  • use strict;


-w

Consider for a moment this little program:

@input=3D@ARGV;

$outfile=3D'outfile.txt';
open OUT, ">$outfile" or die "Can't open $outfile for write:$!\n";

$input2++;
$delay=3D2 if $input[0] eq 'sleep';

sleep $delay;

print "The first element of \@input is $input[0]\n";
print OUY "Slept $delay!\n";

It doesn't do much. Just prints out the first argument supplied, = and=20 demonstrates the uninspiring sleep=20 function. The program itself is full of holes, and it is = only a=20 few lines. How many errors can you spot? Try and count them. When you = are=20 finished, execute the program with error-checks enabled:

perl -w script.pl hello
Perl finds quite a few errors. The -w =
   switch finds, among other heinous sins:    
  • Variables used only once. In the example, $input2 = is used=20 only once. It is a useless variable.=20
  • Filehandles used incorrectly. With print OUY I'm = trying to=20 print to a non-existent filehandle. With -w=20 an alarm is raised, as it would be if I tried to write = to a=20 filehandle which was read-only.=20
  • Use of uninitialised variables. The variable $delay = is=20 uninitialised if 'sleep' is not the first parameter. Making = variables spring=20 into the air on demand is not good programming practice. They should = be=20 defined carefully first.

So, generally, -w is a = Good Thing.=20 It forces you to write cleaner code. So use it, but don't be afraid = not to for=20 very short programs.


Shebang

You know that you can turn warnings on with -w=20 on the command line. You can also turn them on within = the script=20 itself. For that matter, you can give perl any command line option = within the=20 script itself. For example:

perl =
script.pl hello
to execute this: 
#!perl -w

@input=3D@ARGV;

$outfile=3D'outfile.txt';
open OUT, ">$outfile" or die "Can't open $outfile for write:$!\n";

$input2++;
$delay=3D2 if $input[0] eq 'sleep';

sleep $delay;

print "The first element of \@input is $input[0]\n";
print OUY "Slept $delay!\n";

has the same effect as:

perl -w =
script.pl hello
It may be more convenient for you to put the flag inside the    =
script. It doesn't have to be just -w    =
, it can be any argument Perl supports. Run 
perl -hfor a full list.    

The first line, #!perl -w is the shebang line. This is = derived=20 from UNIX, where Perl was first developed. UNIX systems make a script=20 executable by changing an attribute. The operating system then loads = the file=20 and works out how to execute it -- in this case by looking at the = first line,=20 then loading the perl interpreter. Windows systems know that all files = with a=20 certain extension must be passed to a certain program for execution, = eg all=20 .bat files are passed to command.com, and = all=20 .xls files are passed to Excel. The point of all this = being that=20 you don't need a shebang line, but it doesn't hurt.


use strict;

So what's strict and how do you use it? The module strict=20 restricts 'unsafe constructs', according to the perldocs. The=20 strict module is a pragma, which is a hint that = must be=20 obeyed. Like when your girlfriend says 'oh, that ring is *far* too = expensive'.=20

There is no need to be frightened about unsafe code if you don't = mind=20 endless hours of debugging unstructured programs. When you enable the=20 strict module, the three things that Perl becomes strict = about=20 are:=20

  • Variables 'vars'=20
  • References 'refs'=20
  • Subroutines 'subs'

This tutorial doesn't presently cover references (and let's hope I = remember=20 to remove this sentence if I do cover it in later versions) so we = won't worry=20 about refs.

Strict variables are useful. Essentially, this means that all = variables=20 must be declared, that is defined before use rather than springing = into=20 existence as required. Furthermore, each variable must be defined with = my or fully qualified. This is an = example of=20 a program that is not strict, and should be executed something like = this:=20

perl script.pl "Alain James Smith";
where the "" enclose the string as a single parameter as opposed  =
  to three separate space-delimited parameters. 
#use strict;			# uncomment after running a couple =
of times

$name=3Dshift;			# shifts @ARGV if no arguments supplied

print "The name is $name\n";
$inis=3D&initials($name);

$luck=3Dint(rand(10)) if $inis=3D~/^(?:[a-d]|[n-p]|[x-z])/i;

print "The initials are $inis, lucky number: $luck\n";

sub initials {
        my $name=3Dshift;
        $initials.=3D$1 while $name=3D~/(\w)\w+\s?/g;
        return $initials;
}

By now you should be able to work out what the above does. When you = uncomment the use strict; = pragma, and=20 re-run the program, you will get output something like this: =

Global symbol "$name" requires explicit package name at n1.pl =
line 3.
Global symbol "$inis" requires explicit package name at n1.pl line 6.
Global symbol "$luck" requires explicit package name at n1.pl line 8.
Global symbol "$initials" requires explicit package name at n1.pl line =
14.
Execution of n1.pl aborted due to compilation errors.

These warnings mean Perl is not exactly clear about what the scope = of=20 variables is. If Perl is not clear, you might not be either. So you = need to be=20 explicit about your variables, which means either declaring them with = my so they are restricted to the = current=20 block, or referring to them with their fully qualified name. An = example, using=20 both methods:

use strict;

$MAIN::name=3Dshift;			# shifts @ARGV if no arguments supplied

print "The name is ",$MAIN::name,"\n";
my $inis=3D'';
my $luck=3D'';

$inis=3D&initials($MAIN::name);

$luck=3Dint(rand(10)) if $inis=3D~/^(?:[a-d]|[n-p]|[x-z])/i;

print "The initials are $inis, lucky number: $luck\n";

sub initials {
        my $name=3Dshift;
	my $initials;
        $initials.=3D$1 while $name=3D~/(\w)\w+\s?/g;
        return $initials;
}

The my variables in the = subroutine=20 are nothing new. The my = variables=20 outside the subroutine are. If you think about it, the main program = itself is=20 also a kind of block, and therefore variables can be lexically scoped = to be=20 visible only within the block.

The other interesting bit is the $MAIN::name business. = This,=20 as you might expect, is the fully qualified name of the variable. The = first=20 part is the package name, in this case MAIN. The second = part is=20 the actual variable name. Personally, I've never needed to refer to a = variable=20 this way. I'm not saying you'll never use the syntax, but I would = suggest that=20 knowing this is not on a perl students Top 10 list of Things to = Master.

The important thing about use strict=20 is that it does enforce more discipline than you have = been used=20 to, and for all but the smallest of programs, that is most definitely = a Good=20 Thing.


Debugging

Sooner or later you'll need to do some fairly hairy debugging. It = will be=20 later if you are using strict = , -w and writing your subroutines = properly,=20 but the moment will come.

When it does you'll be poring over code, probably late at night, = wondering=20 where the hell the problem is. Some techniques I find useful are:=20

  • Print your variables and other information out at frequent = intervals.=20
  • Split difficult components of the program out into small, = throwaway=20 scripts. Get these working, then copy the results back into the main = program.=20
  • # Comment frequently.

Eventually, you'll be stuck. Such is the price of progress. In this = case,=20 Perl's own debugger can be invaluable. Run this code as normal first:=20

$name=3Dshift;

print "Logon name creation program\n:Converting '$name'\n";

print &logname($name),"\n\n";

print "Program ended at", scalar(localtime),"\n";

sub logname {
        my $name=3Dshift;
        my @namebits;
        my ($logon,$inital);
        @namebits=3Dsplit /\s+/,$name;
        ($inital)=3D$name=3D~/(\w)/;
        $logon=3D$inital.$namebits[$#namebits];
        $logon=3Dlc $logon;
        return $logon;
}

We'll run it with the debugger so you can watch perl at work while = it runs:=20

perl -d script.pl "Peter Dakin";
and you are into the debugger, which should look something like   =
 this: 
c:\scripts\db.pl>perl -d db.pl "Peter Dakin"

Loading DB routines from perl5db.pl version 1.0401
Emacs support available.

Enter h or `h h' for help.

main::(db.pl:1):        $name=3Dshift;
  DB<1>
db.pl Name of script being executed
1 Line number of script that is just about to be = executed.
$name=3Dshift; The code that is just about to be = executed.

Type s for a single step and press enter. The code=20 $name=3Dshift; will be executed, and perl waits for your = next=20 command. Keep inputting s until the program terminates. =

This by itself is useful as you see the subroutine flow, but if you = enter=20 h for help you'll see a bewildering range of debug = options. I=20 won't detail them all here, but some of the ones I find most useful = are:=20

n Executes main program, but skips subroutine calls. The = subroutine is=20 executed, but you aren't stepped through it. Try using n=20 instead of s .
/xx/ Searches through program for xx
p Prints, for example p @namebits, p = $name=20
Enter Pressing the Enter key (inputting a carriage return) repeats = the=20 last n or s command.
perlcode You can type any perl code in and it will be evaluated, and = have a=20 effect on your program. In the example below I remove spaces = from=20 $name. Inputs in bold:
main::(db.pl:1): =
       $name=3Dshift;
  DB<1> s
main::(db.pl:3):        print "Logon name creation program\n:Converting =
'$name'\n";
  DB<1> $name=3D~s/\s//g;

  DB<2> print $name
MarkGray
  DB<3>

There are many, many more debugger options which are worth becoming = familiar with. Type h for a full list.


Logical Operators

Logical operators are such things as OR, NOT, AND. They all = evaluate=20 expressions. The expression evaluates to true, or false. Exactly what = criteria=20 for evaluation are used depends on the operator.

or

The or operator works as = follows:=20

open STUFF, $stuff or die "Cannot open =
$stuff for read :$!";

This line means -- if the operation for opening STUFF = fails,=20 then do something else. Another example:

$_=3Dshift;

/^R/ or print "Doesn't start with R\n";

If the regular expression is false, then whatever is on the left = side of=20 the or is printed. As you = know, shift works on @ARGV if no target is given, or = @_ inside a subroutine.

Perl has two OR operators. One is the now familiar = or and the other is || .


Precedence: What comes First

To understand the difference between the two we need to talk about=20 precedence. Precedence means priority, order, importance. A good = example is:=20

perl -e"print 2+8

which we know equals 10. But if we add:

perl -e"print 2+8/2

Now, will this be 2+8 =3D=3D 10, divided by 2 =3D=3D 5? Or maybe = 8/2 =3D=3D 4, plus 2=20 =3D=3D 6?

Precedence is about what is done first. In the example above, you = can see=20 that the division is done first, then the addition. Therefore, = division has a=20 higher precedence that addition.

You can force the issue with parens:

perl -e"print ((2+8)/2)

which forces Perl, kicking and screaming, to evaluate 2+8 then = divide the=20 result by 2.

So what has this to do with logical operators? Well, the main = difference=20 between or and || is precedence.

In the example below, we attempt to assign two variables to = non-existent=20 elements of an array. This will fail:

@list=3Dqw(a b c);

$name1 =3D  $list[4] or "1-Unknown";

$name2 =3D  $list[4] || "2-Unknown";

print "Name1 is $name1, Name2 is $name2\n";

print "Name1 exists\n" if defined $name1;
print "Name2 exists\n" if defined $name2;

The output is interesting. The variable $name2 has = been=20 created, albeit with a false value. However, $name1 does = not=20 exist. The reason is all about precedence. The or=20 operator has a lower precedence than || .

This means or looks at = the entire=20 expression on its left hand side. In this case, that is $name1 = =3D=20 $list[4] . If it is true, it gets done. If it is false, it is = not and=20 the right hand side is evaluated, and the left hand side is ignored as = if it=20 never existed. In the example above, once the left side is found to be = false,=20 then all the right side evaluates is "1-Unknown" which = may be=20 true but doesn't produce any output.

In the case of || , which = has a=20 higher precedence, the code immediately on the left of the operator is = evaluated. In this case, that is $list[4]. This is false, = so the=20 code immediately to the right is evaluated. But, the original code on = the left=20 which was not evaluated, $name2 =3D is not forgotten. = Therefore,=20 the expression evaluated to $name2 =3D "2-Unknown".

The example below should help clarify things:

@list=3Dqw(a b c);

$ele1 =3D $list[4] or print "1 Failed\n";
$ele2 =3D $list[4] || print "2 Failed\n";

print <<PRT;
ele1 :$ele1:

ele2 :$ele2:

PRT

The two failure codes are both printed, but for different reasons. = The=20 first is printed because we are assigning $ele1 a false = value, so=20 the result of the operation is false. Therefore, the right hand side = is=20 evaluated.

The second is printed because $list[4] itself false. = Yet, as=20 you can see, $ele2 exists. Any idea why?

The reason is that the result of print "2-Failed\n" = has been=20 assigned to $ele2. This is successful, and therefore = returns 1.=20

Another example:

$file=3D'not-there.txt';

open FILE, $file   || print "1: Can't open file:$!\n";

open FILE, $file   or print "2: Can't open file:$!\n";

In the first example, the error message is not printed. This is = because=20 $file is evaluating to true. However, in the second = example,=20 or looks at the entire = expression, not=20 just what is immediately to the left and takes action on the result of = evaluating the entire left hand side, not just the expression = immediately to=20 its left.

You can fix things with parens:

$file=3D'not-there.txt';

open FILE, $file   || print "1: Can't open file:$!\n";

open FILE, $file   or print "2: Can't open file:$!\n";

open (FILE, $file) || print "3: Can't open file:$!\n";

like so, but why bother when you have a perfectly good operator in = or ? You could apply parens = elsewhere:=20

@list=3Dqw(a b c);

$name1 =3D  $list[4]   or "1-Unknown";

($name2 =3D  $list[4]) || "2-Unknown";

print "Name1 is $name1, Name2 is $name2\n";

print "Name1 exists\n" if defined $name1;
print "Name2 exists\n" if defined $name2;

Now, ($name2 =3D $list[4]) is evaluated as a complete=20 expression, not just as $list[4] is evaluated as a = complete=20 expression, not just as $list[4], so we get exactly the = same=20 result as if we used or . =

And

now for something similar. And. Logical AND operators evaluate two=20 expressions, and return true only if both are true. Contrast = this with=20 OR, which returns true only of one or more of the two = expressions are=20 true. Perl has a few AND operators.

The first type of AND we will look at is && :

@list=3Dqw(a b c);

print "List is:@list\n";

if ($list[0] eq 'x' && $list[2]++ eq 'd') {
	print "True\n";
	} else {
	print "False\n";
}

print "List is:@list\n";

The output here is false. It is clear that $list[0] = does not=20 equal x . As AND statements can only return true if both=20 expressions being evaluated are true, then as the first statement is = false=20 this is an obvious non-starter and perl decides it need not continue = to the=20 second statement. Entirely sensible.

The second type of AND statement is &=20 . This is similar to &&=20 . See if you can work out what the difference is using = this=20 example:

@list=3Dqw(a b c);

print "List is:@list\n";

if ($list[0] eq 'x' & $list[2]++ eq 'd') {
	print "True\n";
	} else {
	print "False\n";
}

print "List is:@list\n";

The difference is that the second part of the expression is = evaluated no=20 matter what the result of the first part is. Despite the fact that the = AND=20 statement cannot possibly return true, perl goes ahead and evaluates = the=20 second part of the statement anyway, hence $list[2] ends = up as=20 d .

The third AND which we will look at is and=20 . This behaves in the same way as && but is lower = precedence.=20 Therefore, all the guidelines about ||=20 and or apply. =

Other Logical Operators

Perl has not , which = works like=20 ! except for low precedence. = If you=20 are wondering where you have seen !=20 before, what about:

$x =
!~/match/;

if ($t !=3D 5) {

as two examples. There is also Exclusive OR, or XOR. This means:=20

  • If one expression is true, XOR returns true=20
  • If both expressions are false, XOR returns false=20
  • If both expressions are true, XOR returns false (the crucial = difference=20 from OR)

This needs an example. Jane and Sonia are two known troublemakers, = with a=20 reputation for throwing good beer around, going topless at = inappropriate=20 moments and singing out of tune to the karaoke machine. You only want = to let=20 one of them into your party, and instead of a big muscle-bound bouncer = you=20 have this perl script on the door:

($name1,$name2)=3D@ARGV;

if ($name1 eq 'Jane' xor $name2 eq 'Sonia') {
	print "OK, allowed\n";
} else {
	print "Sorry, not allowed\n";
}

I would suggest running it thus:

perl script.pl Jane Karen (one true, one =
false)
perl script.pl Jim Sonia (one =
true, one false)
perl script.pl Jane Sonia (both =
true)
perl script.pl Jim Sam (both =
false)
   

Well, the script is not perfect as a doorman, as all Jane and Sonia = have to=20 do is type their names in lowercase, but hopefully it demonstrated = xor .

One thing to beware of is:

$_=3Dshift;

print "OK\n" unless not(!/r/i || /o/i & /p/ or /q/);

over-complication, and believe me the above is not as complicated = as it=20 could be. Take the time to understand what you want to do. Perl = provides a=20 plethora of logical operands so you really don't have any excuse for = not=20 writing legible code. The above can be written a lot more concisely = and=20 clearly. As well as a lot more obscurely :-)

@ARGV

Last words

I hope you have enjoyed this tutorial and learnt something from it. = I would=20 appreciate an email letting me know how it could be improved. What you = have=20 learnt is just a fraction of Perl's functionality, but you'll find = skills like=20 regexes can be applied in many other places than Perl.

Good luck.

--
Robert


Thanks to...

Everyone that helped in the development of this tutorial. I do read = all the=20 feedback emails, but don't always action them the same year. What you = have=20 just read is better because of the people below. They fix the bugs, = scream=20 when they don't understand and I rewrite whole sections. Documents = like this=20 are written by the authors, but polished by the readers.

The roll of honour is, in a semi-chronological order:

  • Mark Miller for his long email suggesting improvements = and=20 highlighting typos. I cringed when I realised what I'd let through = :-(=20
  • Roland to whom I am eternally grateful for sending in = many typo=20 reports, and pointing out where he didn't understand an explanation. =
  • Katya de = Vries for finding HTML errors and problems with the example=20 code.
  • Steven Ham for being picky about spelling errors. Good = going,=20 considering English is his second language !
  • Carlos Jaramillo Uribe for pointing out where I could = have=20 explained postincrements and regex a little better and for pointing = out a=20 typo or two.=20
  • Sergio Polini who brought an interesting aspect Perl's = behaviour=20 with arrays to my attention, and helping to improve parts of the = Regex=20 section.=20
  • Leo Durocher for telling me he had trouble with the regex = section. If he did, I'm sure many others did too.=20
  • Paul Trafford for solving the Them/Us problem I was too = lazy to=20 bother with, and doing it so elegantly.=20
  • Eric Smith who was one of many people who made me = a table=20 of contents rather than just tell me I should include one. I never = used any=20 of them, and the one you see now is auto-generated by a program = written in=20 Java (only kidding, its not auto-generated :-)=20
  • Mike Conkin who said he didn't understand $^I. Good = point. I'd=20 forgotten to explain it at all. Mike went to list several other = areas I=20 could do with improving in one of the most amusing and useful = missives I've=20 had on the tutorial. Thanks.=20
  • Vasile = Calamuti who=20 picked up on my use of join before I'd explained it, = and a=20 couple more oversights.=20
  • Didier Owono for pointing out my original explanation of=20 /ee didn't make sense. Hopefully the second version = does.=20
  • Keen Meng Lew and Ever Olano who, independently (I = assume)=20 picked up exactly the same two typos. Which are now fixed.=20
  • Anna in Ohio who sent a polite email with a few errors = she picked=20 up on.=20
  • Ken Teuchler for knowing the difference between =3D = and =3D~, and for his long list of improvements = which=20 varied from grammar errors to style suggestions to oversights. A = huge help.=20
  • cookie, firstly for his Win9x experiments and error = checks about=20 my explanation of scoping. Secondly for his many subsequent emails = pointing=20 out minor problems which elevated him to status of #1 bugfixer. = Appreciated.=20
  • Ginny for spotting an errant ; which in the best = tradition of=20 teachers I have changed into an exercise for debugging, of course I = meant to=20 leave it out in the first place. I should also point out that a = major=20 motivation for me do put the effort into this tutorial is the = appreciation=20 of the userbase, and Ginny sent me a particularly motivational = missive.=20
  • Jeffery Jackson for noticing my error about 0-based = arrays.=20
  • Kevin Haskins for pointing out Notepad's limitations and = an=20 equality issue.=20
  • Pat = McCarthy for=20 picking up a small typo.=20
  • Bob Kauten who noticed that I hadn't explained the range = operator=20 properly. I blame....well, me really.=20
  • Ayhan Tuncer for picking up a mistake where I'd = carelessly cut=20 and pasted pasted pasted. The next day Michael Kersey found = the exact=20 same error, before I'd had a chance to fix it. Ayhan also found = quite a few=20 more errors after that one during her work on the Turkish = translation.=20
  • Ray Price who was another one who found the above error, = and a=20 couple more typos as well.=20
  • Henry Vermeulen, a Dutch chap who noticed I'd mispelled = Heineken.=20 Nothing to do with Perl, just one of my outlandish examples.=20
  • Everyone that has ever worked on perl, all the hackers on = the=20 perl-win32* mailing lists, ActiveState and the = netizens of=20 clpm.

The original location of this document=20 is:
http://www.netcat.co.uk/rob/perl/win32perltut.html


This tutorial is copyright 1997, 1998, = 1999 by=20 Robert Pepper. Reproduction in whole or part is prohibited. Please = contact me=20 if you want to use this information anywhere. Thank you.=20
--
Robert=20 Pepper     mailto:Robert@netcat.co.uk =