[Melbourne-pm] basic regex - char array vs "pipe brackets"

Paul Fenwick pjf at perltraining.com.au
Thu Mar 5 18:35:33 PST 2009


Christopher Short wrote:

> Turns out the long-term perl developer who wrote that code always uses
> "pipe brackets" instead of character arrays.
> Do they really function identically?

Conceptually, for the example given, they work the same.  But under the
hood?  They're completely different.  If you work with large amounts of
data, then they're completely different in a very important way.

Let's see what 'use re qw(debug)' has to say.  This is under Perl 5.10:

$ perl -wne'use re qw(debug); /^(\w|\@\.|\-)+$/;'
Compiling REx "^(\w|\@\.|\-)+$"
Final program:
   1: BOL (2)
   2: CURLYX[0] {1,32767} (17)
   4:   OPEN1 (6)
   6:     BRANCH (8)
   7:       ALNUM (14)
   8:     BRANCH (FAIL)
   9:       TRIE-EXACT[\-@] (14)
            <@.>
            <->
  14:   CLOSE1 (16)
  16: WHILEM[1/1] (0)
  17: NOTHING (18)
  18: EOL (19)
  19: END (0)



$ perl -wne'use re qw(debug); /^[\w at .-]+$/;'
Compiling REx "^[\w at .-]+$"
synthetic stclass "ANYOF[\-.0-9 at -Z_a-z+utf8::IsWord]".
Final program:
   1: BOL (2)
   2: PLUS (15)
   3:   ANYOF[\-.0-9 at -Z_a-z+utf8::IsWord] (0)
  15: EOL (16)
  16: END (0)


Without going into the guts too much, the important things to note here is
that the first example both does capturing (which takes memory and time),
and uses branches (which takes more time when matching your string).  The
second example builds a 'ANYOF' structure, which is MUCH faster[1].

So not only are character classes (square-brackets) more compact and easier
to read, but they go faster, too!

All the very best,

	Paul


[1] On my mailserver, with Perl 5.8 and Benchmark, using a square character
class approximately doubles the rate at which Perl finds all occurrences
that match the regexp on /var/mail/mail.log, as compared to round brackets.
 Using non-capturing round brackets results in only about a 10% speed
improvement.  Your mileage may vary.

-- 
Paul Fenwick <pjf at perltraining.com.au> | http://perltraining.com.au/
Director of Training                   | Ph:  +61 3 9354 6001
Perl Training Australia                | Fax: +61 3 9354 2681


More information about the Melbourne-pm mailing list