[sf-perl] Thinking about psuedo-randomness

Joe Brenner doom at kzsu.stanford.edu
Tue Mar 31 13:38:38 PDT 2009


yary <not.com at gmail.com> wrote:

> Thanks for putting the slides on-line, lots of interesting thoughts in there.
>
> Since Randall chimed in with his bias of bias observation, I'll chime
> in with what I thought when reading the scramble_case code-
>
> @chars = split /(?=.)/, $string;
>
> . doesn't match newlines (unless you tell it to), and it doesn't
> matter, since nothing is lost anyways.
>
> "I thought the canonical way of splitting a string into its component
> chars was 'split //'" was my next thought.

Yes, precisely.  I have no idea why I used that pattern in there ten
years ago.  I verified that it's functionally identical to "split //"
(on single-line strings, anyway) but I left it that way anyway -- if I
wasn't doing it as a lightning talk I would've invited comments on that
sort of thing.

> Then I thought, "Only alphas have case, non-alphas don't have to be
> separated out"- so
>
> @chars = split /(?<=[[:alpha:]])/, $string;
>
> could conceivably save a few cycles. maybe. It also preserves the
> first-letter-case bias when the string begins with non-alpha chars.

Interesting... it's a slightly odd split, that works like this:

          't',
          'h',
          'e',
          ' l',
          'a',
          'w',
          's',
          ' f',
          'o',
          'r',
          ' y',
          'o',

My first thought that there would be a problem with leaving the
whitespace at the start of the string, but of course "uc" and "lc"
act on the entire string, so there's no problem there.

Of course, the efficiency of this routine doesn't really matter that
much (it only takes about three seconds for me to process an entire
Rafael Sabatini novel), but what the hell, let's see if the more unusual
pattern match gains us something with fewer cycles of uc/lc
operations...

Not bad, looks like almost a 15% speedup:

  Benchmark: timing 10 iterations of alpha_split, silly_split, standard_split...
  alpha_split: 13 wallclock secs (12.84 usr +  0.01 sys = 12.85 CPU) @  0.78/s (n=10)
  silly_split: 16 wallclock secs (15.44 usr +  0.01 sys = 15.45 CPU) @  0.65/s (n=10)
  standard_split: 15 wallclock secs (15.04 usr +  0.00 sys = 15.04 CPU) @  0.66/s (n=10)




sub silly {
  open my $fh, '<', $file or die "$!";
  while( my $line = <$fh> ) {
    my $ret = scramble_case_silly( $line ); #    @chars = split /(?=.)/, $string;
  }
}

sub standard { # as usual
  open my $fh, '<', $file or die "$!";
  while( my $line = <$fh> ) {
    my $ret = scramble_case( $line );       #    @chars = split //, $string;
  }
}

sub alpha {
  open my $fh, '<', $file or die "$!";
  while( my $line = <$fh> ) {
    my $ret = scramble_case_alpha( $line );  #  @chars = split /(?<=[[:alpha:]])/, $string;
  }
}



More information about the SanFrancisco-pm mailing list