[Neworleans-pm] Fwd: Solutions and Discussion for Perl Quiz of the Week #20 (Expert Edition)

E. Strade, B.D. estrabd at yahoo.com
Mon Aug 9 17:13:49 CDT 2004




=====
http://www.brettsbsd.net/~estrabd

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

----- Original message -----
From: "Bradley K. Embree" <bke at bkecc.com>
To: perl-qotw at plover.com
Date: Mon, 09 Aug 2004 13:05:37 -0400
Subject: Solutions and Discussion for Perl Quiz of the Week #20 (Expert
Edition)


Sample solutions and discussion
Perl Expert Quiz of The Week #20 (20040729)

        This week's quiz is to write a limited replacement for the
        CPAN 'MLDBM' module.  The purpose of MLDBM is to allow storage
        of complex Perl data structures in a DBM file.

        The typical DBM implementation (such as SDBM_File, which comes
        standard with Perl) allows the programmer to manipulate a disk
        database as if it were an ordinary hash variable.  One says
        something like

                tie %hash, 'SDBM_File', $filename;

        and thereafter,

                $hash{"key"} = "value";

        stores "value" in the disk file under the key "key", and

                $x = $hash{"key"};

        retrieves the value again.  But most DBM implementations are
        limited to storing plain strings as values.  Attempting to
        store undef as a value actually stores the empty string
        instead, so that

                $hash{"key"} = undef;
                print defined($hash{"key"}) ? "oops" : "OK";

        prints "oops".  Similarly, attempting to store a compound
        object such as

                [1, 2, 3]

        or

                {name => "bill", numbers => [1, 2, 3] }

        actually stores a useless string like "ARRAY(0x436c1d)" or
        "HASH(0x436c1d)".  Similarly,

                $hash{"this"}{"that"} = "ouch";

        causes a "strict refs" failure, if you have "strict refs"
        checking on, and a bizarre error if not.  (Sub-quiz: Why?)

        The purpose of MLDBM is to work around this.  (See
        http://search.cpan.org/~chamas/MLDBM-2.01/lib/MLDBM.pm for
        complete details.)  Your task this week is to write a version
        of MLDBM.  It should be a tied hash class named 'ML_DBM', such
        that after

                tie %mldbm, 'ML_DBM', \%hash;

        the user can access %mldbm as if it were an ordinary hash, but
        all data apparently stored in %mldbm is actually stored in
        %hash instead.

        For example, this must work:

                $mldbm{'array'} = [5,6,7];
                print $mldbm{'array'}[2];    # This must print 7


        as must this:

                $mldbm{'hash'} = { I => { like => "pie" } }
                print $mldbm{'hash'}{'I'}{'like'};    # This must print
                "pie"

        and this:

                $mldbm{'a'}{'a2'} = 'AA';
                $z = $mldbm{'a'};
                $z->{'b2'} = "BB";
                print $mldbm{'a'}{'b2'};        # This must print "BB"


        There is a trivial solution to this, which is just to store
        the values into %hash and fetch them back as requested.
        However, this quiz has a restriction which rules out that
        solution: The values that ML_DBM stores in %hash must always
        be ordinary strings.

        The idea is that the user can tie %hash to their favorite DBM
        implementation, and then use that as the backing hash for
        ML_DBM:

                tie %hash => 'NDBM_File', $filename or die ...;
                tie %db => 'ML_DBM', \%hash;

        and ML_DBM will (unknowingly, but faithfully) store the data
        into the underlying disk file via %hash.  ML_DBM must restrict
        the values it stores into %hash to ordinary strings, or else
        the underlying DBM implementation (NDBM_File in this case)
        will not faithfully preserve them.

----------------------------------------------------------------

First off, apologies for the late summary.

        [ Most of the lateness in the report is my fault, not
          Bradley's; he sent me his postmortem several days ago, and I
          held onto it for a while because I was playing around with
          the problem some more.  - MJD ]

Executive Summary:

See http://search.cpan.org/~pinyan/MLDBM-Easy-0.01/lib/MLDBM/Easy.pm

My Aborted Attempt:

Since I had never played with tie'ing I thought I would make a go of
this quiz (although MJD warned me it would be a difficult one). [ It
turned out to be more difficult than I realized.  - MJD ] My first
stop was:

    perldoc perltie

which I perused for several minutes gathering what information I
thought necessary for my attempt. Armed with my newfound knowledge I
chose to focus on something mentioned in the quiz:

    'There is a trivial solution to this, which is just to store the
    values into %hash and fetch them back as requested.'

I decided that this would be my first goal, simply to implement the
trivial
solution. I quickly coded up a (far from complete) MyHash.pm that did
just
that:

    package MyHash;

    use strict;
    use warnings;

    sub TIEHASH {
        my ( $class, $hash ) = @_;

        return bless $hash, $class;
    }

    sub FETCH {
        my ( $self, $key ) = @_;

        return $self->{$key};
    }

    sub STORE {
        my ( $self, $key, $value ) = @_;

        $self->{$key} = $value;
    }

    package main;

    my ( %hash, %tied );

    tie %tied, 'MyHash', \%hash;

    $tied{'foo'}->{'bar'}->{'baz'} = 'test';
    print $tied{'foo'}->{'bar'}->{'baz'}, "\n";

Ok, that worked. My next step was to make use of Data::Dumper to see
what was actually being passed to STORE:

    sub STORE {
        my ( $self, $key, $value ) = @_;

        print Dumper \@_;

        $self->{$key} = $value;
    }

I assumed it would be a reference to a data structure that would have
to be serialized before being stored. Then it would have to be
unserialized (deserialized?) in the FETCH method. I had even gone so
far as deciding to try and use YAML for the serialization. Imagine my
confusion upon seeing this output from 'perl MyHash.pm':

    $VAR1 = [
              bless( {}, 'MyHash' ),
              'foo',
              {}
            ];
    test

'So the value passed to STORE is not yet fleshed out?' I thought to
myself.  'But how can you serialize data that is not there?'. The next
thought was: 'Now what?'

So I decided to actually visit

        http://search.cpan.org/~chamas/MLDBM-2.01/lib/MLDBM.pm

as recommended in the quiz. There I quickly learned of the limitations
of MLDBM with regards to working with references.

I decided that I could implement a solution that worked but was under
the same restrictions as MLDBM. I pseudo-coded until I was satisfied
that I could at least achieve that modest end. I then turned my
thoughts to ways of overcoming those restrictions. In all honesty
there was not much there. I briefly tried to think of a way to turn:

    $tied{'foo'}->{'bar'}->{'baz'} = 'test';

into:

    # where | is some arbitrary separator character
    %hash = ( 'foo'         => 'foo|bar',
              'foo|bar'     => 'foo|bar|baz',
              'foo|bar|baz' => 'test' );

The only thing I could think of was to make use of tie'd hashes that
would know their 'namespace' and define a STORE method that used the
'namespace' to store the value given. Then my brain exploded and I
gave up for the night.

That is as far as I got and I should now turn to submissions that
actually solved the given problem.


Submissions:

I counted three different submissions, two from Randy W. Sims and one
from Michael C. Toren. One of the submissions from Randy W. Sims has
the same restrictions as MLDBM (with the exception of correctly
storing an undef value) and so I will only consider his second
submission.

The similarity between the two (fully conformant) submissions is that
both make use of a 'frontend' hash that allows the serialization to be
delayed until the data structures being stored are fully fleshed
out. After this delay the serialized data is then stored in the
'backend' hash that was passed to tie.  Where the two solutions differ
is in how the data is serialized as well as how long the serialization
is delayed.

Michael C. Toren implemented his own serialization technique while
Randy W.  Sims made use of Data::Dumper. I have not spent enough time
looking at Michael C.  Toren's serialization code to feel confident in
making any comparisons between it and Data::Dumper.

Randy W. Sims' submission delays the serialization of data structures
until the next operation on the tied hash while Michael C. Toren's
submission delays the serialization until the hash is untied or
destroyed. From my understanding of their code Randy W. Sims' method
has an advantage since the frontend hash will never contain the entire
contents of the backend hash unlike Michael C.  Toren's submission.

Regardless of their differences, both submissions from Randy W. Sims
and Michael C. Toren solved the given quiz
successfully. Congratulations to them.

----------------------------------------------------------------

Sub-quiz:

        Similarly, attempting to store a compound object such as

                [1, 2, 3]

        or

                {name => "bill", numbers => [1, 2, 3] }

        actually stores a useless string like "ARRAY(0x436c1d)" or
        "HASH(0x436c1d)".  Similarly,

                $hash{"this"}{"that"} = "ouch";

        causes a "strict refs" failure, if you have "strict refs"
        checking on, and a bizarre error if not.  (Sub-quiz: Why?)

----------------------------------------------------------------

I did not see anyone comment on the given sub-quiz but here are my
observations:

The code:
    use strict;
    use warnings;

    use Fcntl;   # For O_RDWR, O_CREAT, etc.
    use SDBM_File;

    my %h;
    tie( %h, 'SDBM_File', 'filename', O_RDWR|O_CREAT, 0666 )
        or die "Couldn't tie SDBM file 'filename': $!; aborting";

    $h{"this"}{"that"} = "ouch";

gives an error:

    Can't use string ("HASH(0x1abedc4)") as a HASH ref while "strict
    refs"
        in use at SDBM_test.pl line 11.

which seems to be self-explanatory. Removing the 'use strict' did not
produce a bizarre error as was indicated it should so I cannot comment
on that part of the sub-quiz (perhaps my version of Perl affects the
expected error?).


[ The "bizarre error" is not a visible failure; it's an invisible
  erroneous behavior.    

  Perl has a feature called a "symbolic reference", in which an
  ordinary string, say "foo", can be used as if it were a reference.
  The result of using "foo" in a place where Perl expects an array
  reference is that Perl silently treats the string as though it were
  a reference to the array @foo.  Similarly, if "foo" is used as if it
  were a hash reference, it behaves as though it were a reference to
  %foo.  

  In advanced applications, this feature can be very useful, but it is
  also the cause of many common error that are extremely difficult to
  track down.  Let's consider this simple function, which is something
  like Perl's built-in "push" operator:

        sub my_push {
          my ($aref, @items) = @_;
          push @$aref, @items;
        }

  Normally, 'my_push' should be invoked like this:

        my_push(\@my_array, 1, 2, 3);
        print "@my_array";   # prints 1 2 3

  which pushes 1, 2, 3 onto the end of @my_array.

  This works also:

        my $aref = \@my_array;
        my_push($aref, 1, 2, 3);
        print "@my_array";   # prints 1 2 3

  But this does not:

        my $aref = \@my_array;
        my_push("$aref", 1, 2, 3);
        print "@my_array";   # prints NOTHING!

  The quotation marks around $aref turn $aref from an array reference
  into a string, something like "ARRAY(0x436c1d)".  'my_push' then
  treats this string as though it were a reference, but it isn't.
  Perl pushes the 1, 2, 3 not into @my_array but into the array with
  the extremely bizarre name @"ARRAY(0x436c1d)".  The programmer
  thought they were storing the 1, 2, 3 into a known place, but
  actually the 1, 2, 3 are now lost, stored into a mystery bizarro
  array.  

  This is exactly the kind of error that "strict refs" was created to
  diagnose.

  In Pr. Embree's example code, this error is exercised:

    my %h;
    tie( %h, 'SDBM_File', 'filename', O_RDWR|O_CREAT, 0666 )
        or die "Couldn't tie SDBM file 'filename': $!; aborting";

    $h{"this"}{"that"} = "ouch";

  Perl creates a new, fresh hash, {}, and tries to store a reference
  to it into the SDBM file under the key "this".  But doing this
  convers the reference to a string, and what is actually stored in
  the file is the string "HASH(0x1abedc4)".

  Perl then fetches this value back and uses it as a reference to a
  hash, in which "ouch" is stored under key "that".  In an ordinary
  hash, this works properly.  But with the SDBM file, what is used as
  a hash reference is the string "HASH(0x1abedc4)".  If "strict refs"
  is on, using this string as if it were a hash reference is a fatal
  error.  With "strict refs" disabled, Perl stores the string "ouch"
  into ${"HASH(0x1abedc4)"}{"that"}.  There is a good chance that this
  data will be lost for ever.  Even if it isn't, it is certainly not
  what the programmer intended.  That is the "bizarre error".

  My thanks to Pr. Embree and also to everyone who contributed,
  whether visibly or not.  

    -- MJD.
]



More information about the NewOrleans-pm mailing list