[Chicago-talk] Parsing a Hex file

Steven Lembark lembark at wrkhors.com
Fri Jan 19 10:40:33 PST 2007


> Thanks Andrew, I read the perldoc perlpacktut, and followed it for a
> while, but got sort of lost at the end (not having ever worked with
> this type of data).
>
> In my case I don't know (ahead of time) how many fields there are.  I
> need to read until the end of the file.  Do I need to loop, keeping
> track of where I am in the file, or is there some way in unpack and
> "/" it will spit it out to an array without predefining the number of
> fields?

Ah, ya gotta love self-defining records :-) The trick
below can be extenced to deal with smaller chunks by
having mutliple sets of formats for the various sizes.

Easiest way is to snag the data in chunks (say a page),
then use unpack to fondle the first few bytes and decide
what to do with the packet. You can figure out how much
data to read, process it, then

  stubstr $buffer, 0, $bytes = '';

when the buffer falls below, say, 4K you read one more page
until there is no buffer to slurp.

Obviosuly, if the whole file fits into core slurp it and
use the same process to walk down the thing.

The point is to keep the current packet at offset zero
so that you can deal with it as a single piece.

If there are a reasonable number of fixed formats  you can
use something like:

  my $headerformat = '....';

  my %packformtz =
  (
    $byte1  => [ $size1, $format1 ],
    ...
  );

  # if the determining byte is in a different header
  # value then change $header[0] to match it.

  my $buffer = '';

  for(;;)
  {
    # kwikhak for appending another chunk of data
    # to the buffer. this'll append nada to the
    # buffer when you reach EOF.

    read $fh, $buffer, 4096
    of length $buffer < 4096;

    last unless $buffer;

    my @header = unpack $headerformat, $buffer;

    my( $size, $format ) = $packformatz{ $header[0] }
    or die "Ack: formatless header value!", @header;

    my @rest    = unpack $format, $buffer;

    substr $buffer, 0, $size, '';
  }

enjoi

-- 
Steven Lembark                                         85-09 90th Street
Workhorse Computing                                  Woodhaven, NY 11421
lembark at wrkhors.com                                      +1 888 359 3508


More information about the Chicago-talk mailing list