[Chicago-talk] Reading & writing variable length packed

imran javaid imranjj at gmail.com
Mon Mar 16 07:30:39 PDT 2009


This reminds me of a script I once wrote long ago to read a variable
length file on a nine track tape.
Here is a somewhat straight forward way to do it (did not test it and
i am skipping some of the error checking on the return from read):

open my $FILE, "<", $filename or die "Couldn't open file $filename: $!\n";
my $buf;
my $loc = 0;
while(read($FILE, $buf, 2, $loc)) {
  $loc += 2;
  my $linenum = unpack("v", $buf);
  my $ret = read($FILE, $buf, 1, $loc);
  die if $ret != 1;
  $loc++;
  my $length = unpack("C", $buf);
  if ($length == 255) {
    $ret = read($FILE, $buf, 2, $loc);
    die if $ret != 2;
    $loc += 2;
    $length = unpack("v", $buf);
  }
  $ret = read($FILE, $buf, $length, $loc);
  die if $ret != $length;
  print "LineNum: $linenum, Length: $length, Data: $buf\n";
}




On Sun, Mar 15, 2009 at 1:11 PM, Jay Strauss <me at heyjay.com> wrote:
> Hi, (sent to luni previously by accident)
>
> I'm trying to read & write a file with packed data.  The data file is
> created by a piece of software I use for my biz.  The way its laid out
> is:
>
> field# - 2 bytes
> field length - layout changes based on the length of next field
> character data - the data
>
> There are a couple of twists (to me at least, maybe its old hat to you guys):
>
> 1) the field # and length digits are flip/flopped (maybe this is
> little endian, I'm just not used to seeing numbers this way)
> 2) the length field changes format depending on the length of the
> field.  When its under 256 long it is a single byte, but when its over
> 256 it becomes:
> ff 00 00, that is its prefixed with ff then its 2 bytes indicating the
> length of the field (little endian).
>
> I had been reading the file like:
>
> while (length($file_contents) > 2) {
>  my ($field_num, $value) = unpack("SC/A", $file_contents);
>  $data{$field_num} = $value;
>  $file_contents = substr($file_contents,length($value)+3);
> }
>
> Which I now realize is incorrect, because of the change in field
> format for longer length fields
>
> I have a couple of questions:
>
> 1) Do I need to loop over the whole file like I am, or is there some
> sort of magic I'm missing in unpack where it will spit out all the
> contents at once, rather than my manually plucking individual fields
> out?
>
> 2) My format needs to change based on whether the field# is followed
> by an "FF" or not.  Is there a way to tell unpack this, or do I have
> to inspect the byte after field# to determine the format?
>
> Some example data below.
>
> AdamsRichard0902010 in the 1st field would look like:
> 01 00 13 41 64 61 6d 73 52 69 63 68 61 72 64 30 39 30 32 30 31 30
>
> 01 00 - field number = unsigned
> 13 = 19 (dec) is the length
> 41 64 61 6d 73 52 69 63 68 61 72 64 30 39 30 32 30 31 30 = AdamsRichard0902010
>
>
> 9999999 in field 800 would look like:
> 20 03 07 39 39 39 39 39 39 39
>
> 20 03 = field number  (notice that 0320 = 800, but the digits are flip flopped
> 07 = 7 (7 in length)
> 39 39 39 39 39 39 39 = 9999999
>
> Such that:
> Takes a LIST of values and converts it into a string using the rules
> given by the TEMPLATE. The resulting string is the concatenation of
> the converted values. Typically, each converted value looks like its
> machine-level representation. For example, on 32-bit machines an
> integer may be represented by a sequence of 4 bytes that will be
> converted to a sequence of 4 characters
>
> in the 901 field would look like:
>
> 85 03 ff 78 01 54 61 6b 65 73 20 61 20 4c 49 53 54 20 6f 66 20 76 61
> 6c 75 65 73 20 61 6e 64 20 63 6f 6e 76 65 72 74 73 20 69 74 20 69 6e
> 74 6f 20 61 20 73 74 72 69 6e 67 20 75 73 69 6e 67 20 74 68 65 20 72
> 75 6c 65 73 20 67 69 76 65 6e 20 62 79 20 74 68 65 20 54 45 4d 50 4c
> 41 54 45 2e 20 54 68 65 20 72 65 73 75 6c 74 69 6e 67 20 73 74 72 69
> 6e 67 20 69 73 20 74 68 65 20 63 6f 6e 63 61 74 65 6e 61 74 69 6f 6e
> 20 6f 66 20 74 68 65 20 63 6f 6e 76 65 72 74 65 64 20 76 61 6c 75 65
> 73 2e 20 54 79 70 69 63 61 6c 6c 79 2c 20 65 61 63 68 20 63 6f 6e 76
> 65 72 74 65 64 20 76 61 6c 75 65 20 6c 6f 6f 6b 73 20 6c 69 6b 65 20
> 69 74 73 20 6d 61 63 68 69 6e 65 2d 6c 65 76 65 6c 20 72 65 70 72 65
> 73 65 6e 74 61 74 69 6f 6e 2e 20 46 6f 72 20 65 78 61 6d 70 6c 65 2c
> 20 6f 6e 20 33 32 2d 62 69 74 20 6d 61 63 68 69 6e 65 73 20 61 6e 20
> 69 6e 74 65 67 65 72 20 6d 61 79 20 62 65 20 72 65 70 72 65 73 65 6e
> 74 65 64 20 62 79 20 61 20 73 65 71 75 65 6e 63 65 20 6f 66 20 34 20
> 62 79 74 65 73 20 74 68 61 74 20 77 69 6c 6c 20 62 65 20 63 6f 6e 76
> 65 72 74 65 64 20 74 6f 20 61 20 73 65 71 75 65 6e 63 65 20 6f 66 20
> 34 20 63 68 61 72 61 63 74 65 72 73
>
> 85 03 = 901 decimal (field#)
> ff = when the field length is more that 256 I get an ff
> 78 01 = 376 decimal length ( 0178 = 376)
>
> Thanks
> Jay
> _______________________________________________
> Chicago-talk mailing list
> Chicago-talk at pm.org
> http://mail.pm.org/mailman/listinfo/chicago-talk
>


More information about the Chicago-talk mailing list