[Chicago-talk] Reading & writing variable length packed

Jay Strauss me at heyjay.com
Sun Mar 15 11:11:03 PDT 2009


Hi, (sent to luni previously by accident)

I'm trying to read & write a file with packed data.  The data file is
created by a piece of software I use for my biz.  The way its laid out
is:

field# - 2 bytes
field length - layout changes based on the length of next field
character data - the data

There are a couple of twists (to me at least, maybe its old hat to you guys):

1) the field # and length digits are flip/flopped (maybe this is
little endian, I'm just not used to seeing numbers this way)
2) the length field changes format depending on the length of the
field.  When its under 256 long it is a single byte, but when its over
256 it becomes:
ff 00 00, that is its prefixed with ff then its 2 bytes indicating the
length of the field (little endian).

I had been reading the file like:

while (length($file_contents) > 2) {
  my ($field_num, $value) = unpack("SC/A", $file_contents);
  $data{$field_num} = $value;
  $file_contents = substr($file_contents,length($value)+3);
}

Which I now realize is incorrect, because of the change in field
format for longer length fields

I have a couple of questions:

1) Do I need to loop over the whole file like I am, or is there some
sort of magic I'm missing in unpack where it will spit out all the
contents at once, rather than my manually plucking individual fields
out?

2) My format needs to change based on whether the field# is followed
by an "FF" or not.  Is there a way to tell unpack this, or do I have
to inspect the byte after field# to determine the format?

Some example data below.

AdamsRichard0902010 in the 1st field would look like:
01 00 13 41 64 61 6d 73 52 69 63 68 61 72 64 30 39 30 32 30 31 30

01 00 - field number = unsigned
13 = 19 (dec) is the length
41 64 61 6d 73 52 69 63 68 61 72 64 30 39 30 32 30 31 30 = AdamsRichard0902010


9999999 in field 800 would look like:
20 03 07 39 39 39 39 39 39 39

20 03 = field number  (notice that 0320 = 800, but the digits are flip flopped
07 = 7 (7 in length)
39 39 39 39 39 39 39 = 9999999

Such that:
Takes a LIST of values and converts it into a string using the rules
given by the TEMPLATE. The resulting string is the concatenation of
the converted values. Typically, each converted value looks like its
machine-level representation. For example, on 32-bit machines an
integer may be represented by a sequence of 4 bytes that will be
converted to a sequence of 4 characters

in the 901 field would look like:

85 03 ff 78 01 54 61 6b 65 73 20 61 20 4c 49 53 54 20 6f 66 20 76 61
6c 75 65 73 20 61 6e 64 20 63 6f 6e 76 65 72 74 73 20 69 74 20 69 6e
74 6f 20 61 20 73 74 72 69 6e 67 20 75 73 69 6e 67 20 74 68 65 20 72
75 6c 65 73 20 67 69 76 65 6e 20 62 79 20 74 68 65 20 54 45 4d 50 4c
41 54 45 2e 20 54 68 65 20 72 65 73 75 6c 74 69 6e 67 20 73 74 72 69
6e 67 20 69 73 20 74 68 65 20 63 6f 6e 63 61 74 65 6e 61 74 69 6f 6e
20 6f 66 20 74 68 65 20 63 6f 6e 76 65 72 74 65 64 20 76 61 6c 75 65
73 2e 20 54 79 70 69 63 61 6c 6c 79 2c 20 65 61 63 68 20 63 6f 6e 76
65 72 74 65 64 20 76 61 6c 75 65 20 6c 6f 6f 6b 73 20 6c 69 6b 65 20
69 74 73 20 6d 61 63 68 69 6e 65 2d 6c 65 76 65 6c 20 72 65 70 72 65
73 65 6e 74 61 74 69 6f 6e 2e 20 46 6f 72 20 65 78 61 6d 70 6c 65 2c
20 6f 6e 20 33 32 2d 62 69 74 20 6d 61 63 68 69 6e 65 73 20 61 6e 20
69 6e 74 65 67 65 72 20 6d 61 79 20 62 65 20 72 65 70 72 65 73 65 6e
74 65 64 20 62 79 20 61 20 73 65 71 75 65 6e 63 65 20 6f 66 20 34 20
62 79 74 65 73 20 74 68 61 74 20 77 69 6c 6c 20 62 65 20 63 6f 6e 76
65 72 74 65 64 20 74 6f 20 61 20 73 65 71 75 65 6e 63 65 20 6f 66 20
34 20 63 68 61 72 61 63 74 65 72 73

85 03 = 901 decimal (field#)
ff = when the field length is more that 256 I get an ff
78 01 = 376 decimal length ( 0178 = 376)

Thanks
Jay


More information about the Chicago-talk mailing list