[Brisbane-pm] regex syntax - data extraction

Geoffrey Wood geoff at gmgsolutions.com.au
Tue Jun 12 07:49:56 PDT 2007


Thanks Barry for your input. I will start with your code and try it out.

I'm new to pm and Perl and wasn't sure if I had posted my request correctly
with pm.

A friend told me to try out Perl for a proof of concept idea I wanted to
test out.

I have now proved my concept but am at a stage where I would need some
professional Perl help to make a commercial product... Anyone interested in
taking a look at my project?

cheers
Geoff Wood
 
-----Original Message-----
From: Barry Downes [mailto:barry at bquotes.com] 
Sent: Tuesday, June 12, 2007 8:02 PM
To: Geoffrey Wood
Cc: brisbane-pm at pm.org
Subject: Re: [Brisbane-pm] Brisbane-pm Digest, Vol 29, Issue 1

Hey Geoffrey,

I didn't see a response to your question yet.  I hope this isn't so late 
as to be useless..

I find it easier in Perl to extract data from binary strings than hex 
strings like yours, so I'd convert it to binary first using "pack".  
Then I'd use "substr" and "unpack" to extract and interpret data.

You'll probably find the Perl documentation for "pack" particularly 
helpful.  It packs Perl data items into compact binary representations 
as used by lower-level languages and systems, and unpack does the 
opposite.  The binary data is stored in Perl scalars.  Perl's pretty 
good at handling binary because Perl scalars can store any 8-bit 
character, including the null character, without issues.

Probably knowing the right tools is enough, but here's a few lines of 
code showing how you might use them:

#!/usr/bin/perl

use strict;
use warnings;


# the original hex data
my $hex = 
'0504050000002a000022c402000000000009556e69742038393030000000007fffffff00012
0000f30303031303831373330333335303000000022c502000000000009556e6974203839303
1000000007fffffff000120000f30303031303831373330333335373000000022c6020000000
00009556e69742038393032000000007fffffff000120000f303030313038313733303333383
330';

# convert to binary (because it's easier to work with)
my $bin = pack('H*', $hex);

my $val = get3Bytes($bin, 8);
my $str1 = getString($bin, 17);
my $str2 = getString($bin, 39);

print "$val\n";
print "$str1\n";
print "$str2\n";


sub get3Bytes {
    # extract a 3-byte value
    my ($bin, $ofs) = @_;
   
    # grab 3 bytes and pad with a leading 00 byte
    my $str = "\x00" . substr($bin, $ofs, 3);
   
    # unpack as a "network-order" 4-byte integer
    my $val = unpack('N', $str);
   
    return $val;
}

sub getString {
    # extract a string with 1-byte length
    my ($bin, $ofs) = @_;
   
    # interpret the length as a 1-byte unsigned integer
    my $size = unpack('C', substr($bin, $ofs, 1));
   
    # extract a string of the indicated length
    my $string = substr($bin, $ofs+1, $size);
   
    return $string;
}




Geoffrey Wood wrote:
> Sorry guys, wasn't thinking that text formatting would be suppressed (you
> can't see red, green, blue)
>
> Here is another way to look at it:
>
> Header          Hex No            L Alpha/Num string
>
0504050000002a000022c402000000000009556e69742038393030000000007fffffff000120
>   L Numeric string                    Hex No            L Alpha/Num string
>
000f30303031303831373330333335303000000022c502000000000009556e69742038393031
>                         L Numeric string                    Hex No
>
000000007fffffff000120000f30303031303831373330333335373000000022c60200000000
>   L Alpha/Num string                          L Numeric string
>
0009556e69742038393032000000007fffffff000120000f3030303130383137333033333833
> 30
>
> Header- 3 Bytes
> Hex No- 3 Bytes
> L	- String Length
>
> Hope this makes sense.
>
> ghw
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 6 Jun 2007 23:24:18 +1000
> From: "Geoffrey Wood" <geoff at gmgsolutions.com.au>
> Subject: [Brisbane-pm] regex syntax - data extraction
> To: <brisbane-pm at pm.org>
> Message-ID: <001b01c7a83d$fca1c4c0$2800a8c0 at DellD620>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi all
>
> Trying to work out how to extract both hexadecimal values(as numeric) and
> string text (alphanumeric+spaces) from the string variables such as below.
>
> String snippet:
>
>
0504050000002a000022c402000000000009556e69742038393030000000007fffffff000120
>
000f30303031303831373330333335303000000022c502000000000009556e69742038393031
>
000000007fffffff000120000f30303031303831373330333335373000000022c60200000000
>
0009556e69742038393032000000007fffffff000120000f3030303130383137333033333833
> 30
>
> Wanted data:
> Green   - 3 byte Hex value
> Blue      - string length
> Red      - string
>
> 8900 'Unit 8900' '000108173033500'
> 8901 'Unit 8901' '000108173033570'
> 8902 'Unit 8902' '000108173033830'
>
> Any help appreciated.
>
> Kind regards,
> Geoffrey Wood
> Technical Director
> GMG Solutions Pty Ltd
> m: +61 4 1514 8448
> f:   +61 7 5571 2877
> e: geoff at gmgsolutions.com.au
> w: www.gmgsolutions.com.au
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
>
http://mail.pm.org/pipermail/brisbane-pm/attachments/20070606/870ab300/attac
> hment.htm 
>
> ------------------------------
>
> _______________________________________________
> Brisbane-pm mailing list
> Brisbane-pm at pm.org
> http://mail.pm.org/mailman/listinfo/brisbane-pm
>
> End of Brisbane-pm Digest, Vol 29, Issue 1
> ******************************************
> _______________________________________________
> Brisbane-pm mailing list
> Brisbane-pm at pm.org
> http://mail.pm.org/mailman/listinfo/brisbane-pm
>
>   


More information about the Brisbane-pm mailing list