APM: Parsing HTML

John Warner jwarner at texas.net
Mon Nov 1 21:00:18 PDT 2010


All,

 

It's been a while since I did any Perl programming and I could use a
pointer.  I work as a lab admin at Dell where one of my job duties is to
order equipment for the various teams I support.  The process works like
this:  we have the teams configure a system in a shopping cart at Dell.com
then submit the shopping cart to the lab admins.  We, the lab admins, take
the information from the cart (quantity, SKUs, and descriptions), do a whole
bunch of manual manipulation and then paste the processed info into a tool
we use for internal ordering.

 

The problem:

I have been using Win32::Watir to interact with dell.com to navigate to the
SKUs in a shopping cart.  I have thus far been unable to get data in a
format I can use from the shopping cart.  I get some of the HTML but not the
stuff in the frame with the SKUs when I implement HTML::Parser start.  I can
get the information I am after with the Parser text function but without any
kind of separation that I could write a useful regex to separate.

 

my $url = "http://ecomm.dell.com/dellstore/basket_retrieve.aspx?c=us
<http://ecomm.dell.com/dellstore/basket_retrieve.aspx?c=us&cs=04&l=en&s=bsd&
itemtype=CFG&cart_id=1013663916825&toEmail=john_warner at dell.com>
&cs=04&l=en&s=bsd&itemtype=CFG&cart_id=1013663916825&toEmail=john_warner at del
l.com";

 

my $ie = Win32::Watir->new( visible => 1, maximize => 0);

print "Pointing to IE to found URL\n";

$ie->goto($url);

 

print "Clicking \"Detail View\" link in basket\n";

$ie->getLink('linktext:', qr/Detail View/)->Click;

 

print "Showing details of cart\n";

$ie->getLink('linktext:', qr/Show Details/)->Click;

 

my $ordertext = $ie->text;

#my $ordertext = $ie->html;

print $ordertext;

 

#do useful processing here.

 

The crux of my problem (I think) is that I don't know what type of data
(array, hash, etc) that $ie->html or $ie->text returns.  Perhaps if I knew
that I could make headway on processing.

 

Thanks for your time!

 

John Warner

jwarner at texas.net

H:  512.251.1270

C:  512.426.3813

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/austin/attachments/20101101/aaafa763/attachment.html>


More information about the Austin mailing list