[Nh-pm] How to save complete web page not just text?

James Kellndorfer jameskel at adelphia.net
Fri Jun 24 11:02:01 PDT 2005


Thank you for trying this.

Yes I know this works. I'm not interested the textual portion of the
webpage.

If you view the web page in your browser you'll see 80 5 minute periods.
Each period contains data. These data are rendered using a graphic and web
forms. The input boxes display the data as you move your mouse. This is all
controlled using javascripts.

The webpage source that you retrieved using LWP will not contain the .js
files that contain the data for each 5 minute period. The HTML source LWP
retrieves is the textual portion of the webpage only. The data lies with
within a secondary file named NYA.js, which is not retrieved as LWP doesn't
implement a java interpreter.

So, I'm not interested in retrieving the text. That's why I'm interested is
retrieving a complete-webpage. I would be happy if someone could help me
create a "single file format"  .mht or .maf  (mozilla archive format)
formatted file using perl.

I'm using  linux FC4.

JK



----- Original Message ----- 
From: "Andrew Brosnan" <andrew at broscom.com>
To: "James Kellndorfer" <jameskel at adelphia.net>
Sent: Friday, June 24, 2005 1:40 PM
Subject: Re: [Nh-pm] How to save complete web page not just text?


On 6/24/05 at 1:13 PM, jameskel at adelphia.net (James Kellndorfer) wrote:

> I have a very simple task.
>
> I'm just trying to save this complete webpage:
> http://marketrac.nyse.com/ot/ordertrac_detail.html
>

This fetches exacly the same as the source viewed in my browesr:
#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
my $pg = get("http://marketrac.nyse.com/ot/ordertrac_detail.html");
print "Here is the page: \n\n $pg";


The following one liner also works fine:
perl -MLWP::Simple -e 'getprint
"http://marketrac.nyse.com/ot/ordertrac_detail.html"'

Regards,
Andrew
-- 
Andrew Brosnan - Broscom LLC - 1 207 925-1156
andrew at broscom.com - http://www.broscom.com
Websites, Hosting, Programming, Consulting



More information about the Nh-pm mailing list