[Nh-pm] How to save complete web page not just text?

James Kellndorfer jameskel at adelphia.net
Fri Jun 24 10:13:07 PDT 2005


I have a very simple task.

I'm just trying to save this complete webpage:
http://marketrac.nyse.com/ot/ordertrac_detail.html

I can save this manually, but I'm wondering how to do this automatically
(ie: perl based cron job, or some other method). With other webpages, I
simply used lynx,  wget, curl, or LWP.  I am trying to understand why this
webpage is so special. I am trying to learn how this webpage is rendered in
an effort to develop a solution.


Once the page is saved, the data lies within one of the .js files which I
can easily "rip".

I am unfamilar with AppleScripting FireFox or Camino, and I'll embark on
learning these things. (Is applescripting only for MAC users?)
As for Javascript::SpiderMonkey or XPCOM, again I'll investigate and learn
what I can before asking for more help. I don't even know what these things
are.

Thanks for some direction,
JK


----- Original Message ----- 
From: "Bill McGonigle" <bill at bfccomputing.com>
To: "James Kellndorfer" <jameskel at adelphia.net>
Cc: <nh-pm at pm.org>
Sent: Friday, June 24, 2005 9:54 AM
Subject: Re: [Nh-pm] How to save complete web page not just text?


> On Jun 23, 2005, at 21:46, James Kellndorfer wrote:
>
> > "wget -r" or any other combination of command-line options doesn't
> > work.
> > wget doesn't resolve javascript. So a download of a complete webpage
> > is now
> > impossible.
>
> Ah, so the page in question is loading content dynamically with some
> javascript/DOM programming?
>
> JavaScript::SpiderMonkey might be useful, as would perhaps the Perl
> XPCOM project.  None of these is a complete solution - it would be
> helpful to know what you're trying to do (e.g. what kinds of data
> you're accessing) to recommend a solution.  Unless you're trying to
> scrap javascript-guarded e-mail addresses, that is.
>
> A faster solution might be AppleScripting Firefox or Camino.
>
> -Bill
>
> -----
> Bill McGonigle, Owner           Work: 603.448.4440
> BFC Computing, LLC              Home: 603.448.1668
> bill at bfccomputing.com           Mobile: 603.252.2606
> http://www.bfccomputing.com/    Pager: 603.442.1833
> AIM: wpmcgonigle                Text: bill+text at bfccomputing.com
>
> For fastest support contact, please follow:
> http://bfccomputing.com/support_contact.html
>



More information about the Nh-pm mailing list