From cfarinella at appropriatesolutions.com Wed Jun 8 07:36:31 2005 From: cfarinella at appropriatesolutions.com (Charles Farinella) Date: 08 Jun 2005 10:36:31 -0400 Subject: [Nh-pm] build new from source on an RPM system Message-ID: <1118241390.15436.14.camel@lpc01> I have had to do this a number of times and wonder what is the easiest, most 'correct' way. I have a system with Perl installed during OS installation using RPM's. I now want to compile from source the current version and replace all the other versions that seem to be lying around. What's the best way to do this? --charlie -- Charles Farinella Appropriate Solutions, Inc. (www.AppropriateSolutions.com) cfarinella at AppropriateSolutions.com 603.924.6079 From glim at mycybernet.net Thu Jun 16 21:54:00 2005 From: glim at mycybernet.net (Gerard Lim) Date: Fri, 17 Jun 2005 00:54 -0400 Subject: [Nh-pm] Last-minute reminder -- YAPC::NA 2005 Message-ID: Here's a last reminder about Yet Another Perl Conference, North America (YAPC::NA 2005) http://yapc.org/America In case anyone out there has been sitting on the fence or has been meaning to register but has put it on the backburner until now, here is a final information package. Dates: Mon - Wed June 27 - 29, 2005 (11 days from now!) Location: 89 Chestnut Street, University of Toronto, Toronto, Ontario, Canada Accommodations ============== Due to recent renegotiations with the conference facility and hotel, 89 Chestnut, there are still a few rooms left. For details on accommodations go to: http://www.yapc.org/America/accommodations-2005.shtml For quick and easy booking: 89 Chestnut Phone: +1-416-977-0707 Conference booking code: perl0626 The base rate is approx. CAD$80/night, which is *great* for downtown Toronto. Add in taxes and in-room high speed internet and it's up to about CAD$95/night. Book yourself to check-in on Sunday the 26th and check-out on the morning of Wednesday the 29th. Conference Registration ======================= Registration is easy and cheap - only USD$85 - see http://yapc.org/America/register-2005.shtml for details or register directly online at http://donate.perlfoundation.org/index.pl?node=registrant%20info&conference_id=423 The schedule is awesome - http://yapc.org/America/schedule-2005/day1.html >From here, click on the "Day 2" and "Day 3" spots near the top to go from page to page. Click on a talk name to get details regarding the talk. Speakers include Larry Wall, Allison Randal, Autrijus Tang, Brian Ingerson, Andy Lester, chromatic, brian d foy, Chip Salzenberg & Dan Sugalski... and many more! [ This message was sent by Gerard Lim on behalf of the YAPC::NA 2005 Conference organizing committee of the Toronto Perl Mongers. Thanks for your patience and support. ] From jameskel at adelphia.net Thu Jun 23 07:40:36 2005 From: jameskel at adelphia.net (James Kellndorfer) Date: Thu, 23 Jun 2005 10:40:36 -0400 Subject: [Nh-pm] How to save complete web page not just text? Message-ID: <007401c57801$850c3f80$0300a8c0@jameskel> I'm trying to "web fetch" an entire web page to a Linux machine using perl or some other scripting code . I can successfully retrieve a webpage using wget and LWP::simple. However these only retrieve the text portion of the webpage (HTML and innertext). Even if the graphics option is enabled using wget, it still won't retrieve an entire webpage. Javascripts and other tags are not downloaded. Windows has the .mht "single file" format, is their a PERL equivalent, or a way to use firefox/linux to create to do this? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.pm.org/pipermail/nh-pm/attachments/20050623/731d3e15/attachment.html From bill at bfccomputing.com Thu Jun 23 09:12:09 2005 From: bill at bfccomputing.com (Bill McGonigle) Date: Thu, 23 Jun 2005 12:12:09 -0400 Subject: [Nh-pm] How to save complete web page not just text? In-Reply-To: <007401c57801$850c3f80$0300a8c0@jameskel> References: <007401c57801$850c3f80$0300a8c0@jameskel> Message-ID: <885eca7cfc3cfb6677b85f381eef6c56@bfccomputing.com> On Jun 23, 2005, at 10:40, James Kellndorfer wrote: > or a way to use firefox/linux to create to do this? File... Save Page As... ? wget -r can help too. There must be a LWP recipe out there? -Bill ----- Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 bill at bfccomputing.com Mobile: 603.252.2606 http://www.bfccomputing.com/ Pager: 603.442.1833 AIM: wpmcgonigle Text: bill+text at bfccomputing.com For fastest support contact, please follow: http://bfccomputing.com/support_contact.html From jameskel at adelphia.net Thu Jun 23 18:46:22 2005 From: jameskel at adelphia.net (James Kellndorfer) Date: Thu, 23 Jun 2005 21:46:22 -0400 Subject: [Nh-pm] How to save complete web page not just text? References: <007401c57801$850c3f80$0300a8c0@jameskel> <885eca7cfc3cfb6677b85f381eef6c56@bfccomputing.com> Message-ID: <005701c5785e$86bfc1a0$0300a8c0@jameskel> "wget -r" or any other combination of command-line options doesn't work. wget doesn't resolve javascript. So a download of a complete webpage is now impossible. As for firefox ... SaveAs, this works fine manually. Now the trick is how do I code this so that I can accomplish this task automatically? As for LWP, I was hoping someone in the group would know how to activate/initiate the java interpreter so that a complete webpage can be rendered using perl. JK ----- Original Message ----- From: "Bill McGonigle" To: "James Kellndorfer" Cc: Sent: Thursday, June 23, 2005 12:12 PM Subject: Re: [Nh-pm] How to save complete web page not just text? > On Jun 23, 2005, at 10:40, James Kellndorfer wrote: > > > or a way to use firefox/linux to create to do this? > > File... Save Page As... ? > > wget -r can help too. > > There must be a LWP recipe out there? > > -Bill > > ----- > Bill McGonigle, Owner Work: 603.448.4440 > BFC Computing, LLC Home: 603.448.1668 > bill at bfccomputing.com Mobile: 603.252.2606 > http://www.bfccomputing.com/ Pager: 603.442.1833 > AIM: wpmcgonigle Text: bill+text at bfccomputing.com > > For fastest support contact, please follow: > http://bfccomputing.com/support_contact.html > > From bill at bfccomputing.com Fri Jun 24 06:54:57 2005 From: bill at bfccomputing.com (Bill McGonigle) Date: Fri, 24 Jun 2005 09:54:57 -0400 Subject: [Nh-pm] How to save complete web page not just text? In-Reply-To: <005701c5785e$86bfc1a0$0300a8c0@jameskel> References: <007401c57801$850c3f80$0300a8c0@jameskel> <885eca7cfc3cfb6677b85f381eef6c56@bfccomputing.com> <005701c5785e$86bfc1a0$0300a8c0@jameskel> Message-ID: <205a02464ac0ca8bf5f3270441aa78c7@bfccomputing.com> On Jun 23, 2005, at 21:46, James Kellndorfer wrote: > "wget -r" or any other combination of command-line options doesn't > work. > wget doesn't resolve javascript. So a download of a complete webpage > is now > impossible. Ah, so the page in question is loading content dynamically with some javascript/DOM programming? JavaScript::SpiderMonkey might be useful, as would perhaps the Perl XPCOM project. None of these is a complete solution - it would be helpful to know what you're trying to do (e.g. what kinds of data you're accessing) to recommend a solution. Unless you're trying to scrap javascript-guarded e-mail addresses, that is. A faster solution might be AppleScripting Firefox or Camino. -Bill ----- Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 bill at bfccomputing.com Mobile: 603.252.2606 http://www.bfccomputing.com/ Pager: 603.442.1833 AIM: wpmcgonigle Text: bill+text at bfccomputing.com For fastest support contact, please follow: http://bfccomputing.com/support_contact.html From clark_k at pannaway.com Fri Jun 24 07:52:43 2005 From: clark_k at pannaway.com (Kevin D. Clark) Date: Fri, 24 Jun 2005 10:52:43 -0400 Subject: [Nh-pm] How to save complete web page not just text? In-Reply-To: <005701c5785e$86bfc1a0$0300a8c0@jameskel> (James Kellndorfer's message of "Thu, 23 Jun 2005 21:46:22 -0400") References: <007401c57801$850c3f80$0300a8c0@jameskel> <885eca7cfc3cfb6677b85f381eef6c56@bfccomputing.com> <005701c5785e$86bfc1a0$0300a8c0@jameskel> Message-ID: "James Kellndorfer" writes: > "wget -r" or any other combination of command-line options doesn't work. Does wget's "--follow-tags" help? (like, for example, maybe you need to add "script" to the list of tags that wget knows to follow) Does wget's "--page-requisites" help? How about: wget -E -H -k -K -p your-url.html --kevin -- GnuPG ID: B280F24E And the madness of the crowd alumni.unh.edu!kdc Is an epileptic fit -- Tom Waits From jameskel at adelphia.net Fri Jun 24 10:13:07 2005 From: jameskel at adelphia.net (James Kellndorfer) Date: Fri, 24 Jun 2005 13:13:07 -0400 Subject: [Nh-pm] How to save complete web page not just text? References: <007401c57801$850c3f80$0300a8c0@jameskel> <885eca7cfc3cfb6677b85f381eef6c56@bfccomputing.com> <005701c5785e$86bfc1a0$0300a8c0@jameskel> <205a02464ac0ca8bf5f3270441aa78c7@bfccomputing.com> Message-ID: <001701c578df$fddd3520$0300a8c0@jameskel> I have a very simple task. I'm just trying to save this complete webpage: http://marketrac.nyse.com/ot/ordertrac_detail.html I can save this manually, but I'm wondering how to do this automatically (ie: perl based cron job, or some other method). With other webpages, I simply used lynx, wget, curl, or LWP. I am trying to understand why this webpage is so special. I am trying to learn how this webpage is rendered in an effort to develop a solution. Once the page is saved, the data lies within one of the .js files which I can easily "rip". I am unfamilar with AppleScripting FireFox or Camino, and I'll embark on learning these things. (Is applescripting only for MAC users?) As for Javascript::SpiderMonkey or XPCOM, again I'll investigate and learn what I can before asking for more help. I don't even know what these things are. Thanks for some direction, JK ----- Original Message ----- From: "Bill McGonigle" To: "James Kellndorfer" Cc: Sent: Friday, June 24, 2005 9:54 AM Subject: Re: [Nh-pm] How to save complete web page not just text? > On Jun 23, 2005, at 21:46, James Kellndorfer wrote: > > > "wget -r" or any other combination of command-line options doesn't > > work. > > wget doesn't resolve javascript. So a download of a complete webpage > > is now > > impossible. > > Ah, so the page in question is loading content dynamically with some > javascript/DOM programming? > > JavaScript::SpiderMonkey might be useful, as would perhaps the Perl > XPCOM project. None of these is a complete solution - it would be > helpful to know what you're trying to do (e.g. what kinds of data > you're accessing) to recommend a solution. Unless you're trying to > scrap javascript-guarded e-mail addresses, that is. > > A faster solution might be AppleScripting Firefox or Camino. > > -Bill > > ----- > Bill McGonigle, Owner Work: 603.448.4440 > BFC Computing, LLC Home: 603.448.1668 > bill at bfccomputing.com Mobile: 603.252.2606 > http://www.bfccomputing.com/ Pager: 603.442.1833 > AIM: wpmcgonigle Text: bill+text at bfccomputing.com > > For fastest support contact, please follow: > http://bfccomputing.com/support_contact.html > From andrew at broscom.com Fri Jun 24 10:41:49 2005 From: andrew at broscom.com (Andrew Brosnan) Date: Fri, 24 Jun 2005 13:41:49 -0400 Subject: [Nh-pm] How to save complete web page not just text? Message-ID: On 6/24/05 at 1:13 PM, jameskel at adelphia.net (James Kellndorfer) wrote: > I have a very simple task. > > I'm just trying to save this complete webpage: > http://marketrac.nyse.com/ot/ordertrac_detail.html > This fetches exacly the same as the source viewed in my browesr: #!/usr/bin/perl use warnings; use strict; use LWP::Simple; my $pg = get("http://marketrac.nyse.com/ot/ordertrac_detail.html"); print "Here is the page: \n\n $pg"; The following one liner also works fine: perl -MLWP::Simple -e 'getprint "http://marketrac.nyse.com/ot/ordertrac_detail.html"' Regards, Andrew -- Andrew Brosnan - Broscom LLC - 1 207 925-1156 andrew at broscom.com - http://www.broscom.com Websites, Hosting, Programming, Consulting From jameskel at adelphia.net Fri Jun 24 11:02:01 2005 From: jameskel at adelphia.net (James Kellndorfer) Date: Fri, 24 Jun 2005 14:02:01 -0400 Subject: [Nh-pm] How to save complete web page not just text? References: Message-ID: <001701c578e6$d2b4c280$0300a8c0@jameskel> Thank you for trying this. Yes I know this works. I'm not interested the textual portion of the webpage. If you view the web page in your browser you'll see 80 5 minute periods. Each period contains data. These data are rendered using a graphic and web forms. The input boxes display the data as you move your mouse. This is all controlled using javascripts. The webpage source that you retrieved using LWP will not contain the .js files that contain the data for each 5 minute period. The HTML source LWP retrieves is the textual portion of the webpage only. The data lies with within a secondary file named NYA.js, which is not retrieved as LWP doesn't implement a java interpreter. So, I'm not interested in retrieving the text. That's why I'm interested is retrieving a complete-webpage. I would be happy if someone could help me create a "single file format" .mht or .maf (mozilla archive format) formatted file using perl. I'm using linux FC4. JK ----- Original Message ----- From: "Andrew Brosnan" To: "James Kellndorfer" Sent: Friday, June 24, 2005 1:40 PM Subject: Re: [Nh-pm] How to save complete web page not just text? On 6/24/05 at 1:13 PM, jameskel at adelphia.net (James Kellndorfer) wrote: > I have a very simple task. > > I'm just trying to save this complete webpage: > http://marketrac.nyse.com/ot/ordertrac_detail.html > This fetches exacly the same as the source viewed in my browesr: #!/usr/bin/perl use warnings; use strict; use LWP::Simple; my $pg = get("http://marketrac.nyse.com/ot/ordertrac_detail.html"); print "Here is the page: \n\n $pg"; The following one liner also works fine: perl -MLWP::Simple -e 'getprint "http://marketrac.nyse.com/ot/ordertrac_detail.html"' Regards, Andrew -- Andrew Brosnan - Broscom LLC - 1 207 925-1156 andrew at broscom.com - http://www.broscom.com Websites, Hosting, Programming, Consulting From clark_k at pannaway.com Fri Jun 24 11:16:20 2005 From: clark_k at pannaway.com (Kevin D. Clark) Date: Fri, 24 Jun 2005 14:16:20 -0400 Subject: [Nh-pm] How to save complete web page not just text? In-Reply-To: <001701c578e6$d2b4c280$0300a8c0@jameskel> (James Kellndorfer's message of "Fri, 24 Jun 2005 14:02:01 -0400") References: <001701c578e6$d2b4c280$0300a8c0@jameskel> Message-ID: "James Kellndorfer" writes: > The webpage source that you retrieved using LWP will not contain the .js > files that contain the data for each 5 minute period. $ wget -E -H -k -K -p http://marketrac.nyse.com/ot/ordertrac_detail.html .... FINISHED --14:11:41-- Downloaded: 69,000 bytes in 37 files Converting marketrac.nyse.com/ot/ordertrac_detail.html... 50-3 Converting www.nyse.com/404.html... 1-0 Converted 2 files in 0.00 seconds. $ ls marketrac.nyse.com/ www.nyse.com/ $ find . -name \*.js ./marketrac.nyse.com/ot/_rnd.js ./marketrac.nyse.com/ot/_locate.js ./marketrac.nyse.com/data/siac/OrderTrac/staticFiles/js/snapshot.js ./marketrac.nyse.com/data/siac/OrderTrac/staticFiles/js/NYA.js ./www.nyse.com/redirect.js $ Aren't those what you want? snapshot.js and NYA.js have a lot of data in them. Regards, --kevin -- GnuPG ID: B280F24E And the madness of the crowd alumni.unh.edu!kdc Is an epileptic fit -- Tom Waits From jameskel at adelphia.net Fri Jun 24 11:39:19 2005 From: jameskel at adelphia.net (James Kellndorfer) Date: Fri, 24 Jun 2005 14:39:19 -0400 Subject: [Nh-pm] How to save complete web page not just text? References: <001701c578e6$d2b4c280$0300a8c0@jameskel> Message-ID: <003b01c578ec$093dfec0$0300a8c0@jameskel> Yes those are the .JS files I'm looking for. Thank you. Darn it. My wget isn't working.Your command, which I tried days ago, didn't work for me. Thank you for helping isolate the problem. I'm glad wget does indeed work. This is the simplest way to get at the data. I'll let you know if I get the .js files after I reinstall wget. BTW, I have version wget 1.9 Thanks, JK ----- Original Message ----- From: "Kevin D. Clark" To: "James Kellndorfer" Cc: Sent: Friday, June 24, 2005 2:16 PM Subject: Re: [Nh-pm] How to save complete web page not just text? > > "James Kellndorfer" writes: > > > The webpage source that you retrieved using LWP will not contain the .js > > files that contain the data for each 5 minute period. > > $ wget -E -H -k -K -p http://marketrac.nyse.com/ot/ordertrac_detail.html > > .... > > FINISHED --14:11:41-- > Downloaded: 69,000 bytes in 37 files > Converting marketrac.nyse.com/ot/ordertrac_detail.html... 50-3 > Converting www.nyse.com/404.html... 1-0 > Converted 2 files in 0.00 seconds. > $ ls > marketrac.nyse.com/ www.nyse.com/ > $ find . -name \*.js > ./marketrac.nyse.com/ot/_rnd.js > ./marketrac.nyse.com/ot/_locate.js > ./marketrac.nyse.com/data/siac/OrderTrac/staticFiles/js/snapshot.js > ./marketrac.nyse.com/data/siac/OrderTrac/staticFiles/js/NYA.js > ./www.nyse.com/redirect.js > $ > > Aren't those what you want? snapshot.js and NYA.js have a lot of data > in them. > > Regards, > > --kevin > -- > GnuPG ID: B280F24E And the madness of the crowd > alumni.unh.edu!kdc Is an epileptic fit > -- Tom Waits From clark_k at pannaway.com Fri Jun 24 12:19:41 2005 From: clark_k at pannaway.com (Kevin D. Clark) Date: Fri, 24 Jun 2005 15:19:41 -0400 Subject: [Nh-pm] How to save complete web page not just text? In-Reply-To: <003b01c578ec$093dfec0$0300a8c0@jameskel> (James Kellndorfer's message of "Fri, 24 Jun 2005 14:39:19 -0400") References: <001701c578e6$d2b4c280$0300a8c0@jameskel> <003b01c578ec$093dfec0$0300a8c0@jameskel> Message-ID: <1864w3h8aa.fsf@pannaway.com> James Kellndorfer writes: > Yes those are the .JS files I'm looking for. Thank you. Hey, you're welcome. > BTW, I have version wget 1.9 I have "GNU Wget 1.8.2". Regards, --kevin -- GnuPG ID: B280F24E And the madness of the crowd alumni.unh.edu!kdc Is an epileptic fit -- Tom Waits From jameskel at adelphia.net Fri Jun 24 14:10:00 2005 From: jameskel at adelphia.net (James Kellndorfer) Date: Fri, 24 Jun 2005 17:10:00 -0400 Subject: [Nh-pm] How to save complete web page not just text? References: <001701c578e6$d2b4c280$0300a8c0@jameskel> Message-ID: <004501c57901$1572cb20$0300a8c0@jameskel> I got wget to work properly. As far as I'm concerned, using wget suits my needs. Problem solved. Thanks, JK ----- Original Message ----- From: "Kevin D. Clark" To: "James Kellndorfer" Cc: Sent: Friday, June 24, 2005 2:16 PM Subject: Re: [Nh-pm] How to save complete web page not just text? > > "James Kellndorfer" writes: > > > The webpage source that you retrieved using LWP will not contain the .js > > files that contain the data for each 5 minute period. > > $ wget -E -H -k -K -p http://marketrac.nyse.com/ot/ordertrac_detail.html > > .... > > FINISHED --14:11:41-- > Downloaded: 69,000 bytes in 37 files > Converting marketrac.nyse.com/ot/ordertrac_detail.html... 50-3 > Converting www.nyse.com/404.html... 1-0 > Converted 2 files in 0.00 seconds. > $ ls > marketrac.nyse.com/ www.nyse.com/ > $ find . -name \*.js > ./marketrac.nyse.com/ot/_rnd.js > ./marketrac.nyse.com/ot/_locate.js > ./marketrac.nyse.com/data/siac/OrderTrac/staticFiles/js/snapshot.js > ./marketrac.nyse.com/data/siac/OrderTrac/staticFiles/js/NYA.js > ./www.nyse.com/redirect.js > $ > > Aren't those what you want? snapshot.js and NYA.js have a lot of data > in them. > > Regards, > > --kevin > -- > GnuPG ID: B280F24E And the madness of the crowd > alumni.unh.edu!kdc Is an epileptic fit > -- Tom Waits