From dbii at mudpuddle.com Sun Apr 11 18:23:37 2004 From: dbii at mudpuddle.com (David Bluestein II) Date: Mon Aug 2 21:23:24 2004 Subject: APM: April's Topic Message-ID: <20040411232337.GF20627@mudpuddle.com> Mark- The web page for the meeting needs to be updated to show the April meeting date, it still shows the March one. I know it'll be a great topic, as we did a discussion of points at last months dinner only meeting. David From mlehmann at marklehmann.com Wed Apr 14 12:27:36 2004 From: mlehmann at marklehmann.com (Mark Lehmann) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Meeting next Wednesday 7:00pm Perl 6 Message-ID: <16509.29832.338360.170704@lehmbrain.marklehmann.com> Our next meeting is Wednesday April 21st at 7:00pm. The topic is "Perl 6." Meeting place to be determined. If you would like to recommend a place that is convenient to get to, has a computer projector, an internet connection, and free, please tell me. -- Mark Lehmann email mlehmann@marklehmann.com | phone 512 689-7705 From goldilox at teachnet.edb.utexas.edu Sat Apr 17 02:01:22 2004 From: goldilox at teachnet.edb.utexas.edu (Goldilox) Date: Mon Aug 2 21:23:24 2004 Subject: APM: LWP Question Message-ID: I recently moved a website from a shared Windows box running Activestate Perl (unknown version) to a different shared box running: SERVER_SOFTWARE="Apache/1.3.29 (Unix) AuthMySQL/2.20 FrontPage/4.0.4.3 PHP-CGI/0.1b" Perl Version: 5.008 (if that helps). I was using LWP::Simple on the Activestate box for a script - and I never bothered to get the version number of Perl for that box, but now, on this new Unix box, I get this error message: Can't locate LWP/Simple.pm in @INC (@INC contains: /usr/local/lib/perl5/5.8.0/i686-linux /usr/local/lib/perl5/5.8.0 /usr/local/lib/perl5/site_perl/5.8.0/i686-linux /usr/local/lib/perl5/site_perl/5.8.0 /usr/local/lib/perl5/site_perl .) at updatedata.pl line 4. BEGIN failed--compilation aborted at updatedata.pl line 4. I thought LWP was one of the standard modules installed with Perl. Am I missing something really obvious here? And if LWP really isn't there, what's the Internet API module most likely going to be? Thanks for any help. Rhett From ian at remmler.org Sat Apr 17 08:07:20 2004 From: ian at remmler.org (Ian Remmler) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Meeting next Wednesday 7:00pm Perl 6 In-Reply-To: <16509.29832.338360.170704@lehmbrain.marklehmann.com> References: <16509.29832.338360.170704@lehmbrain.marklehmann.com> Message-ID: <20040417130720.GA6321@remmler.org> On Wed, Apr 14, 2004 at 12:27:36PM -0500, Mark Lehmann wrote: > Meeting place to be determined. If you would like to recommend a place that > is convenient to get to, has a computer projector, an internet connection, > and free, please tell me. I found out at the CACTUS meeting on Thursday that we most likely can meet at ARL. I still need to speak with the person in charge of reservations, but apparently they just want someone who works there (which I conveniently do) to be at the meetings. I don't know if next Wednesday is doable, but I'll see what I can do. -- Ian Remmler | A monk asked Joshu, "Has a dog Buddha ian@remmler.org | nature or not?" Joshu replied, "Mu!" http://remmler.org | -- Mumon, "The Gateless Gate" From peterbotros at yahoo.com Sat Apr 17 15:49:50 2004 From: peterbotros at yahoo.com (Peter botros) Date: Mon Aug 2 21:23:24 2004 Subject: APM: FTP In-Reply-To: <200402251800.i1PI06118684@mail.pm.org> Message-ID: <20040417204950.9947.qmail@web20607.mail.yahoo.com> am looking for ftp scripts to send evry 15 min if the files are in a directory and/or to run to run scripts if files are in a directory Thanks ===== Peter Botros __________________________________ Do you Yahoo!? Yahoo! Photos: High-quality 4x6 digital prints for 25¢ http://photos.yahoo.com/ph/print_splash From goldilox at teachnet.edb.utexas.edu Sun Apr 18 00:12:30 2004 From: goldilox at teachnet.edb.utexas.edu (Goldilox) Date: Mon Aug 2 21:23:24 2004 Subject: APM: LWP Question In-Reply-To: <20040417223026.49423.qmail@web20405.mail.yahoo.com> References: <20040417223026.49423.qmail@web20405.mail.yahoo.com> Message-ID: Is there any other Internet API that comes standard with the Linux distributions (since LWP is not standard - this being a shared box I have refered to, I am not sure how easy it would be to get them to add the module I need)? Is there any way to find out what the default modules included in a distribution are? I searched around, but I couldn't seem to find it in the documentation - do I have to install it on my own box to find out? Thanks again Rhett Bill Raty writes: >LWP comes installed with the ActiveState distribution. It >doesn't always seem to be installed standard on many Linux >distributions. > >Its easy enough to get-- fire up CPAN and have it install it. > > perl -MCPAN -e shell # to fire up cpan > > >-Bill > >--- Goldilox wrote: >> I recently moved a website from a shared Windows box running >> Activestate Perl >> (unknown version) to a different shared box running: >> SERVER_SOFTWARE="Apache/1.3.29 (Unix) AuthMySQL/2.20 >> FrontPage/4.0.4.3 >> PHP-CGI/0.1b" >> Perl Version: 5.008 >> (if that helps). >> I was using LWP::Simple on the Activestate box for a script - >> and I never >> bothered to get the version number of Perl for that box, but >> now, on this new >> Unix box, I get this error message: >> Can't locate LWP/Simple.pm in @INC >> (@INC contains: >> /usr/local/lib/perl5/5.8.0/i686-linux >> /usr/local/lib/perl5/5.8.0 >> /usr/local/lib/perl5/site_perl/5.8.0/i686-linux >> /usr/local/lib/perl5/site_perl/5.8.0 >> /usr/local/lib/perl5/site_perl .) at updatedata.pl line 4. >> BEGIN failed--compilation aborted at updatedata.pl line 4. >> >> I thought LWP was one of the standard modules installed with >> Perl. Am I missing >> something really obvious here? And if LWP really isn't there, >> what's the >> Internet API module most likely going to be? >> Thanks for any help. >> Rhett >> >> _______________________________________________ >> Austin mailing list >> Austin@mail.pm.org >> http://mail.pm.org/mailman/listinfo/austin > > >===== >Let's not elect Bush in '04 either. From msouth at shodor.org Sun Apr 18 00:35:41 2004 From: msouth at shodor.org (Mike South) Date: Mon Aug 2 21:23:24 2004 Subject: APM: FTP In-Reply-To: <20040417204950.9947.qmail@web20607.mail.yahoo.com> References: <20040417204950.9947.qmail@web20607.mail.yahoo.com> Message-ID: <408213AD.mail9SZ1346SA@scan.shodor.org> >From austin-bounces@mail.pm.org Sat Apr 17 16:50:01 2004 >Date: Sat, 17 Apr 2004 13:49:50 -0700 (PDT) >From: Peter botros >Subject: APM: FTP > >am looking for ftp scripts to send evry 15 min if the >files are in a directory and/or to run to run scripts >if files are in a directory I am assuming you are wanting something that "reacts", so to speak, to files being in a directory, and, when it sees some, does something to them (where "something" includes transferring them out of the directory so that they don't trigger a re-run or otherwise hang around in the way). We have to do something like that, and maybe I can save you some headaches by describing what I think we do (I haven't seen it firsthand, just know the description). First, for the "every fifteen minutes" part, we use cron. That way you can just have a script that does whatever is supposed to be done with the files, and you won't have to write the "every fifteen minutes" part, or make sure it gets started again when the system reboots, or whatever. Second, we have a lockfile that prevents two instances of the script from running at the same time. One day, you might have so much going on that your script isn't done in fifteen minutes, and then cron fires off another run of your script and all hell breaks loose as they both try to work on the same files. Third, we don't look for "files in the directory", but "a trigger file in that directory". The trigger file lists all the files that are to be processed. The point here is that the trigger file gets transferred into the directory with the other files, but it gets transferred last. Why? Because sooner or later your "every fifteen minutes" is going to wake the script up right in the middle of a file getting dumped into the directory, and then you'll do your work on half a file. So, something like this goes in your crontab: 0-59/15 * * * * /home/msouth/bin/handle_files.pl handle_files.pl would be something like this: # UNTESTED UNTESTED UNTESTED #!/usr/bin/perl -w use strict; # put a lockfile in the same place as us named # same thing as us with '.lock' appended my @files_to_unlink; my $lockfile = $0 . '.lock'; if (-e $lockfile ) { warn "a lockfile exists, I'm not running\n"; exit; # would be better to see if the file just didn't get # cleaned up, and wipe it out if that's the case. # # you can probably "kill 0, PID" or something to see # if the PID that put the lockfile there is still # running, and just wipe out the file if it isn't # (that is, if you put the PID in the lockfile) # Also, in real life you will probably have to # keep the lockfile somewhere else, because # the directory where the script lives is likely not # to be writeable } else { open(LOCK, ">$lockfile") or die "couldn't open lockfile:$!\n"; # put the PID in the lockfile so future instances of # this script can check whether we are still running print LOCK "$$\n"; close LOCK; push @files_to_unlink, $lockfile ; } my $dir = '/home/msouth/dump'; my $trigger = "$dir/trigger.txt"; &cleanup_and_exit unless ( -e $trigger ); open (TRIGGER, "<$trigger") or die "couldn't open $trigger:$!\n"; chomp( my @lines = ); close TRIGGER; my $saw_end = 0; foreach my $line (reverse @lines) { if ($line eq 'END_FILES') { $saw_end++; last; } } unless ($saw_end) { warn qq{trigger file $trigger is missing "END_FILES" line. I am bailing, hopefully it's still being transferred\n}; &cleanup_and_exit; } shift @lines while $lines[0] ne 'BEGIN_FILES'; unless (@lines) { warn "trigger file $trigger does not have 'BEGIN_FILES', this is not good\n"; &cleanup_and_exit; } shift @lines; # $lines[0] is just 'BEGIN_FILES', remember foreach my $line (@lines) { next if $line =~ /^\s*#/; last if $line eq 'END_FILES'; my $this_file = "$dir/$line"; &process_file($this_file); push @files_to_unlink, $this_file; } push @files_to_unlink, $trigger; &cleanup_and_exit; sub process_file { my $file = shift; if (system "cat $file >> /home/msouth/dump/all_dumped_files") { warn "$file didn't process\n"; # cp file to error directory } else { # cp file to success directory } } sub cleanup_and_exit { unlink $_ for @files_to_unlink; exit(0); } __END__ Then you can use a trigger file like this; BEGIN_FILES yo ya ye END_FILES good luck, mike From dbii at mudpuddle.com Mon Apr 19 00:35:43 2004 From: dbii at mudpuddle.com (David Bluestein II) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Meeting next Wednesday 7:00pm Perl 6 In-Reply-To: <20040417130720.GA6321@remmler.org> References: <16509.29832.338360.170704@lehmbrain.marklehmann.com> <20040417130720.GA6321@remmler.org> Message-ID: <20040419053543.GA20627@mudpuddle.com> I suggest ServerGraph if we can still meet there. Since it is up on the site as ServerGraph, changing it for April may lead some people to go to the wrong location. Then we can always check on moving the meeting in May (though I like the current location). Also, changing in April means I don't know where dinner hour would be :( David On Sat, Apr 17, 2004 at 08:07:20AM -0500, Ian Remmler wrote: > On Wed, Apr 14, 2004 at 12:27:36PM -0500, Mark Lehmann wrote: > > Meeting place to be determined. If you would like to recommend a place that > > is convenient to get to, has a computer projector, an internet connection, > > and free, please tell me. > > I found out at the CACTUS meeting on Thursday that we most likely > can meet at ARL. I still need to speak with the person in charge > of reservations, but apparently they just want someone who works > there (which I conveniently do) to be at the meetings. I don't > know if next Wednesday is doable, but I'll see what I can do. > > -- > Ian Remmler | A monk asked Joshu, "Has a dog Buddha > ian@remmler.org | nature or not?" Joshu replied, "Mu!" > http://remmler.org | -- Mumon, "The Gateless Gate" > _______________________________________________ > Austin mailing list > Austin@mail.pm.org > http://mail.pm.org/mailman/listinfo/austin From austin.pm at sam-i-am.com Mon Apr 19 09:45:13 2004 From: austin.pm at sam-i-am.com (Sam Foster) Date: Mon Aug 2 21:23:24 2004 Subject: APM: processing lots of files? Message-ID: <4083E5F9.1040704@sam-i-am.com> (preface: my perl is fairly poor, perhaps fair on a good day. These are the kind of tasks I originally learnt perl for, but the sheer volume of the data is challenging me) I'm currently working with a fairly large set of data that consists of a deep filesystem directory structure, each directory having a (java-style) properties text file, along with miscellaneous directory contents. In addition there's an xml file for each that is our final output for delivery to the client. I've got some data clean-up to do, verification, reporting, and validation of the output against a schema. Lots of tree-crawling and text file parsing in other words. I'm in need of some performance tips. There's about 30,000 individual properties files (and a cross-references file in the same kind of format) - one for each directory. Simply crawling the tree and parsing each properties file is taking a while (an hour or more). Next up I need to fix some broken references (the xrefs file contains references like so: relatedLinks = [@/some/path/, @/someother/path] .) After that I'll need to verify and validate some xml output. Again, one file per directory. This data is on the local network, I'm working on a win2k box, having mapped a network drive. My machine is running activestate perl 5.8, with 1GB RAM, and a (single) 1600 mhz pentium processor. I've done a little benchmarking on parts of individual scripts, but I need a order of magnitude speed increase, not shaving micro-seconds off here and there. Any thoughts? I can attach a sample script if list protocol allows. thanks, Sam From jakulas at swbell.net Mon Apr 19 12:02:17 2004 From: jakulas at swbell.net (John Kulas) Date: Mon Aug 2 21:23:24 2004 Subject: APM: re: LWP Question Message-ID: <20040419170217.59461.qmail@web80603.mail.yahoo.com> LWP is not part of the base Perl installation. LWP is a Perl add-on package. It is often added on because it is so useful. I recommend you asking your sysadmin to add it to your new server. - John Kulas From erik at debill.org Mon Apr 19 12:12:28 2004 From: erik at debill.org (erik@debill.org) Date: Mon Aug 2 21:23:24 2004 Subject: APM: processing lots of files? In-Reply-To: <4083E5F9.1040704@sam-i-am.com> References: <4083E5F9.1040704@sam-i-am.com> Message-ID: <20040419171228.GA25971@debill.org> On Mon, Apr 19, 2004 at 09:45:13AM -0500, Sam Foster wrote: > I'm currently working with a fairly large set of data that consists of a > deep filesystem directory structure, each directory having a > (java-style) properties text file, along with miscellaneous directory > contents. In addition there's an xml file for each that is our final > output for delivery to the client. > > I've got some data clean-up to do, verification, reporting, and > validation of the output against a schema. Lots of tree-crawling and > text file parsing in other words. I'm in need of some performance tips. I'd start by processing each directory completely before moving on to the next one, if at all possible. Directory lookups on network filesystems can be surprisingly expensive, so doing everything in a single pass may be a win. Any chance of getting the files locally instead of via the network? I'm assuming SMB, if it was NFS I might be able to suggest some mount parameters to speed it up, but nothing beats a local disk. > There's about 30,000 individual properties files (and a cross-references > file in the same kind of format) - one for each directory. How deeply does this structure go? Some filesystems get bogged down when there are 1000s of files in a single directory. If all of these 30k directories are within a single parent directory just getting a list of them could be a serious slowdown. On Linux I try to avoid having more than a few hundred files in a directory if at all possible. > Simply crawling the tree and parsing each properties file is taking a > while (an hour or more). Next up I need to fix some broken references 30000/ 3600 = 8.3 files/sec. Not exactly blazing, but not incredibly slow either. > (the xrefs file contains references like so: relatedLinks = > [@/some/path/, @/someother/path] .) > After that I'll need to verify and validate some xml output. Again, one > file per directory. Does this mean you can't parallelize this? I suspect your script is spending a fair amount of time waiting for data. Running 2 copies in parallel each on its own subset of the directories might be a win (even with only a single processor to work with). Erik -- Humor soothes the savage cubicle monkey. -- J Jacques From austin.pm at sam-i-am.com Mon Apr 19 13:52:57 2004 From: austin.pm at sam-i-am.com (Sam Foster) Date: Mon Aug 2 21:23:24 2004 Subject: APM: processing lots of files? In-Reply-To: <20040419171228.GA25971@debill.org> References: <4083E5F9.1040704@sam-i-am.com> <20040419171228.GA25971@debill.org> Message-ID: <40842009.4020704@sam-i-am.com> erik@debill.org wrote: > I'd start by processing each directory completely before moving on to > the next one, if at all possible. Directory lookups on network > filesystems can be surprisingly expensive, so doing everything in a > single pass may be a win. I'm using File::Find, which I think does this by default. > Any chance of getting the files locally instead of via the network? > I'm assuming SMB, if it was NFS I might be able to suggest some mount > parameters to speed it up, but nothing beats a local disk. There's about 3-4 GB of data, that is being worked on collaboratively by a distributed team, so moving it isn't an option unfortunately. However, it is NFS... so what you got? >>There's about 30,000 individual properties files (and a cross-references >>file in the same kind of format) - one for each directory. > > How deeply does this structure go? Some filesystems get bogged down > when there are 1000s of files in a single directory. If all of these > 30k directories are within a single parent directory just getting a > list of them could be a serious slowdown. On Linux I try to avoid > having more than a few hundred files in a directory if at all possible. I have only 5-10 files in each directory. I'm using the pre-processing that File::Find offers to only visit the positive matches (FWIW) >>Simply crawling the tree and parsing each properties file is taking a >>while (an hour or more). Next up I need to fix some broken references > > 30000/ 3600 = 8.3 files/sec. Not exactly blazing, but not incredibly > slow either. I just tried benchmarking one of my scripts again (I called my &find from Benchmark::timeit) with a limited dataset, and got: 72 wallclock secs ( 0.21 usr + 1.24 sys = 1.45 CPU) @ 3. 45/s (n=5) which was after parsing just 160 files. 2.22 files/sec. Not so stellar after all. I'll dig in to the module that's doing the parsing and see if there's an obvious culprit there. (starting with the bits I wrote :) > I suspect your script is > spending a fair amount of time waiting for data. Running 2 copies in > parallel each on its own subset of the directories might be a win > (even with only a single processor to work with). I didn't think of dividing up the directory list and simply running the same script again in parallel. I'll try that. Would forking achieve the same thing, or am I introducing unnecessary complexity? thanks, this was a help, Sam From erik at debill.org Mon Apr 19 15:19:30 2004 From: erik at debill.org (erik@debill.org) Date: Mon Aug 2 21:23:24 2004 Subject: APM: processing lots of files? In-Reply-To: <40842009.4020704@sam-i-am.com> References: <4083E5F9.1040704@sam-i-am.com> <20040419171228.GA25971@debill.org> <40842009.4020704@sam-i-am.com> Message-ID: <20040419201930.GA26249@debill.org> On Mon, Apr 19, 2004 at 01:52:57PM -0500, Sam Foster wrote: > erik@debill.org wrote: > >filesystems can be surprisingly expensive, so doing everything in a > >single pass may be a win. > > I'm using File::Find, which I think does this by default. Ah. I'd assumed you were running that once for each step. As long as you only run it once you're good. > >Any chance of getting the files locally instead of via the network? > >I'm assuming SMB, if it was NFS I might be able to suggest some mount > >parameters to speed it up, but nothing beats a local disk. > > There's about 3-4 GB of data, that is being worked on collaboratively by > a distributed team, so moving it isn't an option unfortunately. However, > it is NFS... so what you got? I'm not sure what the exact options would be for NT, but you want to use tcp (instead of udp, which is a default lots of places), and crank the block size up. I use tcp,rsize=16000,wsize=16000 at home. Even larger block sizes are perfectly legit (I believe some companies default to 64000) and large sizes can save on the number of requests needed to transfer your data (as well as cutting down on actual read requests that get to the physical disks). Also, if you aren't defaulting to an async mount you might try that. I'm not sure how it interacts with NFS (for all I know they're always async) but it's usually a big throughput win to not wait for your writes to complete. > >>Simply crawling the tree and parsing each properties file is taking a > >>while (an hour or more). Next up I need to fix some broken references > > > >30000/ 3600 = 8.3 files/sec. Not exactly blazing, but not incredibly > >slow either. > > I just tried benchmarking one of my scripts again (I called my &find > from Benchmark::timeit) with a limited dataset, and got: > > 72 wallclock secs ( 0.21 usr + 1.24 sys = 1.45 CPU) @ 3. > 45/s (n=5) > > which was after parsing just 160 files. 2.22 files/sec. Not so stellar > after all. I'll dig in to the module that's doing the parsing and see if > there's an obvious culprit there. (starting with the bits I wrote :) 72 wall clock and only 1.45 CPU? Sounds like it's all IO wait. The good news is there's bound to be a way to make that go a lot faster :) Does it slow down as it handles more and more files? Is memory use growing? If your workstation goes into swap that would definitely cause a slowdown. > >parallel each on its own subset of the directories might be a win > >(even with only a single processor to work with). > > I didn't think of dividing up the directory list and simply running the > same script again in parallel. I'll try that. Would forking achieve the > same thing, or am I introducing unnecessary complexity? You could have the script fork a set number of times right at the beginning. You just need a way for each process to figure out what directories are its responsibility (even if it's "I only do odd numbered directories"). Easy to do if your directory names are relatively stable and predictable. I wouldn't modify the function that File::Find calls to fork(), since that's liable to make a fork bomb. > thanks, this was a help, Glad to help. Just let us know how things turn out. Erik -- Humor soothes the savage cubicle monkey. -- J Jacques From dbii at mudpuddle.com Mon Apr 19 16:26:10 2004 From: dbii at mudpuddle.com (David Bluestein II) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Simple BLOG w/RSS Message-ID: <20040419212610.GM20627@mudpuddle.com> Okay, I need to find a SIMPLE blog application, with RSS attached. Something for 3-4 people to use, post updated information (like a bulletin board), but with RSS so people can get the feeds if they want and know when it is updated. Any Perl based suggestions? I've looked over Blosxom (www.blosxom.com) and it looks like it fits the bill, but didn't know if anyone else had worked with a simple system (don't want a lot of overhead that comes with Movable Type) to setup and use. David From jakulas at swbell.net Tue Apr 20 12:27:48 2004 From: jakulas at swbell.net (John Kulas) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Re: Simple BLOG w/RSS In-Reply-To: <200404201700.i3KH05r32031@mail.pm.org> Message-ID: <20040420172748.86587.qmail@web80604.mail.yahoo.com> How about Twiki? See http://www.twiki.org/. Automatic text search, simple organization, access restriction if you want it, etc. - John Kulas From mlehmann at marklehmann.com Tue Apr 20 16:11:43 2004 From: mlehmann at marklehmann.com (Mark Lehmann) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Meeting tomorrow night at ServerGraph Message-ID: <16517.37391.258192.582099@lehmbrain.marklehmann.com> We are going to have the Perl Mongers meeting at ServerGraph tomorrow night at 7:00pm. As normal, we will be eating at the Pok-e-Jo's a block down 5th street from ServerGraph at 6:00pm. Please see the APM website (http://austin.pm.org/) for directions to ServerGraph. -- Mark Lehmann email mlehmann@marklehmann.com | phone 512 689-7705 From dbii at mudpuddle.com Wed Apr 21 08:54:14 2004 From: dbii at mudpuddle.com (David Bluestein II) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Re: Simple BLOG w/RSS In-Reply-To: <20040420172748.86587.qmail@web80604.mail.yahoo.com> References: <200404201700.i3KH05r32031@mail.pm.org> <20040420172748.86587.qmail@web80604.mail.yahoo.com> Message-ID: <20040421135414.GQ20627@mudpuddle.com> John- I've used Twiki before, but it doesn't quite meet the need. While simple, it is more difficult than the targetted end user (I'm not sure they would get Wikiwords) and we need something that is easy for them to put sequential text in. Also need a really good RSS feed mechanism. I looked at the Perl Module Kwiki too, but seemed to have too much extra that I didn't need. Thanks- David On Tue, Apr 20, 2004 at 10:27:48AM -0700, John Kulas wrote: > How about Twiki? See http://www.twiki.org/. > Automatic text search, simple organization, access > restriction if you want it, etc. > - John Kulas > _______________________________________________ > Austin mailing list > Austin@mail.pm.org > http://mail.pm.org/mailman/listinfo/austin From ian at remmler.org Fri Apr 23 09:07:22 2004 From: ian at remmler.org (Ian Remmler) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Tom Christiansen's pop in Message-ID: <20040423140722.GA18202@remmler.org> Here's a link to Tom's rare pop in on the perl6.language list that I mentioned at the meeting. Some of the other messages in the thread are equally, er, impressive... http://tinyurl.com/2jfun -- Ian Remmler | A monk asked Joshu, "Has a dog Buddha ian@remmler.org | nature or not?" Joshu replied, "Mu!" http://remmler.org | -- Mumon, "The Gateless Gate" From ian at remmler.org Fri Apr 23 09:15:59 2004 From: ian at remmler.org (Ian Remmler) Date: Mon Aug 2 21:23:24 2004 Subject: APM: ARL is a go Message-ID: <20040423141559.GB18202@remmler.org> I've scheduled the auditorium for Wednesday, May 19 from 7:00 to 9:00. We will have access to a projector and an ethernet port. It may be possible for someone to bring a wireless router and hook it up, but I'll have to check. -- Ian Remmler | A monk asked Joshu, "Has a dog Buddha ian@remmler.org | nature or not?" Joshu replied, "Mu!" http://remmler.org | -- Mumon, "The Gateless Gate" From eharris at puremagic.com Fri Apr 23 09:57:50 2004 From: eharris at puremagic.com (Evan Harris) Date: Mon Aug 2 21:23:24 2004 Subject: APM: ARL is a go In-Reply-To: <20040423141559.GB18202@remmler.org> Message-ID: Bah, who needs a wireless router? My notebook does a fine job of being an access point. Evan On Fri, 23 Apr 2004, Ian Remmler wrote: > I've scheduled the auditorium for Wednesday, May 19 from 7:00 to > 9:00. We will have access to a projector and an ethernet port. > It may be possible for someone to bring a wireless router and > hook it up, but I'll have to check. > > -- > Ian Remmler | A monk asked Joshu, "Has a dog Buddha > ian@remmler.org | nature or not?" Joshu replied, "Mu!" > http://remmler.org | -- Mumon, "The Gateless Gate" > _______________________________________________ > Austin mailing list > Austin@mail.pm.org > http://mail.pm.org/mailman/listinfo/austin > From austin.pm at sam-i-am.com Fri Apr 23 12:47:53 2004 From: austin.pm at sam-i-am.com (Sam Foster) Date: Mon Aug 2 21:23:24 2004 Subject: APM: processing lots of files? In-Reply-To: <20040419201930.GA26249@debill.org> References: <4083E5F9.1040704@sam-i-am.com> <20040419171228.GA25971@debill.org> <40842009.4020704@sam-i-am.com> <20040419201930.GA26249@debill.org> Message-ID: <408956C9.3030307@sam-i-am.com> So I'm still working on this one. Just now I ran a script that crawled a directory structure to identify "empty" directory (directories that had only some boiler plate properties files and no actual data) that produced a list of around 5 thousand matches. It took a while. Now I've taken that list, split it into 4 and given each piece to a rmtree script. I did this by cutting and pasting the lines into new text files, and creating new command prompts to start each instance of my script. This gives me 4 seperate processes running in parallel each tackling a part of the task. What I'd like is a wrapper that does this for me. I give it the script filename, the filelist and perhaps the number of clones to create, and have it basically do the above for me. But system calls wait for the process to finish before continuing so I'm not sure how to achieve this. I've looked at some forking code but I'll admit to being a little daunted. I also looked at Parallel::Jobs on cpan and took a stab at use it without success - the child processes weren't terminating and nor did they seem to be running in parallel. any pointers? Sam From rainking at feeding.frenzy.com Fri Apr 23 15:05:03 2004 From: rainking at feeding.frenzy.com (Dennis Moore) Date: Mon Aug 2 21:23:24 2004 Subject: APM: processing lots of files? In-Reply-To: <408956C9.3030307@sam-i-am.com> References: <4083E5F9.1040704@sam-i-am.com> <20040419171228.GA25971@debill.org> <40842009.4020704@sam-i-am.com> <20040419201930.GA26249@debill.org> <408956C9.3030307@sam-i-am.com> Message-ID: <20040423200503.GA54835@feeding.frenzy.com> On Fri, Apr 23, 2004 at 12:47:53PM -0500, Sam Foster wrote: > So I'm still working on this one. > Just now I ran a script that crawled a directory structure to identify > "empty" directory (directories that had only some boiler plate > properties files and no actual data) that produced a list of around 5 > thousand matches. > It took a while. > Now I've taken that list, split it into 4 and given each piece to a > rmtree script. I did this by cutting and pasting the lines into new text > files, and creating new command prompts to start each instance of my > script. This gives me 4 seperate processes running in parallel each > tackling a part of the task. > > What I'd like is a wrapper that does this for me. I give it the script > filename, the filelist and perhaps the number of clones to create, and > have it basically do the above for me. > > But system calls wait for the process to finish before continuing so I'm > not sure how to achieve this. I've looked at some forking code but I'll > admit to being a little daunted. > > I also looked at Parallel::Jobs on cpan and took a stab at use it > without success - the child processes weren't terminating and nor did > they seem to be running in parallel. > > any pointers? http://hacks.dlux.hu/Parallel-ForkManager/ -- ;for (74,1970500640,1634627444,1751478816,1348825708,543711587, 1801810465){for($x=1<<1^1;$x>=1>>1;$x--) {$q=hex ff,$r=oct($x=~s,\d,$&* 10,e,$x),$x/=1/.1,$q<<=$r,$s.=chr (($_&$q)>>$r),$t++}}while($= ||= !$|) {$o=$o?$?:$/;$|=1;print $o?$s:$"x$t if$;;print"\b"x$t;sleep 1} From wwalker at bybent.com Fri Apr 23 20:38:33 2004 From: wwalker at bybent.com (Wayne Walker) Date: Mon Aug 2 21:23:24 2004 Subject: APM: processing lots of files? In-Reply-To: <4083E5F9.1040704@sam-i-am.com> References: <4083E5F9.1040704@sam-i-am.com> Message-ID: <20040424013833.GA1777@bybent.com> First, if you have the local disk space, then you should mirror the data, then parse it.walking directories on a net file system is slow. Rsync will allow you to mirror it once (SLOW) then mirror it again (much faster) as often as needed.. What is the maximum # of files/directories in any one directory? This has a large impact on performance, especially on networked disks. What is the size of the whole directory tree (in MBytes). On Mon, Apr 19, 2004 at 09:45:13AM -0500, Sam Foster wrote: > (preface: my perl is fairly poor, perhaps fair on a good day. These are > the kind of tasks I originally learnt perl for, but the sheer volume of > the data is challenging me) > > I'm currently working with a fairly large set of data that consists of a > deep filesystem directory structure, each directory having a > (java-style) properties text file, along with miscellaneous directory > contents. In addition there's an xml file for each that is our final > output for delivery to the client. > > I've got some data clean-up to do, verification, reporting, and > validation of the output against a schema. Lots of tree-crawling and > text file parsing in other words. I'm in need of some performance tips. > > There's about 30,000 individual properties files (and a cross-references > file in the same kind of format) - one for each directory. > Simply crawling the tree and parsing each properties file is taking a > while (an hour or more). Next up I need to fix some broken references > (the xrefs file contains references like so: relatedLinks = > [@/some/path/, @/someother/path] .) > After that I'll need to verify and validate some xml output. Again, one > file per directory. > > This data is on the local network, I'm working on a win2k box, having > mapped a network drive. My machine is running activestate perl 5.8, with > 1GB RAM, and a (single) 1600 mhz pentium processor. > > I've done a little benchmarking on parts of individual scripts, but I > need a order of magnitude speed increase, not shaving micro-seconds off > here and there. Any thoughts? > > I can attach a sample script if list protocol allows. > > thanks, > Sam > _______________________________________________ > Austin mailing list > Austin@mail.pm.org > http://mail.pm.org/mailman/listinfo/austin -- Wayne Walker wwalker@bybent.com Do you use Linux?! http://www.bybent.com Get Counted! http://counter.li.org/ Perl - http://www.perl.org/ Perl User Groups - http://www.pm.org/ Jabber IM: wwalker@jabber.phototropia.org AIM: lwwalkerbybent From chris at tooley.com Mon Apr 26 15:12:13 2004 From: chris at tooley.com (Chris Tooley) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Austin Geek Cruise Message-ID: <1083010333.6474.11.camel@localhost.localdomain> The other day my wife asked me if we could go on a cruise. Having never gotten to take honeymoon and with a father that owns a travel agency, I figured it was worth looking into. Turns out is a lot cheaper to do groups than individuals. This got me thinking about doing an Austin Geek Cruise. What transpired after talking to the travel agency for the day is something I wanted to propose to you all. I want to try to put this together for the fun of getting to take a trip, not for a profit. The trip would be a chartered bus from Austin to Galveston and back after the cruise. The rates are per person but there has to be two people in a cabin. If someone needs help with getting a cabin mate I'm sure that can be arranged. By all means bring that significant other. I'd never live through it if I didn't take my wife. It's a year out but for a group we have to start the process now. We have one of four of the speaker slots filled by Ray Ellis. He is going to speak about Aspect Oriented Programming. We are discussing arrangements with other speakers (no Mark, not woofers, or tweaters, or even mid range :)). If this is something people are interested in please reply to me directly. If I get no interest I promise I'll drop it. If it looks like it will work I'll probably expand it to other technology user groups in Austin or Central Texas. I'm not really opposed to people from outside the area joining in but it's a package price that's broken down here for the purposes of full disclosure. We need about 30 double occupancy cabins to make everything work. That's a decent sized group but it will be fun to take over a cruise ship. There's already been talk of putting together an insta-cluster to create the world's fastest floating supercomputer cluster. If you're interested please go here, and take a look at it: http://www.carsontravel.com/AustinGeekCruise/ Don't mind the horrible HTML, I stole a lot of it from Carnival. :) -- Chris Tooley Home From chris at tooley.com Tue Apr 27 15:37:18 2004 From: chris at tooley.com (Chris Tooley) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Austin Geek Cruise - Randal L Schwartz is coming In-Reply-To: <1083010333.6474.11.camel@localhost.localdomain> References: <1083010333.6474.11.camel@localhost.localdomain> Message-ID: <1083098238.25775.35.camel@ws017.ltsp> So I got it all worked, Randal Schwartz is going with us. He doesn't know what he's going to talk about just yet, either something Perl or something Photoshop. If you don't know who Randal Schwartz is, check out his site here: http://www.stonehenge.com/merlyn/ Suffice it to say, he's a Perl Hacker. Chris Tooley From austin.pm at sam-i-am.com Wed Apr 28 09:21:12 2004 From: austin.pm at sam-i-am.com (Sam Foster) Date: Mon Aug 2 21:23:24 2004 Subject: APM: processing lots of files? In-Reply-To: <20040424013833.GA1777@bybent.com> References: <4083E5F9.1040704@sam-i-am.com> <20040424013833.GA1777@bybent.com> Message-ID: <408FBDD8.8040000@sam-i-am.com> Wayne Walker wrote: > First, if you have the local disk space, then you should mirror the > data, then parse it.walking directories on a net file system is slow. I have the disk space, but not the time to mirror it. Though the rsynch tip is a good one and would mitigate this. So far I've used activestate's perlapp to make a executable of each script that I can drop on the server and run locally. That's really helped performance enormously. I'll be stumping up the $100 for their PDK I think. I also looked into Parallel::ForkManager and got some test scripts running, but I'll need to spend more time with this to get it to wrap my existing scripts, or adapt them to use it. > What is the maximum # of files/directories in any one directory? This > has a large impact on performance, especially on networked disks. > > What is the size of the whole directory tree (in MBytes). There's no more than 10-20 files per directory. The whole thing is about 3.5 GB, 16,000 individual directories (I've been cleaning. It used to be 29,000) The xml validation (against a schema) I handed off to a collegue who whipped up a .NET console app that is speedy and adequate for the task. thanks for all your help Sam From austin.pm at sam-i-am.com Wed Apr 28 12:55:56 2004 From: austin.pm at sam-i-am.com (Sam Foster) Date: Mon Aug 2 21:23:24 2004 Subject: APM: processing lots of files? In-Reply-To: <408FBDD8.8040000@sam-i-am.com> References: <4083E5F9.1040704@sam-i-am.com> <20040424013833.GA1777@bybent.com> <408FBDD8.8040000@sam-i-am.com> Message-ID: <408FF02C.5010809@sam-i-am.com> Sam Foster wrote: > I'll be stumping up the $100 for their > PDK I think. I mean $200. The PO has been approved already, this must signal good things for the economy when your employer actually buys you the software you need. Sam From goldilox at teachnet.edb.utexas.edu Wed Apr 28 19:12:11 2004 From: goldilox at teachnet.edb.utexas.edu (Goldilox) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Installing modules locally on a shared server Message-ID: I need to install some modules locally on a shared server. I do not have access to root and it is a pay service with no support, especially with this issue. I have figured out I need to add my local dir to @INC PERL5LIB=/path/to/my/perl-lib; export PERL5LIB; but now I try to run: perl -MCPAN -e shell and it basically tells me I am not root so I need to get the modules installed to: /path/to/my/perl-lib Can anyone point me to a tutorial? Thanks Rhett From tim at toolman.org Thu Apr 29 07:47:10 2004 From: tim at toolman.org (Tim Peoples) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Installing modules locally on a shared server In-Reply-To: References: Message-ID: <1083242830.15853.28.camel@localhost.localdomain> The FAQ section of "perldoc CPAN" says: 5) I am not root, how can I install a module in a per? sonal directory? You will most probably like something like this: o conf makepl_arg "LIB=~/myperl/lib \ INSTALLMAN1DIR=~/myperl/man/man1 \ INSTALLMAN3DIR=~/myperl/man/man3" install Sybase::Sybperl You can make this setting permanent like all "o conf" settings with "o conf commit". You will have to add ~/myperl/man to the MANPATH envi? ronment variable and also tell your perl programs to look into ~/myperl/lib, e.g. by including use lib "$ENV{HOME}/myperl/lib"; or setting the PERL5LIB environment variable. Another thing you should bear in mind is that the UNINST parameter should never be set if you are not root. Tim. On Wed, 2004-04-28 at 19:12, Goldilox wrote: > I need to install some modules locally on a shared server. I do not have access > to root and it is a pay service with no support, especially with this issue. I > have figured out I need to add my local dir to @INC > > PERL5LIB=/path/to/my/perl-lib; export PERL5LIB; > > but now I try to run: perl -MCPAN -e shell > > and it basically tells me I am not root > > so I need to get the modules installed to: /path/to/my/perl-lib > > Can anyone point me to a tutorial? > > Thanks > Rhett > > _______________________________________________ > Austin mailing list > Austin@mail.pm.org > http://mail.pm.org/mailman/listinfo/austin -- _______________________________________________________________________ Timothy E. Peoples Have Camel, Will Code tim@toolman.org From goldilox at teachnet.edb.utexas.edu Thu Apr 29 13:50:27 2004 From: goldilox at teachnet.edb.utexas.edu (Goldilox) Date: Mon Aug 2 21:23:24 2004 Subject: APM: Installing modules locally on a shared server In-Reply-To: <1083242830.15853.28.camel@localhost.localdomain> References: <1083242830.15853.28.camel@localhost.localdomain> Message-ID: I did read this, and I honestly get lost many times reading these types of documents when they assume a certain comfort level. I guess I confused myself by trying to search for other more specific instructions (like how to add items to the MANPATH env variable?) and do I want to do "o conf commit" if I will always be installing modules to my local area (I don't want to mess something else up in the process)? And I assume I would type it: >o conf commit >makepl_arg ... I'll see if I can do a little more research. Thanks for the feedback. Rhett Tim Peoples writes: > >The FAQ section of "perldoc CPAN" says: > > 5) I am not root, how can I install a module in a per? > sonal directory? > > You will most probably like something like this: > > o conf makepl_arg "LIB=~/myperl/lib \ > INSTALLMAN1DIR=~/myperl/man/man1 \ > INSTALLMAN3DIR=~/myperl/man/man3" > install Sybase::Sybperl > > You can make this setting permanent like all "o conf" > settings with "o conf commit". > > You will have to add ~/myperl/man to the MANPATH envi? > ronment variable and also tell your perl programs to > look into ~/myperl/lib, e.g. by including > > use lib "$ENV{HOME}/myperl/lib"; > > or setting the PERL5LIB environment variable. > > Another thing you should bear in mind is that the > UNINST parameter should never be set if you are not > root. > > >Tim. > > >On Wed, 2004-04-28 at 19:12, Goldilox wrote: >> I need to install some modules locally on a shared server. I do not have >access >> to root and it is a pay service with no support, especially with this issue. >I >> have figured out I need to add my local dir to @INC >> >> PERL5LIB=/path/to/my/perl-lib; export PERL5LIB; >> >> but now I try to run: perl -MCPAN -e shell >> >> and it basically tells me I am not root >> >> so I need to get the modules installed to: /path/to/my/perl-lib >> >> Can anyone point me to a tutorial? >> >> Thanks >> Rhett >> >> _______________________________________________ >> Austin mailing list >> Austin@mail.pm.org >> http://mail.pm.org/mailman/listinfo/austin >-- > _______________________________________________________________________ > Timothy E. Peoples > Have Camel, Will Code > tim@toolman.org > >