From dbii at mudpuddle.com  Sun Apr 11 18:23:37 2004
From: dbii at mudpuddle.com (David Bluestein II)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: April's Topic
Message-ID: <20040411232337.GF20627@mudpuddle.com>

Mark-

The web page for the meeting needs to be updated to show the April meeting date, it still shows the March one.

I know it'll be a great topic, as we did a discussion of points at last months dinner only meeting.

David

From mlehmann at marklehmann.com  Wed Apr 14 12:27:36 2004
From: mlehmann at marklehmann.com (Mark Lehmann)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Meeting next Wednesday 7:00pm Perl 6
Message-ID: <16509.29832.338360.170704@lehmbrain.marklehmann.com>


Our next meeting is Wednesday April 21st at 7:00pm.

The topic is "Perl 6."

Meeting place to be determined.  If you would like to recommend a place that
is convenient to get to, has a computer projector, an internet connection,
and free, please tell me.

-- 
Mark Lehmann
email mlehmann@marklehmann.com | phone 512 689-7705

From goldilox at teachnet.edb.utexas.edu  Sat Apr 17 02:01:22 2004
From: goldilox at teachnet.edb.utexas.edu (Goldilox)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: LWP Question
Message-ID: <fc.0003a9a0010a93d10003a9a0010a93d1.10a9410@teachnet.edb.utexas.edu>

I recently moved a website from a shared Windows box running Activestate Perl
(unknown version) to a different shared box running:
SERVER_SOFTWARE="Apache/1.3.29 (Unix) AuthMySQL/2.20 FrontPage/4.0.4.3
PHP-CGI/0.1b"
Perl Version: 5.008
(if that helps).
I was using LWP::Simple on the Activestate box for a script - and I never
bothered to get the version number of Perl for that box, but now, on this new
Unix box, I get this error message:
Can't locate LWP/Simple.pm in @INC 
(@INC contains: 
/usr/local/lib/perl5/5.8.0/i686-linux 
/usr/local/lib/perl5/5.8.0 
/usr/local/lib/perl5/site_perl/5.8.0/i686-linux 
/usr/local/lib/perl5/site_perl/5.8.0 
/usr/local/lib/perl5/site_perl .) at updatedata.pl line 4. 
BEGIN failed--compilation aborted at updatedata.pl line 4. 

I thought LWP was one of the standard modules installed with Perl. Am I missing
something really obvious here? And if LWP really isn't there, what's the
Internet API module most likely going to be?
Thanks for any help.
Rhett


From ian at remmler.org  Sat Apr 17 08:07:20 2004
From: ian at remmler.org (Ian Remmler)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Meeting next Wednesday 7:00pm Perl 6
In-Reply-To: <16509.29832.338360.170704@lehmbrain.marklehmann.com>
References: <16509.29832.338360.170704@lehmbrain.marklehmann.com>
Message-ID: <20040417130720.GA6321@remmler.org>

On Wed, Apr 14, 2004 at 12:27:36PM -0500, Mark Lehmann wrote:
> Meeting place to be determined.  If you would like to recommend a place that
> is convenient to get to, has a computer projector, an internet connection,
> and free, please tell me.

I found out at the CACTUS meeting on Thursday that we most likely
can meet at ARL.  I still need to speak with the person in charge
of reservations, but apparently they just want someone who works
there (which I conveniently do) to be at the meetings.  I don't
know if next Wednesday is doable, but I'll see what I can do.

-- 
Ian Remmler         |  A monk asked Joshu, "Has a dog Buddha
ian@remmler.org     |  nature or not?"  Joshu replied, "Mu!"
http://remmler.org  |      -- Mumon, "The Gateless Gate"

From peterbotros at yahoo.com  Sat Apr 17 15:49:50 2004
From: peterbotros at yahoo.com (Peter botros)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: FTP
In-Reply-To: <200402251800.i1PI06118684@mail.pm.org>
Message-ID: <20040417204950.9947.qmail@web20607.mail.yahoo.com>

am looking for ftp scripts to send evry 15 min if the
files are in a directory and/or to run to run scripts
if files are in a directory
Thanks 


=====
Peter Botros


__________________________________
Do you Yahoo!?
Yahoo! Photos: High-quality 4x6 digital prints for 25¢
http://photos.yahoo.com/ph/print_splash

From goldilox at teachnet.edb.utexas.edu  Sun Apr 18 00:12:30 2004
From: goldilox at teachnet.edb.utexas.edu (Goldilox)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: LWP Question
In-Reply-To: <20040417223026.49423.qmail@web20405.mail.yahoo.com>
References: <20040417223026.49423.qmail@web20405.mail.yahoo.com>
Message-ID: <fc.0003a9a0010aa0ed0003a9a0010a93d1.10aa0f7@teachnet.edb.utexas.edu>

Is there any other Internet API that comes standard with the Linux
distributions (since LWP is not standard - this being a shared box I have
refered to, I am not sure how easy it would be to get them to add the module I
need)?

Is there any way to find out what the default modules included in a
distribution are? I searched around, but I couldn't seem to find it in the
documentation - do I have to install it on my own box to find out?

Thanks again

Rhett

Bill Raty <bill_raty@yahoo.com> writes:
>LWP comes installed with the ActiveState distribution.  It
>doesn't always seem to be installed standard on many Linux
>distributions.
>
>Its easy enough to get-- fire up CPAN and have it install it.
>
>  perl -MCPAN -e shell  # to fire up cpan
>
>
>-Bill
>
>--- Goldilox <goldilox@teachnet.edb.utexas.edu> wrote:
>> I recently moved a website from a shared Windows box running
>> Activestate Perl
>> (unknown version) to a different shared box running:
>> SERVER_SOFTWARE="Apache/1.3.29 (Unix) AuthMySQL/2.20
>> FrontPage/4.0.4.3
>> PHP-CGI/0.1b"
>> Perl Version: 5.008
>> (if that helps).
>> I was using LWP::Simple on the Activestate box for a script -
>> and I never
>> bothered to get the version number of Perl for that box, but
>> now, on this new
>> Unix box, I get this error message:
>> Can't locate LWP/Simple.pm in @INC 
>> (@INC contains: 
>> /usr/local/lib/perl5/5.8.0/i686-linux 
>> /usr/local/lib/perl5/5.8.0 
>> /usr/local/lib/perl5/site_perl/5.8.0/i686-linux 
>> /usr/local/lib/perl5/site_perl/5.8.0 
>> /usr/local/lib/perl5/site_perl .) at updatedata.pl line 4. 
>> BEGIN failed--compilation aborted at updatedata.pl line 4. 
>> 
>> I thought LWP was one of the standard modules installed with
>> Perl. Am I missing
>> something really obvious here? And if LWP really isn't there,
>> what's the
>> Internet API module most likely going to be?
>> Thanks for any help.
>> Rhett
>> 
>> _______________________________________________
>> Austin mailing list
>> Austin@mail.pm.org
>> http://mail.pm.org/mailman/listinfo/austin
>
>
>=====
>Let's not elect Bush in '04 either.


From msouth at shodor.org  Sun Apr 18 00:35:41 2004
From: msouth at shodor.org (Mike South)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: FTP
In-Reply-To: <20040417204950.9947.qmail@web20607.mail.yahoo.com>
References: <20040417204950.9947.qmail@web20607.mail.yahoo.com>
Message-ID: <408213AD.mail9SZ1346SA@scan.shodor.org>

>From austin-bounces@mail.pm.org  Sat Apr 17 16:50:01 2004
>Date: Sat, 17 Apr 2004 13:49:50 -0700 (PDT)
>From: Peter botros <peterbotros@yahoo.com>
>Subject: APM: FTP
>
>am looking for ftp scripts to send evry 15 min if the
>files are in a directory and/or to run to run scripts
>if files are in a directory

I am assuming you are wanting something that "reacts",
so to speak, to files being in a directory, and, when
it sees some, does something to them (where "something"
includes transferring them out of the directory so that
they don't trigger a re-run or otherwise hang around
in the way).

We have to do something like that, and maybe I can
save you some headaches by describing what I think
we do (I haven't seen it firsthand, just know the
description).

First, for the "every fifteen minutes" part, we use
cron.  That way you can just have a script that
does whatever is supposed to be done with the files,
and you won't have to write the "every fifteen 
minutes" part, or make sure it gets started again
when the system reboots, or whatever.

Second, we have a lockfile that prevents two instances
of the script from running at the same time.  One 
day, you might have so much going on that your script
isn't done in fifteen minutes, and then cron fires off
another run of your script and all hell breaks loose
as they both try to work on the same files.

Third, we don't look for "files in the directory", but
"a trigger file in that directory".  The trigger file
lists all the files that are to be processed.  The point
here is that the trigger file gets transferred into
the directory with the other files, but it gets transferred
last.  Why?  Because sooner or later your "every fifteen
minutes" is going to wake the script up right in the middle
of a file getting dumped into the directory, and then 
you'll do your work on half a file.

So, something like this goes in your crontab:

0-59/15 * * * * /home/msouth/bin/handle_files.pl

handle_files.pl would be something like this:

# UNTESTED UNTESTED UNTESTED
#!/usr/bin/perl -w
use strict;

# put a lockfile in the same place as us named
# same thing as us with '.lock' appended
my @files_to_unlink;

my $lockfile = $0 . '.lock';
if (-e $lockfile ) {
    warn "a lockfile exists, I'm not running\n";
    exit;
    # would be better to see if the file just didn't get
    # cleaned up, and wipe it out if that's the case.
    #
    # you can probably "kill 0, PID" or something to see
    # if the PID that put the lockfile there is still
    # running, and just wipe out the file if it isn't
    # (that is, if you put the PID in the lockfile)

    # Also, in real life you will probably have to
    # keep the lockfile somewhere else, because
    # the directory where the script lives is likely not
    # to be writeable
}
else { 
    open(LOCK, ">$lockfile") or die "couldn't open lockfile:$!\n";

    # put the PID in the lockfile so future instances of 
    # this script can check whether we are still running
    print LOCK "$$\n";

    close LOCK;
    push @files_to_unlink, $lockfile ;
}

my $dir = '/home/msouth/dump';

my $trigger = "$dir/trigger.txt";
&cleanup_and_exit unless ( -e $trigger );

open (TRIGGER, "<$trigger") or die "couldn't open $trigger:$!\n";
chomp( my @lines = <TRIGGER> );
close TRIGGER;

my $saw_end = 0;
foreach my $line (reverse @lines) {
    if ($line eq 'END_FILES') {
        $saw_end++;
        last;
    }
}

unless ($saw_end) {
    warn qq{trigger file $trigger is missing "END_FILES" line.  I am bailing, hopefully it's still being transferred\n};
    &cleanup_and_exit;
}

shift @lines while $lines[0] ne 'BEGIN_FILES';

unless (@lines) {
    warn "trigger file $trigger does not have 'BEGIN_FILES', this is not good\n";
    &cleanup_and_exit;
}

shift @lines; # $lines[0] is just 'BEGIN_FILES', remember
    
foreach my $line (@lines) {
    next if $line =~ /^\s*#/;
    last if $line eq 'END_FILES';
    my $this_file = "$dir/$line";
    &process_file($this_file);
    push @files_to_unlink, $this_file;
}
push @files_to_unlink, $trigger;

&cleanup_and_exit;

sub process_file {
    my $file = shift;
    if (system "cat $file >> /home/msouth/dump/all_dumped_files") {
        warn "$file didn't process\n";
        # cp file to error directory
    }
    else {
        # cp file to success directory
    }
}

sub cleanup_and_exit {
    unlink $_ for @files_to_unlink;
    exit(0);
}
__END__

Then you can use a trigger file like this;

BEGIN_FILES
yo
ya
ye
END_FILES

good luck,

mike

From dbii at mudpuddle.com  Mon Apr 19 00:35:43 2004
From: dbii at mudpuddle.com (David Bluestein II)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Meeting next Wednesday 7:00pm Perl 6
In-Reply-To: <20040417130720.GA6321@remmler.org>
References: <16509.29832.338360.170704@lehmbrain.marklehmann.com>
	<20040417130720.GA6321@remmler.org>
Message-ID: <20040419053543.GA20627@mudpuddle.com>

I suggest ServerGraph if we can still meet there. Since it is up on the site as ServerGraph, changing it for April may lead some people to go to the wrong location. Then we can always check on moving the meeting in May (though I like the current location).

Also, changing in April means I don't know where dinner hour would be :(

David

On Sat, Apr 17, 2004 at 08:07:20AM -0500, Ian Remmler wrote:
> On Wed, Apr 14, 2004 at 12:27:36PM -0500, Mark Lehmann wrote:
> > Meeting place to be determined.  If you would like to recommend a place that
> > is convenient to get to, has a computer projector, an internet connection,
> > and free, please tell me.
> 
> I found out at the CACTUS meeting on Thursday that we most likely
> can meet at ARL.  I still need to speak with the person in charge
> of reservations, but apparently they just want someone who works
> there (which I conveniently do) to be at the meetings.  I don't
> know if next Wednesday is doable, but I'll see what I can do.
> 
> -- 
> Ian Remmler         |  A monk asked Joshu, "Has a dog Buddha
> ian@remmler.org     |  nature or not?"  Joshu replied, "Mu!"
> http://remmler.org  |      -- Mumon, "The Gateless Gate"
> _______________________________________________
> Austin mailing list
> Austin@mail.pm.org
> http://mail.pm.org/mailman/listinfo/austin

From austin.pm at sam-i-am.com  Mon Apr 19 09:45:13 2004
From: austin.pm at sam-i-am.com (Sam Foster)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: processing lots of files? 
Message-ID: <4083E5F9.1040704@sam-i-am.com>

(preface: my perl is fairly poor, perhaps fair on a good day. These are 
the kind of tasks I originally learnt perl for, but the sheer volume of 
the data is challenging me)

I'm currently working with a fairly large set of data that consists of a 
deep filesystem directory structure, each directory having a 
(java-style) properties text file, along with miscellaneous directory 
contents. In addition there's an xml file for each that is our final 
output for delivery to the client.

I've got some data clean-up to do, verification, reporting, and 
validation of the output against a schema. Lots of tree-crawling and 
text file parsing in other words. I'm in need of some performance tips.

There's about 30,000 individual properties files (and a cross-references 
file in the same kind of format) - one for each directory.
Simply crawling the tree and parsing each properties file is taking a 
while (an hour or more). Next up I need to fix some broken references 
(the xrefs file contains references like so: relatedLinks = 
[@/some/path/, @/someother/path] .)
After that I'll need to verify and validate some xml output. Again, one 
file per directory.

This data is on the local network, I'm working on a win2k box, having 
mapped a network drive. My machine is running activestate perl 5.8, with 
1GB RAM, and a (single) 1600 mhz pentium processor.

I've done a little benchmarking on parts of individual scripts, but I 
need a order of magnitude speed increase, not shaving micro-seconds off 
here and there. Any thoughts?

I can attach a sample script if list protocol allows.

thanks,
Sam

From jakulas at swbell.net  Mon Apr 19 12:02:17 2004
From: jakulas at swbell.net (John Kulas)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: re: LWP Question
Message-ID: <20040419170217.59461.qmail@web80603.mail.yahoo.com>

LWP is not part of the base Perl installation. LWP is
a Perl add-on package. It is often added on because it
is so useful. I recommend you asking your sysadmin to
add it to your new server.
 - John Kulas

From erik at debill.org  Mon Apr 19 12:12:28 2004
From: erik at debill.org (erik@debill.org)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: processing lots of files?
In-Reply-To: <4083E5F9.1040704@sam-i-am.com>
References: <4083E5F9.1040704@sam-i-am.com>
Message-ID: <20040419171228.GA25971@debill.org>

On Mon, Apr 19, 2004 at 09:45:13AM -0500, Sam Foster wrote:
> I'm currently working with a fairly large set of data that consists of a 
> deep filesystem directory structure, each directory having a 
> (java-style) properties text file, along with miscellaneous directory 
> contents. In addition there's an xml file for each that is our final 
> output for delivery to the client.
> 
> I've got some data clean-up to do, verification, reporting, and 
> validation of the output against a schema. Lots of tree-crawling and 
> text file parsing in other words. I'm in need of some performance tips.

I'd start by processing each directory completely before moving on to
the next one, if at all possible.  Directory lookups on network
filesystems can be surprisingly expensive, so doing everything in a
single pass may be a win.

Any chance of getting the files locally instead of via the network?
I'm assuming SMB, if it was NFS I might be able to suggest some mount
parameters to speed it up, but nothing beats a local disk.


> There's about 30,000 individual properties files (and a cross-references 
> file in the same kind of format) - one for each directory.

How deeply does this structure go?  Some filesystems get bogged down
when there are 1000s of files in a single directory.  If all of these
30k directories are within a single parent directory just getting a
list of them could be a serious slowdown.  On Linux I try to avoid
having more than a few hundred files in a directory if at all possible.


> Simply crawling the tree and parsing each properties file is taking a 
> while (an hour or more). Next up I need to fix some broken references 

30000/ 3600 = 8.3 files/sec.  Not exactly blazing, but not incredibly
slow either.


> (the xrefs file contains references like so: relatedLinks = 
> [@/some/path/, @/someother/path] .)
> After that I'll need to verify and validate some xml output. Again, one 
> file per directory.

Does this mean you can't parallelize this?  I suspect your script is
spending a fair amount of time waiting for data.  Running 2 copies in
parallel each on its own subset of the directories might be a win
(even with only a single processor to work with).


Erik

-- 
Humor soothes the savage cubicle monkey.
   -- J Jacques

From austin.pm at sam-i-am.com  Mon Apr 19 13:52:57 2004
From: austin.pm at sam-i-am.com (Sam Foster)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: processing lots of files?
In-Reply-To: <20040419171228.GA25971@debill.org>
References: <4083E5F9.1040704@sam-i-am.com> <20040419171228.GA25971@debill.org>
Message-ID: <40842009.4020704@sam-i-am.com>

erik@debill.org wrote:
> I'd start by processing each directory completely before moving on to
> the next one, if at all possible.  Directory lookups on network
> filesystems can be surprisingly expensive, so doing everything in a
> single pass may be a win.

I'm using File::Find, which I think does this by default.

> Any chance of getting the files locally instead of via the network?
> I'm assuming SMB, if it was NFS I might be able to suggest some mount
> parameters to speed it up, but nothing beats a local disk.

There's about 3-4 GB of data, that is being worked on collaboratively by 
a distributed team, so moving it isn't an option unfortunately. However, 
it is NFS... so what you got?

>>There's about 30,000 individual properties files (and a cross-references 
>>file in the same kind of format) - one for each directory.
> 
> How deeply does this structure go?  Some filesystems get bogged down
> when there are 1000s of files in a single directory.  If all of these
> 30k directories are within a single parent directory just getting a
> list of them could be a serious slowdown.  On Linux I try to avoid
> having more than a few hundred files in a directory if at all possible.

I have only 5-10 files in each directory. I'm using the pre-processing 
that File::Find offers to only visit the positive matches (FWIW)

>>Simply crawling the tree and parsing each properties file is taking a 
>>while (an hour or more). Next up I need to fix some broken references 
> 
> 30000/ 3600 = 8.3 files/sec.  Not exactly blazing, but not incredibly
> slow either.

I just tried benchmarking one of my scripts again (I called my &find 
from Benchmark::timeit) with a limited dataset, and got:

72 wallclock secs ( 0.21 usr +  1.24 sys =  1.45 CPU) @  3.
45/s (n=5)

which was after parsing just 160 files. 2.22 files/sec. Not so stellar 
after all. I'll dig in to the module that's doing the parsing and see if 
there's an obvious culprit there. (starting with the bits I wrote :)

 >  I suspect your script is
> spending a fair amount of time waiting for data.  Running 2 copies in
> parallel each on its own subset of the directories might be a win
> (even with only a single processor to work with).

I didn't think of dividing up the directory list and simply running the 
same script again in parallel. I'll try that. Would forking achieve the 
same thing, or am I introducing unnecessary complexity?

thanks, this was a help,

Sam


From erik at debill.org  Mon Apr 19 15:19:30 2004
From: erik at debill.org (erik@debill.org)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: processing lots of files?
In-Reply-To: <40842009.4020704@sam-i-am.com>
References: <4083E5F9.1040704@sam-i-am.com> <20040419171228.GA25971@debill.org>
	<40842009.4020704@sam-i-am.com>
Message-ID: <20040419201930.GA26249@debill.org>

On Mon, Apr 19, 2004 at 01:52:57PM -0500, Sam Foster wrote:
> erik@debill.org wrote:
> >filesystems can be surprisingly expensive, so doing everything in a
> >single pass may be a win.
> 
> I'm using File::Find, which I think does this by default.

Ah.  I'd assumed you were running that once for each step.  As long as
you only run it once you're good.


> >Any chance of getting the files locally instead of via the network?
> >I'm assuming SMB, if it was NFS I might be able to suggest some mount
> >parameters to speed it up, but nothing beats a local disk.
> 
> There's about 3-4 GB of data, that is being worked on collaboratively by 
> a distributed team, so moving it isn't an option unfortunately. However, 
> it is NFS... so what you got?

I'm not sure what the exact options would be for NT, but you want to
use tcp (instead of udp, which is a default lots of places), and crank
the block size up.

I use tcp,rsize=16000,wsize=16000 at home.

Even larger block sizes are perfectly legit (I believe some companies
default to 64000) and large sizes can save on the number of requests
needed to transfer your data (as well as cutting down on actual read
requests that get to the physical disks).

Also, if you aren't defaulting to an async mount you might try that.
I'm not sure how it interacts with NFS (for all I know they're always
async) but it's usually a big throughput win to not wait for your
writes to complete.

> >>Simply crawling the tree and parsing each properties file is taking a 
> >>while (an hour or more). Next up I need to fix some broken references 
> >
> >30000/ 3600 = 8.3 files/sec.  Not exactly blazing, but not incredibly
> >slow either.
> 
> I just tried benchmarking one of my scripts again (I called my &find 
> from Benchmark::timeit) with a limited dataset, and got:
> 
> 72 wallclock secs ( 0.21 usr +  1.24 sys =  1.45 CPU) @  3.
> 45/s (n=5)
> 
> which was after parsing just 160 files. 2.22 files/sec. Not so stellar 
> after all. I'll dig in to the module that's doing the parsing and see if 
> there's an obvious culprit there. (starting with the bits I wrote :)

72 wall clock and only 1.45 CPU?  Sounds like it's all IO wait.  The
good news is there's bound to be a way to make that go a lot faster :)

Does it slow down as it handles more and more files?  Is memory use
growing?  If your workstation goes into swap that would definitely
cause a slowdown.


> >parallel each on its own subset of the directories might be a win
> >(even with only a single processor to work with).
> 
> I didn't think of dividing up the directory list and simply running the 
> same script again in parallel. I'll try that. Would forking achieve the 
> same thing, or am I introducing unnecessary complexity?

You could have the script fork a set number of times right at the
beginning.  You just need a way for each process to figure out what
directories are its responsibility (even if it's "I only do odd
numbered directories").  Easy to do if your directory names are
relatively stable and predictable.  I wouldn't modify the function
that File::Find calls to fork(), since that's liable to make a fork
bomb.


> thanks, this was a help,

Glad to help.  Just let us know how things turn out.


Erik
-- 
Humor soothes the savage cubicle monkey.
   -- J Jacques

From dbii at mudpuddle.com  Mon Apr 19 16:26:10 2004
From: dbii at mudpuddle.com (David Bluestein II)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Simple BLOG w/RSS
Message-ID: <20040419212610.GM20627@mudpuddle.com>

Okay, I need to find a SIMPLE blog application, with RSS attached. Something for 3-4 people to use, post updated information (like a bulletin board), but with RSS so people can get the feeds if they want and know when it is updated.

Any Perl based suggestions? I've looked over Blosxom (www.blosxom.com) and it looks like it fits the bill, but didn't know if anyone else had worked with a simple system (don't want a lot of overhead that comes with Movable Type) to setup and use.

David


From jakulas at swbell.net  Tue Apr 20 12:27:48 2004
From: jakulas at swbell.net (John Kulas)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Re: Simple BLOG w/RSS
In-Reply-To: <200404201700.i3KH05r32031@mail.pm.org>
Message-ID: <20040420172748.86587.qmail@web80604.mail.yahoo.com>

How about Twiki? See http://www.twiki.org/.
Automatic text search, simple organization, access
restriction if you want it, etc.
 - John Kulas

From mlehmann at marklehmann.com  Tue Apr 20 16:11:43 2004
From: mlehmann at marklehmann.com (Mark Lehmann)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Meeting tomorrow night at ServerGraph
Message-ID: <16517.37391.258192.582099@lehmbrain.marklehmann.com>


We are going to have the Perl Mongers meeting at ServerGraph tomorrow night
at 7:00pm.

As normal, we will be eating at the Pok-e-Jo's a block down 5th street from
ServerGraph at 6:00pm.

Please see the APM website (http://austin.pm.org/) for directions to ServerGraph.

-- 
Mark Lehmann
email mlehmann@marklehmann.com | phone 512 689-7705

From dbii at mudpuddle.com  Wed Apr 21 08:54:14 2004
From: dbii at mudpuddle.com (David Bluestein II)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Re: Simple BLOG w/RSS
In-Reply-To: <20040420172748.86587.qmail@web80604.mail.yahoo.com>
References: <200404201700.i3KH05r32031@mail.pm.org>
	<20040420172748.86587.qmail@web80604.mail.yahoo.com>
Message-ID: <20040421135414.GQ20627@mudpuddle.com>

John-

I've used Twiki before, but it doesn't quite meet the need. While simple, it is more difficult than the targetted end user (I'm not sure they would get Wikiwords) and we need something that is easy for them to put sequential text in. Also need a really good RSS feed mechanism. I looked at the Perl Module Kwiki too, but seemed to have too much extra that I didn't need.

Thanks-

David

On Tue, Apr 20, 2004 at 10:27:48AM -0700, John Kulas wrote:
> How about Twiki? See http://www.twiki.org/.
> Automatic text search, simple organization, access
> restriction if you want it, etc.
>  - John Kulas
> _______________________________________________
> Austin mailing list
> Austin@mail.pm.org
> http://mail.pm.org/mailman/listinfo/austin

From ian at remmler.org  Fri Apr 23 09:07:22 2004
From: ian at remmler.org (Ian Remmler)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Tom Christiansen's pop in
Message-ID: <20040423140722.GA18202@remmler.org>

Here's a link to Tom's rare pop in on the perl6.language list
that I mentioned at the meeting.  Some of the other messages in
the thread are equally, er, impressive...

http://tinyurl.com/2jfun

-- 
Ian Remmler         |  A monk asked Joshu, "Has a dog Buddha
ian@remmler.org     |  nature or not?"  Joshu replied, "Mu!"
http://remmler.org  |      -- Mumon, "The Gateless Gate"

From ian at remmler.org  Fri Apr 23 09:15:59 2004
From: ian at remmler.org (Ian Remmler)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: ARL is a go
Message-ID: <20040423141559.GB18202@remmler.org>

I've scheduled the auditorium for Wednesday, May 19 from 7:00 to
9:00.  We will have access to a projector and an ethernet port.
It may be possible for someone to bring a wireless router and
hook it up, but I'll have to check.

-- 
Ian Remmler         |  A monk asked Joshu, "Has a dog Buddha
ian@remmler.org     |  nature or not?"  Joshu replied, "Mu!"
http://remmler.org  |      -- Mumon, "The Gateless Gate"

From eharris at puremagic.com  Fri Apr 23 09:57:50 2004
From: eharris at puremagic.com (Evan Harris)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: ARL is a go
In-Reply-To: <20040423141559.GB18202@remmler.org>
Message-ID: <Pine.LNX.4.44.0404230956370.11552-100000@kinison.puremagic.com>


Bah, who needs a wireless router?  My notebook does a fine job of being an
access point.

Evan


On Fri, 23 Apr 2004, Ian Remmler wrote:

> I've scheduled the auditorium for Wednesday, May 19 from 7:00 to
> 9:00.  We will have access to a projector and an ethernet port.
> It may be possible for someone to bring a wireless router and
> hook it up, but I'll have to check.
>
> --
> Ian Remmler         |  A monk asked Joshu, "Has a dog Buddha
> ian@remmler.org     |  nature or not?"  Joshu replied, "Mu!"
> http://remmler.org  |      -- Mumon, "The Gateless Gate"
> _______________________________________________
> Austin mailing list
> Austin@mail.pm.org
> http://mail.pm.org/mailman/listinfo/austin
>


From austin.pm at sam-i-am.com  Fri Apr 23 12:47:53 2004
From: austin.pm at sam-i-am.com (Sam Foster)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: processing lots of files?
In-Reply-To: <20040419201930.GA26249@debill.org>
References: <4083E5F9.1040704@sam-i-am.com>
	<20040419171228.GA25971@debill.org>	<40842009.4020704@sam-i-am.com>
	<20040419201930.GA26249@debill.org>
Message-ID: <408956C9.3030307@sam-i-am.com>

So I'm still working on this one.
Just now I ran a script that crawled a directory structure to identify 
"empty" directory (directories that had only some boiler plate 
properties files and no actual data) that produced a list of around 5 
thousand matches.
It took a while.
Now I've taken that list, split it into 4 and given each piece to a 
rmtree script. I did this by cutting and pasting the lines into new text 
files, and creating new command prompts to start each instance of my 
script. This gives me 4 seperate processes running in parallel each 
tackling a part of the task.

What I'd like is a wrapper that does this for me. I give it the script 
filename, the filelist and perhaps the number of clones to create, and 
have it basically do the above for me.

But system calls wait for the process to finish before continuing so I'm 
not sure how to achieve this. I've looked at some forking code but I'll 
admit to being a little daunted.

I also looked at Parallel::Jobs on cpan and took a stab at use it 
without success - the child processes weren't terminating and nor did 
they seem to be running in parallel.

any pointers?

Sam

From rainking at feeding.frenzy.com  Fri Apr 23 15:05:03 2004
From: rainking at feeding.frenzy.com (Dennis Moore)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: processing lots of files?
In-Reply-To: <408956C9.3030307@sam-i-am.com>
References: <4083E5F9.1040704@sam-i-am.com> <20040419171228.GA25971@debill.org>
	<40842009.4020704@sam-i-am.com> <20040419201930.GA26249@debill.org>
	<408956C9.3030307@sam-i-am.com>
Message-ID: <20040423200503.GA54835@feeding.frenzy.com>

On Fri, Apr 23, 2004 at 12:47:53PM -0500, Sam Foster wrote:
> So I'm still working on this one.
> Just now I ran a script that crawled a directory structure to identify 
> "empty" directory (directories that had only some boiler plate 
> properties files and no actual data) that produced a list of around 5 
> thousand matches.
> It took a while.
> Now I've taken that list, split it into 4 and given each piece to a 
> rmtree script. I did this by cutting and pasting the lines into new text 
> files, and creating new command prompts to start each instance of my 
> script. This gives me 4 seperate processes running in parallel each 
> tackling a part of the task.
> 
> What I'd like is a wrapper that does this for me. I give it the script 
> filename, the filelist and perhaps the number of clones to create, and 
> have it basically do the above for me.
> 
> But system calls wait for the process to finish before continuing so I'm 
> not sure how to achieve this. I've looked at some forking code but I'll 
> admit to being a little daunted.
> 
> I also looked at Parallel::Jobs on cpan and took a stab at use it 
> without success - the child processes weren't terminating and nor did 
> they seem to be running in parallel.


> 
> any pointers?

http://hacks.dlux.hu/Parallel-ForkManager/

-- 
<BLINK> ;for (74,1970500640,1634627444,1751478816,1348825708,543711587,
1801810465){for($x=1<<1^1;$x>=1>>1;$x--) {$q=hex ff,$r=oct($x=~s,\d,$&*
10,e,$x),$x/=1/.1,$q<<=$r,$s.=chr (($_&$q)>>$r),$t++}}while($= ||= !$|)
{$o=$o?$?:$/;$|=1;print $o?$s:$"x$t if$;;print"\b"x$t;sleep 1} </BLINK>

From wwalker at bybent.com  Fri Apr 23 20:38:33 2004
From: wwalker at bybent.com (Wayne Walker)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: processing lots of files?
In-Reply-To: <4083E5F9.1040704@sam-i-am.com>
References: <4083E5F9.1040704@sam-i-am.com>
Message-ID: <20040424013833.GA1777@bybent.com>

First, if you have the local disk space, then you should mirror the
data, then parse it.walking directories on a net file system is slow.

Rsync will allow you to mirror it once (SLOW) then mirror it again (much
faster) as often as needed..

What is the maximum # of files/directories in any one directory?  This
has a large impact on performance, especially on networked disks.

What is the size of the whole directory tree (in MBytes).

On Mon, Apr 19, 2004 at 09:45:13AM -0500, Sam Foster wrote:
> (preface: my perl is fairly poor, perhaps fair on a good day. These are 
> the kind of tasks I originally learnt perl for, but the sheer volume of 
> the data is challenging me)
> 
> I'm currently working with a fairly large set of data that consists of a 
> deep filesystem directory structure, each directory having a 
> (java-style) properties text file, along with miscellaneous directory 
> contents. In addition there's an xml file for each that is our final 
> output for delivery to the client.
> 
> I've got some data clean-up to do, verification, reporting, and 
> validation of the output against a schema. Lots of tree-crawling and 
> text file parsing in other words. I'm in need of some performance tips.
> 
> There's about 30,000 individual properties files (and a cross-references 
> file in the same kind of format) - one for each directory.
> Simply crawling the tree and parsing each properties file is taking a 
> while (an hour or more). Next up I need to fix some broken references 
> (the xrefs file contains references like so: relatedLinks = 
> [@/some/path/, @/someother/path] .)
> After that I'll need to verify and validate some xml output. Again, one 
> file per directory.
> 
> This data is on the local network, I'm working on a win2k box, having 
> mapped a network drive. My machine is running activestate perl 5.8, with 
> 1GB RAM, and a (single) 1600 mhz pentium processor.
> 
> I've done a little benchmarking on parts of individual scripts, but I 
> need a order of magnitude speed increase, not shaving micro-seconds off 
> here and there. Any thoughts?
> 
> I can attach a sample script if list protocol allows.
> 
> thanks,
> Sam
> _______________________________________________
> Austin mailing list
> Austin@mail.pm.org
> http://mail.pm.org/mailman/listinfo/austin

-- 

Wayne Walker
wwalker@bybent.com                 Do you use Linux?!
http://www.bybent.com              Get Counted!  http://counter.li.org/
Perl - http://www.perl.org/        Perl User Groups - http://www.pm.org/
Jabber IM:  wwalker@jabber.phototropia.org       AIM:     lwwalkerbybent

From chris at tooley.com  Mon Apr 26 15:12:13 2004
From: chris at tooley.com (Chris Tooley)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Austin Geek Cruise
Message-ID: <1083010333.6474.11.camel@localhost.localdomain>

The other day my wife asked me if we could go on a cruise.  Having never
gotten to take honeymoon and with a father that owns a travel agency, I
figured it was worth looking into.  Turns out is a lot cheaper to do
groups than individuals.  This got me thinking about doing an Austin
Geek Cruise.  What transpired after talking to the travel agency for the
day is something I wanted to propose to you all.  I want to try to put
this together for the fun of getting to take a trip, not for a profit.

The trip would be a chartered bus from Austin to Galveston and back
after the cruise.  The rates are per person but there has to be two
people in a cabin.  If someone needs help with getting a cabin mate I'm
sure that can be arranged.  By all means bring that significant other. 
I'd never live through it if I didn't take my wife.  It's a year out but
for a group we have to start the process now.

We have one of four of the speaker slots filled by Ray Ellis.  He is
going to speak about Aspect Oriented Programming.  We are discussing
arrangements with other speakers (no Mark, not woofers, or tweaters, or
even mid range :)).

If this is something people are interested in please reply to me
directly.  If I get no interest I promise I'll drop it.  If it looks
like it will work I'll probably expand it to other technology user
groups in Austin or Central Texas.  I'm not really opposed to people
from outside the area joining in but it's a package price that's broken
down here for the purposes of full disclosure.  We need about 30 double
occupancy cabins to make everything work.  That's a decent sized group
but it will be fun to take over a cruise ship.

There's already been talk of putting together an insta-cluster to create
the world's fastest floating supercomputer cluster.

If you're interested please go here, and take a look at it:

http://www.carsontravel.com/AustinGeekCruise/

Don't mind the horrible HTML, I stole a lot of it from Carnival. :)
-- 
Chris Tooley <chris@tooley.com>
Home


From chris at tooley.com  Tue Apr 27 15:37:18 2004
From: chris at tooley.com (Chris Tooley)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Austin Geek Cruise - Randal L Schwartz is coming
In-Reply-To: <1083010333.6474.11.camel@localhost.localdomain>
References: <1083010333.6474.11.camel@localhost.localdomain>
Message-ID: <1083098238.25775.35.camel@ws017.ltsp>

So I got it all worked, Randal Schwartz is going with us.  He doesn't
know what he's going to talk about just yet, either something Perl or
something Photoshop.

If you don't know who Randal Schwartz is, check out his site here:
http://www.stonehenge.com/merlyn/

Suffice it to say, he's a Perl Hacker.

Chris Tooley


From austin.pm at sam-i-am.com  Wed Apr 28 09:21:12 2004
From: austin.pm at sam-i-am.com (Sam Foster)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: processing lots of files?
In-Reply-To: <20040424013833.GA1777@bybent.com>
References: <4083E5F9.1040704@sam-i-am.com> <20040424013833.GA1777@bybent.com>
Message-ID: <408FBDD8.8040000@sam-i-am.com>

Wayne Walker wrote:
> First, if you have the local disk space, then you should mirror the
> data, then parse it.walking directories on a net file system is slow.

I have the disk space, but not the time to mirror it. Though the rsynch 
tip is a good one and would mitigate this.

So far I've used activestate's perlapp to make a executable of each 
script that I can drop on the server and run locally. That's really 
helped performance enormously. I'll be stumping up the $100 for their 
PDK I think.

I also looked into Parallel::ForkManager and got some test scripts 
running, but I'll need to spend more time with this to get it to wrap my 
existing scripts, or adapt them to use it.


> What is the maximum # of files/directories in any one directory?  This
> has a large impact on performance, especially on networked disks.
> 
> What is the size of the whole directory tree (in MBytes).

There's no more than 10-20 files per directory. The whole thing is about 
3.5 GB, 16,000 individual directories (I've been cleaning. It used to be 
29,000)

The xml validation (against a schema) I handed off to a collegue who 
whipped up a .NET console app that is speedy and adequate for the task.

thanks for all your help
Sam

From austin.pm at sam-i-am.com  Wed Apr 28 12:55:56 2004
From: austin.pm at sam-i-am.com (Sam Foster)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: processing lots of files?
In-Reply-To: <408FBDD8.8040000@sam-i-am.com>
References: <4083E5F9.1040704@sam-i-am.com> <20040424013833.GA1777@bybent.com>
	<408FBDD8.8040000@sam-i-am.com>
Message-ID: <408FF02C.5010809@sam-i-am.com>

Sam Foster wrote:

> I'll be stumping up the $100 for their 
> PDK I think.

I mean $200. The PO has been approved already, this must signal good 
things for the economy when your employer actually buys you the software 
you need.

Sam

From goldilox at teachnet.edb.utexas.edu  Wed Apr 28 19:12:11 2004
From: goldilox at teachnet.edb.utexas.edu (Goldilox)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Installing modules locally on a shared server
Message-ID: <fc.0003a9a0010c50000003a9a0010c5000.10c5015@teachnet.edb.utexas.edu>

I need to install some modules locally on a shared server. I do not have access
to root and it is a pay service with no support, especially with this issue. I
have figured out I need to add my local dir to @INC

PERL5LIB=/path/to/my/perl-lib; export PERL5LIB;

but now I try to run: perl -MCPAN -e shell

and it basically tells me I am not root

so I need to get the modules installed to: /path/to/my/perl-lib

Can anyone point me to a tutorial?

Thanks
Rhett


From tim at toolman.org  Thu Apr 29 07:47:10 2004
From: tim at toolman.org (Tim Peoples)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Installing modules locally on a shared server
In-Reply-To: <fc.0003a9a0010c50000003a9a0010c5000.10c5015@teachnet.edb.utexas.edu>
References: <fc.0003a9a0010c50000003a9a0010c5000.10c5015@teachnet.edb.utexas.edu>
Message-ID: <1083242830.15853.28.camel@localhost.localdomain>


The FAQ section of "perldoc CPAN" says:

       5)  I am not root, how can I install a module in a per?
           sonal directory?
 
           You will most probably like something like this:
 
             o conf makepl_arg "LIB=~/myperl/lib \
                               INSTALLMAN1DIR=~/myperl/man/man1 \
                               INSTALLMAN3DIR=~/myperl/man/man3"
             install Sybase::Sybperl
 
           You can make this setting permanent like all "o conf"
           settings with "o conf commit".
 
           You will have to add ~/myperl/man to the MANPATH envi?
           ronment variable and also tell your perl programs to
           look into ~/myperl/lib, e.g. by including
 
             use lib "$ENV{HOME}/myperl/lib";
 
           or setting the PERL5LIB environment variable.
 
           Another thing you should bear in mind is that the
           UNINST parameter should never be set if you are not
           root.


Tim.


On Wed, 2004-04-28 at 19:12, Goldilox wrote:
> I need to install some modules locally on a shared server. I do not have access
> to root and it is a pay service with no support, especially with this issue. I
> have figured out I need to add my local dir to @INC
> 
> PERL5LIB=/path/to/my/perl-lib; export PERL5LIB;
> 
> but now I try to run: perl -MCPAN -e shell
> 
> and it basically tells me I am not root
> 
> so I need to get the modules installed to: /path/to/my/perl-lib
> 
> Can anyone point me to a tutorial?
> 
> Thanks
> Rhett
> 
> _______________________________________________
> Austin mailing list
> Austin@mail.pm.org
> http://mail.pm.org/mailman/listinfo/austin
-- 
 _______________________________________________________________________
                                                      Timothy E. Peoples
                                                   Have Camel, Will Code
                                                         tim@toolman.org


From goldilox at teachnet.edb.utexas.edu  Thu Apr 29 13:50:27 2004
From: goldilox at teachnet.edb.utexas.edu (Goldilox)
Date: Mon Aug  2 21:23:24 2004
Subject: APM: Installing modules locally on a shared server
In-Reply-To: <1083242830.15853.28.camel@localhost.localdomain>
References: <fc.0003a9a0010c50000003a9a0010c5000.10c5015@teachnet.edb.utexas.edu>
	<1083242830.15853.28.camel@localhost.localdomain>
Message-ID: <fc.0003a9a0010c6be60003a9a0010c5000.10c6cb4@teachnet.edb.utexas.edu>

I did read this, and I honestly get lost many times reading these types of
documents when they assume a certain comfort level. I guess I confused myself
by trying to search for other more specific instructions (like how to add items
to the MANPATH env variable?) and do I want to do "o conf commit" if I will
always be installing modules to my local area (I don't want to mess something
else up in the process)? And I assume I would type it: 
>o conf 
commit 
>makepl_arg
...

I'll see if I can do a little more research.

Thanks for the feedback.

Rhett

Tim Peoples <tim@toolman.org> writes:
>
>The FAQ section of "perldoc CPAN" says:
>
>       5)  I am not root, how can I install a module in a per?
>           sonal directory?
> 
>           You will most probably like something like this:
> 
>             o conf makepl_arg "LIB=~/myperl/lib \
>                               INSTALLMAN1DIR=~/myperl/man/man1 \
>                               INSTALLMAN3DIR=~/myperl/man/man3"
>             install Sybase::Sybperl
> 
>           You can make this setting permanent like all "o conf"
>           settings with "o conf commit".
> 
>           You will have to add ~/myperl/man to the MANPATH envi?
>           ronment variable and also tell your perl programs to
>           look into ~/myperl/lib, e.g. by including
> 
>             use lib "$ENV{HOME}/myperl/lib";
> 
>           or setting the PERL5LIB environment variable.
> 
>           Another thing you should bear in mind is that the
>           UNINST parameter should never be set if you are not
>           root.
>
>
>Tim.
>
>
>On Wed, 2004-04-28 at 19:12, Goldilox wrote:
>> I need to install some modules locally on a shared server. I do not have
>access
>> to root and it is a pay service with no support, especially with this issue.
>I
>> have figured out I need to add my local dir to @INC
>> 
>> PERL5LIB=/path/to/my/perl-lib; export PERL5LIB;
>> 
>> but now I try to run: perl -MCPAN -e shell
>> 
>> and it basically tells me I am not root
>> 
>> so I need to get the modules installed to: /path/to/my/perl-lib
>> 
>> Can anyone point me to a tutorial?
>> 
>> Thanks
>> Rhett
>> 
>> _______________________________________________
>> Austin mailing list
>> Austin@mail.pm.org
>> http://mail.pm.org/mailman/listinfo/austin
>-- 
> _______________________________________________________________________
>                                                      Timothy E. Peoples
>                                                   Have Camel, Will Code
>                                                         tim@toolman.org
>
>