From jkeroes at eli.net  Tue Nov  4 00:04:55 2003
From: jkeroes at eli.net (Joshua Keroes)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] Code review
Message-ID: <CFB10CF6-0E8C-11D8-9EBD-000A95C466EC@eli.net>


Come one, come all to a night of code review! We promise, this will be 
altogether unlike any code review at work.

You lucky contestants get to:
1. Bring code.
2. Talk to people.
3. Get additional eyes to look over your code.
4. Treat those people to beer, food, pool, your HUMAN SOUL; or 
something else nice.

Others may:
1. Constructively criticize code.
2. ...win a lifesize talking alarm clock of super-mega-sitcom star, 
Fran Drescher!

Some lucky few may also witness code being refactored - before your 
very eyes!

A select number will also watch as your fellow coders take on the 
physical and mental confidence  of Charles Atlas that only smooth, 
clean, succinct, clear code can provide. No sand getting kicked in 
anyones' eyes here!

---

So, the next Perl Mongers meeting is in about ten days. To make this 
happen, we need those who have improvable code and don't mind admitting 
that it can be improved, and those who can improve it without being 
either heavy-handed or low-handed.

Reply if you're interested. There needs to be enough interested parties 
for this to work.

-Joshua

PS I'll post the location details later.


From schwern at pobox.com  Tue Nov  4 00:17:06 2003
From: schwern at pobox.com (Michael G Schwern)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] Code review
In-Reply-To: <CFB10CF6-0E8C-11D8-9EBD-000A95C466EC@eli.net>
References: <CFB10CF6-0E8C-11D8-9EBD-000A95C466EC@eli.net>
Message-ID: <20031104061706.GB5649@localhost.comcast.net>

On Mon, Nov 03, 2003 at 10:04:55PM -0800, Joshua Keroes wrote:
> So, the next Perl Mongers meeting is in about ten days. To make this 
> happen, we need those who have improvable code and don't mind admitting 
> that it can be improved, and those who can improve it without being 
> either heavy-handed or low-handed.

Can we at least administer corrective wedgies?


-- 
Michael G Schwern        schwern@pobox.com  http://www.pobox.com/~schwern/
Playstation?  Of course Perl runs on Playstation.
    -- Jarkko Hietaniemi

From poec at yahoo.com  Tue Nov  4 11:44:53 2003
From: poec at yahoo.com (Ovid)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] Code review
In-Reply-To: <CFB10CF6-0E8C-11D8-9EBD-000A95C466EC@eli.net>
Message-ID: <20031104174453.20460.qmail@web40408.mail.yahoo.com>

--- Joshua Keroes <jkeroes@eli.net> wrote:
> Reply if you're interested. There needs to be enough interested parties 
> for this to work.

Err ... the code I most want reviewed right now is some C with a thin Perl wrapper.  Fair game?  I
suspect not :(

Cheers,
Ovid

=====
Silence is Evil            http://users.easystreet.com/ovid/philosophy/indexdecency.htm
Ovid                       http://www.perlmonks.org/index.pl?node_id=17000
Web Programming with Perl  http://users.easystreet.com/ovid/cgi_course/

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

From hydo at mac.com  Tue Nov  4 18:12:18 2003
From: hydo at mac.com (Clint Moore)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] Code review
In-Reply-To: <20031104174453.20460.qmail@web40408.mail.yahoo.com>
References: <20031104174453.20460.qmail@web40408.mail.yahoo.com>
Message-ID: <B7A8BB76-0F24-11D8-865E-000393CD5C76@mac.com>


On Nov 4, 2003, at 9:44 AM, Ovid wrote:
> --- Joshua Keroes <jkeroes@eli.net> wrote:
>> Reply if you're interested. There needs to be enough interested 
>> parties
>> for this to work.
>
> Err ... the code I most want reviewed right now is some C with a thin 
> Perl wrapper.  Fair game?  I
> suspect not :(
>


I'm probably pretty rusty but i'll take a look at it.   Bring it!

-cm


From kyle at silverbeach.net  Tue Nov  4 19:25:15 2003
From: kyle at silverbeach.net (Kyle Hayes)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] Code review
In-Reply-To: <20031104174453.20460.qmail@web40408.mail.yahoo.com>
References: <20031104174453.20460.qmail@web40408.mail.yahoo.com>
Message-ID: <200311041725.15172.kyle@silverbeach.net>

On Tuesday 04 November 2003 09:44, Ovid wrote:
> --- Joshua Keroes <jkeroes@eli.net> wrote:
> > Reply if you're interested. There needs to be enough interested parties
> > for this to work.
>
> Err ... the code I most want reviewed right now is some C with a thin Perl
> wrapper.  Fair game?  I suspect not :(

I'll look at it.  I've been playing with Inline::C anyway lately.

Best,
Kyle


From john at digitalmx.com  Tue Nov  4 20:58:34 2003
From: john at digitalmx.com (John Springer)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] saving state with CGI.pm
Message-ID: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>

I'm having a problem using CGI.pm to save state.  Maybe I'm using the 
wrong tool??  Anyways...
I have users going through several forms to collect information, and 
I'm saving the state of the CGI object in a session file. But I want to 
keep a "running list" of all the data that has been set across all the 
forms, so the user can bounce back and forth without losing anything.  
I got it to work but it's awkward and took a lot of trial and error.
Here's what I'm doing:
1. create new cgi object with form values ($q= new CGI();)
2. open another CGI object from the previously saved state. ($p= new 
CGI(FILEHANDLE);)
3. add all the variables from $p that aren't in $q to $q.
	 if($var is in $p but not in $q) {
		$val=$p->parm($var);
		$q->param($var,$val); #sets $var to $val in $q.
	}
4. Save the state of $new back to the file.   $q->save(FILEHANDLE);

One of the difficulties is that the ->param($var,$val) notation fails 
if $val is null.  Also if $var was
previously set but the current form clears it, it needs to get cleared. 
  The if() in step 3 got rather complicated.  I run through a sorted 
list of variables in $q and compare to a sorted list of variables in $p 
and pass only the vars in p but not in q.

I have a feeling I've turned something simple into something really 
complicated.  Can anyone put me on the path of righteousness?


--
   John Springer
   Somewhere in Portland
   Where it's probably raining.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1517 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20031104/32ee309d/attachment.bin
From merlyn at stonehenge.com  Tue Nov  4 21:05:40 2003
From: merlyn at stonehenge.com (Randal L. Schwartz)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] saving state with CGI.pm
In-Reply-To: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>
References: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>
Message-ID: <867k2fms5b.fsf@blue.stonehenge.com>

>>>>> "John" == John Springer <john@digitalmx.com> writes:

John> I have users going through several forms to collect information, and
John> I'm saving the state of the CGI object in a session file. But I want
John> to keep a "running list" of all the data that has been set across all
John> the forms, so the user can bounce back and forth without losing
John> anything.  I got it to work but it's awkward and took a lot of trial
John> and error.

Consider the code at
<http://www.stonehenge.com/merlyn/WebTechniques/col40.html> as a
possible solution.  It's using client-side state management, but
that's certainly a viable solution.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

From darthsmily at verizon.net  Tue Nov  4 21:59:25 2003
From: darthsmily at verizon.net (darthsmily)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] Code review
In-Reply-To: <CFB10CF6-0E8C-11D8-9EBD-000A95C466EC@eli.net>
References: <CFB10CF6-0E8C-11D8-9EBD-000A95C466EC@eli.net>
Message-ID: <3FA8759D.4080205@verizon.net>

Joshua Keroes wrote:

>
> Come one, come all to a night of code review! We promise, this will be 
> altogether unlike any code review at work.
>
> You lucky contestants get to:
> 1. Bring code.
> 2. Talk to people.
> 3. Get additional eyes to look over your code.
> 4. Treat those people to beer, food, pool, your HUMAN SOUL; or 
> something else nice.
>
> Others may:
> 1. Constructively criticize code.
> 2. ...win a lifesize talking alarm clock of super-mega-sitcom star, 
> Fran Drescher!
>
> Some lucky few may also witness code being refactored - before your 
> very eyes!
>
> A select number will also watch as your fellow coders take on the 
> physical and mental confidence  of Charles Atlas that only smooth, 
> clean, succinct, clear code can provide. No sand getting kicked in 
> anyones' eyes here!
>
> ---
>
> So, the next Perl Mongers meeting is in about ten days. To make this 
> happen, we need those who have improvable code and don't mind 
> admitting that it can be improved, and those who can improve it 
> without being either heavy-handed or low-handed.
>
> Reply if you're interested. There needs to be enough interested 
> parties for this to work.
>
> -Joshua
>
> PS I'll post the location details later.
>
> _______________________________________________
> Pdx-pm-list mailing list
> Pdx-pm-list@mail.pm.org
> http://mail.pm.org/mailman/listinfo/pdx-pm-list
>
When and where?


From tex at off.org  Wed Nov  5 02:19:32 2003
From: tex at off.org (Austin Schutz)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] saving state with CGI.pm
In-Reply-To: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>
References: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>
Message-ID: <20031105081932.GA20965@gblx.net>

On Tue, Nov 04, 2003 at 06:58:34PM -0800, John Springer wrote:
> I'm having a problem using CGI.pm to save state.  Maybe I'm using the 
> wrong tool??  Anyways... 
<snip>

	One way to do it is to use cookies. Benefits are that you don't
have to save any state yourself and the user can go back to any part of the
form at any point in the future and still access their data. You can set
cookies at any part of your website and have them readable everywhere, sort
of like global variables.
	Some folks used to say that users wouldn't always allow cookies, but
that's probably not true any more.

	Austin

From rootbeer at redcat.com  Thu Nov  6 11:50:08 2003
From: rootbeer at redcat.com (Tom Phoenix)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] Anti-cookie rhetoric (was: saving state with CGI.pm)
In-Reply-To: <20031105081932.GA20965@gblx.net>
References: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>
	<20031105081932.GA20965@gblx.net>
Message-ID: <Pine.BSO.4.53.0311060843150.9242@blue.stonehenge.com>

On Wed, 5 Nov 2003, Austin Schutz wrote:

> 	Some folks used to say that users wouldn't always allow cookies,
> but that's probably not true any more.

It's worth remembering that a few users may not be able to use cookies
even if they want to. For example, the user might be at a school or
library net terminal, unable to change the preferences, while the site
admin has ordained "no cookies" since each computer is shared among many
users.

Even when cookies succeed, they don't hold information for the user; they
hold information for the _browser_. If I borrow your computer and use your
browser, sites will think that you're visiting. If you use a different
computer or browser, sites may think that a different person is visiting.
That's one reason that most cookies should expire within a few hours or at
end-of-session, the sooner the better. (Exception: The user asks to save
state, such as "Remember my settings". Or you have users who are sure to
have cookie support and mostly one-user-per-browser, such as with an
in-house application.)

We should all laugh at sites which use cookies to keep voters on a
web-based poll from "stuffing the ballot box" with multiple votes. That
inconveniences some people who share browsers while being impotent to
prevent fraudulent votes. (That's a task for a captcha:
http://www.captcha.net/ - but there's no fair way to stop someone who
wants to vote more than once, short of some non-net-based registration.)

Cookies can work for some purposes, but they have a lot of shortcomings.

--Tom

From poec at yahoo.com  Thu Nov  6 12:21:47 2003
From: poec at yahoo.com (Ovid)
Date: Mon Aug  2 21:34:25 2004
Subject: [Pdx-pm] saving state with CGI.pm
In-Reply-To: <20031105081932.GA20965@gblx.net>
Message-ID: <20031106182147.97425.qmail@web40404.mail.yahoo.com>

--- Austin Schutz <tex@off.org> wrote:
> 	One way to do it is to use cookies. Benefits are that you don't
> have to save any state yourself and the user can go back to any part of the
> form at any point in the future and still access their data. You can set
> cookies at any part of your website and have them readable everywhere, sort
> of like global variables.

Er, sorry, but I have to say that this is a terrible idea.

  http://use.perl.org/~Ovid/journal/15165
    (my credit card number and pin was stored in a cookie)
  http://use.perl.org/~Ovid/journal/13542
    (Friendster stored password in cookie)
  http://use.perl.org/~Ovid/journal/13471
    (Microsoft abuses cookies and a young lady may have gotten in trouble 
     because a cookie revealed the location of her online journal)

You can read about those horror stories of storing user data in the cookies.  One response might
be "store everything *but* sensitive data in the cookie", but at that point, it means you already
have a server-side mechanism for maintaining state and you no longer need to rely on the cookie.

Cheers,
Ovid
  

=====
Silence is Evil            http://users.easystreet.com/ovid/philosophy/indexdecency.htm
Ovid                       http://www.perlmonks.org/index.pl?node_id=17000
Web Programming with Perl  http://users.easystreet.com/ovid/cgi_course/

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

From tex at off.org  Wed Nov  5 13:08:45 2003
From: tex at off.org (Austin Schutz)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] saving state with CGI.pm
In-Reply-To: <20031106182147.97425.qmail@web40404.mail.yahoo.com>
References: <20031105081932.GA20965@gblx.net>
	<20031106182147.97425.qmail@web40404.mail.yahoo.com>
Message-ID: <20031105190845.GA13945@gblx.net>

On Thu, Nov 06, 2003 at 10:21:47AM -0800, Ovid wrote:
> --- Austin Schutz <tex@off.org> wrote:
> > 	One way to do it is to use cookies. Benefits are that you don't
> > have to save any state yourself and the user can go back to any part of the
> > form at any point in the future and still access their data. You can set
> > cookies at any part of your website and have them readable everywhere, sort
> > of like global variables.
> 
> Er, sorry, but I have to say that this is a terrible idea.
> 
>   http://use.perl.org/~Ovid/journal/15165
>     (my credit card number and pin was stored in a cookie)
>   http://use.perl.org/~Ovid/journal/13542
>     (Friendster stored password in cookie)
>   http://use.perl.org/~Ovid/journal/13471
>     (Microsoft abuses cookies and a young lady may have gotten in trouble 
>      because a cookie revealed the location of her online journal)
> 
> You can read about those horror stories of storing user data in the cookies.

	Three points of rebuttal... err.. I guess four:

	1. If a credit card number has to be stored, I'd much rather have it
stored on my computer than on some poorly maintained webserver run
by joe shmoe on the other side of the 'Net.
	2. You shouldn't be storing credit card information anyway.
	3. Encryption works swell. Just because the data is stored on the
user's computer doesn't mean it has to be available in plaintext.

	In addition to the point that if you can't trust the other users
on an insecure operating system you shouldn't be using it anyway. In the
"young lady" story her parents could just as well have installed a keystroke
logger, etc. etc. etc.


	Austin

From joe at oppegaard.net  Thu Nov  6 13:38:29 2003
From: joe at oppegaard.net (Joe Oppegaard)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] saving state with CGI.pm
In-Reply-To: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>
References: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>
Message-ID: <Pine.LNX.4.53.0311061120340.1388@s21.MatrixPark.net>

On Tue, 4 Nov 2003, John Springer wrote:

> I'm having a problem using CGI.pm to save state.  Maybe I'm using the
> wrong tool??  Anyways...
> I have users going through several forms to collect information, and
> I'm saving the state of the CGI object in a session file. But I want to
> keep a "running list" of all the data that has been set across all the
> forms, so the user can bounce back and forth without losing anything.

My preferred way to do things like this are with sessions. See
CGI::Session and the very good tutorial that comes with it. (Note the
-ip-match switch).

You can store the user sessionId in a cookie, hidden input fields, or in
the URL query string itself, which is nice for non-cookie users. The
preferred thing to do with sessions that hold sensitive data in the
session file is to expire the sessionid after a set number of minutes or
when the browser closes, making sure to cleanup the session files. Of
course the session files should only be readable by the user the
webserver is running as.

I actually haven't done this in perl too much because most of the web
code at my job uses PHP (/me ducks), which has very convienent
built in session handling.

-Joe Oppegaard

From tex at off.org  Wed Nov  5 13:41:46 2003
From: tex at off.org (Austin Schutz)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Anti-cookie rhetoric (was: saving state with CGI.pm)
In-Reply-To: <Pine.BSO.4.53.0311060843150.9242@blue.stonehenge.com>
References: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>
	<20031105081932.GA20965@gblx.net>
	<Pine.BSO.4.53.0311060843150.9242@blue.stonehenge.com>
Message-ID: <20031105194146.GB13945@gblx.net>

On Thu, Nov 06, 2003 at 09:50:08AM -0800, Tom Phoenix wrote:
> On Wed, 5 Nov 2003, Austin Schutz wrote:
>
> >     Some folks used to say that users wouldn't always allow cookies,
> > but that's probably not true any more.
>
> It's worth remembering that a few users may not be able to use cookies
> even if they want to. For example, the user might be at a school or
> library net terminal, unable to change the preferences, while the site
> admin has ordained "no cookies" since each computer is shared among many
> users.

        Sure, that could happen. That's a pretty smally minority, but it
could be important.

>
> Even when cookies succeed, they don't hold information for the user; they
> hold information for the _browser_. If I borrow your computer and use your
> browser, sites will think that you're visiting. If you use a different
> computer or browser, sites may think that a different person is visiting.
> That's one reason that most cookies should expire within a few hours or at
> end-of-session, the sooner the better. (Exception: The user asks to save
> state, such as "Remember my settings". Or you have users who are sure to
> have cookie support and mostly one-user-per-browser, such as with an
> in-house application.)

        Well it's certainly possible to make sure the data in the cookies is
user specific, and to make sure it's password protected and/or encrypted, all
of which can be done without that much effort and without maintaining state
by the server.
        The point of the exercise was to maintain state for the user
anyway, so at some point unless the data gets flushed it will still be
available in the browser no matter which method you use, and using any
reasonable method the data can be flushed anyway. *shrug*


> Cookies can work for some purposes, but they have a lot of shortcomings.

        They're definitely not a panacea, but they can be pretty handy IMO,
especially for data that _isn't_ particularly sensitive, but should be
stored over long periods. Preferences, in particular, can be saved in cookies
and make a user's web browsing experience significantly better.

	I like 'em, but then again, I don't have the added burden of worrying
or caring about library users, etc.

	Austin

From poec at yahoo.com  Thu Nov  6 14:04:07 2003
From: poec at yahoo.com (Ovid)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] 3 Simple ways to attack cookies (was: saving state with
	CGI.pm)
In-Reply-To: <20031105190845.GA13945@gblx.net>
Message-ID: <20031106200407.69151.qmail@web40412.mail.yahoo.com>

--- Austin Schutz <tex@off.org> wrote:
> 	Three points of rebuttal... err.. I guess four:
> 
> 	1. If a credit card number has to be stored, I'd much rather have it
> stored on my computer than on some poorly maintained webserver run
> by joe shmoe on the other side of the 'Net.

Credit card numbers generally should not be stored.  However, if they are, there is *no way* that
information should be in a cookie.  That information gets invisibly stored on whatever computer
I'm using.  If I am at a public place and I happen to pop over to the credit card company's Web
site, then I've just stored my credit card number on this public computer (and session cookies can
also be written to swap files and get stored on disk when you think they aren't!).

There is no way we can hope to educate all users on how to manage this information and we
shouldn't have to.  The more things we ask people to remember, the more things they will forget.

> 	3. Encryption works swell. Just because the data is stored on the
> user's computer doesn't mean it has to be available in plaintext.

I can't remember the source of the quote, but I recall reading once a description of SSL as using
an armored car to send credit card numbers from a guy on a park bench to a guy in a cardboard box.
 Now while that *seems* like an innapropriate analogy since you appear to be talking about
encrypting the cookies as they are stored in the hard-drive (thus making is an armored cardboard
box), I think it's a perfect analogy because it reminds us that there we are talking about complex
systems and there are many parts to secure.

For example, let's say you really, really think that you've got everything nailed down.  The Web
site is using SSL for every single page (performance be damned), you have a physically secure
computer and just to be paranoid, you use an encrypted file system.  You've checked the server
that the Web site is sitting on and, as far as you can tell, every security patch appears to be
place and there are no known exploits.  An in-depth security audit of the code also reveals that
there are no known security holes in any portion of the Web code.  Feel pretty safe, huh?

Now the site uses your computer to store your personal data by storing it in a cookie on your
side, but no worries, you're bullet proof.

Tomorrow, the new programmer does a quick update to a page to allow a very limited subset of HTML
in user-posted comments and someone slips in an XSS (cross-site scripting) attack and snags Joe
User's cookie.  Game over.  Joe loses.

But the Web site's security team is so top-notch that they would never allow anything like this to
occur, so Joe User doesn't have to worry.

And while Joe User isn't worrying, he is over at another site which, unbeknownst to him, allows
users to add javascript and, as a result, Joe User finds himself the victim of an XST (cross-site
tracing) attack whereby his cookie for the safe domain is sent to the attacker whose script
resides on the unsafe domain.

Nope.  That won't work because Joe User is rather unusual in that he has his Javascript disabled
(unlike the vast majority of surfers), so Joe User is now perfectly safe and doesn't have to
worry.

And while Joe User isn't worrying, he's connecting to the safe server and notices that they had a
problem with their SSL certificate.  Annoyed at seeing this *again* (so many sites are sloppy
about this), he clicks "ignore" and is completely unaware of the man in the middle attack that's
grabbing his credit card number.

I could go on, but I think the point is made.  Those were potential scenarious of attack that
*assumed* everyone was cognizant of security issues.  Most, as we are sadly aware, are not.  With
networks, there are so many ways of compromising them that it doesn't make sense to send more
confidential information over the Web than is necessary.  We know that this data will be sent, but
it should be as limited as possible.

This is not to say that we should never use the Web for anything personal.  That's like saying we
can never unlock our front door lest the thieves get in.  The problem is that we shouldn't be
leaving our front door unlocked and then driving to the coast (unless we have insurance and want
new furniture).  The point here is risk management.  We need to understand what common sources of
attack are and how we can guard against them.  I probably won't hire a small army to guard against
a physical assault against my servers as this is both unlikely and not cost-effective.  On the
other hand, XSS attacks are quite common and *should* be guarded against.

Cheers,
Ovid

=====
Silence is Evil            http://users.easystreet.com/ovid/philosophy/indexdecency.htm
Ovid                       http://www.perlmonks.org/index.pl?node_id=17000
Web Programming with Perl  http://users.easystreet.com/ovid/cgi_course/

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

From rootbeer at redcat.com  Thu Nov  6 16:28:47 2003
From: rootbeer at redcat.com (Tom Phoenix)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Anti-cookie rhetoric (was: saving state with CGI.pm)
In-Reply-To: <20031105194146.GB13945@gblx.net>
References: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>
	<20031105081932.GA20965@gblx.net>
	<Pine.BSO.4.53.0311060843150.9242@blue.stonehenge.com>
	<20031105194146.GB13945@gblx.net>
Message-ID: <Pine.BSO.4.53.0311061404390.9242@blue.stonehenge.com>

On Wed, 5 Nov 2003, Austin Schutz wrote:

>         Well it's certainly possible to make sure the data in the
> cookies is user specific,

Am I missing something here? The only ways I can think of to ensure that
the cookie data belong to a particular user, instead of browser, would
obviate the need for long-term cookies at all. For example, if the user
logs in with a username-password combo, you know which user it is - but
now, why keep anything in the cookie jar? You've already got the username
and password (in some form) in a database, so you may as well keep
everything in there, or at least everything important. Cookies get lost,
but databases get backed up. (We hope!)

> and to make sure it's password protected and/or encrypted,

Encrypting user data in cookies is using a cheap database that sometimes
loses data. :-)

Seriously, disk space on the server is cheap; bandwidth consumed by large
cookies that go back and forth on many transactions is expenive. A small
cookie that has a session-ID is okay, but that's designed to expire at the
end of a session. If you must use large cookies, ensure that they're not
sent to and from your server except when necessary. Some servers send and
require every cookie even when you're fetching the eighteen images on
every page. For some reason, these pages load slowly...  :-D

> especially for data that _isn't_ particularly sensitive, but should be
> stored over long periods.

Long-term cookies are generally problematic. Most browsers implement some
limit on cookies, deleting old cookies to make room for new ones. The RFC
has some information on this, even though its suggested limits are pretty
permissive. Section 6.3 says,

    Applications should use as few and as small cookies as possible, and
    they should cope gracefully with the loss of a cookie.

        http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2109.html#sec-6.3

> Preferences, in particular, can be saved in cookies and make a user's
> web browsing experience significantly better.

Yes, that's the usage I mentioned - so long as the _user_ chooses to save
the state. If I borrow your browser, some site shouldn't save my
preferences as if they were yours, though.

I'm not opposed to all uses of cookies. But I'm opposed to most of their
uses on the web today.

--Tom Phoenix

From rootbeer at redcat.com  Thu Nov  6 16:32:36 2003
From: rootbeer at redcat.com (Tom Phoenix)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] 3 Simple ways to attack cookies (was: saving state with
	CGI.pm)
In-Reply-To: <20031106200407.69151.qmail@web40412.mail.yahoo.com>
References: <20031106200407.69151.qmail@web40412.mail.yahoo.com>
Message-ID: <Pine.BSO.4.53.0311061430530.9242@blue.stonehenge.com>

Ya know, maybe we should have a lightning talks session devoted to
cookies: pro, con, uses of, abuses of, and recipies. Those who don't want
to talk should bring the chocolate chip cookies.

--Tom

From tex at off.org  Wed Nov  5 17:08:07 2003
From: tex at off.org (Austin Schutz)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Anti-cookie rhetoric (was: saving state with CGI.pm)
In-Reply-To: <Pine.BSO.4.53.0311061404390.9242@blue.stonehenge.com>
References: <F23156F1-0F3B-11D8-8CDE-003065C30250@digitalmx.com>
	<20031105081932.GA20965@gblx.net>
	<Pine.BSO.4.53.0311060843150.9242@blue.stonehenge.com>
	<20031105194146.GB13945@gblx.net>
	<Pine.BSO.4.53.0311061404390.9242@blue.stonehenge.com>
Message-ID: <20031105230807.GC13945@gblx.net>

On Thu, Nov 06, 2003 at 02:28:47PM -0800, Tom Phoenix wrote:
> On Wed, 5 Nov 2003, Austin Schutz wrote:
> I'm not opposed to all uses of cookies. But I'm opposed to most of their
> uses on the web today.
> 

	I'll just say that I generally disagree, but I like the lightning
talk/chocolate chip cookie idea.

	Austin

From jkeroes at eli.net  Thu Nov  6 17:37:35 2003
From: jkeroes at eli.net (Joshua Keroes)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Afterhours meet
Message-ID: <3330DA76-10B2-11D8-AC50-000A95C466EC@eli.net>


After the next meeting, where shall we go for beer, etc?

J


From lemming at quirkyqatz.com  Thu Nov  6 17:47:49 2003
From: lemming at quirkyqatz.com (Mark Morgan)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] 3 Simple ways to attack cookies (was: saving state with
	CGI.pm)
In-Reply-To: <Pine.BSO.4.53.0311061430530.9242@blue.stonehenge.com>
References: <20031106200407.69151.qmail@web40412.mail.yahoo.com>
	<Pine.BSO.4.53.0311061430530.9242@blue.stonehenge.com>
Message-ID: <55415.134.134.136.3.1068162469.squirrel@webmail.pair.com>

Tom Phoenix said:
> Ya know, maybe we should have a lightning talks session devoted to
> cookies: pro, con, uses of, abuses of, and recipies. Those who don't
> want to talk should bring the chocolate chip cookies.

I have been thinking I should be making my chocolate chip cookies again.
They're not in the same class as my wife's rum balls, but they're pretty
good.

-- 
Mark
http://www.kittydream.org - House of Dreams Cat Shelter
p.s. sorry Tom for getting you twice on the reply


From john at digitalmx.com  Thu Nov  6 18:18:43 2003
From: john at digitalmx.com (John Springer)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] 3 Simple ways to attack cookies (was: saving state with
	CGI.pm)
In-Reply-To: <55415.134.134.136.3.1068162469.squirrel@webmail.pair.com>
References: <20031106200407.69151.qmail@web40412.mail.yahoo.com>
	<Pine.BSO.4.53.0311061430530.9242@blue.stonehenge.com>
	<55415.134.134.136.3.1068162469.squirrel@webmail.pair.com>
Message-ID: <F2287A97-10B7-11D8-8CDE-003065C30250@digitalmx.com>

Seems like those in favor of cookies should bring cookies; those 
opposed bring rum balls.
I may choose sides based on the refreshments, assuming there's enough 
rum is in the rum balls.

--
   John Springer
   Somewhere in Portland
   Where it's probably raining.
On Nov 6, 2003, at 3:47 PM, Mark Morgan wrote:

> I have been thinking I should be making my chocolate chip cookies 
> again.
> They're not in the same class as my wife's rum balls, but they're 
> pretty
> good.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 565 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20031106/0a58be05/attachment.bin
From kyle at silverbeach.net  Thu Nov  6 22:55:38 2003
From: kyle at silverbeach.net (Kyle Hayes)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] saving state with CGI.pm
In-Reply-To: <20031105190845.GA13945@gblx.net>
References: <20031105081932.GA20965@gblx.net>
	<20031106182147.97425.qmail@web40404.mail.yahoo.com>
	<20031105190845.GA13945@gblx.net>
Message-ID: <200311062055.38180.kyle@silverbeach.net>

On Wednesday 05 November 2003 11:08, Austin Schutz wrote:
> On Thu, Nov 06, 2003 at 10:21:47AM -0800, Ovid wrote:
> > --- Austin Schutz <tex@off.org> wrote:
> > > 	One way to do it is to use cookies. Benefits are that you don't
> > [snip, horror stories]
> > You can read about those horror stories of storing user data in the
> > cookies.
>
> 	Three points of rebuttal... err.. I guess four:
>
> 	1. If a credit card number has to be stored, I'd much rather have it
> stored on my computer than on some poorly maintained webserver run
> by joe shmoe on the other side of the 'Net.

??? but browsers will give it up to any server persuasive enough.  Cookie 
attacks are legion.  Tried cranking down control of your cookies real tight 
and then using Hotmail?

HTTP cookies are like VD infections: they spread easily and no one wants to 
talk about it.

Really, really important information I keep offline if at all possible.

> 	2. You shouldn't be storing credit card information anyway.

Tricky to do if you need recurring billing.  I agree that the best approach is 
to do everything possible to avoid storing a credit card in any form that 
could be turned back into a usable number.

It is worth doing all kinds of gymnastics to avoid storing CC numbers in any 
usable form.

> 	3. Encryption works swell. Just because the data is stored on the
> user's computer doesn't mean it has to be available in plaintext.

This is only true when the decryption method is as safe as the credit card 
data itself.  I.e. you've got all your credit card numbers carefully 
encrypted, but the Perl CGI that has the decryption key and salt is 
downloadable via misconfiguration or a web server bug....  Ouch.  I've seen 
it happen.

I've seen people make MySQL loadable user modules (compiled C++ code) to do 
the decryption.  Great, tough to make the web server serve that module up as 
a binary.  However, a little trickery, a couple of SQL injection attacks, and 
again, I've got your decryption key, or I can get your server to do the 
decryption for me.  Cool.

I prefer doing one-way crypto hashes on CC numbers.  MD5 is a bit long in the 
tooth, but SHA1 and some of the newer, stronger crypto hashes do a fine job.
You can't get the number back, but you can tell if you get the same CC twice 
(which is useful in stopping potential fraud).    Sure, someone can crack 
your web site and stripmine the DB. Big deal. You haven't given away any CC 
numbers.  

It's far too easy to win the Visa lawsuit sweepstakes.

The Secret Service will not help you.  Visa will not help you.  The first is 
too busy with too few people to do much.  The second makes money on every 
transaction, coming or going.  You are not worthy to talk to American 
Express.

> 	In addition to the point that if you can't trust the other users
> on an insecure operating system you shouldn't be using it anyway. In the
> "young lady" story her parents could just as well have installed a
> keystroke logger, etc. etc. etc.

As noted in other venues (and on this list): The end points are not secure, 
but the transport is great.  A bit of a conundrum for those of us that 
sometimes need to make sure that secret data stays secret.

Best,
Kyle


From jkeroes at eli.net  Tue Nov 11 11:44:30 2003
From: jkeroes at eli.net (Joshua Keroes)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Nov Meeting: Code Review
Message-ID: <B3D44736-146E-11D8-ADA3-000A95C466EC@eli.net>

Come one, come all to a night of code review! We promise, this will be 
altogether unlike any code review at work.

You lucky contestants get to:
1. Bring code.
2. Talk to people.
3. Get additional eyes to look over your code.
4. Treat those people to beer, food, pool, your HUMAN SOUL; or 
something else nice.

Others may:
1. Constructively criticize code.
2. ...win a lifesize talking alarm clock of super-mega-sitcom star, 
Fran Drescher!

Some lucky few may also witness code being refactored - before your 
very eyes!

A select number will also watch as your fellow coders take on the 
physical and mental confidence  of Charles Atlas that only smooth, 
clean, succinct, clear code can provide. No sand getting kicked in 
anyones' eyes here!

---

Meeting:
   Weds Nov 12, 6:30-8:30, at the Urban Grind Cafe
   Map at http://urbangrindcoffee.com/

Afterhours:
   8:30/9:00-whenever
   Goodfoot? Laurelwood? Moon & Sixpense? Basement Pub? Anywhere else?
   We'll vote on a location at Urban Grind. Email me privately if you 
only can hit the afterhours.

-J


From jkeroes at eli.net  Wed Nov 12 14:50:56 2003
From: jkeroes at eli.net (Joshua Keroes)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Nov Meeting: Code Review TONIGHT
Message-ID: <E9960F37-1551-11D8-A723-000A95C466EC@eli.net>

Come one, come all to a night of code review! We promise, this will be 
altogether unlike any code review at work.

You lucky contestants get to:
1. Bring code.
2. Talk to people.
3. Get additional eyes to look over your code.
4. Treat those people to beer, food, pool, your HUMAN SOUL; or 
something else nice.

Others may:
1. Constructively criticize code.
2. ...win a lifesize talking alarm clock of super-mega-sitcom star, 
Fran Drescher!

Some lucky few may also witness code being refactored - before your 
very eyes!

A select number will also watch as your fellow coders take on the 
physical and mental confidence  of Charles Atlas that only smooth, 
clean, succinct, clear code can provide. No sand getting kicked in 
anyones' eyes here!

---

Meeting:
   Weds Nov 12, 6:30-8:30, at the Urban Grind Cafe
   Map at http://urbangrindcoffee.com/

Afterhours:
   8:30/9:00-whenever
   Goodfoot? Laurelwood? Moon & Sixpense? Basement Pub? Anywhere else?
   We'll vote on a location at Urban Grind. Email me privately if you 
only can hit the afterhours.

-J


From poec at yahoo.com  Wed Nov 12 16:37:00 2003
From: poec at yahoo.com (Ovid)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Nov Meeting: Code Review TONIGHT
In-Reply-To: <E9960F37-1551-11D8-A723-000A95C466EC@eli.net>
Message-ID: <20031112223700.65176.qmail@web40403.mail.yahoo.com>

--- Joshua Keroes <jkeroes@eli.net> wrote:
> Come one, come all to a night of code review! We promise, this will be 
> altogether unlike any code review at work.

Crap.  I can't make it.  I'm nice and sick.  Blah.

Any chance that some of the before and afters can be posted to the Web site?

Cheers,
Ovid

=====
Silence is Evil            http://users.easystreet.com/ovid/philosophy/indexdecency.htm
Ovid                       http://www.perlmonks.org/index.pl?node_id=17000
Web Programming with Perl  http://users.easystreet.com/ovid/cgi_course/

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

From merlyn at stonehenge.com  Wed Nov 12 16:42:52 2003
From: merlyn at stonehenge.com (Randal L. Schwartz)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Nov Meeting: Code Review TONIGHT
In-Reply-To: <20031112223700.65176.qmail@web40403.mail.yahoo.com>
References: <20031112223700.65176.qmail@web40403.mail.yahoo.com>
Message-ID: <86ekwdryxm.fsf@blue.stonehenge.com>

>>>>> "Ovid" == Ovid  <poec@yahoo.com> writes:

Ovid> Crap.  I can't make it.  I'm nice and sick.  Blah.

And I'm in LA.  Nearly the same thing.   Nice and sick (read: twisted).

Ovid> Any chance that some of the before and afters can be posted to the Web site?

Ditto on the request.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

From kellert at ohsu.edu  Wed Nov 12 16:57:17 2003
From: kellert at ohsu.edu (Thomas Keller)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Mac OS X 10.3
Message-ID: <90818557-1563-11D8-8656-0003930405E2@ohsu.edu>

On my Mac running OS X 10.2 and perl 5.6, I had installed the fink 
package installer, and had added various libraries required by some of 
the perl modules I was using. But I was having trouble getting some 
perl modules to install. So for better or worse, I decided to rebuild 
my machine. I did a clean install of Mac 10.3 with all the developers 
tools, xcode, etc.

My questions are:
What libraries should I install right away, and where on Panther? e.g 
expat, libgd, freetype, etc?
What do people thing about using fink as a software installation tool 
on the Mac, specifically for perl required libraries, and perl modules?
What's your favorite reference for this type of perl-oriented system 
administration?

Thanks,
Tom K.

Tom Keller, Ph.D.
http://www.ohsu.edu/core
kellert@ohsu.edu
503-494-2442
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 857 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20031112/f610211a/attachment.bin
From jkeroes at eli.net  Wed Nov 12 17:00:42 2003
From: jkeroes at eli.net (Joshua Keroes)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Nov Meeting: Code Review TONIGHT
In-Reply-To: <86ekwdryxm.fsf@blue.stonehenge.com>
References: <20031112223700.65176.qmail@web40403.mail.yahoo.com>
	<86ekwdryxm.fsf@blue.stonehenge.com>
Message-ID: <0AC262AF-1564-11D8-A723-000A95C466EC@eli.net>

	
On Nov 12, 2003, at 2:42 PM, Randal L. Schwartz wrote:

>>>>>> "Ovid" == Ovid  <poec@yahoo.com> writes:
>
> Ovid> Crap.  I can't make it.  I'm nice and sick.  Blah.
>
> And I'm in LA.  Nearly the same thing.   Nice and sick (read: twisted).
>
> Ovid> Any chance that some of the before and afters can be posted to 
> the Web site?
>
> Ditto on the request.

Excellent idea. Everyone: please post befores and afters to the PDX.pm 
kwiki. There's an example listed at
http://pdx.pm.org/kwiki/ .

-J


From schwern at pobox.com  Wed Nov 12 20:07:35 2003
From: schwern at pobox.com (Michael G Schwern)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Mac OS X 10.3
In-Reply-To: <90818557-1563-11D8-8656-0003930405E2@ohsu.edu>
References: <90818557-1563-11D8-8656-0003930405E2@ohsu.edu>
Message-ID: <20031113020735.GG715@localhost.comcast.net>

On Wed, Nov 12, 2003 at 02:57:17PM -0800, Thomas Keller wrote:
> On my Mac running OS X 10.2 and perl 5.6, I had installed the fink 
> package installer, and had added various libraries required by some of 
> the perl modules I was using. But I was having trouble getting some 
> perl modules to install. So for better or worse, I decided to rebuild 
> my machine. I did a clean install of Mac 10.3 with all the developers 
> tools, xcode, etc.  
> 
> My questions are: 
> What libraries should I install right away, and where on Panther? e.g 
> expat, libgd, freetype, etc? 

I dunno, install stuff with CPANPLUS as you need it.


> What do people thing about using fink as a software installation tool 
> on the Mac, specifically for perl required libraries, and perl modules? 

fink doesn't have enough perl modules in its system to be usable as your
only source of Perl modules.  You could try rolling .info files for
each module you want to use and throwing them into /sw/fink/dists/local
but its probably not worth the trouble.


> What's your favorite reference for this type of perl-oriented system 
> administration? 

CPANPLUS does the job well.


-- 
Michael G Schwern        schwern@pobox.com  http://www.pobox.com/~schwern/
...someone always points out that we'll end up dressing like gay space
pirates anyway, so why bother planning otherwise?
        - C.H.U.N.K. DCLXVI

From john at digitalmx.com  Wed Nov 12 23:39:37 2003
From: john at digitalmx.com (John Springer)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Mac OS X 10.3
In-Reply-To: <90818557-1563-11D8-8656-0003930405E2@ohsu.edu>
References: <90818557-1563-11D8-8656-0003930405E2@ohsu.edu>
Message-ID: <C4D71AF5-159B-11D8-98C2-003065C30250@digitalmx.com>

This is Jaguar-oriented, but probably still useful:
(How to install perl 5.8 on Jaguar)

http://developer.apple.com/internet/macosx/perl.html
--
   John Springer
   Somewhere in Portland
   Where it's probably raining.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 301 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20031112/a20f83d2/attachment.bin
From schwern at pobox.com  Thu Nov 13 00:54:49 2003
From: schwern at pobox.com (Michael G Schwern)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Mac OS X 10.3
In-Reply-To: <C4D71AF5-159B-11D8-98C2-003065C30250@digitalmx.com>
References: <90818557-1563-11D8-8656-0003930405E2@ohsu.edu>
	<C4D71AF5-159B-11D8-98C2-003065C30250@digitalmx.com>
Message-ID: <20031113065449.GA14138@localhost.personaltelco.net>

On Wed, Nov 12, 2003 at 09:39:37PM -0800, John Springer wrote:
> This is Jaguar-oriented, but probably still useful: 
> (How to install perl 5.8 on Jaguar) 
> 
> http://developer.apple.com/internet/macosx/perl.html 

I would recommend against changing the system perl on any machine.  Leave 
/usr/bin/perl alone and don't overwrite /System/Library/Perl.  Its likely
to cause pain and suffering and pain.

If you want a newer Perl, install it into /usr/local/perl5.x.y and put a 
symlink in /usr/local/bin.  Configure it so that /Library/Perl is added when
it asks you for extra directories for @INC.


-- 
Michael G Schwern        schwern@pobox.com  http://www.pobox.com/~schwern/
Kindly do not attempt to cloud the issue with facts.

From merlyn at stonehenge.com  Thu Nov 13 01:46:25 2003
From: merlyn at stonehenge.com (Randal L. Schwartz)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Mac OS X 10.3
In-Reply-To: <20031113065449.GA14138@localhost.personaltelco.net>
References: <90818557-1563-11D8-8656-0003930405E2@ohsu.edu>
	<C4D71AF5-159B-11D8-98C2-003065C30250@digitalmx.com>
	<20031113065449.GA14138@localhost.personaltelco.net>
Message-ID: <86n0b0pv78.fsf@blue.stonehenge.com>

>>>>> "Michael" == Michael G Schwern <schwern@pobox.com> writes:

Michael> If you want a newer Perl, install it into
Michael> /usr/local/perl5.x.y and put a symlink in /usr/local/bin.
Michael> Configure it so that /Library/Perl is added when it asks you
Michael> for extra directories for @INC.

I put my OSX Perl in /opt/perl/snap (for "snapshot") with the Configure
line of:

  ./Configure -des -Dusedevel -Uversiononly -Dprefix=/opt/perl/snap -Dlocincpth=/sw/include -Dloclibpth=/sw/lib -Dperladmin=merlyn@stonehenge.com

Note that I have fink installed, so I need to include the two /sw dirs as well.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

From tex at off.org  Wed Nov 12 12:46:37 2003
From: tex at off.org (Austin Schutz)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Mac OS X 10.3
In-Reply-To: <20031113065449.GA14138@localhost.personaltelco.net>
References: <90818557-1563-11D8-8656-0003930405E2@ohsu.edu>
	<C4D71AF5-159B-11D8-98C2-003065C30250@digitalmx.com>
	<20031113065449.GA14138@localhost.personaltelco.net>
Message-ID: <20031112184637.GC2485@gblx.net>

On Wed, Nov 12, 2003 at 10:54:49PM -0800, Michael G Schwern wrote:
> On Wed, Nov 12, 2003 at 09:39:37PM -0800, John Springer wrote:
> > This is Jaguar-oriented, but probably still useful: 
> > (How to install perl 5.8 on Jaguar) 
> > 
> > http://developer.apple.com/internet/macosx/perl.html 
> 
> I would recommend against changing the system perl on any machine.  Leave 
> /usr/bin/perl alone and don't overwrite /System/Library/Perl.  Its likely
> to cause pain and suffering and pain.
> 
> If you want a newer Perl, install it into /usr/local/perl5.x.y and put a 
> symlink in /usr/local/bin.  Configure it so that /Library/Perl is added when
> it asks you for extra directories for @INC.
> 

	..and after you do that you may want to put /usr/local/bin into your
PATH ahead of your current PATH, e.g. in .profile:

PATH=/usr/local/bin:$PATH; export PATH;

	Also on some machines I've used the sysadmins neglect to link
/usr/local/bin/perldoc -> /usr/local/perl5.x.y/bin/perldoc. That can be quite
bothersome when you are trying to read documentation for modules in the
non-system version.

	Also another thing that's worked for me is to install it as 
/usr/local/bin/perl5 or perl5.8, so if you remember the name you can't confuse
the two, even if you wind up using an account without /usr/local/bin in its
PATH first. Ditto for perldoc, etc.

	Seems like this is in a faq somewhere, but my lack of coffee this
morning isn't helping me find it.

	Austin

From rootbeer at redcat.com  Thu Nov 13 18:06:58 2003
From: rootbeer at redcat.com (Tom Phoenix)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Regular Expression Compendium
Message-ID: <Pine.BSO.4.53.0311131411400.22024@blue.stonehenge.com>

A student in my class asked whether there's some sort of RE Compendium out
there. We're thinking of a web page that has Perl patterns filed under
entries like "North American phone number with optional area code" or
"fully-qualified domain name" or "Character name from The Simpsons".

Is there a page like this already? If not, it really should be on a Wiki,
shouldn't it? I think I'll make one...

    http://pdx.pm.org/kwiki/index.cgi?RECompendium

--Tom Phoenix

From jkeroes at eli.net  Thu Nov 13 18:25:57 2003
From: jkeroes at eli.net (Joshua Keroes)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Regular Expression Compendium
In-Reply-To: <Pine.BSO.4.53.0311131411400.22024@blue.stonehenge.com>
References: <Pine.BSO.4.53.0311131411400.22024@blue.stonehenge.com>
Message-ID: <1D880654-1639-11D8-B9A8-000A95C466EC@eli.net>


On Nov 13, 2003, at 4:06 PM, Tom Phoenix wrote:
> A student in my class asked whether there's some sort of RE Compendium 
> out
> there. We're thinking of a web page that has Perl patterns filed under
> entries like "North American phone number with optional area code" or
> "fully-qualified domain name" or "Character name from The Simpsons".

http://search.cpan.org/~abigail/Regexp-Common-2.113/  won't do any of 
these things but I believe the first two are on the TODO list.

J


From joe at oppegaard.net  Thu Nov 20 00:39:05 2003
From: joe at oppegaard.net (Joe Oppegaard)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Too much validation
Message-ID: <Pine.LNX.4.53.0311192200560.10157@s21.MatrixPark.net>

Mongers,

I notice that sometimes in OO code that I write I'll do something like
this obviously contrived example:

----

package WordCharacters;

sub new {
    my ($class, $value) = @_;

    # Validation check here
    unless ($value =~ /^\w+$/) {
        die "Non-word character used in value: $value";
    }

    my $self = {
       value => $value
    };

    bless $self, ref($class) || $class;
    return $self;
}

package main;

print "> ";
chomp(my $input = <>);

unless ($input =~ /^\w+$/) {
    # More validation here
    die "Word characters only!\n";
}

my $wc = WordCharacters->new($input);

----

So as a general rule of thumb, when should data validation be done?
Catch it early or catch it when it actually matters? Or both? (Ugh,
duplicate code).

Seems to me that typically you should catch it when it actually matters,
so the calling code doesn't have to worry about what is and isn't
acceptable. On the other hand, I guess I just feel dirty passing through
data that I know could be invalid.


	-Joe Oppegaard

From wcooley at nakedape.cc  Thu Nov 20 00:56:19 2003
From: wcooley at nakedape.cc (Wil Cooley)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Too much validation
In-Reply-To: <Pine.LNX.4.53.0311192200560.10157@s21.MatrixPark.net>
References: <Pine.LNX.4.53.0311192200560.10157@s21.MatrixPark.net>
Message-ID: <1069311379.9617.101.camel@denk.nakedape.priv>

On Wed, 2003-11-19 at 22:39, Joe Oppegaard wrote:

> So as a general rule of thumb, when should data validation be done?
> Catch it early or catch it when it actually matters? Or both? (Ugh,
> duplicate code).

My guess (and IANAExpert) is that it should probably be done in both,
depending on the circumstances.  If you're writing a module or anything
where you expect reuse, you should treat it as a black-box and make it
as robust as possible.  OTOH, if you're following an XP/YAGNI approach
early in releases, then probably it's fine to just have it in one place.

I've often wondered if it wouldn't be better to use objects instead of
basic strings for a lot of attributes, where the class implements robust
validation checks.  How many places splattered throughout your code is
the regex for testing if a telephone number or IP address is valid?  Why
do OO languages rarely ship standard with classes for common data
formats?

Wil
-- 
Wil Cooley                                 wcooley@nakedape.cc
Naked Ape Consulting                        http://nakedape.cc
* * * * Linux, UNIX, Networking and Security Solutions * * * *
*   Naked Ape Consulting                 http://nakedape.cc  *
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20031119/e24df785/attachment.bin
From schwern at pobox.com  Thu Nov 20 02:54:15 2003
From: schwern at pobox.com (Michael G Schwern)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Too much validation
In-Reply-To: <Pine.LNX.4.53.0311192200560.10157@s21.MatrixPark.net>
References: <Pine.LNX.4.53.0311192200560.10157@s21.MatrixPark.net>
Message-ID: <20031120085415.GA17333@windhund.schwern.org>

On Wed, Nov 19, 2003 at 10:39:05PM -0800, Joe Oppegaard wrote:
> So as a general rule of thumb, when should data validation be done?
> Catch it early or catch it when it actually matters? Or both? (Ugh,
> duplicate code).

Depends on what you're doing, but in general I'd say catch it as the new
data comes in.  That way you an put the checks in one place and won't
forget to validate on the way out.


-- 
Michael G Schwern        schwern@pobox.com  http://www.pobox.com/~schwern/
Cottleston, Cottleston, Cottleston Pie.
A fly can't bird, but a bird can fly.
Ask me a riddle and I reply:
"Cottleston, Cottleston, Cottleston Pie."

From kellert at ohsu.edu  Thu Nov 20 15:54:22 2003
From: kellert at ohsu.edu (Thomas Keller)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] tilde in paths
Message-ID: <19B5940C-1BA4-11D8-A217-0003930405E2@ohsu.edu>

Greetings,
Forgive my laziness for not searching for the answer to this. But does 
someone off the top of their head know how to open a file with a tilde 
in the path?
Specifically, I want something like:
{
open FILE, '~/Documents/myfile' or die;
my @info = <FILE>;
}
to open myfile in the Documents dir in the users home directory.
I know the problem is passing it to the shell, but I don't know how to 
do that
Thanks,
Tom K.


From jkeroes at eli.net  Thu Nov 20 15:55:36 2003
From: jkeroes at eli.net (Joshua Keroes)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] tilde in paths
In-Reply-To: <19B5940C-1BA4-11D8-A217-0003930405E2@ohsu.edu>
References: <19B5940C-1BA4-11D8-A217-0003930405E2@ohsu.edu>
Message-ID: <45C296F9-1BA4-11D8-89E2-000A95C466EC@eli.net>


On Nov 20, 2003, at 1:54 PM, Thomas Keller wrote:

> Greetings,
> Forgive my laziness for not searching for the answer to this. But does 
> someone off the top of their head know how to open a file with a tilde 
> in the path?

Found in /usr/local/perl581/lib/5.8.1/pods/perlfaq5.pod
        How can I translate tildes (~) in a filename?

        Use the <> (glob()) operator, documented in perlfunc.  Older 
versions
        of Perl require that you have a shell installed that groks 
tildes.
        Recent perl versions have this feature built in. The File::KGlob 
module
        (available from CPAN) gives more portable glob functionality.

        Within Perl, you may use this directly:

                $filename =~ s{
                  ^ ~             # find a leading tilde
                  (               # save this in $1
                      [^/]        # a non-slash character
                            *     # repeated 0 or more times (0 means me)
                  )
                }{
                  $1
                      ? (getpwnam($1))[7]
                      : ( $ENV{HOME} || $ENV{LOGDIR} )
                }ex;

Have a nice day. :-)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Joshua Keroes.vcf
Type: text/directory
Size: 363 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20031120/e6e6ac64/JoshuaKeroes.bin
-------------- next part --------------

From tkil at scrye.com  Thu Nov 20 18:09:58 2003
From: tkil at scrye.com (Tkil)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Too much validation
In-Reply-To: <Pine.LNX.4.53.0311192200560.10157@s21.MatrixPark.net>
References: <Pine.LNX.4.53.0311192200560.10157@s21.MatrixPark.net>
Message-ID: <g1xs2eg4s.fsf@brand.scrye.com>

>>>>> "Joe" == Joe Oppegaard <joe@oppegaard.net> writes:

Joe> So as a general rule of thumb, when should data validation be
Joe> done?  Catch it early or catch it when it actually matters? Or
Joe> both? (Ugh, duplicate code).

With object-oriented codde, I feel you should let the class decide
what is acceptable or not.  This lets me change what is considered
acceptable without editing all callers.

There are two ways I'd code this convention in Perl.  One requires a
bit of checking, but it is unobtrusive and straightforward:

| while ( my $raw_data = get_data() )
| {
|     # MyClass::new will return undef if $raw_data is invalid
|     my $obj = MyClass->new( $raw_data )
|       or next;
| 
|     # do stuff with $obj here
| }

The other way -- which I've adopted in most of my code of late -- is
to use "eval BLOCK" to catch "die" calls as exceptions:

| while ( my $raw_data = get_data() )
| {
|     eval
|     {
|         # MyClass::new will 'die' if $raw_data is invalid
|         my $obj = MyClass->new( $raw_data );
|
|         # do stuff with $obj here
|     };
| 
|     if ( $@ )
|     {
|         # complain
|     }
| }

This has the advantage of providing a description of what went wrong
in $@.  Further, it allows any method in MyClass to "die" if it can't
do what it promises to do.

There are existing modules that can be told to "die" if something goes
wrong, freeing you from checking the error return of every call.  A
fine example of this is DBI, with its RaiseError attribute.

t.

From wcooley at nakedape.cc  Thu Nov 20 18:20:53 2003
From: wcooley at nakedape.cc (Wil Cooley)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Too much validation
In-Reply-To: <g1xs2eg4s.fsf@brand.scrye.com>
References: <Pine.LNX.4.53.0311192200560.10157@s21.MatrixPark.net>
	<g1xs2eg4s.fsf@brand.scrye.com>
Message-ID: <1069374053.30864.10.camel@denk.nakedape.priv>

On Thu, 2003-11-20 at 16:09, Tkil wrote:

> The other way -- which I've adopted in most of my code of late -- is
> to use "eval BLOCK" to catch "die" calls as exceptions:

Have you looked at using the Exception.pm module from CPAN?  I read the
docs for it but never actually got around to using it.  I like the idea
of being able to use exceptions with the familiar 'try' syntax used in
other OO languages, although the lack of pervasive exceptions makes it
less than ideal.

Wil
-- 
Wil Cooley                                 wcooley@nakedape.cc
Naked Ape Consulting                        http://nakedape.cc
* * * * * *  Linux Services for Small Businesses  * * * * * *
*       Easy, reliable solutions for small businesses       *
*    Naked Ape Business Server http://nakedape.cc/r/sms     *
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20031120/30bcf059/attachment.bin
From tex at off.org  Thu Nov 20 18:31:28 2003
From: tex at off.org (Austin Schutz)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Too much validation
In-Reply-To: <Pine.LNX.4.53.0311192200560.10157@s21.MatrixPark.net>
References: <Pine.LNX.4.53.0311192200560.10157@s21.MatrixPark.net>
Message-ID: <20031121003128.GB2390@gblx.net>

On Wed, Nov 19, 2003 at 10:39:05PM -0800, Joe Oppegaard wrote:
> So as a general rule of thumb, when should data validation be done?
> Catch it early or catch it when it actually matters? Or both? (Ugh,
> duplicate code).

	My suggestion would be when it can be done with the least work. That
would be "where it matters" in your example.

> 
> Seems to me that typically you should catch it when it actually matters,
> so the calling code doesn't have to worry about what is and isn't
> acceptable. On the other hand, I guess I just feel dirty passing through
> data that I know could be invalid.
> 

	If you use the module in many places you will soon tire of repeating
the same code and be thankful the module does the validation for you.
	The other great advantage is that if you change your mind about what
constitutes valid input it's in a single spot. Otherwise you may be chasing
down regexes in 50 different CGI scripts, etc.

	I dunno, that's my 2c.

	Austin

From raanders at acm.org  Fri Nov 21 16:23:12 2003
From: raanders at acm.org (Roderick A. Anderson)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Hash question
Message-ID: <Pine.LNX.4.33.0311211352500.31646-100000@main.cyber-office.net>

New to the list and glad I found it.  Not from the Portland area (Hayden, 
ID).

I've been through the list archives but found nothing on this topic (which 
almost ended up being 'Hash-ish question').

I'm working on a web interface to a commercial application using CGI.pm,
Win32::ODBC, and a pile of other modules.  Using neat tricks I've got from
"Programming Perl", "Learning Perl", "Perl Cookbook" and almost every
other O'Reilly book on perl plus those from a few other publishers, I
solved an append of one hash to another need but was wondering if there
was a better way.  And there is one problem that is Win32::ODBC caused
that I'm looking for a solution to.

First the append hash solution.  I use the hash generated from some CGI.pm
params then query a SQL Server database and and use DataHash to returned
the row.  To append to the original hash I'm using a variation on code I
got out of "Perl Cookbook".

	%SignUpInfo = (%SignUpInfo, $db2->DataHash());

Is there a better or more efficient way to do this?


Then the Win32::ODBC issue.  When I use the above $db2->DataHash() whether 
appending or creating a new hash I end up with an empty key/value in 
the hash.

Doing a

        foreach my $key (sort keys %SignUpInfo) {
            print "$key: $SignUpInfo{$key}\n";

gets me output with one line with only the colon on it.  Is there a way to
remove this key/value combination?  I think it has to do with Win32::ODBC 
returning some kind of row identifier.


TIA,
Rod
-- 
  "Open Source Software - Sometimes you get more than you paid for..."


From joe at radiojoe.org  Fri Nov 21 17:45:55 2003
From: joe at radiojoe.org (Joe Oppegaard)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <Pine.LNX.4.33.0311211352500.31646-100000@main.cyber-office.net>
References: <Pine.LNX.4.33.0311211352500.31646-100000@main.cyber-office.net>
Message-ID: <Pine.LNX.4.53.0311211516480.14856@s21.MatrixPark.net>

On Fri, 21 Nov 2003, Roderick A. Anderson wrote:

> New to the list and glad I found it.  Not from the Portland area (Hayden,
> ID).
>

Cool, welcome to the list.

> First the append hash solution.  I use the hash generated from some CGI.pm
> params then query a SQL Server database and and use DataHash to returned
> the row.  To append to the original hash I'm using a variation on code I
> got out of "Perl Cookbook".
>
> 	%SignUpInfo = (%SignUpInfo, $db2->DataHash());
>
> Is there a better or more efficient way to do this?
>
>
> Then the Win32::ODBC issue.  When I use the above $db2->DataHash() whether
> appending or creating a new hash I end up with an empty key/value in
> the hash.
>
> Doing a
>
>         foreach my $key (sort keys %SignUpInfo) {
>             print "$key: $SignUpInfo{$key}\n";
>
> gets me output with one line with only the colon on it.  Is there a way to
> remove this key/value combination?  I think it has to do with Win32::ODBC
> returning some kind of row identifier.
>

I'm guessing that %SignUpInfo was initially empty up top and
$db2->DataHash() had some type of error condition and returned undef
or ''.

Take the following for example which will print just a colon:

----------
sub ret_undef {
    return undef;
}

%a = ret_undef();
foreach (keys %a) {
    print "$_ : $a{$_}\n";
}
----------

Or to get a better idea of what's in the hash:

use Data::Dumper;
print Dumper(\%a);

Which shows you that you actually do have a blank key:
$VAR1 = {
          '' => undef
        };

I don't think you really want to remove the blank key/value combination,
you probably just want to make sure you're properly checking for error
conditions when doing the Win32::ODBC calls.

	-Joe Oppegaard

From ebaur at aracnet.com  Fri Nov 21 18:35:12 2003
From: ebaur at aracnet.com (Eric Shore Baur)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] reading a broken CSV file
Message-ID: <Pine.LNX.4.33.0311211626220.2140-100000@gird.baurhome.net>


	I doing an import from a CSV-style text file into a SQL database.
The data is set up so that I have one set of text files with a field
listing in them (so I know what matches up with what) and then the data
files in a parent directory.
	The data format looks something like this:

"title","some text","a date is next",1999/05/10,T,123,F,F,T,"more text"

	Fine... I can import that.  Unfortunatly, some of the records have
embeded newlines in them, so you end up with something like this:

"title","some text","a date is next",1999/05/10,T,123,F,F,T,"more text
goes here
until
the record
is done"

	... or, potentially:

"title","some text goes
over
multiple
lines","a date is next",1999/05/10,T,123,F,F,T,"more text"

	What I've been doing is simply doing the data import - letting
those screwed up lines fail when the SQL inserts run and then going back
and hand entering the screwed up data (since I"ll end up with partial
records, so I can search for the missing last field).  This is not,
however, a very maintainable method.  (I have to re-import things when the
data set changes, I get all new files, not just changes.)
	Is there any neat/slick way to get this data in there on the first
pass?  I tried using ParseWords, but I'm not sure if I utilized it to its
fullest extent.  I briefly played with a CSV driver for DBI, but it
couldn't handle things split over the newlines, either.

	This was awhile ago that I did this in the first place, I'm just
picking the project back up off the shelf, so to speak.  Although I had
kind of figured I'd have to re-write from scratch, I didn't want to fight
the same issues if there was an easy way out of it... any ideas?

Thanks,
Eric


From sechrest at peak.org  Fri Nov 21 18:00:04 2003
From: sechrest at peak.org (John Sechrest)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] reading a broken CSV file 
In-Reply-To: Your message of Fri, 21 Nov 2003 16:35:12 PST.
	<Pine.LNX.4.33.0311211626220.2140-100000@gird.baurhome.net> 
Message-ID: <200311220000.hAM004g00719@jas.peak.org>


Why not do a text substitution ? Do you have any indicator of what 
an end of field looks like?

Can you say that a record only ends when you have a " on the end of a line?
Or do you have to count the records.

Sounds like a pre-parser to force things into the right form is a good 
place to start.


Eric Shore Baur <ebaur@aracnet.com> writes:

 % 
 % 	I doing an import from a CSV-style text file into a SQL database.
 % The data is set up so that I have one set of text files with a field
 % listing in them (so I know what matches up with what) and then the data
 % files in a parent directory.
 % 	The data format looks something like this:
 % 
 % "title","some text","a date is next",1999/05/10,T,123,F,F,T,"more text"
 % 
 % 	Fine... I can import that.  Unfortunatly, some of the records have
 % embeded newlines in them, so you end up with something like this:
 % 
 % "title","some text","a date is next",1999/05/10,T,123,F,F,T,"more text
 % goes here
 % until
 % the record
 % is done"
 % 
 % 	... or, potentially:
 % 
 % "title","some text goes
 % over
 % multiple
 % lines","a date is next",1999/05/10,T,123,F,F,T,"more text"
 % 
 % 	What I've been doing is simply doing the data import - letting
 % those screwed up lines fail when the SQL inserts run and then going back
 % and hand entering the screwed up data (since I"ll end up with partial
 % records, so I can search for the missing last field).  This is not,
 % however, a very maintainable method.  (I have to re-import things when the
 % data set changes, I get all new files, not just changes.)
 % 	Is there any neat/slick way to get this data in there on the first
 % pass?  I tried using ParseWords, but I'm not sure if I utilized it to its
 % fullest extent.  I briefly played with a CSV driver for DBI, but it
 % couldn't handle things split over the newlines, either.
 % 
 % 	This was awhile ago that I did this in the first place, I'm just
 % picking the project back up off the shelf, so to speak.  Although I had
 % kind of figured I'd have to re-write from scratch, I didn't want to fight
 % the same issues if there was an easy way out of it... any ideas?
 % 
 % Thanks,
 % Eric
 % 
 % _______________________________________________
 % Pdx-pm-list mailing list
 % Pdx-pm-list@mail.pm.org
 % http://mail.pm.org/mailman/listinfo/pdx-pm-list

-----
John Sechrest          .         Helping people use
CTO PEAK -              .           computers and the Internet
Public Electronic         .            more effectively
Access to Knowledge,Inc       .                      
1600 SW Western, Suite 180       .            Internet: sechrest@peak.org
Corvallis Oregon 97333               .                  (541) 754-7325
                                            . http://www.peak.org/~sechrest

From jeff at vpservices.com  Fri Nov 21 18:03:45 2003
From: jeff at vpservices.com (Jeff Zucker)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] reading a broken CSV file
In-Reply-To: <Pine.LNX.4.33.0311211626220.2140-100000@gird.baurhome.net>
References: <Pine.LNX.4.33.0311211626220.2140-100000@gird.baurhome.net>
Message-ID: <3FBEA7E1.7010800@vpservices.com>

Eric Shore Baur wrote:

Why not use DBD::CSV, which will let you query the data files with SQL 
and which handles embedded newlines just fine.

-- 
Jeff

>	I doing an import from a CSV-style text file into a SQL database.
>The data is set up so that I have one set of text files with a field
>listing in them (so I know what matches up with what) and then the data
>files in a parent directory.
>	The data format looks something like this:
>
>"title","some text","a date is next",1999/05/10,T,123,F,F,T,"more text"
>
>	Fine... I can import that.  Unfortunatly, some of the records have
>embeded newlines in them, so you end up with something like this:
>
>"title","some text","a date is next",1999/05/10,T,123,F,F,T,"more text
>goes here
>until
>the record
>is done"
>
>	... or, potentially:
>
>"title","some text goes
>over
>multiple
>lines","a date is next",1999/05/10,T,123,F,F,T,"more text"
>
>	What I've been doing is simply doing the data import - letting
>those screwed up lines fail when the SQL inserts run and then going back
>and hand entering the screwed up data (since I"ll end up with partial
>records, so I can search for the missing last field).  This is not,
>however, a very maintainable method.  (I have to re-import things when the
>data set changes, I get all new files, not just changes.)
>	Is there any neat/slick way to get this data in there on the first
>pass?  I tried using ParseWords, but I'm not sure if I utilized it to its
>fullest extent.  I briefly played with a CSV driver for DBI, but it
>couldn't handle things split over the newlines, either.
>
>	This was awhile ago that I did this in the first place, I'm just
>picking the project back up off the shelf, so to speak.  Although I had
>kind of figured I'd have to re-write from scratch, I didn't want to fight
>the same issues if there was an easy way out of it... any ideas?
>
>Thanks,
>Eric
>
>_______________________________________________
>Pdx-pm-list mailing list
>Pdx-pm-list@mail.pm.org
>http://mail.pm.org/mailman/listinfo/pdx-pm-list
>
>
>  
>


From ckuskie at dalsemi.com  Fri Nov 21 18:19:55 2003
From: ckuskie at dalsemi.com (Colin Kuskie)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] reading a broken CSV file
In-Reply-To: <Pine.LNX.4.33.0311211626220.2140-100000@gird.baurhome.net>
References: <Pine.LNX.4.33.0311211626220.2140-100000@gird.baurhome.net>
Message-ID: <20031122001955.GJ8719@dalsemi.com>

On Fri, Nov 21, 2003 at 04:35:12PM -0800, Eric Shore Baur wrote:
> 
> 	Is there any neat/slick way to get this data in there on the first
> pass?  I tried using ParseWords, but I'm not sure if I utilized it to its
> fullest extent.  I briefly played with a CSV driver for DBI, but it
> couldn't handle things split over the newlines, either.

If the number of columns in each file is a constant, then you could try
the following:

Get a line.
Feed it into some module that handles CSV and returns an array of elements
Do I have enough columns?
NO: Remove newline from present line;
    Fetch another line from file;
    Append it to the current line and check again.
YES: Push data into database.

From jeff at vpservices.com  Fri Nov 21 18:32:41 2003
From: jeff at vpservices.com (Jeff Zucker)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] reading a broken CSV file
In-Reply-To: <20031122001955.GJ8719@dalsemi.com>
References: <Pine.LNX.4.33.0311211626220.2140-100000@gird.baurhome.net>
	<20031122001955.GJ8719@dalsemi.com>
Message-ID: <3FBEAEA9.40608@vpservices.com>

Colin Kuskie wrote:

>Get a line.
>Feed it into some module that handles CSV and returns an array of elements
>
If that module is Text::CSV_XS, then the rest of this is irrelevant 
because it handles embedded newlines.  But since Eric is dealing with 
DBI and a database already, I can't think of any reason that DBD::CSV 
isn't the right tool for this job, but I'm prejudiced.

>Do I have enough columns?
>NO: Remove newline from present line;
>    Fetch another line from file;
>    Append it to the current line and check again.
>YES: Push data into database.
>
Text::CSV_XS and DBD::CSV do all that for you.

-- 
Jeff


From bruce at gridpoint.com  Fri Nov 21 22:11:35 2003
From: bruce at gridpoint.com (Bruce J Keeler)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <Pine.LNX.4.33.0311211352500.31646-100000@main.cyber-office.net>
References: <Pine.LNX.4.33.0311211352500.31646-100000@main.cyber-office.net>
Message-ID: <1069474294.2179.154.camel@scrunge.gridpoint.com>

On Fri, 2003-11-21 at 14:23, Roderick A. Anderson wrote:

> First the append hash solution.  I use the hash generated from some CGI.pm
> params then query a SQL Server database and and use DataHash to returned
> the row.  To append to the original hash I'm using a variation on code I
> got out of "Perl Cookbook".
> 
> 	%SignUpInfo = (%SignUpInfo, $db2->DataHash());
> 
> Is there a better or more efficient way to do this?

This will recreate the hash, re-adding all the elements that were there
before.  In cases where there are a lot of items in the hash to begin
with, that's going to be inefficient.  Something like this might be
better in that case:

        %tmp = $db2->DataHash();
        @SignUpInfo{keys %tmp} = values %tmp;
        
Though now the new hash entries are going to be hashed twice instead, so
that's less efficient in the case where there's more data being added
than was there to begin with.
        
--Bruce


From merlyn at stonehenge.com  Fri Nov 21 22:25:09 2003
From: merlyn at stonehenge.com (Randal L. Schwartz)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <1069474294.2179.154.camel@scrunge.gridpoint.com>
References: <Pine.LNX.4.33.0311211352500.31646-100000@main.cyber-office.net>
	<1069474294.2179.154.camel@scrunge.gridpoint.com>
Message-ID: <868ym9ghcv.fsf@blue.stonehenge.com>

>>>>> "Bruce" == Bruce J Keeler <bruce@gridpoint.com> writes:

Bruce> This will recreate the hash, re-adding all the elements that were there
Bruce> before.  In cases where there are a lot of items in the hash to begin
Bruce> with, that's going to be inefficient.  Something like this might be
Bruce> better in that case:

Bruce>         %tmp = $db2->DataHash();
Bruce>         @SignUpInfo{keys %tmp} = values %tmp;
        
Bruce> Though now the new hash entries are going to be hashed twice instead, so
Bruce> that's less efficient in the case where there's more data being added
Bruce> than was there to begin with.

I think it's even been shown that iteration is better:

    my @array = $db2->DataHash();
    while (@array) {
      $SignUpInfo{shift @array} = shift @array;
    }

Of course, I'm cheating here, knowing that the left side
is eval'ed before the right.  If you don't want that much magic:

    my %tmp = $db2->DataHash();
    while (my ($k, $v) = each %tmp) {
      $SignUpInfo{$k} = $v;
    }

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

From cdawson at webiphany.com  Fri Nov 21 23:18:47 2003
From: cdawson at webiphany.com (Chris Dawson)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Net::SCP
Message-ID: <3FBEF1B7.1090308@webiphany.com>

Does anyone have experience using this?  I plan to allow a user to 
upload files from a web script, and I am wondering if someone has an 
elegant way of generating keys.  Perldoc suggests using keys over 
setting a password, and I see the logic here rather than storing them in 
cleartext within a script, but am not sure if I am thinking about this 
in the correct way.  It would be nice if there were a OO method exposed 
for doing this sort of thing.

This might sound like a rant or complaint about the interface, but trust 
me, it is a question.  :)

Thanks,
Chris


From bruce at gridpoint.com  Sat Nov 22 15:38:18 2003
From: bruce at gridpoint.com (Bruce J Keeler)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <868ym9ghcv.fsf@blue.stonehenge.com>
References: <Pine.LNX.4.33.0311211352500.31646-100000@main.cyber-office.net>
	<1069474294.2179.154.camel@scrunge.gridpoint.com>
	<868ym9ghcv.fsf@blue.stonehenge.com>
Message-ID: <1069537097.2179.185.camel@scrunge.gridpoint.com>

On Fri, 2003-11-21 at 20:25, Randal L. Schwartz wrote:
>         
> I think it's even been shown that iteration is better:
> 
>     my @array = $db2->DataHash();
>     while (@array) {
>       $SignUpInfo{shift @array} = shift @array;
>     }
> 
Makes sense as it doesn't have to compute hash keys for %tmp.

> Of course, I'm cheating here, knowing that the left side
> is eval'ed before the right.  If you don't want that much magic:
> 
>     my %tmp = $db2->DataHash();
>     while (my ($k, $v) = each %tmp) {
>       $SignUpInfo{$k} = $v;
>     }

You're saying that's cheaper than

>     @SignUpInfo{keys %tmp} = values %tmp;

?

This I found hard to believe.  Why would Perl pessimize it so?  I
whipped up the following:

        #!/usr/bin/perl
        
        use Benchmark qw( cmpthese );
        
        push (@array, rand, rand) for (1..100);
        
        cmpthese ( -10, {
            iterated_hash => sub {
                my %dest;
        	my %tmp = @array;
                while (my ($k, $v) = each %tmp) {
        	    $dest{$k} = $v;
        	}
            },
            atonce => sub {
                my %dest;
        	my %tmp = @array;
        	@dest{keys %tmp} = values %tmp;
            },
            iterated_array => sub {
                my %dest;
        	my @tmp = @array;
        	while (@tmp) {
        	    $dest{shift @tmp} = shift @tmp;
        	}
            },
        } );

Results:

                 Rate iterated_array  iterated_hash         atonce
iterated_array 1198/s             --           -56%           -71%
iterated_hash  2709/s           126%             --           -34%
atonce         4098/s           242%            51%             --

It seems that the array method is worst of all.  Most interesting.

My perl is:

bruce@scrunge| /tmp % perl -V
Summary of my perl5 (revision 5.0 version 8 subversion 2) configuration:
  Platform:
    osname=linux, osvers=2.4.22-xfs+ti1211,
archname=i386-linux-thread-multi
    uname='linux kosh 2.4.22-xfs+ti1211 #1 sat oct 25 10:11:37 est 2003
i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
-Dcccdlflags=-fPIC -Darchname=i386-linux -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.8.2 -Darchlib=/usr/lib/perl/5.8.2
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local
-Dsitelib=/usr/local/share/perl/5.8.2
-Dsitearch=/usr/local/lib/perl/5.8.2 -Dman1dir=/usr/share/man/man1
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm
-Duseshrplib -Dlibperl=libperl.so.5.8.2 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
    optimize='-O3',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN
-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='3.3.2 (Debian)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=true,
libperl=libperl.so.5.8.2
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'


Characteristics of this binary (from libperl): 
  Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES
PERL_IMPLICIT_CONTEXT
  Built under linux
  Compiled at Nov 15 2003 17:52:08
  @INC:
    /etc/perl
    /usr/local/lib/perl/5.8.2
    /usr/local/share/perl/5.8.2
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.8.2
    /usr/share/perl/5.8.2
    /usr/local/lib/site_perl
    /usr/local/lib/perl/5.8.0
    /usr/local/share/perl/5.8.0
    .


From rootbeer at redcat.com  Sat Nov 22 21:50:03 2003
From: rootbeer at redcat.com (Tom Phoenix)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Fireside Cafe, open 124 hours a week!
In-Reply-To: <20030731064406.GC24299@windhund.schwern.org>
References: <20030731064406.GC24299@windhund.schwern.org>
Message-ID: <Pine.BSO.4.53.0311221933090.22694@blue.stonehenge.com>

On Wed, 30 Jul 2003, Michael G Schwern wrote:

> BEST COFFEEHOUSE EVER
>
> Fireside Cafe, SE 13th and Powell.  Free wireless.  Free ethernet. Power
> jacks galore.  Comfy faux "cabin in the woods" feel.  Friendly clerk
> (the owner).  Little side room with rocking chairs for quietness.
> Populated mostly by studying college students.
>
> And the hours:  Open 7pm Sunday to Midnight on Friday.  124 hours a
> week!  YOW!

And now, it's open eternally. (Which sounds better to me than saying
"24/7".) Also, they have sandwiches and some snacks, so you don't have to
live on coffee alone.

But the bad news is that I can no longer connect to their free WiFi. It
worked an hour ago. But after I put my Mac to sleep and re-awakened it,
"There was an error joining the selected AirPort network." The attendant
on duty says that "Macs sometimes have problems", but can't offer any more
assistance. And I can't find anything in log files or elsewhere telling me
more about what's going on. Dang this user-friendly interface!

Details for those who are interested: PowerBook G4 using internal AirPort
card with Mac OS X 10.2.8. It worked without a problem on the first try; I
selected the network name from the pop-up menu and needed no password. I
used it for over an hour. Trying to reconnect, I tried manually specifying
the network name, tried no password, tried made-up passwords, tried the
network name as a password, told it to use the strongest network, told it
to use the last-used network, told it to use a specific network, called it
some bad names, tried totally different network settings (using my cell
phone to connect to Verizon's network, which works fine; I'm using it
now), then went back and tried everything again.

My best theory: When my machine went to sleep, it failed to properly
disconnect from their access point. When I try to connect again, their box
thinks my MAC address is already connected, and won't let me connect back
up. (Can this happen to WiFi?) Although this implies that the problem
would be cured by rebooting their equipment, I can't suggest that when at
least half a dozen folks are using their access point at the moment.

Anybody else have this happen? Did you figure out how to cure the problem?

--Tom Phoenix

From ebaur at aracnet.com  Mon Nov 24 10:04:09 2003
From: ebaur at aracnet.com (Eric Shore Baur)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] reading a broken CSV file
In-Reply-To: <3FBEAEA9.40608@vpservices.com>
Message-ID: <Pine.LNX.4.33.0311240759420.9367-100000@gird.baurhome.net>


	Sorry for being quiet for a couple days... got busy.  :)  At any
rate, this was kind of an old program and (if I remember correctly)
DBD:CSV was not handling things properly at the time... it sounds like it
does handle embeded newlines properly now, so I just need to give it
another go.
	On the other hand, I may be able to do some pre-processing, like
another post suggested.  I was trying to make sure I had the right line
and put it into the database all at the same time... I think its a much
better idea to make two passes instead.

Thanks for all the suggestions,
Eric

On Fri, 21 Nov 2003, Jeff Zucker wrote:

> Colin Kuskie wrote:
>
> >Get a line.
> >Feed it into some module that handles CSV and returns an array of elements
> >
> If that module is Text::CSV_XS, then the rest of this is irrelevant
> because it handles embedded newlines.  But since Eric is dealing with
> DBI and a database already, I can't think of any reason that DBD::CSV
> isn't the right tool for this job, but I'm prejudiced.
>
> >Do I have enough columns?
> >NO: Remove newline from present line;
> >    Fetch another line from file;
> >    Append it to the current line and check again.
> >YES: Push data into database.
> >
> Text::CSV_XS and DBD::CSV do all that for you.
>
>


From MichaelRWolf at att.net  Sun Nov 23 03:38:03 2003
From: MichaelRWolf at att.net (Michael R. Wolf)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] reading a broken CSV file
In-Reply-To: <3FBEAEA9.40608@vpservices.com> (Jeff Zucker's message of "Fri,
	21 Nov 2003 16:32:41 -0800")
References: <Pine.LNX.4.33.0311211626220.2140-100000@gird.baurhome.net>
	<20031122001955.GJ8719@dalsemi.com> <3FBEAEA9.40608@vpservices.com>
Message-ID: <uoev3juh0.fsf@att.net>


Jeff Zucker <jeff@vpservices.com> writes:

> Colin Kuskie wrote:
>
>>Get a line.
>>Feed it into some module that handles CSV and returns an array of elements
>>
> If that module is Text::CSV_XS, then the rest of this is irrelevant
> because it handles embedded newlines.  But since Eric is dealing with
> DBI and a database already, I can't think of any reason that DBD::CSV
> isn't the right tool for this job, but I'm prejudiced.

Prejudiced has such a negative connotation.. how 'bout well-informed?

[...]

Wanting to be more informed...

It appears that Text::CSV will *not* handle multi-line "records".
Will any other non-DBD module handle multi-line CSV's.

This is a timely thread for me. I just received a multi-line CSV from
an application. It appears to read into Excel OK, but not OpenOffice.
Since I'm the token Open Source guy in a mixed open/non-open new
venture, I'm wanting to get things done, and also wanting to use Perl
when I can. It's good to know that the DBD::CSV module does what I
want. Do other CSV modules also handle multi-line records?

Thanks,
Michael Wolf

-- 
Michael R. Wolf
    All mammals learn by playing!
        MichaelRWolf@att.net


From jeff at vpservices.com  Mon Nov 24 10:18:11 2003
From: jeff at vpservices.com (Jeff Zucker)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] reading a broken CSV file
In-Reply-To: <uoev3juh0.fsf@att.net>
References: <Pine.LNX.4.33.0311211626220.2140-100000@gird.baurhome.net>	<20031122001955.GJ8719@dalsemi.com>
	<3FBEAEA9.40608@vpservices.com> <uoev3juh0.fsf@att.net>
Message-ID: <3FC22F43.4020404@vpservices.com>

Michael R. Wolf wrote:

>Prejudiced has such a negative connotation.. how 'bout well-informed?
>  
>
How about -- is the maintainer of the module in question :-)

>It appears that Text::CSV will *not* handle multi-line "records".
>  
>
Yes, it will. set binary=1.  If you have problems, let me know, I'm also 
its maintainer now.

-- 
Jeff


From jeff at vpservices.com  Mon Nov 24 10:42:28 2003
From: jeff at vpservices.com (Jeff Zucker)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] reading a broken CSV file
In-Reply-To: <3FC22F43.4020404@vpservices.com>
References: <Pine.LNX.4.33.0311211626220.2140-100000@gird.baurhome.net>	<20031122001955.GJ8719@dalsemi.com>	<3FBEAEA9.40608@vpservices.com>
	<uoev3juh0.fsf@att.net> <3FC22F43.4020404@vpservices.com>
Message-ID: <3FC234F4.6080200@vpservices.com>

Jeff Zucker wrote:

> Michael R. Wolf wrote:
>
>> It appears that Text::CSV will *not* handle multi-line "records".
>>  
>>
> Yes, it will. set binary=1.  If you have problems, let me know, I'm 
> also its maintainer now.


Grrr, I should read before responding.  You're correct Text::CSV does 
not handle newlines but Text::CSV_XS does.  I don't maintain the former, 
I do maintain the later.

-- 
Jeff


From robbyrussell at pdxlug.org  Mon Nov 24 11:06:50 2003
From: robbyrussell at pdxlug.org (Robby Russell)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Fireside Cafe, open 124 hours a week!
In-Reply-To: <Pine.BSO.4.53.0311221933090.22694@blue.stonehenge.com>
References: <20030731064406.GC24299@windhund.schwern.org>
	<Pine.BSO.4.53.0311221933090.22694@blue.stonehenge.com>
Message-ID: <3FC23AAA.7060909@pdxlug.org>

Tom Phoenix wrote:
> On Wed, 30 Jul 2003, Michael G Schwern wrote:
> 
> 
>>BEST COFFEEHOUSE EVER
>>
>>Fireside Cafe, SE 13th and Powell.  Free wireless.  Free ethernet. Power
>>jacks galore.  Comfy faux "cabin in the woods" feel.  Friendly clerk
>>(the owner).  Little side room with rocking chairs for quietness.
>>Populated mostly by studying college students.
>>
>>And the hours:  Open 7pm Sunday to Midnight on Friday.  124 hours a

PDXLUG (http://www.pdxlug.org/) hosts their monthly meetings there. Good 
place.

-Robby


From raanders at acm.org  Mon Nov 24 14:57:23 2003
From: raanders at acm.org (Roderick A. Anderson)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <Pine.LNX.4.53.0311211516480.14856@s21.MatrixPark.net>
Message-ID: <Pine.LNX.4.33.0311241250160.7296-100000@main.cyber-office.net>

On Fri, 21 Nov 2003, Joe Oppegaard wrote:

> use Data::Dumper;
> print Dumper(\%a);

I'll do that if it get to be too much of an irritation.  I do need to look
for errors.  Or at least more indepth errors.  I am selecting some
attributes from several tables using a username I know is good.  (I
already checked it.)  No other error checks but thinking on this it could
be the way the Win32::ODBC DataHash method is handling duplicate attribute
names from different tables even though I'm using table identifiers in 
the select.

Thanks for the ideas.


Rod
-- 
  "Open Source Software - Usually you get more than you pay for..."
   "Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL"


From raanders at acm.org  Mon Nov 24 15:09:37 2003
From: raanders at acm.org (Roderick A. Anderson)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <868ym9ghcv.fsf@blue.stonehenge.com>
Message-ID: <Pine.LNX.4.33.0311241258440.7296-100000@main.cyber-office.net>

On 21 Nov 2003, Randal L. Schwartz wrote:

> I think it's even been shown that iteration is better:
> 
>     my @array = $db2->DataHash();
>     while (@array) {
>       $SignUpInfo{shift @array} = shift @array;
>     }

Very cool.  Amazingly simple once you see it.

> Of course, I'm cheating here, knowing that the left side
> is eval'ed before the right.  If you don't want that much magic:

This is a great point to keep in mind.  Magic works for me.  And when
someone else looks at the code I can impress the hell out of them with
this neat trick.  :-)

>     my %tmp = $db2->DataHash();
>     while (my ($k, $v) = each %tmp) {
>       $SignUpInfo{$k} = $v;
>     }

I was hoping to avoid this as it seemed _so_ sledge-hammerish.  I used it
for awhile but wasn't happy with the concept.  I really like the one
above.

Thanks for the insight.


Rod
-- 
  "Open Source Software - Usually you get more than you pay for..."
   "Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL"


From raanders at acm.org  Mon Nov 24 15:18:51 2003
From: raanders at acm.org (Roderick A. Anderson)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <1069537097.2179.185.camel@scrunge.gridpoint.com>
Message-ID: <Pine.LNX.4.33.0311241312260.7296-100000@main.cyber-office.net>

On Sat, 22 Nov 2003, Bruce J Keeler wrote:

> You're saying that's cheaper than
> 
> >     @SignUpInfo{keys %tmp} = values %tmp;

This too looks properly perlish.
 
> Results:
> 
>                  Rate iterated_array  iterated_hash         atonce
> iterated_array 1198/s             --           -56%           -71%
> iterated_hash  2709/s           126%             --           -34%
> atonce         4098/s           242%            51%             --
> 
> It seems that the array method is worst of all.  Most interesting.

Hum.  Looks over speed.  Reminds me of my younger years and the low-riders
verses the hot-rodders.  But since the atonce is fastest and looks great I
could get the best of both worlds.  A low rider that performs.  Great
paint job and dual quads.  Cool!


Rod
-- 
  "Open Source Software - Usually you get more than you pay for..."
   "Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL"


From raanders at acm.org  Mon Nov 24 15:23:25 2003
From: raanders at acm.org (Roderick A. Anderson)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Slow replies
Message-ID: <Pine.LNX.4.33.0311241319460.7296-100000@main.cyber-office.net>

Sorry to be so slow replying.  A procmail typo had the messages going into
a non-visible folder.  I kept checking all week-end as was starting to 
mutter ill things until I checked the list archives and realized it was on 
my end.


Rod
-- 
  "Open Source Software - Usually you get more than you pay for..."
   "Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL"


From raanders at acm.org  Mon Nov 24 16:00:47 2003
From: raanders at acm.org (Roderick A. Anderson)
Date: Mon Aug  2 21:34:26 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <1069537097.2179.185.camel@scrunge.gridpoint.com>
Message-ID: <Pine.LNX.4.33.0311241354540.8174-100000@main.cyber-office.net>

On Sat, 22 Nov 2003, Bruce J Keeler wrote:

> You're saying that's cheaper than
> 
> >     @SignUpInfo{keys %tmp} = values %tmp;

Now trying to put this in place I'm confused as hell.  Shouldn't this be

	$SignUpInfo{keys %tmp} = values %tmp;

I know perl usually does the right thing with what it's handed but I 
really don't understand the @something{} verses the $something{} here.
Do the curly braces over-ride(?) the at-symbol? 


Rod
-- 
  "Open Source Software - Usually you get more than you pay for..."
   "Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL"


From tcaine at eli.net  Mon Nov 24 16:46:12 2003
From: tcaine at eli.net (Todd Caine)
Date: Mon Aug  2 21:34:27 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <Pine.LNX.4.33.0311241354540.8174-100000@main.cyber-office.net>
References: <1069537097.2179.185.camel@scrunge.gridpoint.com>
	<Pine.LNX.4.33.0311241354540.8174-100000@main.cyber-office.net>
Message-ID: <20031124224612.GB12285@eli.net>


It's called a hash slice.

http://tlc.perlarchive.com/articles/perl/ug0001.shtml


On (Mon, Nov 24 14:00), Roderick A. Anderson wrote:
> > >     @SignUpInfo{keys %tmp} = values %tmp;
> 
> Now trying to put this in place I'm confused as hell.  Shouldn't this be
> 
> 	$SignUpInfo{keys %tmp} = values %tmp;
> 
> I know perl usually does the right thing with what it's handed but I 
> really don't understand the @something{} verses the $something{} here.
> Do the curly braces over-ride(?) the at-symbol? 

From merlyn at stonehenge.com  Mon Nov 24 16:59:56 2003
From: merlyn at stonehenge.com (Randal L. Schwartz)
Date: Mon Aug  2 21:34:27 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <Pine.LNX.4.33.0311241354540.8174-100000@main.cyber-office.net>
References: <Pine.LNX.4.33.0311241354540.8174-100000@main.cyber-office.net>
Message-ID: <864qwt2x06.fsf@blue.stonehenge.com>

>>>>> "Roderick" == Roderick A Anderson <raanders@acm.org> writes:

Roderick> On Sat, 22 Nov 2003, Bruce J Keeler wrote:
>> You're saying that's cheaper than
>> 
>> >     @SignUpInfo{keys %tmp} = values %tmp;

Roderick> Now trying to put this in place I'm confused as hell.  Shouldn't this be

Roderick> 	$SignUpInfo{keys %tmp} = values %tmp;

Roderick> I know perl usually does the right thing with what it's handed but I 
Roderick> really don't understand the @something{} verses the $something{} here.

One is a hash element, the other is a hash slice.

Roderick> Do the curly braces over-ride(?) the at-symbol? 

Define "override".

In the same manner that we go from:

        $array[3] = "fred";

to

        @array[3, 5, 8] = ("fred", "barney", "dino");

we go from:

        $hash{"fred"} = "flintstone";

to

        @hash{("fred", "barney", "dino")} = ("flintstone", "rubble", undef);

A hash slice sets many items at once in hash, like an array slice
sets many items at once in an array.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

From raanders at acm.org  Mon Nov 24 18:24:47 2003
From: raanders at acm.org (Roderick A. Anderson)
Date: Mon Aug  2 21:34:27 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <864qwt2x06.fsf@blue.stonehenge.com>
Message-ID: <Pine.LNX.4.33.0311241621390.9612-100000@main.cyber-office.net>

On 24 Nov 2003, Randal L. Schwartz wrote:

> Define "override".

Can't but I see further down you've explained it.

>         @hash{("fred", "barney", "dino")} = ("flintstone", "rubble", undef);
> 
> A hash slice sets many items at once in hash, like an array slice
> sets many items at once in an array.


Thanks to you and Todd,


Rod
-- 
  "Open Source Software - Usually you get more than you pay for..."
   "Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL"


From tkil at scrye.com  Mon Nov 24 22:02:26 2003
From: tkil at scrye.com (Tkil)
Date: Mon Aug  2 21:34:27 2004
Subject: [Pdx-pm] Hash question
In-Reply-To: <1069537097.2179.185.camel@scrunge.gridpoint.com>
References: <Pine.LNX.4.33.0311211352500.31646-100000@main.cyber-office.net>
	<1069474294.2179.154.camel@scrunge.gridpoint.com>
	<868ym9ghcv.fsf@blue.stonehenge.com>
	<1069537097.2179.185.camel@scrunge.gridpoint.com>
Message-ID: <gptfhrt7z.fsf@brand.scrye.com>


Regarding various ways to append one hash onto another, I played with
it a bit this afternoon.

Parameters to tune for:

 * difficulty of coding

 * size of existing hash
 
 * number of elements to add to existing hash

 * are auxilary data structures required

 * memory efficiency

 * time efficiency

Note that I cheated: in all my tests below, number of existing
elements is equal to the number being added.  Test program at:

   http://scrye.com/~tkil/perl/append-hash.plx

Observations:

   The simplest method is hash assignment.  It is regularly about half
   the speed of the other methods (this might be an artifact of my
   "cheat" above, though.)  It is easy to code and difficult to get
   wrong.  Until you know this is your bottleneck, consider sticking
   with this.

   For sets up to the low thousands (on this hardware, at least),
   using array slices of @array as fodder for hash slice assignment
   into %dest seems the fastest.  This is the "even/odd" approach
   described below.  For small sets, this can be nearly 40% faster
   than any other method examined.

   For small sets (less than 100 elements or so), Bruce's
   copy-hash-at-once is the best method that doesn't require any
   "outside" knowledge.

   Larger that, the best self-contained method investigated below is
   to index into the source array directly.  Some speedups (5-10%) can
   be obtained by unrolling the loops.

Summary rankings of simplest and fastest methods:

   10 elements        100 elements      1000 elements     10000 elements
   ----------------   ---------------   ----------------  ----------------
   h_assign 15085/s   h_assign 1485/s   h_assign  95.6/s  h_assign  4.97/s
   duffs_16 22318/s   a_index  2326/s   h_atonce 148/s    h_atonce  7.00/s
   a_index  22454/s   h_atonce 2425/s   a_index  165/s    a_index   9.88/s
   h_atonce 23247/s   unr_16   2571/s   duffs_16 179/s    even_odd  9.94/s
   unr_16   23423/s   duffs_16 2594/s   unr_16   181/s    duffs_16 10.3/s
   even_odd 31992/s   even_odd 3283/s   even_odd 195/s    unr_16   10.4/s

Randal, Tom -- did anything ever come out of the p5p discussions to
allow "push HASH, LIST" to do this?  Or am I making that up?

Long-winded crap:

To make it a bit more realistic, and to try it out with different
sizes of hashes, I put it into a big loop and looked at 10, 100, 1000,
and 10000-element hashes and arrays:

| for my $n_hash_elts ( 10, 100, 1000, 10000 )
| {
|     my @array     = map { rand } 1 .. 2*$n_hash_elts;
|     my %orig_dest = map { rand } @array;

On each of the sets of benchmarks, I included the previous winner:

|     my $h_atonce = sub {
|         my %dest = %orig_dest;
|         my %tmp = @array;
|         @dest{keys %tmp} = values %tmp;
|     };

And my first alternate solution, using array indexing to avoid the
cost of copying or modifying @array:

|     my $a_index = sub {
|         my %dest = %orig_dest;
|         for ( my $i = 0; $i < @array; $i += 2 ) {
|             $dest{ $array[$i] } = $array[$i+1];
|         }
|     };

A failed experiment, where I just tried the brute force approach.
(Note that this isn't really a failure; if this is fast enough, by all
means, use it...)

|     my $h_assign = sub {
|         my %dest = %orig_dest;
|         %dest = ( %dest, @array );
|     };

Another failed experiment, where I tried to use 'splice' to minimize
the number of changes to the @tmp copy of @array:

|     my $a_splice = sub {
|         my %dest = %orig_dest;
|         my @tmp = @array;
|         while (@tmp) {
|             my ( $k, $v ) = splice @tmp, 0, 2;
|             $dest{$k} = $v;
|         }
|     };

Some simple unrollings of the array index case, at 8, 16, and 32
elements (I included 24 elements later):

|     my $unr_8 = sub {
|         my %dest = %orig_dest;
|         my $i = 0;
|         while ( $i < @array-8 ) {
|             $dest{ $array[$i   ] } = $array[$i+1];
|             $dest{ $array[$i+2 ] } = $array[$i+3];
|             $dest{ $array[$i+4 ] } = $array[$i+5];
|             $dest{ $array[$i+6 ] } = $array[$i+7];
|             $i += 8;
|         }
|         while ( $i < @array )
|         {
|             $dest{ $array[$i   ] } = $array[$i+1];
|             $i += 2;
|         }
|     };

On the "that makes me feel dirty" scale, how about one that uses
Duff's Device?  <http://www.lysator.liu.se/c/duffs-device.html>

|     my $duffs_16 = sub {
|         my %dest = %orig_dest;
|         my $t = @array % 16;
|         my $i = $t - 16;
|         goto "TARGET_$t";
|         while ( $i < @array ) {
|           TARGET_0:  $dest{ $array[$i   ] } = $array[$i+1 ];
|           TARGET_14: $dest{ $array[$i+2 ] } = $array[$i+3 ];
|           TARGET_12: $dest{ $array[$i+4 ] } = $array[$i+5 ];
|           TARGET_10: $dest{ $array[$i+6 ] } = $array[$i+7 ];
|           TARGET_8:  $dest{ $array[$i+8 ] } = $array[$i+9 ];
|           TARGET_6:  $dest{ $array[$i+10] } = $array[$i+11];
|           TARGET_4:  $dest{ $array[$i+12] } = $array[$i+13];
|           TARGET_2:  $dest{ $array[$i+14] } = $array[$i+15];
|             $i += 16;
|         }
|     };

And, in my final bow to the benchmarking gods:

|     my @odd  = grep { $_ & 1 } 0 .. $#array;
|     my @even = map { $_-1 } @odd;
| 
|     my $even_odd = sub {
|         my %dest = %orig_dest;
|         @dest{@array[@even]} = @array[@odd];
|     };

There is another implementation that comes to mind, if we can assert
these conditions:

1. Having extra entries in %dest is ok; and

2. The universe of keys is fully distinct from the universe of values.

2. No values are undef (or you are running in "no warnings"):

Then you could do something like:

   @dest{ '', @array } = ( @array, '' );

Heh.  "Careful with that axe, Eugene!"

Anyway, here are results for various set sizes:

| $ ./append-hash.plx
|
| === 10 elements ===
| 
| original methods:
|             Rate  a_shift   h_each  a_index h_atonce
| a_shift   9878/s       --     -47%     -56%     -58%
| h_each   18654/s      89%       --     -17%     -21%
| a_index  22540/s     128%      21%       --      -5%
| h_atonce 23685/s     140%      27%       5%       --
| 
| failed experiments:
|             Rate h_assign a_splice  a_index h_atonce
| h_assign 15085/s       --      -0%     -33%     -36%
| a_splice 15123/s       0%       --     -33%     -36%
| a_index  22583/s      50%      49%       --      -4%
| h_atonce 23559/s      56%      56%       4%       --
| 
| unrolled:
|             Rate   unr_32  a_index    unr_8 h_atonce   unr_16
| unr_32   21599/s       --      -4%      -7%      -8%      -9%
| a_index  22454/s       4%       --      -3%      -4%      -6%
| unr_8    23239/s       8%       3%       --      -1%      -2%
| h_atonce 23420/s       8%       4%       1%       --      -2%
| unr_16   23833/s      10%       6%       3%       2%       --
| 
| the contenders:
|             Rate duffs_16  a_index h_atonce   unr_16 even_odd
| duffs_16 22318/s       --      -1%      -4%      -5%     -30%
| a_index  22454/s       1%       --      -3%      -4%     -30%
| h_atonce 23247/s       4%       4%       --      -1%     -27%
| unr_16   23423/s       5%       4%       1%       --     -27%
| even_odd 31992/s      43%      42%      38%      37%       --
|
| === 100 elements ===
| 
| original methods:
|            Rate  a_shift   h_each  a_index h_atonce
| a_shift  1018/s       --     -48%     -56%     -59%
| h_each   1942/s      91%       --     -17%     -22%
| a_index  2329/s     129%      20%       --      -6%
| h_atonce 2477/s     143%      28%       6%       --
| 
| failed experiments:
|            Rate h_assign a_splice  a_index h_atonce
| h_assign 1485/s       --      -4%     -36%     -38%
| a_splice 1544/s       4%       --     -34%     -36%
| a_index  2325/s      57%      51%       --      -3%
| h_atonce 2401/s      62%      56%       3%       --
| 
| unrolled:
|            Rate  a_index h_atonce    unr_8   unr_32   unr_16
| a_index  2322/s       --      -4%      -7%      -9%     -10%
| h_atonce 2431/s       5%       --      -3%      -5%      -6%
| unr_8    2504/s       8%       3%       --      -2%      -3%
| unr_32   2546/s      10%       5%       2%       --      -1%
| unr_16   2574/s      11%       6%       3%       1%       --
| 
| the contenders:
|            Rate  a_index h_atonce   unr_16 duffs_16 even_odd
| a_index  2326/s       --      -4%     -10%     -10%     -29%
| h_atonce 2425/s       4%       --      -6%      -7%     -26%
| unr_16   2571/s      11%       6%       --      -1%     -22%
| duffs_16 2594/s      12%       7%       1%       --     -21%
| even_odd 3283/s      41%      35%      28%      27%       --
|
| === 1000 elements ===
| 
| original methods:
|            Rate  a_shift   h_each h_atonce  a_index
| a_shift  88.5/s       --     -39%     -45%     -54%
| h_each    145/s      63%       --     -10%     -24%
| h_atonce  160/s      81%      11%       --     -16%
| a_index   191/s     116%      32%      19%       --
| 
| failed experiments:
|            Rate h_assign a_splice h_atonce  a_index
| h_assign 95.6/s       --     -19%     -37%     -45%
| a_splice  117/s      23%       --     -23%     -32%
| h_atonce  153/s      60%      30%       --     -12%
| a_index   173/s      81%      48%      13%       --
| 
| unrolled:
|           Rate h_atonce  a_index    unr_8   unr_32   unr_16
| h_atonce 152/s       --     -10%     -15%     -15%     -16%
| a_index  169/s      12%       --      -5%      -5%      -6%
| unr_8    177/s      17%       5%       --      -0%      -2%
| unr_32   178/s      17%       5%       0%       --      -2%
| unr_16   181/s      19%       7%       2%       2%       --
| 
| the contenders:
|           Rate h_atonce  a_index duffs_16   unr_16 even_odd
| h_atonce 148/s       --     -10%     -17%     -18%     -24%
| a_index  165/s      11%       --      -8%      -9%     -15%
| duffs_16 179/s      21%       9%       --      -1%      -8%
| unr_16   181/s      22%      10%       1%       --      -7%
| even_odd 195/s      32%      18%       9%       8%       --
|
| === 10000 elements ===
| 
| original methods:
|            Rate  a_shift h_atonce   h_each  a_index
| a_shift  6.00/s       --     -15%     -21%     -40%
| h_atonce 7.06/s      18%       --      -7%     -30%
| h_each   7.60/s      27%       8%       --     -24%
| a_index  10.0/s      67%      42%      32%       --
| 
| failed experiments:
|            Rate h_assign h_atonce a_splice  a_index
| h_assign 4.97/s       --     -29%     -31%     -50%
| h_atonce 7.03/s      41%       --      -2%     -29%
| a_splice 7.19/s      45%       2%       --     -28%
| a_index  9.94/s     100%      41%      38%       --
| 
| unrolled:
|            Rate h_atonce  a_index    unr_8   unr_32   unr_16
| h_atonce 7.03/s       --     -29%     -31%     -32%     -32%
| a_index  9.90/s      41%       --      -3%      -4%      -5%
| unr_8    10.2/s      45%       3%       --      -1%      -2%
| unr_32   10.3/s      46%       4%       1%       --      -1%
| unr_16   10.4/s      48%       5%       2%       1%       --
| 
| the contenders:
|            Rate h_atonce  a_index even_odd duffs_16   unr_16
| h_atonce 7.00/s       --     -29%     -30%     -32%     -32%
| a_index  9.88/s      41%       --      -1%      -4%      -5%
| even_odd 9.94/s      42%       1%       --      -3%      -4%
| duffs_16 10.3/s      47%       4%       4%       --      -1%
| unr_16   10.4/s      48%       5%       4%       1%       --

For small data sets, the "even odd" approach pretty clearly dominates
the field.  The fact that there is some preprocessing involved doesn't
disqualify it in my mind; in a database environment, you are often
fetching exactly the same number of fields each time, so building the
even/odd arrays once is not a problem.  And note that these are
indexes into the result arrays, not results themselves, so they're
quite reusable.

The "hash at once" construct does well until the sets get particularly
large.  It has an advantage over the "array index" in smaller sets,
and is probably easier to code correctly offhand.  "array index"
is nearly as fast as "hash at once" with smaller sets, catching up as
early as 100 elements.  As an added bonus, it has the smallest memory
and icache footprint of any of these methods.

Unrolling the array index method does give additional speed, but it is
probably not worth the code bulk.  (Hm... the thought of using eval
STRING to generate an unrolled subroutine at run time is tempting.)
The fact that 16 is regularly faster than 8 and 32 is interesting; I
wonder if I'm hitting an icache limitation at the 32.  A quick run
with 24 showed it performing about as well as the others

| unrolled:
|           Rate h_atonce  a_index   unr_32    unr_8   unr_24   unr_16
| h_atonce 151/s       --      -9%     -14%     -14%     -15%     -16%
| a_index  167/s      10%       --      -5%      -5%      -7%      -7%
| unr_32   175/s      16%       5%       --      -0%      -2%      -3%
| unr_8    175/s      16%       5%       0%       --      -2%      -2%
| unr_24   178/s      18%       7%       2%       2%       --      -1%
| unr_16   180/s      19%       8%       3%       2%       1%       --

Duff's Device is just silly, and provides less and less return as the
set size gets larger.