<3C98F086.38780A58@mailhost.cmi.cypress.com> <20020320144204.A17417@acadcam.com>
Message-ID: <20020320181032.GA12614@phluffynet.com>
I've been wanting to take part for some time myself but haven't managed to for several reasons...one of which is that Wednesdays aren't that convenient. Oh by the way, on behalf of all the other lurkers, thank you to whoever decided to simply add folks to the new list (roughly a year ago iwhen lists changed) as opposed to needing them to re-sign back on. A wise decision that. Back to the discussion, I live in Eagan and work at West Group. I'm uncertain but I imagine it wouldn't be too hard to get a suitably equipped room here in this gargantuan building. I'd be willing to find out. I don't get Wednesdays off until mid-April though......
Mark Chaudhary
On Wed, Mar 20, 2002 at 02:42:04PM -0600, Jim Anderson wrote:
> On Wed, Mar 20, 2002 at 02:26:46PM -0600, Andrew Jonsson wrote:
> > I've been wanting to attend a PM meeting for some time, but they tend to
> > be a little North-Side specific. I work just south of the Mall O.A. and
> > am moving to lakeville. Has there been any discussion about an
> > occasional meeting at a more centralized or even slightly
> > southerly-skewed location?
>
> If you can supply a room with a VGA projector, an Internet connection,
> and room for about 20 or so people to sit, you can have it wherever you
> want.
>
> Personally, living and working in the Roseville/New Brighton/St. Anthony
> area, I like these northerly meetings.
>
> --
> Jim Anderson (612) 782-0456 jim@acadcam.com
> Anderson CAD/CAM, Inc Lucifer designed MS-DOS to try
> 3800 Apache Lane NE men's souls.
> St Anthony, MN 55421 Then he had a better idea...
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From matt at omega.org Wed Mar 20 18:21:21 2002
From: matt at omega.org (Matt Johnson)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: A meeting?
Message-ID: <0ceba2709001532FE5@mail5.mn.rr.com>
If the value of the information you receive or the value of the contacts you meet at a meeting is not worth cost of the drive for you, I certainly recommend staying where you are.
You can always start a Lakeville perl mongers group.
-Matt
The following message was sent by Andrew Jonsson on Wed, 20 Mar 2002 16:57:16 -0600.
> I'm not going to fight at all. If Mpls PM wants to stay
> North-side-centric, then oh well. I may or may not attend. No biggie.
>
> Andrew Jonsson Cypress Semiconductor, MN Inc.
>
> Dave Rolsky wrote:
> >
> > On Wed, 20 Mar 2002, Andrew Jonsson wrote:
> >
> > > I've been wanting to attend a PM meeting for some time, but they tend
> to
> > > be a little North-Side specific. I work just south of the Mall O.A.
> and
> > > am moving to lakeville. Has there been any discussion about an
> > > occasional meeting at a more centralized or even slightly
> > > southerly-skewed location?
> >
> > It doesn't matter too much to me. I live in Mpls proper so most places
> > are about the same.
> >
> > Fight amongst yourselves (or we could vote).
> >
> > -dave
> >
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From matt at omega.org Wed Mar 20 18:24:20 2002
From: matt at omega.org (Matt Johnson)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: mrj is out for next week.
Message-ID: <0d6c42515001532FE6@mail6.mn.rr.com>
Next week I'm out of town. If anyone wants to come hang out at the Barley pub tonight (Wed), please page or call me.
matt-page@omega.org
612-281-3058
I can be there around 7:45pm -Matt
The following message was sent by Dave Rolsky on Wed, 20 Mar 2002 11:28:50 -0600 (CST).
> Do we want to do something next week (last Wednesday of the month)?
>
> It could be social or if somebody wants to present something we could do
> that.
>
> BTW, did the last (social) meeting happen? I couldn't go.
>
>
> -dave
>
> /*==================
> www.urth.org
> we await the New Sun
> ==================*/
>
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
>
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From autarch at urth.org Wed Mar 20 18:52:00 2002
From: autarch at urth.org (Dave Rolsky)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: mrj is in. at Barleys? at 19:00 today?
In-Reply-To: <20020321010813.A92013@io.stderr.net>
Message-ID:
On Thu, 21 Mar 2002, Thomas Eibner wrote:
> Tonight?! Argh! Didn't Dave say wednesday next week? (Last wednesday of the
> month?)
Yes, I did. That is our agreed upon meeting day, isn't it?
-dave
/*==================
www.urth.org
we await the New Sun
==================*/
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From thomas at stderr.net Wed Mar 20 19:49:59 2002
From: thomas at stderr.net (Thomas Eibner)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: mrj is in. at Barleys? at 19:00 today?
In-Reply-To: ; from autarch@urth.org on Wed, Mar 20, 2002 at 06:52:00PM -0600
References: <20020321010813.A92013@io.stderr.net>
Message-ID: <20020321024959.B92013@io.stderr.net>
On Wed, Mar 20, 2002 at 06:52:00PM -0600, Dave Rolsky wrote:
> On Thu, 21 Mar 2002, Thomas Eibner wrote:
>
> > Tonight?! Argh! Didn't Dave say wednesday next week? (Last wednesday of the
> > month?)
>
> Yes, I did. That is our agreed upon meeting day, isn't it?
Good. That was what my initial "I'm coming" was for anyway ;-)
--
Thomas Eibner DnsZone
mod_pointer
!(C)
Putting the HEST in .COM
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From thomas at stderr.net Wed Mar 20 19:57:30 2002
From: thomas at stderr.net (Thomas Eibner)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: A meeting?
In-Reply-To: <20020320181032.GA12614@phluffynet.com>; from sharky@phluffynet.com on Wed, Mar 20, 2002 at 12:10:32PM -0600
References: <3C98F086.38780A58@mailhost.cmi.cypress.com> <20020320144204.A17417@acadcam.com> <20020320181032.GA12614@phluffynet.com>
Message-ID: <20020321025730.C92013@io.stderr.net>
On Wed, Mar 20, 2002 at 12:10:32PM -0600, Mark DeSharq wrote:
> I've been wanting to take part for some time myself but haven't managed to for several reasons...one of which is that Wednesdays aren't that convenient. Oh by the way, on behalf of all the other lurkers, thank you to whoever decided to simply add folks to the new list (roughly a year ago iwhen lists changed) as opposed to needing them to re-sign back on. A wise decision that. Back to the discussion, I live in Eagan and work at West Group. I'm uncertain but I imagine it wouldn't be too hard to get a suitably equipped room here in this gargantuan building. I'd be willing to find out. I don't get Wednesdays off until mid-April though......
I'd say we don't necessarily need the equipment, if anyone "down" south
has a place they can recommend we go to (one or more times) I would be
ready for driving down there. So if you can get a room or suggest a
place just put it up as a proposal on the list.
--
Thomas Eibner DnsZone
mod_pointer
!(C)
Putting the HEST in .COM
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From matt at omega.org Wed Mar 20 20:56:24 2002
From: matt at omega.org (Matt Johnson)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: mrj is in. at Barleys? at 19:00 today?
References: <20020321010813.A92013@io.stderr.net> <20020321024959.B92013@io.stderr.net>
Message-ID: <000801c1d083$fcf0feb0$6501a8c0@rip>
Sorry for confusing you! I was getting ahead of myself and thought (at
first) the end of the month was here!
----- Original Message -----
From: "Thomas Eibner"
To:
Sent: Wednesday, March 20, 2002 7:49 PM
Subject: Re: [mplspm]: mrj is in. at Barleys? at 19:00 today?
> On Wed, Mar 20, 2002 at 06:52:00PM -0600, Dave Rolsky wrote:
> > On Thu, 21 Mar 2002, Thomas Eibner wrote:
> >
> > > Tonight?! Argh! Didn't Dave say wednesday next week? (Last wednesday
of the
> > > month?)
> >
> > Yes, I did. That is our agreed upon meeting day, isn't it?
>
> Good. That was what my initial "I'm coming" was for anyway ;-)
>
> --
> Thomas Eibner DnsZone
> mod_pointer
> !(C)
> Putting the HEST in .COM
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From cpj1 at isis.visi.com Wed Mar 20 23:32:44 2002
From: cpj1 at isis.visi.com (Chris Josephes)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: A meeting?
In-Reply-To:
Message-ID:
On Wed, 20 Mar 2002, Dave Rolsky wrote:
> Do we want to do something next week (last Wednesday of the month)?
Sadly, I'll be out of town. Otherwise I'd definately go. I just joined
the group, but it sounds pretty interesting so far.
-----------------------------------------------------------------------
Christopher Josephes | http://www.visi.com/~cpj1
cpj1@visi.com |
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From cpj1 at isis.visi.com Thu Mar 21 09:21:20 2002
From: cpj1 at isis.visi.com (Chris Josephes)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: Removing output from DBD::Sybase?
Message-ID:
I'm using DBD::Sybase with the FreeTDS drivers to access a Microsoft SQL
server.
It's working okay, but I get a lot of debugging messages like "Changed
database to ...", or "Changed language to ...".
I tried playing around with he trace() value, and even directed it to
"/dev/null", but it's still showing up.
After checking the source code, it looks like the output is coming
directly from the SQL server and is being piped to STDERR by either the
DBD module or the libtds library.
Anyone else run into this? The DBI-Users list archive appears to be
down.
-----------------------------------------------------------------------
Christopher Josephes | http://www.visi.com/~cpj1
cpj1@visi.com |
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From chrome at real-time.com Thu Mar 21 12:35:01 2002
From: chrome at real-time.com (Carl Wilhelm Soderstrom)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: mrj is in. at Barleys? at 19:00 today?
In-Reply-To: <20020320174418.A17956@acadcam.com>; from jim@acadcam.com on Wed, Mar 20, 2002 at 05:44:18PM -0600
References: <0ecb01430231432FE1@mail1.mn.rr.com> <20020320174418.A17956@acadcam.com>
Message-ID: <20020321123501.G23695@real-time.com>
> I'll be at Barley John's until about 6:40 or so tonight. From there I
> head off to a UUM (Unix Users of Minnesota) SIGBAP (Special Interest
> Group on Beer And Pizza) at Savoy near downtown St. Paul. If you want
> to join us down there, feel free to. It's right at the north end of
> the Lafeyette bridge.
argh. I was just a block away from Barley John's, at a job site until 4:00
or so. Thought about going to get a burger and beer, but decided to go home
instead (since it's cheaper at home); esp. since I didn't know how often Jim
frequented the place. :)
this is what I get for being away from e-mail all day. :)
Carl Soderstrom.
--
Network Engineer
Real-Time Enterprises
www.real-time.com
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From chrome at real-time.com Thu Mar 21 13:29:57 2002
From: chrome at real-time.com (Carl Wilhelm Soderstrom)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: A meeting?
In-Reply-To: ; from autarch@urth.org on Wed, Mar 20, 2002 at 01:24:46PM -0600
References: <20020320115813.A17168@acadcam.com>
Message-ID: <20020321132957.M23695@real-time.com>
count me in for the next social get-together.
Carl Soderstrom.
--
Network Engineer
Real-Time Enterprises
www.real-time.com
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From josha at mac.com Thu Mar 21 22:07:22 2002
From: josha at mac.com (Josh Aas)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list
In-Reply-To:
Message-ID:
Hey MPM,
If I have an alphabetically sorted array of strings (containing up to 2
million strings), and I want to find out if any strings in that array equal
a certain string (yes or no, not how many), what is the fastest way to do
that search? This seems basic to me, I just can't come up with the answer
and I have an hour to do so. Thanks a lot!
-Josh
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From thomas at stderr.net Thu Mar 21 22:25:27 2002
From: thomas at stderr.net (Thomas Eibner)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list
In-Reply-To: ; from josha@mac.com on Thu, Mar 21, 2002 at 10:07:22PM -0600
References:
Message-ID: <20020322052527.A2676@io.stderr.net>
On Thu, Mar 21, 2002 at 10:07:22PM -0600, Josh Aas wrote:
> Hey MPM,
> If I have an alphabetically sorted array of strings (containing up to 2
> million strings), and I want to find out if any strings in that array equal
> a certain string (yes or no, not how many), what is the fastest way to do
> that search? This seems basic to me, I just can't come up with the answer
> and I have an hour to do so. Thanks a lot!
An "easy" way to do it would simply be to grep on the array, but probably
not the fastest way to do it.
if (grep {/^thestring$/} @sorted_array) { # match! }
But since you say you have it sorted alphabetically, you might want to do
something like taking the number of entries, taking the middle element
out and comparing it by the first letter to see wheter it was close to the
string or not and then continue with moving half way to one side.. just an
idea anyway.
--
Thomas Eibner DnsZone
mod_pointer
!(C)
Putting the HEST in .COM
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From trammell at trammell.dyndns.org Thu Mar 21 22:50:48 2002
From: trammell at trammell.dyndns.org (John J. Trammell)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list
In-Reply-To: ; from josha@mac.com on Thu, Mar 21, 2002 at 10:07:22PM -0600
References:
Message-ID: <20020321225048.A8799@trammell.dyndns.org>
On Thu, Mar 21, 2002 at 10:07:22PM -0600, Josh Aas wrote:
> If I have an alphabetically sorted array of strings (containing up to 2
> million strings), and I want to find out if any strings in that array equal
> a certain string (yes or no, not how many), what is the fastest way to do
> that search? This seems basic to me, I just can't come up with the answer
> and I have an hour to do so. Thanks a lot!
Binary search. Compare your test string with [n/2]; if it's
equal you're done, otherwise your new search range is (0,[n/2])
or ([n/2,n). Rinse, repeat.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From matt at omega.org Thu Mar 21 23:17:27 2002
From: matt at omega.org (Matt Johnson)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: mrj is in. at Barleys? at 19:00 today?
References: <0ecb01430231432FE1@mail1.mn.rr.com> <20020320174418.A17956@acadcam.com>
Message-ID: <011e01c1d160$dbc9b6e0$6501a8c0@rip>
I missed it this time Jim, but that sounds fun too. -Matt
----- Original Message -----
From: "Jim Anderson"
To:
Sent: Wednesday, March 20, 2002 5:44 PM
Subject: Re: [mplspm]: mrj is in. at Barleys? at 19:00 today?
> On Wed, Mar 20, 2002 at 05:41:17PM -0600, Matt Johnson wrote:
> > I've been working on a basic script to retreive an ip address off my
router, but (of course) it isn't ready yet. I can be social if I have too,
so we could try that. -Matt
>
> I'll be at Barley John's until about 6:40 or so tonight. From there I
> head off to a UUM (Unix Users of Minnesota) SIGBAP (Special Interest
> Group on Beer And Pizza) at Savoy near downtown St. Paul. If you want
> to join us down there, feel free to. It's right at the north end of
> the Lafeyette bridge.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From troy.johnson at myrealbox.com Thu Mar 21 23:37:08 2002
From: troy.johnson at myrealbox.com (Troy Johnson)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list
References: <20020322052527.A2676@io.stderr.net>
Message-ID: <3C9AC304.D8CD2A22@myrealbox.com>
Fun!
>>>>>START>>>>>
#!/usr/local/bin/perl -w
my @array = (aa..yy);
foreach $s (qw(cc gg mm aa yy zz))
{
my $rv = findit($s, \@array);
print "debug: \$rv = $rv\n";
}
sub findit
{
my $string = shift;
my $sar = shift; # sorted array reference
my $try = my $interval = my $size = scalar @{$sar};
my $dir = -1;
my %seen = ();
my $done = 0;
my $found = 0;
while (not $done)
{
$interval = int($interval / 2) || 1;
$try += $dir * $interval;
if ($try < 0 or $try >= $size)
{
$done = 1;
next;
}
$dir = $string cmp $sar->[$try];
print "debug: $size $interval $try $dir $string $sar->[$try]\n";
if (not $dir)
{
$found = 1;
$done = 1;
next;
}
if (exists $seen{$sar->[$try]})
{
$done = 1;
next;
}
$seen{$sar->[$try]} = undef;
}
return $found;
}
<<<<<
> On Thu, Mar 21, 2002 at 10:07:22PM -0600, Josh Aas wrote:
> > Hey MPM,
> > If I have an alphabetically sorted array of strings (containing up to 2
> > million strings), and I want to find out if any strings in that array equal
> > a certain string (yes or no, not how many), what is the fastest way to do
> > that search? This seems basic to me, I just can't come up with the answer
> > and I have an hour to do so. Thanks a lot!
>
> An "easy" way to do it would simply be to grep on the array, but probably
> not the fastest way to do it.
>
> if (grep {/^thestring$/} @sorted_array) { # match! }
>
> But since you say you have it sorted alphabetically, you might want to do
> something like taking the number of entries, taking the middle element
> out and comparing it by the first letter to see wheter it was close to the
> string or not and then continue with moving half way to one side.. just an
> idea anyway.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From troy.johnson at myrealbox.com Thu Mar 21 23:55:18 2002
From: troy.johnson at myrealbox.com (Troy Johnson)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list
References: <20020322052527.A2676@io.stderr.net> <3C9AC304.D8CD2A22@myrealbox.com>
Message-ID: <3C9AC746.9A777487@myrealbox.com>
The previous stuff is a result of my daily language jumping. I like this
code better, so if you would like to criticize it (please do!), please
do this version.
More "perly" and less "liney":
>>>>>START>>>>>
#!/usr/local/bin/perl -w
my @array = ("aa" .. "yy");
foreach $s (qw(cc gg mm aa yy zz mmm))
{
my $rv = findit($s, \@array);
print "debug: \$rv = $rv\n";
}
sub findit
{
my $string = shift;
my $sar = shift; # sorted array reference
my $try = my $interval = my $size = scalar @{$sar};
my $dir = -1;
my %seen = ();
my $found = 0;
while (1)
{
$interval = int($interval / 2) || 1;
$try += $dir * $interval;
if ($try < 0 or $try >= $size) { last; }
$dir = $string cmp $sar->[$try];
print "debug: $size $interval $try $dir $string $sar->[$try]\n";
if (not $dir) { $found = 1; last; }
if (exists $seen{$sar->[$try]}) { last; }
$seen{$sar->[$try]} = undef;
}
return $found;
}
<<<<<; from josha@mac.com on Thu, Mar 21, 2002 at 10:07:22PM -0600
References:
Message-ID: <20020322110723.A20113@maple.min.ov.com>
I read my mail in FIFO, so maybe someone has already submitted this
solution. It's a cookbook entry 5.14 from the Perl Cookbook.
-------------------------------------------------
#!/usr/bin/perl
my @list=(1,2,3,4,1,2,3,4,5,6);
my %count = ();
for(@list){
$count{$_}++;
}
foreach (keys %count) {
print "$_ occurs $count{$_} in file\n";
}
-------------------------------------------------
If you like terser lines.
-------------------------------------------------
#!/usr/bin/perl
my @list=(1,2,3,4,1,2,3,4,5,6);
my %count = ();
$count{$_}++ for (@list);
print "$_ occurs $count{$_} in file\n" foreach (keys %count);
-------------------------------------------------
tim burlowski
Previously Josh Aas(josha@mac.com) wrote:
> Hey MPM,
> If I have an alphabetically sorted array of strings (containing up to 2
> million strings), and I want to find out if any strings in that array equal
> a certain string (yes or no, not how many), what is the fastest way to do
> that search? This seems basic to me, I just can't come up with the answer
> and I have an hour to do so. Thanks a lot!
> -Josh
>
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
--
tim burlowski
========================================
Weather conditions for Roseville, MN.
Condition are Fair, temperature is 12 degrees
with winds From the West at 15. Visibility is Unlimited.
Dewpoint 0, barometer reads 30.34 humidity at 49%.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From jim at acadcam.com Fri Mar 22 09:00:32 2002
From: jim at acadcam.com (Jim Anderson)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list
In-Reply-To: <20020321225048.A8799@trammell.dyndns.org>; from trammell@trammell.dyndns.org on Thu, Mar 21, 2002 at 10:50:48PM -0600
References: <20020321225048.A8799@trammell.dyndns.org>
Message-ID: <20020322090032.D22826@acadcam.com>
On Thu, Mar 21, 2002 at 10:50:48PM -0600, John J. Trammell wrote:
> On Thu, Mar 21, 2002 at 10:07:22PM -0600, Josh Aas wrote:
> > If I have an alphabetically sorted array of strings (containing up to 2
> > million strings), and I want to find out if any strings in that array equal
> > a certain string (yes or no, not how many), what is the fastest way to do
> > that search? This seems basic to me, I just can't come up with the answer
> > and I have an hour to do so. Thanks a lot!
>
> Binary search. Compare your test string with [n/2]; if it's
> equal you're done, otherwise your new search range is (0,[n/2])
> or ([n/2,n). Rinse, repeat.
If your strings are relatively evenly spread out, you can also try a
predictive compare for the first few passes. For example, if your
strings go from a-z, and your search string starts with a c, then
start 3/26'ths through the list.
As you get close to it, go to a binary search as previously described.
--
Jim Anderson (612) 782-0456 jim@acadcam.com
Anderson CAD/CAM, Inc Lucifer designed MS-DOS to try
3800 Apache Lane NE men's souls.
St Anthony, MN 55421 Then he had a better idea...
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From trammell at trammell.dyndns.org Fri Mar 22 09:04:19 2002
From: trammell at trammell.dyndns.org (John J. Trammell)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list
In-Reply-To: <3C9AC746.9A777487@myrealbox.com>; from troy.johnson@myrealbox.com on Thu, Mar 21, 2002 at 11:55:18PM -0600
References: <20020322052527.A2676@io.stderr.net> <3C9AC304.D8CD2A22@myrealbox.com> <3C9AC746.9A777487@myrealbox.com>
Message-ID: <20020322090419.A12335@trammell.dyndns.org>
On Thu, Mar 21, 2002 at 11:55:18PM -0600, Troy Johnson wrote:
> The previous stuff is a result of my daily language jumping. I like this
> code better, so if you would like to criticize it (please do!), please
> do this version.
>
> More "perly" and less "liney":
>
> >>>>>START>>>>>
> #!/usr/local/bin/perl -w
use strict; # >:-)
>
> my @array = ("aa" .. "yy");
>
[snip]
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From tim.burlowski at veritas.com Fri Mar 22 11:19:46 2002
From: tim.burlowski at veritas.com (Tim Burlowski)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322110723.A20113@maple.min.ov.com>; from tim.burlowski@veritas.com on Fri, Mar 22, 2002 at 11:07:23AM -0600
References: <20020322110723.A20113@maple.min.ov.com>
Message-ID: <20020322111946.A20319@maple.min.ov.com>
Whoops I think I misread the requirements. Let me try again.
Looking for literal "string".
#!/usr/bin/perl
my @list=(1,2,3,4,1,2,3,4,5,6,"string","a","string");
my %count = ();
my $string = "1";
$count{$_}++ for (@list);
foreach (keys %count) {
print "$_ occurs $count{$_} in file\n" if /string/;
}
tim
Previously Tim Burlowski(tim.burlowski@veritas.com) wrote:
> I read my mail in FIFO, so maybe someone has already submitted this
> solution. It's a cookbook entry 5.14 from the Perl Cookbook.
>
> -------------------------------------------------
> #!/usr/bin/perl
> my @list=(1,2,3,4,1,2,3,4,5,6);
> my %count = ();
> for(@list){
> $count{$_}++;
> }
> foreach (keys %count) {
> print "$_ occurs $count{$_} in file\n";
> }
> -------------------------------------------------
>
> If you like terser lines.
>
> -------------------------------------------------
> #!/usr/bin/perl
> my @list=(1,2,3,4,1,2,3,4,5,6);
> my %count = ();
> $count{$_}++ for (@list);
> print "$_ occurs $count{$_} in file\n" foreach (keys %count);
> -------------------------------------------------
>
> tim burlowski
>
> Previously Josh Aas(josha@mac.com) wrote:
>
> > Hey MPM,
> > If I have an alphabetically sorted array of strings (containing up to 2
> > million strings), and I want to find out if any strings in that array equal
> > a certain string (yes or no, not how many), what is the fastest way to do
> > that search? This seems basic to me, I just can't come up with the answer
> > and I have an hour to do so. Thanks a lot!
> > -Josh
> >
> >
> >
> > --------------------------------------------------
> > Minneapolis Perl Mongers mailing list
> >
> > To unsubscribe, send mail to majordomo@pm.org
> > with "unsubscribe mpls" in the body of the message.
>
> --
> tim burlowski
> ========================================
> Weather conditions for Roseville, MN.
> Condition are Fair, temperature is 12 degrees
> with winds From the West at 15. Visibility is Unlimited.
> Dewpoint 0, barometer reads 30.34 humidity at 49%.
--
tim burlowski
========================================
Weather conditions for Roseville, MN.
Condition are Fair, temperature is 14 degrees
with winds From the West Northwest at 14. Visibility is Unlimited.
Dewpoint 0, barometer reads 30.33 humidity at 46%.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From patrickm at eltecinc.com Fri Mar 22 09:19:30 2002
From: patrickm at eltecinc.com (patrickm)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list
References:
Message-ID: <001c01c1d1b4$fa8b5bc0$3cc8a8c0@NTDOMAIN1>
my %seen = map { $_, 0 } @sorted_array; # it doesn't have to be
sorted
if (exists $seen{bob}) { # is string
'bob' in the array?
print "yes\n";
} else {
print "no\n";
}
----- Original Message -----
From: Josh Aas
To:
Sent: Thursday, March 21, 2002 10:07 PM
Subject: [mplspm]: In a sorted list
> Hey MPM,
> If I have an alphabetically sorted array of strings (containing up to
2
> million strings), and I want to find out if any strings in that array
equal
> a certain string (yes or no, not how many), what is the fastest way to do
> that search? This seems basic to me, I just can't come up with the answer
> and I have an hour to do so. Thanks a lot!
> -Josh
>
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From tim.burlowski at veritas.com Fri Mar 22 11:40:45 2002
From: tim.burlowski at veritas.com (Tim Burlowski)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322111946.A20319@maple.min.ov.com>; from tim.burlowski@veritas.com on Fri, Mar 22, 2002 at 11:19:46AM -0600
References: <20020322110723.A20113@maple.min.ov.com> <20020322111946.A20319@maple.min.ov.com>
Message-ID: <20020322114045.A20663@maple.min.ov.com>
OK, well I missed the requirements again, as you don't want a count only
yes or no. Doooh. I am a double dumbass today. How about this?
#!/usr/bin/perl
my @list=(1,2,3,4,1,2,3,4,5,6,"string","a","string");
my %count = ();
my $i = 0;
$count{$_}++ for (@list);
my $bool = "no";
CHECK: foreach (keys %count) {
$i++;
if (/string/){
$bool = "yes";
last CHECK;
}
}
print "$bool, string found in list, $i iterations";
tim
Previously Tim Burlowski(tim.burlowski@veritas.com) wrote:
> Whoops I think I misread the requirements. Let me try again.
>
> Looking for literal "string".
>
> #!/usr/bin/perl
> my @list=(1,2,3,4,1,2,3,4,5,6,"string","a","string");
> my %count = ();
> $count{$_}++ for (@list);
> foreach (keys %count) {
> print "$_ occurs $count{$_} in file\n" if /string/;
> }
>
> tim
>
>
> Previously Tim Burlowski(tim.burlowski@veritas.com) wrote:
>
> > I read my mail in FIFO, so maybe someone has already submitted this
> > solution. It's a cookbook entry 5.14 from the Perl Cookbook.
> >
> > -------------------------------------------------
> > #!/usr/bin/perl
> > my @list=(1,2,3,4,1,2,3,4,5,6);
> > my %count = ();
> > for(@list){
> > $count{$_}++;
> > }
> > foreach (keys %count) {
> > print "$_ occurs $count{$_} in file\n";
> > }
> > -------------------------------------------------
> >
> > If you like terser lines.
> >
> > -------------------------------------------------
> > #!/usr/bin/perl
> > my @list=(1,2,3,4,1,2,3,4,5,6);
> > my %count = ();
> > $count{$_}++ for (@list);
> > print "$_ occurs $count{$_} in file\n" foreach (keys %count);
> > -------------------------------------------------
> >
> > tim burlowski
> >
> > Previously Josh Aas(josha@mac.com) wrote:
> >
> > > Hey MPM,
> > > If I have an alphabetically sorted array of strings (containing up to 2
> > > million strings), and I want to find out if any strings in that array equal
> > > a certain string (yes or no, not how many), what is the fastest way to do
> > > that search? This seems basic to me, I just can't come up with the answer
> > > and I have an hour to do so. Thanks a lot!
> > > -Josh
> > >
> > >
> > >
> > > --------------------------------------------------
> > > Minneapolis Perl Mongers mailing list
> > >
> > > To unsubscribe, send mail to majordomo@pm.org
> > > with "unsubscribe mpls" in the body of the message.
> >
> > --
> > tim burlowski
> > ========================================
> > Weather conditions for Roseville, MN.
> > Condition are Fair, temperature is 12 degrees
> > with winds From the West at 15. Visibility is Unlimited.
> > Dewpoint 0, barometer reads 30.34 humidity at 49%.
>
> --
> tim burlowski
> ========================================
> Weather conditions for Roseville, MN.
> Condition are Fair, temperature is 14 degrees
> with winds From the West Northwest at 14. Visibility is Unlimited.
> Dewpoint 0, barometer reads 30.33 humidity at 46%.
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
--
tim burlowski
========================================
Weather conditions for Roseville, MN.
Condition are Fair, temperature is 14 degrees
with winds From the West Northwest at 14. Visibility is Unlimited.
Dewpoint 0, barometer reads 30.33 humidity at 46%.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From jim at acadcam.com Fri Mar 22 09:40:54 2002
From: jim at acadcam.com (Jim Anderson)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322114045.A20663@maple.min.ov.com>; from tim.burlowski@veritas.com on Fri, Mar 22, 2002 at 11:40:45AM -0600
References: <20020322110723.A20113@maple.min.ov.com> <20020322111946.A20319@maple.min.ov.com> <20020322114045.A20663@maple.min.ov.com>
Message-ID: <20020322094054.A22998@acadcam.com>
On Fri, Mar 22, 2002 at 11:40:45AM -0600, Tim Burlowski wrote:
> OK, well I missed the requirements again, as you don't want a count only
> yes or no. Doooh. I am a double dumbass today. How about this?
>
>
> #!/usr/bin/perl
> my @list=(1,2,3,4,1,2,3,4,5,6,"string","a","string");
> my %count = ();
> my $i = 0;
> $count{$_}++ for (@list);
> my $bool = "no";
> CHECK: foreach (keys %count) {
> $i++;
> if (/string/){
> $bool = "yes";
> last CHECK;
> }
> }
> print "$bool, string found in list, $i iterations";
So what's the point of spending all that time building a hash, when the
array was already sorted???
> > > Previously Josh Aas(josha@mac.com) wrote:
> > >
> > > > Hey MPM,
> > > > If I have an alphabetically sorted array of strings (containing up to 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > million strings), and I want to find out if any strings in that array equal
And with 2 million strings, building a hash is a non-trivial amount of
time.
And instead of that great big foreach loop, why not just use
print "Found it\n" if defined($count{"string"});
--
Jim Anderson (612) 782-0456 jim@acadcam.com
Anderson CAD/CAM, Inc Lucifer designed MS-DOS to try
3800 Apache Lane NE men's souls.
St Anthony, MN 55421 Then he had a better idea...
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From thomas at stderr.net Fri Mar 22 09:48:04 2002
From: thomas at stderr.net (Thomas Eibner)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322114045.A20663@maple.min.ov.com>; from tim.burlowski@veritas.com on Fri, Mar 22, 2002 at 11:40:45AM -0600
References: <20020322110723.A20113@maple.min.ov.com> <20020322111946.A20319@maple.min.ov.com> <20020322114045.A20663@maple.min.ov.com>
Message-ID: <20020322164804.A8519@io.stderr.net>
On Fri, Mar 22, 2002 at 11:40:45AM -0600, Tim Burlowski wrote:
> OK, well I missed the requirements again, as you don't want a count only
> yes or no. Doooh. I am a double dumbass today. How about this?
>
>
> #!/usr/bin/perl
> my @list=(1,2,3,4,1,2,3,4,5,6,"string","a","string");
> my %count = ();
> my $i = 0;
> $count{$_}++ for (@list);
> my $bool = "no";
> CHECK: foreach (keys %count) {
> $i++;
> if (/string/){
> $bool = "yes";
> last CHECK;
> }
> }
> print "$bool, string found in list, $i iterations";
If he just wants to know if it's in there or not why go through the first
for loop and consume that memory? (After all he says up to 2 million
strings)
my $found = 0;
CHECK: for (@list) {
if ($_ eq 'string') {
$found = 1;
last CHECK;
}
}
--
Thomas Eibner DnsZone
mod_pointer
!(C)
Putting the HEST in .COM
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From tim.burlowski at veritas.com Fri Mar 22 11:56:03 2002
From: tim.burlowski at veritas.com (Tim Burlowski)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322094054.A22998@acadcam.com>; from jim@acadcam.com on Fri, Mar 22, 2002 at 09:40:54AM -0600
References: <20020322110723.A20113@maple.min.ov.com> <20020322111946.A20319@maple.min.ov.com> <20020322114045.A20663@maple.min.ov.com> <20020322094054.A22998@acadcam.com>
Message-ID: <20020322115603.A20868@maple.min.ov.com>
Right, I guess I was sticking with my original flow, instead of making
the solution fit the problem. I give up my brain is too foogy today.
tim
Previously Jim Anderson(jim@acadcam.com) wrote:
> On Fri, Mar 22, 2002 at 11:40:45AM -0600, Tim Burlowski wrote:
> > OK, well I missed the requirements again, as you don't want a count only
> > yes or no. Doooh. I am a double dumbass today. How about this?
> >
> >
> > #!/usr/bin/perl
> > my @list=(1,2,3,4,1,2,3,4,5,6,"string","a","string");
> > my %count = ();
> > my $i = 0;
> > $count{$_}++ for (@list);
> > my $bool = "no";
> > CHECK: foreach (keys %count) {
> > $i++;
> > if (/string/){
> > $bool = "yes";
> > last CHECK;
> > }
> > }
> > print "$bool, string found in list, $i iterations";
>
> So what's the point of spending all that time building a hash, when the
> array was already sorted???
>
> > > > Previously Josh Aas(josha@mac.com) wrote:
> > > >
> > > > > Hey MPM,
> > > > > If I have an alphabetically sorted array of strings (containing up to 2
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> > > > > million strings), and I want to find out if any strings in that array equal
>
> And with 2 million strings, building a hash is a non-trivial amount of
> time.
>
> And instead of that great big foreach loop, why not just use
>
> print "Found it\n" if defined($count{"string"});
>
> --
> Jim Anderson (612) 782-0456 jim@acadcam.com
> Anderson CAD/CAM, Inc Lucifer designed MS-DOS to try
> 3800 Apache Lane NE men's souls.
> St Anthony, MN 55421 Then he had a better idea...
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
--
tim burlowski
========================================
Weather conditions for Roseville, MN.
Condition are Fair, temperature is 14 degrees
with winds From the West Northwest at 14. Visibility is Unlimited.
Dewpoint 0, barometer reads 30.33 humidity at 46%.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From thomas at stderr.net Fri Mar 22 09:49:58 2002
From: thomas at stderr.net (Thomas Eibner)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322164804.A8519@io.stderr.net>; from thomas@stderr.net on Fri, Mar 22, 2002 at 04:48:04PM +0100
References: <20020322110723.A20113@maple.min.ov.com> <20020322111946.A20319@maple.min.ov.com> <20020322114045.A20663@maple.min.ov.com> <20020322164804.A8519@io.stderr.net>
Message-ID: <20020322164958.B8519@io.stderr.net>
On Fri, Mar 22, 2002 at 04:48:04PM +0100, Thomas Eibner wrote:
> If he just wants to know if it's in there or not why go through the first
> for loop and consume that memory? (After all he says up to 2 million
> strings)
>
> my $found = 0;
> CHECK: for (@list) {
> if ($_ eq 'string') {
> $found = 1;
> last CHECK;
> }
> }
[Not saying that Jim's solution isn't what I would have done, 'cause it
was what I was playing with last night]
--
Thomas Eibner DnsZone
mod_pointer
!(C)
Putting the HEST in .COM
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From trammell at trammell.dyndns.org Fri Mar 22 10:59:12 2002
From: trammell at trammell.dyndns.org (John J. Trammell)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322094054.A22998@acadcam.com>; from jim@acadcam.com on Fri, Mar 22, 2002 at 09:40:54AM -0600
References: <20020322110723.A20113@maple.min.ov.com> <20020322111946.A20319@maple.min.ov.com> <20020322114045.A20663@maple.min.ov.com> <20020322094054.A22998@acadcam.com>
Message-ID: <20020322105912.A13477@trammell.dyndns.org>
On Fri, Mar 22, 2002 at 09:40:54AM -0600, Jim Anderson wrote:
> And with 2 million strings, building a hash is a non-trivial amount of
> time.
You said it:
#!/usr/bin/perl -w
use strict;
use Benchmark;
$| = 1;
my $binary = sub
{
use POSIX qw[ floor ];
my ($x,$a) = @_;
my ($p,$q) = (0,scalar(@$a)-1);
LOOP:
{
return 1 if $a->[$p] eq $x;
return 1 if $a->[$q] eq $x;
my $mid = floor( ($p+$q)/2 );
return 0 if ($mid == $p) || ($mid == $q);
for ($a->[$mid])
{
$x eq $_ && do { return 1; };
$x lt $_ && do { ($p,$q) = ($p,$mid); last; };
$x gt $_ && do { ($p,$q) = ($mid,$q); last; };
}
redo LOOP;
}
};
my $hash = sub
{
my ($x,$a) = @_;
my %h = map { $_, 0 } @$a;
return exists $h{$x};
};
{
print "benchmark for size 26\n";
my @a = ('a' .. 'z');
my $x = $a[rand(@a)];
my $y = "__foo__";
timethese(50000,
{
binary => sub { $binary->($x,\@a); $binary->($y,\@a); },
hash => sub { $hash->($x,\@a); $hash->($y,\@a); },
});
}
{
print "benchmark for size @{[ 26*26 ]}\n";
my @a = ('aa' .. 'zz');
my $x = $a[rand(@a)];
my $y = "__foo__";
timethese(50000,
{
binary => sub { $binary->($x,\@a); $binary->($y,\@a); },
hash => sub { $hash->($x,\@a); $hash->($y,\@a); },
});
}
__END__
[ ~ ] perl bar.pl
benchmark for size 26
Benchmark: timing 50000 iterations of binary, hash...
binary: 4 wallclock secs ( 3.68 usr + 0.00 sys = 3.68 CPU) @ 13586.96/s (n=50000)
hash: 7 wallclock secs ( 7.21 usr + 0.01 sys = 7.22 CPU) @ 6925.21/s (n=50000)
benchmark for size 676
Benchmark: timing 50000 iterations of binary, hash...
binary: 7 wallclock secs ( 6.86 usr + 0.00 sys = 6.86 CPU) @ 7288.63/s (n=50000)
hash: 187 wallclock secs (184.91 usr + 0.06 sys = 184.97 CPU) @ 270.31/s (n=50000)
[ ~ ] ls -l
Admittedly this benchmark isn't perfect, but I think it points
in the direction of the binary search solution scaling better.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From jim at acadcam.com Fri Mar 22 11:12:38 2002
From: jim at acadcam.com (Jim Anderson)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322105912.A13477@trammell.dyndns.org>; from trammell@trammell.dyndns.org on Fri, Mar 22, 2002 at 10:59:12AM -0600
References: <20020322110723.A20113@maple.min.ov.com> <20020322111946.A20319@maple.min.ov.com> <20020322114045.A20663@maple.min.ov.com> <20020322094054.A22998@acadcam.com> <20020322105912.A13477@trammell.dyndns.org>
Message-ID: <20020322111238.A23254@acadcam.com>
On Fri, Mar 22, 2002 at 10:59:12AM -0600, John J. Trammell wrote:
> On Fri, Mar 22, 2002 at 09:40:54AM -0600, Jim Anderson wrote:
> > And with 2 million strings, building a hash is a non-trivial amount of
> > time.
A couple other points. The original premise had the list sorted to start
with. Has it always been sorted? Building the hash may be faster than
doing a sort if it wasn't sorted to start with. Where did the list come
from? If it came from inside a database, having an index on the field
could eliminate the problem entirely. If the data is in a flat file on
the disk, and searches are relatively infrequent, it may be best to do
a binary search on the file itself, and not even reading the list into
memory. There used to be a Unix utility that did exactly that (It might
have been called 'look'). If a great deal of searches are being done,
it might work to create the hash, especially if the very long startup
time to build the hash can be tolerated. Another possiblity, if the
list is relatively stable, is to use a tie'd hash to a DBM file. That
way, you can build the hash once, and then use an index into the hash to
only read the bit of the file that has the string you're looking for.
This has the advantage of very fast searches, combined with little or no
setup time for the user program. There is a very slow initial startup
time to build the hash in the first place, though.
--
Jim Anderson (612) 782-0456 jim@acadcam.com
Anderson CAD/CAM, Inc Lucifer designed MS-DOS to try
3800 Apache Lane NE men's souls.
St Anthony, MN 55421 Then he had a better idea...
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From djb at tc.umn.edu Fri Mar 22 11:14:27 2002
From: djb at tc.umn.edu (Dave Bianchi)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
References: <20020322110723.A20113@maple.min.ov.com> <20020322111946.A20319@maple.min.ov.com> <20020322114045.A20663@maple.min.ov.com> <20020322094054.A22998@acadcam.com> <20020322105912.A13477@trammell.dyndns.org>
Message-ID: <3C9B6673.E248CAAF@tc.umn.edu>
In all this discussion, the assumption is that you are starting with a
sorted list of strings. Perhaps that list should not have been created
in the first place? Maybe a hash would have been a more appropriate
data structure? With two million strings, maybe a hash mapped to a DBM
file would be appropriate, generated by one program and used by another?
Without more information about the whole problem, a binary search seems
appropriate. But I would take a step back and look at the bigger
picture and perhaps implement the solution differently.
- Dave Bianchi
"John J. Trammell" wrote:
>
> On Fri, Mar 22, 2002 at 09:40:54AM -0600, Jim Anderson wrote:
> > And with 2 million strings, building a hash is a non-trivial amount of
> > time.
>
> You said it:
>
> #!/usr/bin/perl -w
> use strict;
> use Benchmark;
> $| = 1;
>
> my $binary = sub
> {
> use POSIX qw[ floor ];
> my ($x,$a) = @_;
> my ($p,$q) = (0,scalar(@$a)-1);
> LOOP:
> {
> return 1 if $a->[$p] eq $x;
> return 1 if $a->[$q] eq $x;
> my $mid = floor( ($p+$q)/2 );
> return 0 if ($mid == $p) || ($mid == $q);
> for ($a->[$mid])
> {
> $x eq $_ && do { return 1; };
> $x lt $_ && do { ($p,$q) = ($p,$mid); last; };
> $x gt $_ && do { ($p,$q) = ($mid,$q); last; };
> }
> redo LOOP;
> }
> };
>
> my $hash = sub
> {
> my ($x,$a) = @_;
> my %h = map { $_, 0 } @$a;
> return exists $h{$x};
> };
>
> {
> print "benchmark for size 26\n";
> my @a = ('a' .. 'z');
> my $x = $a[rand(@a)];
> my $y = "__foo__";
>
> timethese(50000,
> {
> binary => sub { $binary->($x,\@a); $binary->($y,\@a); },
> hash => sub { $hash->($x,\@a); $hash->($y,\@a); },
> });
> }
>
> {
> print "benchmark for size @{[ 26*26 ]}\n";
> my @a = ('aa' .. 'zz');
> my $x = $a[rand(@a)];
> my $y = "__foo__";
>
> timethese(50000,
> {
> binary => sub { $binary->($x,\@a); $binary->($y,\@a); },
> hash => sub { $hash->($x,\@a); $hash->($y,\@a); },
> });
> }
>
> __END__
>
> [ ~ ] perl bar.pl
> benchmark for size 26
> Benchmark: timing 50000 iterations of binary, hash...
> binary: 4 wallclock secs ( 3.68 usr + 0.00 sys = 3.68 CPU) @ 13586.96/s (n=50000)
> hash: 7 wallclock secs ( 7.21 usr + 0.01 sys = 7.22 CPU) @ 6925.21/s (n=50000)
> benchmark for size 676
> Benchmark: timing 50000 iterations of binary, hash...
> binary: 7 wallclock secs ( 6.86 usr + 0.00 sys = 6.86 CPU) @ 7288.63/s (n=50000)
> hash: 187 wallclock secs (184.91 usr + 0.06 sys = 184.97 CPU) @ 270.31/s (n=50000)
> [ ~ ] ls -l
>
> Admittedly this benchmark isn't perfect, but I think it points
> in the direction of the binary search solution scaling better.
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From djb at tc.umn.edu Fri Mar 22 11:16:12 2002
From: djb at tc.umn.edu (Dave Bianchi)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
References: <20020322110723.A20113@maple.min.ov.com> <20020322111946.A20319@maple.min.ov.com> <20020322114045.A20663@maple.min.ov.com> <20020322094054.A22998@acadcam.com> <20020322105912.A13477@trammell.dyndns.org> <20020322111238.A23254@acadcam.com>
Message-ID: <3C9B66DC.D4954472@tc.umn.edu>
Okay, Jim and I are thinking the same way :-)
Jim Anderson wrote:
>
> On Fri, Mar 22, 2002 at 10:59:12AM -0600, John J. Trammell wrote:
> > On Fri, Mar 22, 2002 at 09:40:54AM -0600, Jim Anderson wrote:
> > > And with 2 million strings, building a hash is a non-trivial amount of
> > > time.
>
> A couple other points. The original premise had the list sorted to start
> with. Has it always been sorted? Building the hash may be faster than
> doing a sort if it wasn't sorted to start with. Where did the list come
> from? If it came from inside a database, having an index on the field
> could eliminate the problem entirely. If the data is in a flat file on
> the disk, and searches are relatively infrequent, it may be best to do
> a binary search on the file itself, and not even reading the list into
> memory. There used to be a Unix utility that did exactly that (It might
> have been called 'look'). If a great deal of searches are being done,
> it might work to create the hash, especially if the very long startup
> time to build the hash can be tolerated. Another possiblity, if the
> list is relatively stable, is to use a tie'd hash to a DBM file. That
> way, you can build the hash once, and then use an index into the hash to
> only read the bit of the file that has the string you're looking for.
> This has the advantage of very fast searches, combined with little or no
> setup time for the user program. There is a very slow initial startup
> time to build the hash in the first place, though.
>
> --
> Jim Anderson (612) 782-0456 jim@acadcam.com
> Anderson CAD/CAM, Inc Lucifer designed MS-DOS to try
> 3800 Apache Lane NE men's souls.
> St Anthony, MN 55421 Then he had a better idea...
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From jim at acadcam.com Fri Mar 22 11:19:08 2002
From: jim at acadcam.com (Jim Anderson)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <3C9B66DC.D4954472@tc.umn.edu>; from djb@tc.umn.edu on Fri, Mar 22, 2002 at 11:16:12AM -0600
References: <20020322110723.A20113@maple.min.ov.com> <20020322111946.A20319@maple.min.ov.com> <20020322114045.A20663@maple.min.ov.com> <20020322094054.A22998@acadcam.com> <20020322105912.A13477@trammell.dyndns.org> <20020322111238.A23254@acadcam.com> <3C9B66DC.D4954472@tc.umn.edu>
Message-ID: <20020322111908.C23254@acadcam.com>
On Fri, Mar 22, 2002 at 11:16:12AM -0600, Dave Bianchi wrote:
> Okay, Jim and I are thinking the same way :-)
But I beat you by about 2 minutes...
--
Jim Anderson (612) 782-0456 jim@acadcam.com
Anderson CAD/CAM, Inc Lucifer designed MS-DOS to try
3800 Apache Lane NE men's souls.
St Anthony, MN 55421 Then he had a better idea...
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From josha at mac.com Fri Mar 22 12:57:06 2002
From: josha at mac.com (Josh Aas)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322164804.A8519@io.stderr.net>
Message-ID:
Thanks a lot guys! Truthfully, I found a much better algorithm for my bigger
problem that does not involve a binary search, but it has been fun learning
about it. Surprisingly, it does not take that much time or even memory to
sort 2000000 strings. It makes no sense to me, but it works perfectly. For
the curious, my bigger problem was to take all the words in a 16 MB text
file (there was one word on each line) and remove all of the duplicates. I
wasn't using my head the first time around. Here's my app, that does all the
sorting and writes to file. I'm surprised that this works so quickly and
without fail... Anyone know how that sort routine finishes in about 6
seconds when given 2 million strings? When I used the algorithm that
involved binary searches, my app would have taken about 20 hours (as opposed
to seconds) to complete. Go perl!
-Josh
#!/usr/bin/perl
print "Loading file...\n";
open (DATA, "/Users/Josh/Big_List.txt") || &CgiDie ("Cannot open
wordlist.");
my @data = ;
close (DATA);
print "Loading file done.\n";
print "Cleaning...\n";
foreach (@data) {
chomp;
}
print "Done cleaning.\n";
print "Sorting...\n";
@data = sort @data;
print "Done sorting.\n";
$wordcount = scalar(@data);
print "There are ";
print $wordcount;
print " words in the file.\n";
$duplicates = 0;
$unique = 0;
for ($i = 0; $i < $wordcount; $i++) {
$newdata[$unique] = $data[$i];
$unique++;
$b = 1;
while (1) {
if ($data[$i] eq $data[$i + $b]) {
$b++;
}
else {
$i += $b;
$duplicates += $b - 1;
last;
}
}
}
open (ADDFILE,">>/Users/josh/Good_Wordlist.txt") || &CgiDie ("Cannot open
write to edited wordlist.");
flock(ADDFILE, LOCK_EX);
foreach (@newdata) {
print ADDFILE "$_\n";
}
close (ADDFILE);
print "Duplicates removed: ";
print $duplicates;
print "\n";
print "Done!\n";
exit;
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From dion at almaer.com Fri Mar 22 14:31:45 2002
From: dion at almaer.com (Dion Almaer)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To:
Message-ID:
Why not use "uniq" (or even "sort -u") ?
D
> -----Original Message-----
> From: owner-mpls@pm.org [mailto:owner-mpls@pm.org]On Behalf Of Josh Aas
> Sent: Friday, March 22, 2002 11:57 AM
> To: mpls@pm.org
> Subject: Re: [mplspm]: In a sorted list [better]
>
>
> Thanks a lot guys! Truthfully, I found a much better algorithm for my bigger
> problem that does not involve a binary search, but it has been fun learning
> about it. Surprisingly, it does not take that much time or even memory to
> sort 2000000 strings. It makes no sense to me, but it works perfectly. For
> the curious, my bigger problem was to take all the words in a 16 MB text
> file (there was one word on each line) and remove all of the duplicates. I
> wasn't using my head the first time around. Here's my app, that does all the
> sorting and writes to file. I'm surprised that this works so quickly and
> without fail... Anyone know how that sort routine finishes in about 6
> seconds when given 2 million strings? When I used the algorithm that
> involved binary searches, my app would have taken about 20 hours (as opposed
> to seconds) to complete. Go perl!
> -Josh
>
> #!/usr/bin/perl
>
> print "Loading file...\n";
>
> open (DATA, "/Users/Josh/Big_List.txt") || &CgiDie ("Cannot open
> wordlist.");
> my @data = ;
> close (DATA);
>
> print "Loading file done.\n";
>
> print "Cleaning...\n";
> foreach (@data) {
> chomp;
> }
> print "Done cleaning.\n";
>
> print "Sorting...\n";
> @data = sort @data;
> print "Done sorting.\n";
>
> $wordcount = scalar(@data);
> print "There are ";
> print $wordcount;
> print " words in the file.\n";
> $duplicates = 0;
> $unique = 0;
>
> for ($i = 0; $i < $wordcount; $i++) {
> $newdata[$unique] = $data[$i];
> $unique++;
> $b = 1;
> while (1) {
> if ($data[$i] eq $data[$i + $b]) {
> $b++;
> }
> else {
> $i += $b;
> $duplicates += $b - 1;
> last;
> }
> }
> }
>
> open (ADDFILE,">>/Users/josh/Good_Wordlist.txt") || &CgiDie ("Cannot open
> write to edited wordlist.");
> flock(ADDFILE, LOCK_EX);
> foreach (@newdata) {
> print ADDFILE "$_\n";
> }
> close (ADDFILE);
>
> print "Duplicates removed: ";
> print $duplicates;
> print "\n";
> print "Done!\n";
>
> exit;
>
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
>
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From qglex at yahoo.com Fri Mar 22 14:59:34 2002
From: qglex at yahoo.com (Jeff Gleixner)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To:
Message-ID: <20020322205934.94503.qmail@web21108.mail.yahoo.com>
> For
> the curious, my bigger problem was to take all the words in a
> 16 MB text
> file (there was one word on each line) and remove all of the
> duplicates. [...]
Unless you need to use Perl for some specific reason (an
assignment for class? :-), you could simply use Unix utilities
already written to do what you need.
% sort < file > uniq -o new_file
It won't get any faster than that.. :-)
See ya
__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From josha at mac.com Fri Mar 22 15:11:20 2002
From: josha at mac.com (Josh Aas)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322205934.94503.qmail@web21108.mail.yahoo.com>
Message-ID:
I realize I could do better, but making it faster than it is would be kind
of pointless. But I do appreciate the most elegant solution to problems
(shameless plug: that's why I use Mac OS X!). It wasn't for class - I'm just
always used perl for everything text related. I usually just start writing
perl as opposed to digging for other command line tools. I should learn.
Thanks!
-Josh
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From dion at almaer.com Fri Mar 22 16:13:05 2002
From: dion at almaer.com (Dion Almaer)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322205934.94503.qmail@web21108.mail.yahoo.com>
Message-ID:
or sort -u file > new_file
D
> -----Original Message-----
> From: owner-mpls@pm.org [mailto:owner-mpls@pm.org]On Behalf Of Jeff
> Gleixner
> Sent: Friday, March 22, 2002 2:00 PM
> To: mpls@pm.org
> Subject: Re: [mplspm]: In a sorted list [better]
>
>
> > For
> > the curious, my bigger problem was to take all the words in a
> > 16 MB text
> > file (there was one word on each line) and remove all of the
> > duplicates. [...]
>
>
> Unless you need to use Perl for some specific reason (an
> assignment for class? :-), you could simply use Unix utilities
> already written to do what you need.
>
> % sort < file > uniq -o new_file
>
> It won't get any faster than that.. :-)
>
> See ya
>
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Movies - coverage of the 74th Academy Awards.
> http://movies.yahoo.com/
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
>
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From ken at mathforum.org Mon Mar 25 17:16:27 2002
From: ken at mathforum.org (Ken Williams)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020322105912.A13477@trammell.dyndns.org>
Message-ID: <556019FC-4046-11D6-BE36-003065F6D85A@mathforum.org>
On Saturday, March 23, 2002, at 03:59 AM, John J. Trammell wrote:
> my $hash = sub
> {
> my ($x,$a) = @_;
> my %h = map { $_, 0 } @$a;
> return exists $h{$x};
> };
Hello from Australia.
Since you only care about the keys and not the values, it's generally
quicker to construct the hash like so:
my $hash = sub
{
my ($x,$a) = @_;
my %h;
@h{@$a} = ();
return exists $h{$x};
};
The real question is whether the original list would be better stored in
a hash in the first place, avoiding that conversion each time. The
answer is usually yes. In that case, it's *way* quicker.
-Ken
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From ken at mathforum.org Mon Mar 25 17:23:34 2002
From: ken at mathforum.org (Ken Williams)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To:
Message-ID: <53B24E58-4047-11D6-BE36-003065F6D85A@mathforum.org>
Hey Josh,
Some pointers:
On Saturday, March 23, 2002, at 05:57 AM, Josh Aas wrote:
> #!/usr/bin/perl
>
> print "Loading file...\n";
>
> open (DATA, "/Users/Josh/Big_List.txt") || &CgiDie ("Cannot open
> wordlist.");
> my @data = ;
> close (DATA);
>
> print "Loading file done.\n";
>
> print "Cleaning...\n";
> foreach (@data) {
> chomp;
> }
> print "Done cleaning.\n";
Better to just do everything at once:
-------------------------------------------
print "Loading file...\n";
open (DATA, "/Users/Josh/Big_List.txt") || &CgiDie ("Cannot open
wordlist.");
chomp(my @data = sort );
close (DATA);
print "Loading file, cleaning, & sorting done.\n";
-------------------------------------------
> $wordcount = scalar(@data);
> print "There are ";
> print $wordcount;
> print " words in the file.\n";
You can just do:
print "There are $wordcount words in the file.\n";
Oops - I was going to give some more pointers, but I have to go.
-Ken
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From trammell at trammell.dyndns.org Tue Mar 26 11:46:55 2002
From: trammell at trammell.dyndns.org (John J. Trammell)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <556019FC-4046-11D6-BE36-003065F6D85A@mathforum.org>; from ken@mathforum.org on Tue, Mar 26, 2002 at 10:16:27AM +1100
References: <20020322105912.A13477@trammell.dyndns.org> <556019FC-4046-11D6-BE36-003065F6D85A@mathforum.org>
Message-ID: <20020326114655.A21967@trammell.dyndns.org>
On Tue, Mar 26, 2002 at 10:16:27AM +1100, Ken Williams wrote:
>
> Since you only care about the keys and not the values, it's generally
> quicker to construct the hash like so:
>
> my $hash = sub
> {
> my ($x,$a) = @_;
> my %h;
> @h{@$a} = ();
> return exists $h{$x};
> };
>
The results of the actual experiment (code below) surprised me. Is
there something wrong in my test maybe?
#!/usr/bin/perl -w
use strict;
use Benchmark;
sub __map
{
my ($a) = @_;
my %h = map { $_, 0 } @$a;
return \%h;
}
sub __slice
{
my ($a) = @_;
my %h;
@h{ @$a } = ();
return \%h;
}
foreach my $size ( 10, 100, 1000, 10_000 )
{
warn "$size-element array\n";
my @x = ( 1 .. $size );
timethese( 1_000_000, {
map => q[__map(\@x)],
slice => q[__slice(\@x)],
});
}
__END__
10-element array
Benchmark: timing 1000000 iterations of map, slice...
map: 3 wallclock secs ( 3.77 usr + 0.00 sys = 3.77 CPU) @ 265251.99/s (n=1000000)
slice: 3 wallclock secs ( 3.94 usr + 0.00 sys = 3.94 CPU) @ 253807.11/s (n=1000000)
100-element array
Benchmark: timing 1000000 iterations of map, slice...
map: 3 wallclock secs ( 3.81 usr + 0.00 sys = 3.81 CPU) @ 262467.19/s (n=1000000)
slice: 3 wallclock secs ( 3.94 usr + 0.00 sys = 3.94 CPU) @ 253807.11/s (n=1000000)
1000-element array
Benchmark: timing 1000000 iterations of map, slice...
map: 3 wallclock secs ( 3.87 usr + 0.00 sys = 3.87 CPU) @ 258397.93/s (n=1000000)
slice: 3 wallclock secs ( 3.97 usr + 0.00 sys = 3.97 CPU) @ 251889.17/s (n=1000000)
10000-element array
Benchmark: timing 1000000 iterations of map, slice...
map: 3 wallclock secs ( 3.89 usr + 0.00 sys = 3.89 CPU) @ 257069.41/s (n=1000000)
slice: 3 wallclock secs ( 4.06 usr + 0.00 sys = 4.06 CPU) @ 246305.42/s (n=1000000)
> The real question is whether the original list would be better stored
> in a hash in the first place, avoiding that conversion each time. The
> answer is usually yes. In that case, it's *way* quicker.
Agreed.
--
I would like to see an anime where Love conquers all, then goes mad with the
power and sets up a repressive totalitarian regime that ruthlessly crushes
and oppresses all non-love. -- Frank Raymond Michaels
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From autarch at urth.org Tue Mar 26 12:01:54 2002
From: autarch at urth.org (Dave Rolsky)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: Tomorrow night
Message-ID:
So I'm assuming we're still on for tomorrow night.
So, we'll have a social meeting tomorrow night at 7PM at the Stonehouse.
Be there or be square (or a rectangle, at the very least).
Maybe next month we could have an actual presentation?
C'mon, one of you on this list must have done _something_ interest
recently!
-dave
/*==================
www.urth.org
we await the New Sun
==================*/
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From thomas at stderr.net Tue Mar 26 12:38:05 2002
From: thomas at stderr.net (Thomas Eibner)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: Tomorrow night
In-Reply-To: ; from autarch@urth.org on Tue, Mar 26, 2002 at 12:01:54PM -0600
References:
Message-ID: <20020326193805.A44934@io.stderr.net>
On Tue, Mar 26, 2002 at 12:01:54PM -0600, Dave Rolsky wrote:
> So I'm assuming we're still on for tomorrow night.
>
> So, we'll have a social meeting tomorrow night at 7PM at the Stonehouse.
> Be there or be square (or a rectangle, at the very least).
Where was the Stonehouse again?
> Maybe next month we could have an actual presentation?
>
> C'mon, one of you on this list must have done _something_ interest
> recently!
Not a chance. :)
--
Thomas Eibner DnsZone
mod_pointer
!(C)
Putting the HEST in .COM
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From autarch at urth.org Tue Mar 26 15:32:54 2002
From: autarch at urth.org (Dave Rolsky)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: Tomorrow night
In-Reply-To: <20020326193805.A44934@io.stderr.net>
Message-ID:
On Tue, 26 Mar 2002, Thomas Eibner wrote:
> Where was the Stonehouse again?
http://twincities.citysearch.com/profile/5593545/
/*==================
www.urth.org
we await the New Sun
==================*/
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From Mark_Conty at cargill.com Tue Mar 26 16:23:28 2002
From: Mark_Conty at cargill.com (Mark Conty)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: Local Perl training?
Message-ID: <200203262223.QAA14494@w408lynx.grain.cargill.com>
Someone here at work asked me about Perl training in-town, and I had to
shrug. I checked with majordomo@pm.org, but there doesn't appear to be an
archive for this mailing list, so I can't go back to see if this is
something that's been asked before. (Or is there an archive, but it's
stored elsewhere?)
I snooped around in www.pm.org, www.perl.org, and use.perl.org. In the
last, I found a reference to yet another site, perltraining.org, where
among other things, they had a list of about a dozen Perl training
institutions in the USA. Unfortunately, they were not grouped by state
or anything like that.
Looking in Yahoo, I found that our very own Euler Solutions (Dennis, are
you still on this list?) is supposed to offer some Perl training. Does
anyone know about this offering, or any other Perl training offering here
in the Cities?
Thanks...
--
Mark Conty
APS/NAGO IT Group
Server Team - MS 64
952-984-0503
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From djb at tc.umn.edu Tue Mar 26 16:40:08 2002
From: djb at tc.umn.edu (Dave Bianchi)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: Local Perl training?
References: <200203262223.QAA14494@w408lynx.grain.cargill.com>
Message-ID: <3CA0F8C8.20601BEE@tc.umn.edu>
Euler does offer "Essential Perl" and "CGI Programming using Perl",
according to their website (http://www.euler.com/). I'm not aware of
any other local Perl training, but you can go to the USENIX Technical
conference or the LISA conference and find Perl classes taught by Tom
Christiansen.
- Dave Bianchi
Mark Conty wrote:
>
> Someone here at work asked me about Perl training in-town, and I had to
> shrug. I checked with majordomo@pm.org, but there doesn't appear to be an
> archive for this mailing list, so I can't go back to see if this is
> something that's been asked before. (Or is there an archive, but it's
> stored elsewhere?)
>
> I snooped around in www.pm.org, www.perl.org, and use.perl.org. In the
> last, I found a reference to yet another site, perltraining.org, where
> among other things, they had a list of about a dozen Perl training
> institutions in the USA. Unfortunately, they were not grouped by state
> or anything like that.
>
> Looking in Yahoo, I found that our very own Euler Solutions (Dennis, are
> you still on this list?) is supposed to offer some Perl training. Does
> anyone know about this offering, or any other Perl training offering here
> in the Cities?
>
> Thanks...
> --
> Mark Conty
> APS/NAGO IT Group
> Server Team - MS 64
> 952-984-0503
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From ken at mathforum.org Tue Mar 26 16:44:39 2002
From: ken at mathforum.org (Ken Williams)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <20020326114655.A21967@trammell.dyndns.org>
Message-ID: <0E87FA68-410B-11D6-911B-003065F6D85A@mathforum.org>
Huh, funny.
Well, it might have something to do with the hash keys. Numeric hash
keys (especially sequential ones) notoriously trip up the perl hashing
algorithm, sometimes making everything go into a single bucket. I'm not
sure whether this has been patched in recent perls or not. Maybe try
with random words from /usr/dict/words ?
The other thing is that perl maintains a global table of hash keys so
that it doesn't have to re-hash a key that it's seen elsewhere in a
different context.
But come to thing of it, maybe neither of these things affects this
benchmark. The Conventional Wisdom has been that slicing is faster for
this, but what does it know?
-Ken
On Wednesday, March 27, 2002, at 04:46 AM, John J. Trammell wrote:
> On Tue, Mar 26, 2002 at 10:16:27AM +1100, Ken Williams wrote:
>>
>> Since you only care about the keys and not the values, it's generally
>> quicker to construct the hash like so:
>>
>> my $hash = sub
>> {
>> my ($x,$a) = @_;
>> my %h;
>> @h{@$a} = ();
>> return exists $h{$x};
>> };
>>
>
> The results of the actual experiment (code below) surprised me. Is
> there something wrong in my test maybe?
>
> #!/usr/bin/perl -w
> use strict;
> use Benchmark;
>
> sub __map
> {
> my ($a) = @_;
> my %h = map { $_, 0 } @$a;
> return \%h;
> }
>
> sub __slice
> {
> my ($a) = @_;
> my %h;
> @h{ @$a } = ();
> return \%h;
> }
>
> foreach my $size ( 10, 100, 1000, 10_000 )
> {
> warn "$size-element array\n";
> my @x = ( 1 .. $size );
> timethese( 1_000_000, {
> map => q[__map(\@x)],
> slice => q[__slice(\@x)],
> });
> }
>
> __END__
>
> 10-element array
> Benchmark: timing 1000000 iterations of map, slice...
> map: 3 wallclock secs ( 3.77 usr + 0.00 sys = 3.77 CPU) @
> 265251.99/s (n=1000000)
> slice: 3 wallclock secs ( 3.94 usr + 0.00 sys = 3.94 CPU) @
> 253807.11/s (n=1000000)
> 100-element array
> Benchmark: timing 1000000 iterations of map, slice...
> map: 3 wallclock secs ( 3.81 usr + 0.00 sys = 3.81 CPU) @
> 262467.19/s (n=1000000)
> slice: 3 wallclock secs ( 3.94 usr + 0.00 sys = 3.94 CPU) @
> 253807.11/s (n=1000000)
> 1000-element array
> Benchmark: timing 1000000 iterations of map, slice...
> map: 3 wallclock secs ( 3.87 usr + 0.00 sys = 3.87 CPU) @
> 258397.93/s (n=1000000)
> slice: 3 wallclock secs ( 3.97 usr + 0.00 sys = 3.97 CPU) @
> 251889.17/s (n=1000000)
> 10000-element array
> Benchmark: timing 1000000 iterations of map, slice...
> map: 3 wallclock secs ( 3.89 usr + 0.00 sys = 3.89 CPU) @
> 257069.41/s (n=1000000)
> slice: 3 wallclock secs ( 4.06 usr + 0.00 sys = 4.06 CPU) @
> 246305.42/s (n=1000000)
>
>
>> The real question is whether the original list would be better stored
>> in a hash in the first place, avoiding that conversion each time. The
>> answer is usually yes. In that case, it's *way* quicker.
>
> Agreed.
>
> --
> I would like to see an anime where Love conquers all, then goes mad
> with the
> power and sets up a repressive totalitarian regime that ruthlessly
> crushes
> and oppresses all non-love. -- Frank Raymond
> Michaels
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
>
-Ken
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From ytrah at sound.scc.net Tue Mar 26 17:26:11 2002
From: ytrah at sound.scc.net (Thomas A. Harty)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: Local Perl training?
In-Reply-To: <200203262223.QAA14494@w408lynx.grain.cargill.com>
Message-ID:
Mark,
Hennipen Technical College offers a Perl corse. It's actually a perl/CGI
course, but should teach all the core information you would need.
I'm a little supprised to see Cargill is letting people use Perl around
the company. When I worked there for the CGO I and CGO II I couldn't get
anyone to sign off on using Perl. I had to write several CGI's in AWK.
Cheers,
Tom Harty
On Tue, 26 Mar 2002, Mark Conty wrote:
> Someone here at work asked me about Perl training in-town, and I had to
> shrug. I checked with majordomo@pm.org, but there doesn't appear to be an
> archive for this mailing list, so I can't go back to see if this is
> something that's been asked before. (Or is there an archive, but it's
> stored elsewhere?)
>
> I snooped around in www.pm.org, www.perl.org, and use.perl.org. In the
> last, I found a reference to yet another site, perltraining.org, where
> among other things, they had a list of about a dozen Perl training
> institutions in the USA. Unfortunately, they were not grouped by state
> or anything like that.
>
> Looking in Yahoo, I found that our very own Euler Solutions (Dennis, are
> you still on this list?) is supposed to offer some Perl training. Does
> anyone know about this offering, or any other Perl training offering here
> in the Cities?
>
> Thanks...
> --
> Mark Conty
> APS/NAGO IT Group
> Server Team - MS 64
> 952-984-0503
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
>
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From trammell at trammell.dyndns.org Tue Mar 26 18:45:30 2002
From: trammell at trammell.dyndns.org (John J. Trammell)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: In a sorted list [better]
In-Reply-To: <0E87FA68-410B-11D6-911B-003065F6D85A@mathforum.org>; from ken@mathforum.org on Wed, Mar 27, 2002 at 09:44:39AM +1100
References: <20020326114655.A21967@trammell.dyndns.org> <0E87FA68-410B-11D6-911B-003065F6D85A@mathforum.org>
Message-ID: <20020326184530.A25174@trammell.dyndns.org>
On Wed, Mar 27, 2002 at 09:44:39AM +1100, Ken Williams wrote:
> Huh, funny.
>
> Well, it might have something to do with the hash keys. Numeric hash
> keys (especially sequential ones) notoriously trip up the perl hashing
> algorithm, sometimes making everything go into a single bucket. I'm not
> sure whether this has been patched in recent perls or not. Maybe try
> with random words from /usr/dict/words ?
Trying with randomly-generated nonsense words doesn't change the
flavor of the outcome (results posted below).
> The other thing is that perl maintains a global table of hash keys so
> that it doesn't have to re-hash a key that it's seen elsewhere in a
> different context.
I wonder if the flatness of the timing is related to that.
#!/usr/bin/perl -w
use strict;
use Benchmark;
sub __map
{
my ($a) = @_;
my %h = map { $_, 0 } @$a;
return \%h;
}
sub __slice
{
my ($a) = @_;
my %h;
@h{ @$a } = ();
return \%h;
}
foreach my $size ( 10, 100, 1000, 10_000 )
{
warn "$size-element array\n";
my @x = genarray($size);
warn "sample: @x[0..5]\n";
timethese( 1_000_000, {
map => q[ my $foo = __map(\@x) ],
slice => q[ my $bar = __slice(\@x) ],
});
}
sub genarray
{
my $size = shift;
my @a = ('a' .. 'z');
my @out;
for (1 .. $size)
{
push @out, join("", map $a[rand(@a)], 0 .. 2+rand(5));
}
return @out;
}
__END__
10-element array
sample: xfjpra awhtc kyu uylhkj ogctixb nzwj
Benchmark: timing 1000000 iterations of map, slice...
map: 4 wallclock secs ( 4.20 usr + 0.01 sys = 4.21 CPU) @ 237529.69/s (n=1000000)
slice: 5 wallclock secs ( 4.35 usr + 0.00 sys = 4.35 CPU) @ 229885.06/s (n=1000000)
100-element array
sample: hcc lggly hay ygdqh wlscd afdos
Benchmark: timing 1000000 iterations of map, slice...
map: 3 wallclock secs ( 4.08 usr + 0.00 sys = 4.08 CPU) @ 245098.04/s (n=1000000)
slice: 4 wallclock secs ( 4.39 usr + 0.00 sys = 4.39 CPU) @ 227790.43/s (n=1000000)
1000-element array
sample: vlkqrxe oebi msmc ghz xmkd vlcbrs
Benchmark: timing 1000000 iterations of map, slice...
map: 3 wallclock secs ( 4.12 usr + 0.00 sys = 4.12 CPU) @ 242718.45/s (n=1000000)
slice: 4 wallclock secs ( 4.37 usr + 0.00 sys = 4.37 CPU) @ 228832.95/s (n=1000000)
10000-element array
sample: evpo vsyprxl eidu bgqilvs piex zhb
Benchmark: timing 1000000 iterations of map, slice...
map: 3 wallclock secs ( 4.15 usr + 0.00 sys = 4.15 CPU) @ 240963.86/s (n=1000000)
slice: 4 wallclock secs ( 4.45 usr + 0.00 sys = 4.45 CPU) @ 224719.10/s (n=1000000)
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From ken at mathforum.org Tue Mar 26 19:47:37 2002
From: ken at mathforum.org (Ken Williams)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: Tomorrow night
In-Reply-To:
Message-ID: <9DB65B5E-4124-11D6-96EB-003065F6D85A@mathforum.org>
On Wednesday, March 27, 2002, at 05:01 AM, Dave Rolsky wrote:
> Maybe next month we could have an actual presentation?
>
> C'mon, one of you on this list must have done _something_ interest
> recently!
Dave, how about a presentation on the Container architecture Mason's
using? It's kind of neat.
-Ken
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From James.FitzGibbon at target.com Tue Mar 26 17:42:01 2002
From: James.FitzGibbon at target.com (James.FitzGibbon)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: Local Perl training?
Message-ID:
We have had both really good and really bad experiences with Euler. Two of
our people went to Beginning Perl and had an great instructor named Jeff
Wolfe.
Then a few weeks later they went to Advanced Perl and had a terrible
instructor
whose primary focus was CGI programming rather than Perl as a generic
language.
In discussions with our Euler account manager, we found out that Jeff Wolfe
is no
longer training for them and the second (poor) instructor is now the
permanent Perl
person at Euler. This is a shame because the other courses we've sent
people to Euler
for (UNIX, KSH, etc.) have been really well taught.
We have also found nothing locally; we plan on bringing someone from Tom
Christiansen's
group in sometime in May/June. They charge $3000 a day for up to 12 people;
when you
work it out (even allowing for T&E) the cost per day per person is less than
any of
the established schools like Euler or HOTT. For that caliber of training,
it's a real
bargain.
*** disclaimer *** the above (sans TCPC pricing) is my opinion, not my
employer's or
client's. If someone has had a good experience with Euler's Perl training
recently I'd
love to hear about it.
--
j.
James FitzGibbon
Consultant, Ajilon Services, TTS-3D@TPN4H
james.fitzgibbon@target.com
voice/fax 612-761-6121/4277
> -----Original Message-----
> From: Mark Conty [mailto:Mark_Conty@cargill.com]
> Sent: Tuesday, March 26, 2002 4:23 PM
> To: mpls@pm.org
> Cc: sumika_chai@cargill.com
> Subject: [mplspm]: Local Perl training?
>
>
> Someone here at work asked me about Perl training in-town,
> and I had to
> shrug. I checked with majordomo@pm.org, but there doesn't
> appear to be an
> archive for this mailing list, so I can't go back to see if this is
> something that's been asked before. (Or is there an archive, but it's
> stored elsewhere?)
>
> I snooped around in www.pm.org, www.perl.org, and
> use.perl.org. In the
> last, I found a reference to yet another site, perltraining.org, where
> among other things, they had a list of about a dozen Perl training
> institutions in the USA. Unfortunately, they were not
> grouped by state
> or anything like that.
>
> Looking in Yahoo, I found that our very own Euler Solutions
> (Dennis, are
> you still on this list?) is supposed to offer some Perl
> training. Does
> anyone know about this offering, or any other Perl training
> offering here
> in the Cities?
>
> Thanks...
> --
> Mark Conty
> APS/NAGO IT Group
> Server Team - MS 64
> 952-984-0503
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo@pm.org
> with "unsubscribe mpls" in the body of the message.
>
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From autarch at urth.org Tue Mar 26 23:14:15 2002
From: autarch at urth.org (Dave Rolsky)
Date: Thu Aug 5 00:29:35 2004
Subject: [mplspm]: Tomorrow night
In-Reply-To: <9DB65B5E-4124-11D6-96EB-003065F6D85A@mathforum.org>
Message-ID:
On Wed, 27 Mar 2002, Ken Williams wrote:
>
> On Wednesday, March 27, 2002, at 05:01 AM, Dave Rolsky wrote:
> > Maybe next month we could have an actual presentation?
> >
> > C'mon, one of you on this list must have done _something_ interest
> > recently!
>
> Dave, how about a presentation on the Container architecture Mason's
> using? It's kind of neat.
That might be interesting. Maybe I could present on other odds'n'ends
related to large-scale programming, like exceptions, Params::Validate,
etc.
-dave
/*==================
www.urth.org
we await the New Sun
==================*/
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From james at ehlo.com Wed Mar 27 05:37:21 2002
From: james at ehlo.com (James FitzGibbon)
Date: Thu Aug 5 00:29:36 2004
Subject: [mplspm]: Tomorrow night
In-Reply-To:
References: <9DB65B5E-4124-11D6-96EB-003065F6D85A@mathforum.org>
Message-ID: <20020327113721.GA9938@ehlo.com>
* Dave Rolsky (autarch@urth.org) [020326 23:28]:
> That might be interesting. Maybe I could present on other odds'n'ends
> related to large-scale programming, like exceptions, Params::Validate,
> etc.
I'd definately be interested in this. I think I've got the hang of
Exception::Class on a small scale, but ramping it up (as well as integrating
it with GBARR's Error) is proving to be a bit more difficult.
--
j.
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From phil at archwing.com Thu Mar 28 09:01:07 2002
From: phil at archwing.com (Phil Platt)
Date: Thu Aug 5 00:29:36 2004
Subject: [mplspm]: Local Perl training with ArchWing
In-Reply-To:
Message-ID:
Perlmongers has had quite a focus on Perl training recently. At the risk of
being too commercial, I would like to let the group know about another local
option.
ArchWing Innovations offers training programs in selected areas of our
expertise. We currently offer the following Perl courses on a custom basis
to Twin Cities employers:
- Introduction to Perl and Web Programming
- Intermediate and Object-Oriented Perl
- Advanced Perl Objects, Patterns, and Web Programming
We can also help with one-on-one mentoring for really quick (What do I need
to accomplish right now!?) results.
Stan Kegel is the instructor for these offerings. Many will recall Stan's
presentation on Command Line Perl last fall.
If you would like some more details, please contact me directly. (We do not
have these details posted on our web site.)
ArchWing is a small local company with lots of connections to the OO
community. For example, last year we helped OTUG sponsor the Distinguished
Lecture Series.
Phil Platt
ArchWing Innovations LLC
1313 5th Street SE
Minneapolis MN 55414
www.ArchWing.com
612.379.2014
phil@ArchWing.com
"We do the Heavy Lifting for eBusiness."sm
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.
From autarch at urth.org Thu Mar 28 21:13:58 2002
From: autarch at urth.org (Dave Rolsky)
Date: Thu Aug 5 00:29:36 2004
Subject: [mplspm]: Tomorrow night
In-Reply-To: <20020327113721.GA9938@ehlo.com>
Message-ID:
On Wed, 27 Mar 2002, James FitzGibbon wrote:
> I'd definately be interested in this. I think I've got the hang of
> Exception::Class on a small scale, but ramping it up (as well as integrating
> it with GBARR's Error) is proving to be a bit more difficult.
Ok, here's a tentative outline for a presentation next time:
- Exception throwing and handling with Perl
-- Exception::Class - declare exceptions
-- Error(.pm) - try/catch
- Parameter validation
-- Params::Validate
-- Getargs::Long
-- Class::ParamParser
The latter two I'll probably just cover briefly because I don't really
like either of them ;)
- HTML::Mason::Container, which is basically a set of neat routines for
objects which contain/create other objects, in order to manage constructor
parameter validation and other cool things.
- ? Any requests along the lines of "more disciplined Perl programming"?
There's things like Class::Contract, Tie::SecureHash, and probably others
that might be of interest as well. I haven't used most of them, but I
could talk about a few if anybody has a great desire to hear me do so.
-dave
/*==================
www.urth.org
we await the New Sun
==================*/
--------------------------------------------------
Minneapolis Perl Mongers mailing list
To unsubscribe, send mail to majordomo@pm.org
with "unsubscribe mpls" in the body of the message.