From andy at petdance.com Fri Sep 9 07:44:14 2011 From: andy at petdance.com (Andy Lester) Date: Fri, 9 Sep 2011 09:44:14 -0500 Subject: [Chicago-talk] Mongo Chicago 2011 Message-ID: <79D22505-FC54-4AB3-B5FB-AA60AD2151C2@petdance.com> Hello everyone, Many of you are probably aware of this but I thought I'd send out a reminder in case you missed it. 10gen's Mongo Chicago 2011 is happening on October 18th. Registration is $50 if you register before September 20th. Proposals for presentations are accepted through September 15th. More details are available on 10gen's site. Hope to see you there, Seth -- Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list (Chicago-MongoDB-User-Group-list at meetup.com) This message was sent by Seth Mabbott (seth.mabbott at gmail.com) from Chicago MongoDB User Group. To learn more about Seth Mabbott, visit his/her member profile Meetup, PO Box 4668 #37895 New York, New York 10163-4668 | support at meetup.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy at petdance.com Fri Sep 9 07:50:52 2011 From: andy at petdance.com (Andy Lester) Date: Fri, 9 Sep 2011 09:50:52 -0500 Subject: [Chicago-talk] Postgres Open conference in Chicago, September 14-16, 2011 Message-ID: <1EF55713-DAC7-496E-976A-67C70D064F9F@petdance.com> http://postgresopen.org/2011/home/ Postgres Open features use cases, latest developments in the open source database PostgreSQL, and a variety of speakers who will talk about applications, database performance, and the current state of the database market. Many of the speakers and attendees are Oracle, MS SQL, Informix and MySQL DBAs who have recently converted to PostgreSQL. Our schedule is up at: http://postgresopen.org/2011/schedule/ We're also trying to bring Postgres *to* an existing open source and database community in Chicago, and connect deeply with folks who already use Postgres but maybe aren't in touch with key members of the development community. Our conference is a non-profit, backed by a 501(c)3, and has a program committee made up of core PostgreSQL community members, experienced speakers and myself. We chose our city based on the number of books related to Postgres that were sold there. Austin and Chicago are the two places that have sold the most books, but have never had a Postgres conference located there. We'd love to see you at our conference. We're offering a $150 discount for user groups: http://postgresopen.org/2011/tickets/ Enter the following discount code: PUGLUV Feel free to pass the code along to others in the local community. Thanks so much, and hope to see you there. -selena -- Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean at blanton.com Fri Sep 9 08:41:21 2011 From: sean at blanton.com (Sean Blanton) Date: Fri, 9 Sep 2011 11:41:21 -0400 Subject: [Chicago-talk] Mongo Chicago 2011 In-Reply-To: <79D22505-FC54-4AB3-B5FB-AA60AD2151C2@petdance.com> References: <79D22505-FC54-4AB3-B5FB-AA60AD2151C2@petdance.com> Message-ID: Thanks, Seth Regards, Sean On Fri, Sep 9, 2011 at 10:44 AM, Andy Lester wrote: > Hello everyone, > > Many of you are probably aware of this but I thought I'd send out a > reminder in case you missed it. 10gen's Mongo Chicago 2011 is happening on > October 18th. Registration is $50 if you register before September 20th. > Proposals for presentations are accepted through September 15th. More > details are available on 10gen's site > . > > Hope to see you there, > Seth > -- > Please Note: If you hit "*REPLY*", your message will be sent to *everyone*on this mailing list ( > Chicago-MongoDB-User-Group-list at meetup.com) > This message was sent by Seth Mabbott (seth.mabbott at gmail.com) from Chicago > MongoDB User Group . > To learn more about Seth Mabbott, visit his/her member profile > > > Meetup, PO Box 4668 #37895 New York, New York 10163-4668 | > support at meetup.com > > _______________________________________________ > Chicago-talk mailing list > Chicago-talk at pm.org > http://mail.pm.org/mailman/listinfo/chicago-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard at rushlogistics.com Sun Sep 11 13:14:28 2011 From: richard at rushlogistics.com (richard at rushlogistics.com) Date: Sun, 11 Sep 2011 20:14:28 +0000 Subject: [Chicago-talk] Spliting an up undelimited file Message-ID: <1002890549-1315772071-cardhu_decombobulator_blackberry.rim.net-930738286-@b17.c5.bise6.blackberry> I have a text file that I need to split up so I can put it into a database. However, it isn't exactly delimited. The structure is as follows: March 1, 2006 Few interruptions. Operations proceed as planed. March 2, 2006 Delays due to bad weather and worker absences. March 3, 2006 Significant progress. Few absences reported and agreeable weather. I want to split it up into two scalars: date and event description however since it's not delimited I'm not sure how to go about this. Any suggestions appreciated. Watch our 3 minute movie: http://www.rushlogistics.com/movie From jim at jimandkoka.com Sun Sep 11 13:19:25 2011 From: jim at jimandkoka.com (Jim Thomason) Date: Sun, 11 Sep 2011 15:19:25 -0500 Subject: [Chicago-talk] Spliting an up undelimited file In-Reply-To: <1002890549-1315772071-cardhu_decombobulator_blackberry.rim.net-930738286-@b17.c5.bise6.blackberry> References: <1002890549-1315772071-cardhu_decombobulator_blackberry.rim.net-930738286-@b17.c5.bise6.blackberry> Message-ID: On Sun, Sep 11, 2011 at 3:14 PM, wrote: > I have a text file that I need to split up so I can put it into a database. However, it isn't exactly delimited. The structure is as follows: > > March 1, 2006 Few interruptions. Operations proceed as planed. > March 2, 2006 Delays due to bad weather and worker absences. > March 3, 2006 Significant progress. Few absences reported and agreeable weather. > > I want to split it up into two scalars: date and event description however since it's not delimited I'm not sure how to go about this. Any suggestions appreciated. This still looks rigidly structured - "date" "space" "run of text" while (<>) { if (/(\w+ \d+, \d{4}) (.+)/) { my ($date, $memo) = ($1, $2); #do something interesting with $date and $memo } } or something to that effect. Be more or less paranoid about the format of the month, date, and year as desired. -Jim..... From tigerpeng2001 at yahoo.com Mon Sep 12 07:03:25 2011 From: tigerpeng2001 at yahoo.com (tiger peng) Date: Mon, 12 Sep 2011 07:03:25 -0700 (PDT) Subject: [Chicago-talk] Spliting an up undelimited file In-Reply-To: <1002890549-1315772071-cardhu_decombobulator_blackberry.rim.net-930738286-@b17.c5.bise6.blackberry> References: <1002890549-1315772071-cardhu_decombobulator_blackberry.rim.net-930738286-@b17.c5.bise6.blackberry> Message-ID: <1315836205.81348.YahooMailNeo@web120528.mail.ne1.yahoo.com> Are the date format(s) known? Are there limited event description? Can post some (makeup) sample data? ________________________________ From: "richard at rushlogistics.com" To: chicago-talk at pm.org Sent: Sunday, September 11, 2011 3:14 PM Subject: [Chicago-talk] Spliting an up undelimited file I have a text file that I need to split up so I can put it into a database. However, it isn't exactly delimited. The structure is as follows: March 1, 2006 Few interruptions. Operations proceed as planed. March 2, 2006 Delays due to bad weather and worker absences. March 3, 2006 Significant progress. Few absences reported and agreeable weather. I want to split it up into two scalars: date and event description however since it's not delimited I'm not sure how to go about this. Any suggestions appreciated. Watch our 3 minute movie: http://www.rushlogistics.com/movie _______________________________________________ Chicago-talk mailing list Chicago-talk at pm.org http://mail.pm.org/mailman/listinfo/chicago-talk -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andy_Bach at wiwb.uscourts.gov Mon Sep 12 07:20:01 2011 From: Andy_Bach at wiwb.uscourts.gov (Andy_Bach at wiwb.uscourts.gov) Date: Mon, 12 Sep 2011 09:20:01 -0500 Subject: [Chicago-talk] Spliting an up undelimited file In-Reply-To: References: <1002890549-1315772071-cardhu_decombobulator_blackberry.rim.net-930738286-@b17.c5.bise6.blackberry> Message-ID: > This still looks rigidly structured - "date" "space" "run of text" while (<>) { if (/(\w+ \d+, \d{4}) (.+)/) { my ($date, $memo) = ($1, $2); #do something interesting with $date and $memo } } Yeah, and just to be safe, use whitespace metas, (and /x - "readability") to get: if (/(\w+ \s+ \d+, \s+ \d+) \s +(.+)/x) { if there's a chance for variability, as w/ those logs that outdent the single digit date number March 8 March 9 March 10 and add and 'else' if you want to worry about bad data. a ---------------------- Andy Bach Systems Mangler Internet: andy_bach at wiwb.uscourts.gov Voice: (608) 261-5738, Cell: (608) 658-1890 ?One of the most striking differences between a cat and a lie is that a cat has only nine lives.? Mark Twain, Vice President, American Anti-Imperialist League, and erstwhile writer From vjcang at gmail.com Wed Sep 14 01:50:14 2011 From: vjcang at gmail.com (Vijay Kumar) Date: Wed, 14 Sep 2011 04:50:14 -0400 Subject: [Chicago-talk] Simulating "Save Link As" in Perl Message-ID: Hi, When I access below $binaryfile_url (some url pointing to a binary file) from a web browser, I get HTTP Error 404. However, I can save the binary file to my hard disk by right clicking the url and selectung 'Save Link As'. Now, when I try this with LWP::Simple my $status=getstore($binaryfile_url, $download_file_fullpath); it fails with the same 404 error. I want to simulate the 'Save Link As' behavior of the web browser to download it programmatically from Perl. Any ideas? Thanks a lot VIJAY -------------- next part -------------- An HTML attachment was scrubbed... URL: From vjcang at gmail.com Wed Sep 14 02:14:26 2011 From: vjcang at gmail.com (Vijay Kumar) Date: Wed, 14 Sep 2011 05:14:26 -0400 Subject: [Chicago-talk] Simulating "Save Link As" in Perl In-Reply-To: References: Message-ID: Apologies. Please ignore this mail. I did a mistake in my testing. It works. Thanks VIJAY On 14 September 2011 04:50, Vijay Kumar wrote: > Hi, > > When I access below $binaryfile_url (some url pointing to a binary file) > from a web browser, I get HTTP Error 404. > However, I can save the binary file to my hard disk by right clicking the > url and selectung 'Save Link As'. > > Now, when I try this with LWP::Simple > my $status=getstore($binaryfile_url, $download_file_fullpath); > it fails with the same 404 error. > > I want to simulate the 'Save Link As' behavior of the web browser to > download it programmatically from Perl. Any ideas? > > Thanks a lot > VIJAY > -- VIJAY -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at potter.name Thu Sep 15 08:18:16 2011 From: michael at potter.name (Michael Potter) Date: Thu, 15 Sep 2011 11:18:16 -0400 Subject: [Chicago-talk] Mechanical Turk Message-ID: Perl Crew, I have been called upon to try to do "OCR" on handwriting. In particular, I need to convert a hand written name to ascii. I could provide a small .tif with just the name in it. It came to mind that this might be a good use of mechanical turk. I am sending this to the perl list because I seem to recall some of the Mongers have worked with mechanical turk. Here are my specific questions: 1) how long is typical turn around for a response? 2) Is this a reasonable task for Mechanical Turk. I looked at the amazon website for HITs similar to what I am trying to do. I did not find any, but I question my ability to search completely. The closest I found was business card transcription. You comments welcome. -- Michael Potter Replatform Technologies, LLC +1 770 815 6142 michael at potter.name -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel.a.berger at gmail.com Thu Sep 15 12:26:13 2011 From: joel.a.berger at gmail.com (Joel Berger) Date: Thu, 15 Sep 2011 14:26:13 -0500 Subject: [Chicago-talk] Mechanical Turk In-Reply-To: References: Message-ID: Have you tried OCRing programmatically? http://search.cpan.org/search?mode=all&query=ocr How have the results been? It seems that if you could eliminate the easy ones and perhaps only shift the problematic ones to mTurk that would be cheaper. Joel On Thu, Sep 15, 2011 at 10:18 AM, Michael Potter wrote: > Perl Crew, > I have been called upon to try to do "OCR" on handwriting. > In particular, I need to convert a hand written name to ascii. ?I could > provide a small .tif with just the name in it. > It came to mind that this might be a good use of mechanical turk. > I am sending this to the perl list because I seem to recall some of the > Mongers have worked with mechanical turk. > Here are my specific questions: > 1) how long is typical turn around for a response? > 2) Is this a reasonable task for Mechanical Turk. > I looked at the amazon website for HITs similar to what I am trying to do. > ?I did not find any, but I question my ability to search completely. ?The > closest I found was business card transcription. > You comments welcome. > -- > Michael Potter > Replatform Technologies, LLC > +1 770 815 6142 > michael at potter.name > > _______________________________________________ > Chicago-talk mailing list > Chicago-talk at pm.org > http://mail.pm.org/mailman/listinfo/chicago-talk > From michael at potter.name Thu Sep 15 13:01:16 2011 From: michael at potter.name (Michael Potter) Date: Thu, 15 Sep 2011 16:01:16 -0400 Subject: [Chicago-talk] Mechanical Turk In-Reply-To: References:

Message-ID: yes, we are using tesseract-3.00 for OCR of the computer printed text. We are going to try to get the tesseract trained to do hand written block letters, but I am not holding out a lot of hope that it will work with. I am researching the next best option which might be the mechanical turk. On Thu, Sep 15, 2011 at 3:26 PM, Joel Berger wrote: > Have you tried OCRing programmatically? > http://search.cpan.org/search?mode=all&query=ocr > > How have the results been? It seems that if you could eliminate the > easy ones and perhaps only shift the problematic ones to mTurk that > would be cheaper. > > Joel > > On Thu, Sep 15, 2011 at 10:18 AM, Michael Potter > wrote: > > Perl Crew, > > I have been called upon to try to do "OCR" on handwriting. > > In particular, I need to convert a hand written name to ascii. I could > > provide a small .tif with just the name in it. > > It came to mind that this might be a good use of mechanical turk. > > I am sending this to the perl list because I seem to recall some of the > > Mongers have worked with mechanical turk. > > Here are my specific questions: > > 1) how long is typical turn around for a response? > > 2) Is this a reasonable task for Mechanical Turk. > > I looked at the amazon website for HITs similar to what I am trying to > do. > > I did not find any, but I question my ability to search completely. The > > closest I found was business card transcription. > > You comments welcome. > > -- > > Michael Potter > > Replatform Technologies, LLC > > +1 770 815 6142 > > michael at potter.name > > > > _______________________________________________ > > Chicago-talk mailing list > > Chicago-talk at pm.org > > http://mail.pm.org/mailman/listinfo/chicago-talk > > > _______________________________________________ > Chicago-talk mailing list > Chicago-talk at pm.org > http://mail.pm.org/mailman/listinfo/chicago-talk > -- Michael Potter Replatform Technologies, LLC +1 770 815 6142 michael at potter.name -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at potter.name Thu Sep 15 14:39:19 2011 From: michael at potter.name (Michael Potter) Date: Thu, 15 Sep 2011 17:39:19 -0400 Subject: [Chicago-talk] Mechanical Turk In-Reply-To: References:

Message-ID: Here are a couple more comments: Errors are not a big deal. We already deal with typos in names all the time. To check, I think I would run twice, if they did not match significantly, run a third time. The names are not sensitive. The stranger would know that somewhere in the world a person lived named "Ruth Smith". Not a big deal. If at some time in the future someone decides that it is a big deal I will run a HIT for first name and at HIT for last name. Anyone know the trick to embedding the image in the HIT? >From what I read I need to provide a url to the image, but I would rather have the image embedded in the request. Seems easier to control security. On Thu, Sep 15, 2011 at 4:01 PM, Michael Potter wrote: > yes, we are using tesseract-3.00 for OCR of the computer printed text. > > We are going to try to get the tesseract trained to do hand written block > letters, but I am not holding out a lot of hope that it will work with. > > I am researching the next best option which might be the mechanical turk. > > > On Thu, Sep 15, 2011 at 3:26 PM, Joel Berger wrote: > >> Have you tried OCRing programmatically? >> http://search.cpan.org/search?mode=all&query=ocr >> >> How have the results been? It seems that if you could eliminate the >> easy ones and perhaps only shift the problematic ones to mTurk that >> would be cheaper. >> >> Joel >> >> On Thu, Sep 15, 2011 at 10:18 AM, Michael Potter >> wrote: >> > Perl Crew, >> > I have been called upon to try to do "OCR" on handwriting. >> > In particular, I need to convert a hand written name to ascii. I could >> > provide a small .tif with just the name in it. >> > It came to mind that this might be a good use of mechanical turk. >> > I am sending this to the perl list because I seem to recall some of the >> > Mongers have worked with mechanical turk. >> > Here are my specific questions: >> > 1) how long is typical turn around for a response? >> > 2) Is this a reasonable task for Mechanical Turk. >> > I looked at the amazon website for HITs similar to what I am trying to >> do. >> > I did not find any, but I question my ability to search completely. >> The >> > closest I found was business card transcription. >> > You comments welcome. >> > -- >> > Michael Potter >> > Replatform Technologies, LLC >> > +1 770 815 6142 >> > michael at potter.name >> > >> > _______________________________________________ >> > Chicago-talk mailing list >> > Chicago-talk at pm.org >> > http://mail.pm.org/mailman/listinfo/chicago-talk >> > >> _______________________________________________ >> Chicago-talk mailing list >> Chicago-talk at pm.org >> http://mail.pm.org/mailman/listinfo/chicago-talk >> > > > > -- > Michael Potter > Replatform Technologies, LLC > +1 770 815 6142 > michael at potter.name > -- Michael Potter Replatform Technologies, LLC +1 770 815 6142 michael at potter.name -------------- next part -------------- An HTML attachment was scrubbed... URL: From selenamarie at gmail.com Sun Sep 18 20:03:15 2011 From: selenamarie at gmail.com (Selena Deckelmann) Date: Sun, 18 Sep 2011 22:03:15 -0500 Subject: [Chicago-talk] Slides from PostgreSQL 9.1 talk In-Reply-To: References: Message-ID: Hello Perlmongers! Thanks for hosting me at BofA in Chicago last week. Stephen Frost, David Wheeler and I had a blast. Here's a shortlink: http://chesnok.com/u/4U I made one mistake in the discussion that I can correct now - unlogged tables *are* preserved after a clean shutdown as of 9.1 release. Stephen and I discussed the issue with the author of the feature during Postgres Open, and he let us know that a long discussion happened about what the preferred default behavior should be, and those who thought clean shutdown *should not cause a truncate* prevailed. The slides are updated to reflect that. Thanks again! -selena -- http://chesnok.com -- http://chesnok.com From richard at rushlogistics.com Wed Sep 21 17:28:59 2011 From: richard at rushlogistics.com (Richard Reina) Date: Wed, 21 Sep 2011 20:28:59 -0400 (EDT) Subject: [Chicago-talk] data mining Message-ID: <20110922002859.A20EA611@swiftsure.xo.com> I am hoping to create a US geography database comprised of information about US towns and cities. I was able to get the populations of all incorporated towns from a US census file into a table. However, I am hoping to create a table with facts or (trivia) about the towns themselves (large and small) -- when they were founded, what they're known for (if anything). Writing a perl program that uses regex to search the web is probably beyond my skills. However, I am wondering if such a thing is possible and if so how hard or easy it is? I looked on CPAN but confess I really am not sure if I would recognise what I need if I saw it. Any ideas on how or IF this can be done would be greatly appreciated. Thanks, Richard -- Richard Reina Rush Logistics, Inc. Watch our 3 minute movie: http://www.rushlogistics.com/movie From don at drakeconsulting.com Wed Sep 21 17:49:51 2011 From: don at drakeconsulting.com (Don Drake) Date: Wed, 21 Sep 2011 19:49:51 -0500 Subject: [Chicago-talk] data mining In-Reply-To: <20110922002859.A20EA611@swiftsure.xo.com> References: <20110922002859.A20EA611@swiftsure.xo.com> Message-ID: <97F7107D-B647-409A-9A22-381659F97813@drakeconsulting.com> I would like here for data: https://simplegeo.com/products/context/#11.00/41.8639/-87.6091 or http://www.factual.com And use their API's to get the data you need. -Don -- Don Drake www.drakeconsulting.com www.maillaunder.com 312-560-1574 800-733-2143 On Sep 21, 2011, at 7:28 PM, Richard Reina wrote: > I am hoping to create a US geography database comprised of information about US towns and cities. I was able to get the populations of all incorporated towns from a US census file into a table. However, I am hoping to create a table with facts or (trivia) about the towns themselves (large and small) -- when they were founded, what they're known for (if anything). Writing a perl program that uses regex to search the web is probably beyond my skills. However, I am wondering if such a thing is possible and if so how hard or easy it is? I looked on CPAN but confess I really am not sure if I would recognise what I need if I saw it. Any ideas on how or IF this can be done would be greatly appreciated. > > Thanks, > > Richard > -- > Richard Reina > Rush Logistics, Inc. > Watch our 3 minute movie: > http://www.rushlogistics.com/movie > > _______________________________________________ > Chicago-talk mailing list > Chicago-talk at pm.org > http://mail.pm.org/mailman/listinfo/chicago-talk -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcmertens.perl at gmail.com Thu Sep 22 06:40:30 2011 From: dcmertens.perl at gmail.com (David Mertens) Date: Thu, 22 Sep 2011 08:40:30 -0500 Subject: [Chicago-talk] data mining In-Reply-To: <20110922002859.A20EA611@swiftsure.xo.com> References: <20110922002859.A20EA611@swiftsure.xo.com> Message-ID: Richard, you said: > Writing a perl program that uses regex to search the web is probably beyond my skills. Can you elaborate on that? To what would you apply such a regex? Were you thinking about doing an all-out web crawl, then parsing the output to find relevant information about a given city? Are you looking for an existing database that you can tap? (If the latter, Don's suggestions look pretty good. See p3rl.org/Geo::Coder::SimpleGeo or p3rl.org/Net::HTTP::Factual.) If you want an all-out web crawl, generating your data from the web from scratch, I can imagine putting something together with WWW::Mechanize to do the crawl. Determining which page has relevant---and authoritative---information about a city, and then managing to extract that information, can get very complicated. What are your resources? What is your timeline? What is your expertise in data mining? David