From benjamin.j.hayes at exxonmobil.com Sun Aug 1 21:43:24 2010 From: benjamin.j.hayes at exxonmobil.com (benjamin.j.hayes at exxonmobil.com) Date: Mon, 2 Aug 2010 15:43:24 +1100 Subject: [Melbourne-pm] pod2html problem on Windows Message-ID: Hi Perl Mongers, I'm in the process of porting a build script from Solaris to Windows. The script packages up a collection of Perl scripts for distribution on our corporate network and one of the things it does it to build all the POD into a nice, pretty set of html pages. The code all lives in my TFS workspace on my c: drive and I'm having trouble with pod2html accepting a path which contains a : (like in c:\). I discovered this is because pod2html (in Pod::Html) tries to split the podpath on the : character, so it crashes because it attempts to open a file called C. Of course this all worked perfectly on Solaris where file paths are sans : characters. I tried replacing C:\ with \\$ENV{COMPUTERNAME}\c$, but it appears that only works if you have admin rights on the machine, which in this instance I don't. It seems inconceivable to me that pod2html doesn't work on Windows and I feel there must be a simple solution, but I have not been able to find it. Can anyone help? Regards Ben Hayes Onsite Application Support Coordinator ExxonMobil Technical Computing Company / Upstream IT Upstream Technical Computing / UTC Applications / Application & Data Integration Esso Australia Pty Ltd Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia Phone: +61-3-9270-3538?Fax: +61-3-9270-3600? E-mail: benjamin.j.hayes at exxonmobil.com From alfiejohn at gmail.com Sun Aug 1 21:59:15 2010 From: alfiejohn at gmail.com (Alfie John) Date: Mon, 2 Aug 2010 14:59:15 +1000 Subject: [Melbourne-pm] pod2html problem on Windows In-Reply-To: References: Message-ID: Hi Benjamin, In Pod::Html, it looks like the following line is the offender: @Podpath = split(":", $opt_podpath) if defined $opt_podpath; If you want a quick fix, you can edit in place and get it working by looking at $^O to see what system you're on. Otherwise, submit a patch that does it more portably. Alfie On Mon, Aug 2, 2010 at 2:43 PM, wrote: > > Hi Perl Mongers, > > I'm in the process of porting a build script from Solaris to Windows. The > script packages up a collection of Perl scripts for distribution on our > corporate network and one of the things it does it to build all the POD > into a nice, pretty set of html pages. The code all lives in my TFS > workspace on my c: drive and I'm having trouble with pod2html accepting a > path which contains a : (like in c:\). I discovered this is because > pod2html (in Pod::Html) tries to split the podpath on the : character, so > it crashes because it attempts to open a file called C. Of course this all > worked perfectly on Solaris where file paths are sans : characters. I tried > replacing C:\ with \\$ENV{COMPUTERNAME}\c$, but it appears that only works > if you have admin rights on the machine, which in this instance I don't. It > seems inconceivable to me that pod2html doesn't work on Windows and I feel > there must be a simple solution, but I have not been able to find it. Can > anyone help? > > Regards > > Ben Hayes > Onsite Application Support Coordinator > ExxonMobil Technical Computing Company / Upstream IT > Upstream Technical Computing / UTC Applications / Application & Data > Integration > Esso Australia Pty Ltd > Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia > Phone: +61-3-9270-3538 Fax: +61-3-9270-3600 E-mail: > benjamin.j.hayes at exxonmobil.com > > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.j.hayes at exxonmobil.com Sun Aug 1 22:13:48 2010 From: benjamin.j.hayes at exxonmobil.com (benjamin.j.hayes at exxonmobil.com) Date: Mon, 2 Aug 2010 16:13:48 +1100 Subject: [Melbourne-pm] pod2html problem on Windows In-Reply-To: Message-ID: Thanks Alfie, The problem is that : is used as a delimiter to allow multiple paths to be passed in on the -podpath option. So it would be necessary to change the UI to use a different delimiter on Windows. I was hoping there might be some way to specify an escape character to tell split to ignore particular : characters, and that wouldn't involve changing Html.pm. pod2html has been around for years and I'm frankly amazed that it appears not to work on Windows, which gives a strong feeling that this is user error and I'm missing something.... Regards Ben Hayes Onsite Application Support Coordinator ExxonMobil Technical Computing Company / Upstream IT Upstream Technical Computing / UTC Applications / Application & Data Integration Esso Australia Pty Ltd Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia Phone: +61-3-9270-3538?Fax: +61-3-9270-3600? E-mail: benjamin.j.hayes at exxonmobil.com Alfie John To benjamin.j.hayes at exxonmobil.com cc 02/08/2010 melbourne-pm at pm.org 02:59 PM Subject Re: [Melbourne-pm] pod2html problem on Windows Hi Benjamin, In Pod::Html, it looks like the following line is the offender:? @Podpath = split(":", $opt_podpath) if defined $opt_podpath; If you want a quick fix, you can edit in place and get it working by looking at $^O to see what system you're on. Otherwise, submit a patch that does it more portably. Alfie On Mon, Aug 2, 2010 at 2:43 PM, wrote: Hi Perl Mongers, I'm in the process of porting a build script from Solaris to Windows. The script packages up a collection of Perl scripts for distribution on our corporate network and one of the things it does it to build all the POD into a nice, pretty set of html pages. The code all lives in my TFS workspace on my c: drive and I'm having trouble with pod2html accepting a path which contains a : (like in c:\). I discovered this is because pod2html (in Pod::Html) tries to split the podpath on the : character, so it crashes because it attempts to open a file called C. Of course this all worked perfectly on Solaris where file paths are sans : characters. I tried replacing C:\ with \\$ENV{COMPUTERNAME}\c$, but it appears that only works if you have admin rights on the machine, which in this instance I don't. It seems inconceivable to me that pod2html doesn't work on Windows and I feel there must be a simple solution, but I have not been able to find it. Can anyone help? Regards Ben Hayes Onsite Application Support Coordinator ExxonMobil Technical Computing Company / Upstream IT Upstream Technical Computing / UTC Applications / Application & Data Integration Esso Australia Pty Ltd Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia Phone: +61-3-9270-3538?Fax: +61-3-9270-3600? E-mail: benjamin.j.hayes at exxonmobil.com _______________________________________________ Melbourne-pm mailing list Melbourne-pm at pm.org http://mail.pm.org/mailman/listinfo/melbourne-pm From alfiejohn at gmail.com Sun Aug 1 22:35:32 2010 From: alfiejohn at gmail.com (Alfie John) Date: Mon, 2 Aug 2010 15:35:32 +1000 Subject: [Melbourne-pm] pod2html problem on Windows In-Reply-To: References: Message-ID: Hey again, I think because there is no more info given to the split, you're out of luck. Maybe try subclassing Pod::Html and overriding parse_command_line() or scan_podpath() to do what you want. I know it should do the right thing being an old module. I guess most users either were on a Unix platform, or in a Windows box with their source on the same drive. Alfie On Mon, Aug 2, 2010 at 3:13 PM, wrote: > > > Thanks Alfie, > > The problem is that : is used as a delimiter to allow multiple paths to be > passed in on the -podpath option. So it would be necessary to change the UI > to use a different delimiter on Windows. I was hoping there might be some > way to specify an escape character to tell split to ignore particular : > characters, and that wouldn't involve changing Html.pm. pod2html has been > around for years and I'm frankly amazed that it appears not to work on > Windows, which gives a strong feeling that this is user error and I'm > missing something.... > > Regards > > Ben Hayes > Onsite Application Support Coordinator > ExxonMobil Technical Computing Company / Upstream IT > Upstream Technical Computing / UTC Applications / Application & Data > Integration > Esso Australia Pty Ltd > Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia > Phone: +61-3-9270-3538 Fax: +61-3-9270-3600 E-mail: > benjamin.j.hayes at exxonmobil.com > > > > Alfie John > il.com> To > benjamin.j.hayes at exxonmobil.com > cc > 02/08/2010 melbourne-pm at pm.org > 02:59 PM Subject > Re: [Melbourne-pm] pod2html problem > on Windows > > > > > > > > > > > Hi Benjamin, > > In Pod::Html, it looks like the following line is the offender: @Podpath > = split(":", $opt_podpath) if defined $opt_podpath; > > If you want a quick fix, you can edit in place and get it working by > looking at $^O to see what system you're on. Otherwise, submit a patch that > does it more portably. > > Alfie > > On Mon, Aug 2, 2010 at 2:43 PM, wrote: > > Hi Perl Mongers, > > I'm in the process of porting a build script from Solaris to Windows. The > script packages up a collection of Perl scripts for distribution on our > corporate network and one of the things it does it to build all the POD > into a nice, pretty set of html pages. The code all lives in my TFS > workspace on my c: drive and I'm having trouble with pod2html accepting a > path which contains a : (like in c:\). I discovered this is because > pod2html (in Pod::Html) tries to split the podpath on the : character, so > it crashes because it attempts to open a file called C. Of course this > all > worked perfectly on Solaris where file paths are sans : characters. I > tried > replacing C:\ with \\$ENV{COMPUTERNAME}\c$, but it appears that only > works > if you have admin rights on the machine, which in this instance I don't. > It > seems inconceivable to me that pod2html doesn't work on Windows and I > feel > there must be a simple solution, but I have not been able to find it. Can > anyone help? > > Regards > > Ben Hayes > Onsite Application Support Coordinator > ExxonMobil Technical Computing Company / Upstream IT > Upstream Technical Computing / UTC Applications / Application & Data > Integration > Esso Australia Pty Ltd > Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia > Phone: +61-3-9270-3538 Fax: +61-3-9270-3600 E-mail: > benjamin.j.hayes at exxonmobil.com > > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toby.corkindale at strategicdata.com.au Wed Aug 4 22:50:40 2010 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Thu, 05 Aug 2010 15:50:40 +1000 Subject: [Melbourne-pm] Melbourne Perl Mongers August meeting Message-ID: <4C5A5130.2040706@strategicdata.com.au> Good afternoon, The next Melbourne Perl Mongers meeting will be held on Wednesday the 11th of August at 6:30pm. It will hosted by Strategic Data: Strategic Data Level 2 51-55 Johnston Street Fitzroy 3065 (After this I think we will be moving back into the CBD for some meetings.) Talks for Wednesday are: * David doing a talk on building and deploying native Win32 Perl applications, including the installers, IIS, user permissions, etc. Has anyone else spoken to me about talks they could do? I'd love to see one on Padre - if no-one else steps up, then I'll at least install it and give a 5 minute demo at the meeting. After the meeting we will retire the the nearby "Standard" hotel on Fitzroy street. Cheers, Toby From toby.corkindale at strategicdata.com.au Tue Aug 10 18:01:47 2010 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Wed, 11 Aug 2010 11:01:47 +1000 Subject: [Melbourne-pm] Melbourne Perl Mongers TONIGHT! Message-ID: <4C61F67B.70409@strategicdata.com.au> Good morning, Mongers! The Melbourne Perl Mongers meeting will be held TONIGHT at 6:30pm. It will hosted by Strategic Data: Strategic Data Level 2 51-55 Johnston Street Fitzroy 3065 We'll provide some refreshments. Talks for Wednesday are: * David doing a talk on building and deploying native Win32 Perl applications, including the installers, IIS, user permissions, etc. * Hamish will be taking us through the wonders of Padr?. After the meeting we will retire the the nearby "Standard" hotel on Fitzroy street. Cheers, Toby From ddick at iinet.net.au Wed Aug 11 05:31:51 2010 From: ddick at iinet.net.au (David Dick) Date: Wed, 11 Aug 2010 22:31:51 +1000 Subject: [Melbourne-pm] Melbourne Perl Mongers TONIGHT! In-Reply-To: <4C61F67B.70409@strategicdata.com.au> References: <4C61F67B.70409@strategicdata.com.au> Message-ID: <4C629837.7010801@iinet.net.au> On 11/08/10 11:01, Toby Corkindale wrote: > Talks for Wednesday are: > * David doing a talk on building and deploying native Win32 Perl > applications, including the installers, IIS, user permissions, etc. My talk has been uploaded to http://perl.net.au/wiki/Melbourne_Perl_Mongers/Meeting_History_2010_08 for future reference. From david.tulloh at AirservicesAustralia.com Wed Aug 18 23:52:41 2010 From: david.tulloh at AirservicesAustralia.com (Tulloh, David) Date: Thu, 19 Aug 2010 16:52:41 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files Message-ID: Dear List, As part of my work I have built several modules to handle data files. The idea is to hide the structure and messiness of the data file in a nice reusable module. This also allows the script to focus on the processing rather than the data format. Unfortunately while the method I have evolved towards meets these objectives reasonably well I'm running into significant memory and speed problems with large data files. I have some ideas of ways to restructure it to improve this but all involve some uncomfortable compromises. I was hoping some of the more experienced eyes on the list could look over my approach and make a few suggestions. Following is the basic module structure followed by usage examples. David package DataType; use Moose; use 5.010; use MyTypes; around BUILDARGS => sub { my ($orig, $class, $file) = @_; return $class->$orig(_file => $file); }; has '_file' => ( is => 'ro', isa => 'MyTypes::File', # File handle, IO handle or filename coerce => 1, required => 1, trigger => \&_process_file, ); sub _process_file { my ($this, $file) = @_; # Break file into entries $this->_set_rows([map {DataType::Entry->new($_)} @entry_strings]); } # An easy optimisation is to store a hash of array refs where the # key of the hash is the most commonly searched for string. If # there is no strong key candidate I just leave it as an array. has '_rows' => ( is => 'ro', isa => 'ArrayRef[DataType::Entry]', writer => '_set_rows', default => sub {[]}, ); sub find { my ($this, %fields) = @_; my @possibles = @{$this->_rows}; foreach my $k (keys %fields) { @possibles = grep {$_->$k ~~ $fields{$k}} @possibles; } return @possibles; } no Moose; __PACKAGE__->meta->make_immutable; package DataType::Entry; use Moose; use 5.010; around BUILDARGS => sub { my ($orig, $class, $string) = @_; # Process string into structure return $class->$orig(%structure); } has [qw(field list)] => ( is => 'ro', ); no Moose; __PACKAGE__->meta->make_immutable; Examples of typical usage: my $data = DataType->new($filename); # Convert to a different data format say join "\n", map {} sort {} map {} $data->find; # Loop through all data foreach ($data->find) {} # loop through a subset foreach ($data->find(destination => "YSSY")) {} From toby.corkindale at strategicdata.com.au Thu Aug 19 00:15:22 2010 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Thu, 19 Aug 2010 17:15:22 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: References: Message-ID: <4C6CDA0A.2090003@strategicdata.com.au> On 19/08/10 16:52, Tulloh, David wrote: > Dear List, > > As part of my work I have built several modules to handle data files. > The idea is to hide the structure and messiness of the data file in a > nice reusable module. This also allows the script to focus on the > processing rather than the data format. > > Unfortunately while the method I have evolved towards meets these > objectives reasonably well I'm running into significant memory and speed > problems with large data files. I have some ideas of ways to > restructure it to improve this but all involve some uncomfortable > compromises. > > I was hoping some of the more experienced eyes on the list could look > over my approach and make a few suggestions. Suggestion 1: Perhaps you should import the data file into a database, then let the database do all the hard work for you? By all means put a layer over the DB interface so as to make it nice for people to use. You are running the risk of reinventing the wheel otherwise. Suggestion 2: If you want to stick with processing the file in situ, then you'll need to approach it with a streaming processor, rather than loading the whole thing into memory at once. Are you familiar with that concept? Cheers, Toby From david.tulloh at AirservicesAustralia.com Thu Aug 19 00:35:22 2010 From: david.tulloh at AirservicesAustralia.com (Tulloh, David) Date: Thu, 19 Aug 2010 17:35:22 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <4C6CDA0A.2090003@strategicdata.com.au> References: <4C6CDA0A.2090003@strategicdata.com.au> Message-ID: On 19/08/10 17:15, Toby Corkindale wrote: > Suggestion 1: > Perhaps you should import the data file into a database, then let the database do all the hard work for you? By all means put a layer over the DB interface so as to make it nice for people to use. > You are running the risk of reinventing the wheel otherwise. > > Suggestion 2: > If you want to stick with processing the file in situ, then you'll need to approach it with a streaming processor, rather than loading the whole thing into memory at once. > Are you familiar with that concept? Thanks for the ideas. My hesitation with the first suggestion is that a database felt like overkill for what is normally simple data structures. Ideally I would like all the data to be permanently kept in a database but that's unlikely to happen soon. I'll have another look into temporary SQLite databases as an option. The catch with processing in situ is that often I want random access and some file formats need at least one full pass (data and cancellation entries for example). The more I ponder the more I feel that my objectives are too broad for a single solution. Switching to a database for the complex messy data sets and streaming for the simpler ones may be the ticket. Possibly with a file size check early on. David From sam at nipl.net Sun Aug 22 18:14:27 2010 From: sam at nipl.net (Sam Watkins) Date: Mon, 23 Aug 2010 01:14:27 +0000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: References: <4C6CDA0A.2090003@strategicdata.com.au> Message-ID: <20100823011427.GB21113@nipl.net> hi David, When you say 'large' datasets, how large do you mean? I did experiment with using Perl for a toy full-text search system, it's quite capable to handle medium sized datasets (maybe 500MB) and query and process them very quickly. I think if you have datasets that are smaller than your RAM, and you don't create too many unnecessary perl strings and objects, you should be able to process everything in perl if you prefer to do it like that. It may even outperform a general relational database. Say for example you have 6,000,000 objects each with 10 fields. I would store the objects on disk in the manner of Debian packages files: name: Sam email: sam at ai.ki name: Fred email: fred at yahoo.com Text files, key-value pairs, records terminated with a blank line. I'm not sure as I haven't tried this, but you might find that loading each object into a single string, and parsing out the fields 'on demand' will save you a lot of memory and the program will run faster. IO and specifically swapping is what will kill your performance. You will also need to create indexes of course (perl hash tables). If you are really running out of RAM, you could compress objects using Compress::Zlib or similar - or buy some more RAM! I do like to use streaming systems where possible, but sometimes you want Random access. You could also look at creating your indexes in RAM, but reading the object data from files, or perhaps using Berkerley DB for indexes if your indexes become too big for RAM. I'm not a big fan of SQL, but I do like the mathemtical concept of relational databases. Sam From toby.corkindale at strategicdata.com.au Sun Aug 22 18:49:17 2010 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Mon, 23 Aug 2010 11:49:17 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <20100823011427.GB21113@nipl.net> References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> Message-ID: <4C71D39D.90500@strategicdata.com.au> On 23/08/10 11:14, Sam Watkins wrote: > I think if you have datasets that are smaller than your RAM, and you don't > create too many unnecessary perl strings and objects, you should be able to > process everything in perl if you prefer to do it like that. It may even > outperform a general relational database. Outperform, yes, but it won't scale well at all. [snip example] > I'm not sure as I haven't tried this, but you might find that loading each > object into a single string, and parsing out the fields 'on demand' will save > you a lot of memory and the program will run faster. To both of you - I suggest you benchmark this suggestion before implementing your program around it. My intuition suggests you won't save that much memory with this approach. Perl scalars aren't as inefficient as you imagine. > You will also need to create indexes of course (perl hash tables). If you are > really running out of RAM, you could compress objects using Compress::Zlib or > similar - or buy some more RAM! Or you could use a lightweight db or NoSQL system, which has already implemented those features for you. Perhaps MongoDB or CouchDB would suit you? You can keep buying ram in the short-term, but what happens when your dataset gets 10x bigger? You stop being able to economically install more ram quite quickly.. whereas using a scalable approach will enable you to process more data at no cost and a more linear increase in time. > I do like to use streaming systems where possible, but sometimes you want > Random access. You could also look at creating your indexes in RAM, but > reading the object data from files, or perhaps using Berkerley DB for indexes > if your indexes become too big for RAM. I'm not a big fan of SQL, but I do > like the mathemtical concept of relational databases. From adrian at ash-blue.org Sun Aug 22 19:41:02 2010 From: adrian at ash-blue.org (Adrian Masters) Date: Mon, 23 Aug 2010 12:41:02 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <20100823011427.GB21113@nipl.net> References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> Message-ID: <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com> David, [snip] > Say for example you have 6,000,000 objects each with 10 fields. I would store > the objects on disk in the manner of Debian packages files: > > name: Sam > email: sam at ai.ki > > name: Fred > email: fred at yahoo.com > > > Text files, key-value pairs, records terminated with a blank line. [snip] If you went down this road and were considering exchanging data with others, I'd suggest using either JSON or YAML, as they model rich data structures without the (full) overhead of XML. Doctrine & Propel frameworks for PHP use YAML for ORM schema & data representation. If you want something fast, which parses the data file once, use a stream based approach. You could handle your complex field requirements using a design pattern like SAX (see http://search.cpan.org/~grantm/XML-SAX-0.96/SAX/Intro.pod). If you are going to query the parsed data more often than parsing it, a database is the way to go (as per the worthy suggestions previously). If you want to go full geek, you could look at writing a BTree index for your file, and record characters position (1 index per use case) ;). Adrian. From toby.corkindale at strategicdata.com.au Sun Aug 22 21:31:32 2010 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Mon, 23 Aug 2010 14:31:32 +1000 Subject: [Melbourne-pm] Fwd: WebDev Warehouse Message-ID: <4C71F9A4.3010305@strategicdata.com.au> Hey guys, I'm not at all affiliated with them, but they asked to forward this on, and maybe it's useful to you.. Seems to be an incubator/shared-office place. -Toby ---------- Forwarded message ---------- From: Shaun Moss Date: 21 August 2010 19:04 Subject: The Webdev Warehouse launch party Hi guys I've updated the site at http://webdevwarehouse.com/ again. I added a "Stuff We Need" page (thanks Mark) and a photos gallery for those who didn't see the pics on facebook. Here are the details of the launch party again - I really hope to see a few of you there, your support would mean a lot. Did I mention free beer? If you think you will come please let me know so I have an idea about how much beer and pizza to organise. Location: 16a Linden Street, Brunswick East (see the website for a map and tram instructions) Time: Thursday, 26th August, 2010 from 18:30 Thanks! If you are on the relevant list, can you please forward this information on to the Melbourne Perlmongers, Mobile Monday, the Melbourne Ruby Group, and any other Melbourne-based coders groups who may be interested in something like this. Much appreciated! Shaun From shaun at astromultimedia.com Sun Aug 22 22:31:05 2010 From: shaun at astromultimedia.com (Shaun Moss) Date: Mon, 23 Aug 2010 15:31:05 +1000 Subject: [Melbourne-pm] Fwd: WebDev Warehouse In-Reply-To: <4C71F9A4.3010305@strategicdata.com.au> References: <4C71F9A4.3010305@strategicdata.com.au> Message-ID: <4C720799.6070000@astromultimedia.com> Hi guys This email was from me - I've just joined this list (my knowledge of Perl is rudimentary, but I like it!) Anyway if there are any freelancers out there looking for a place to work, please check out the website and feel free to come along on Thursday night. Cheers, Shaun On 2010-08-23 14:31, Toby Corkindale wrote: > Hey guys, > I'm not at all affiliated with them, but they asked to forward this > on, and maybe it's useful to you.. Seems to be an > incubator/shared-office place. > -Toby > > ---------- Forwarded message ---------- > From: Shaun Moss > Date: 21 August 2010 19:04 > Subject: The Webdev Warehouse launch party > > Hi guys > > I've updated the site at http://webdevwarehouse.com/ again. I added a > "Stuff We Need" page (thanks Mark) and a photos gallery for those who > didn't see the pics on facebook. > > Here are the details of the launch party again - I really hope to see > a few of you there, your support would mean a lot. Did I mention free > beer? If you think you will come please let me know so I have an idea > about how much beer and pizza to organise. > > Location: 16a Linden Street, Brunswick East (see the website for a map > and tram instructions) > Time: Thursday, 26th August, 2010 from 18:30 > > Thanks! If you are on the relevant list, can you please forward this > information on to the Melbourne Perlmongers, Mobile Monday, the > Melbourne Ruby Group, and any other Melbourne-based coders groups who > may be interested in something like this. Much appreciated! > > Shaun > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm > From daniel at rimspace.net Mon Aug 23 01:48:31 2010 From: daniel at rimspace.net (Daniel Pittman) Date: Mon, 23 Aug 2010 18:48:31 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <4C71D39D.90500@strategicdata.com.au> (Toby Corkindale's message of "Mon, 23 Aug 2010 11:49:17 +1000") References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> <4C71D39D.90500@strategicdata.com.au> Message-ID: <87iq31ww80.fsf@rimspace.net> Toby Corkindale writes: > On 23/08/10 11:14, Sam Watkins wrote: > >> I think if you have datasets that are smaller than your RAM, and you don't >> create too many unnecessary perl strings and objects, you should be able to >> process everything in perl if you prefer to do it like that. It may even >> outperform a general relational database. > > Outperform, yes, but it won't scale well at all. *nod* Everything is easy, and every algorithm is sufficient, for data smaller than core memory. Given that 24 to 96 GB of memory is possible for a dedicated home user today, that makes a lot of the old scaling problems go away. (Don't forget persistence, and hardware contention, though :) [...] >> You will also need to create indexes of course (perl hash tables). If you are >> really running out of RAM, you could compress objects using Compress::Zlib or >> similar - or buy some more RAM! > > Or you could use a lightweight db or NoSQL system, which has already > implemented those features for you. Perhaps MongoDB or CouchDB would suit > you? For something like this I would also seriously consider Riak; the main differences between Riak and the MongoDB/CouchDB models are in how they scale across systems. (Internal, invisible sharding vs replication, basically.) They all use JavaScript based map/reduce as their inherent data mining tools, and can generally deliver reasonably on exploiting data locally and the like. Daniel -- ? Daniel Pittman ? daniel at rimspace.net ? +61 401 155 707 ? made with 100 percent post-consumer electrons From scottp at dd.com.au Mon Aug 23 03:54:59 2010 From: scottp at dd.com.au (Scott Penrose) Date: Mon, 23 Aug 2010 20:54:59 +1000 Subject: [Melbourne-pm] Last chance for OSDC Presentations Message-ID: <939274CF-D205-421C-869C-FFDFF1256492@dd.com.au> Hi Melbourne PM Team OSDC is in Melbourne this year ! You all know this. But we have had very few Perl talks, and I have not noticed any Melbourne PM people talking. A golden opportunity to do a talk in Melbourne, not have to pay for travel, get free access to the conference. But... today is the last day for proposals - remember, it is only a proposal - the idea, not the finished paper. There has been lots of good talk, lots of good advice on Melbourne PM lately, lets see it as a talk. Scott From sam at nipl.net Mon Aug 23 21:48:33 2010 From: sam at nipl.net (Sam Watkins) Date: Tue, 24 Aug 2010 04:48:33 +0000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com> References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com> Message-ID: <20100824044833.GA9921@nipl.net> On Mon, Aug 23, 2010 at 12:41:02PM +1000, Adrian Masters wrote: > David, > > [snip] > > Say for example you have 6,000,000 objects each with 10 fields. I would store > > the objects on disk in the manner of Debian packages files: > > > > name: Sam > > email: sam at ai.ki > > > > name: Fred > > email: fred at yahoo.com > > > > > > Text files, key-value pairs, records terminated with a blank line. > [snip] > > If you went down this road and were considering exchanging data with others, I'd suggest using either JSON or YAML The format I'm suggesting is like YAML-lite, without the kitchen sink, as used in email and http headers. The only addition over those is the blank-line as record separator. It's the same as debian package files. I think it's more than sufficient for practically any task, and it's an extremely Simple and Readable format. I don't know of a dataset that can't be expressed nicely like this. If you want more compactness, I would suggest going with TSV. Other formats like XML and even YAML and JSON are unnecessarily over-complicated in my opinion. Simplicity, Clarity, Generality!! http://www.informit.com/ShowCover.aspx?isbn=020161586X > If you want to go full geek, you could look at writing a BTree index for your file, and record characters position (1 index per use case) ;). I like that method :) The file is text, the BTree index can be regenerated from the file. I'd recommend using libdb4 for the index rather than coding your own BTrees unless you'd like to do that. The illustrious postfix does something like this for its map files, well actually I think it creates binary .db files from the text files, not indexes. Although I do prefer to avoid them, It very likely would be much easier to use an SQL database. Sam From sam at nipl.net Mon Aug 23 21:54:16 2010 From: sam at nipl.net (Sam Watkins) Date: Tue, 24 Aug 2010 04:54:16 +0000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <4C71D39D.90500@strategicdata.com.au> References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> <4C71D39D.90500@strategicdata.com.au> Message-ID: <20100824045416.GB9921@nipl.net> On Mon, Aug 23, 2010 at 11:49:17AM +1000, Toby Corkindale wrote: > On 23/08/10 11:14, Sam Watkins wrote: >> I think if you have datasets that are smaller than your RAM, and you don't >> create too many unnecessary perl strings and objects, you should be able to >> process everything in perl if you prefer to do it like that. It may even >> outperform a general relational database. > > Outperform, yes, but it won't scale well at all. True, I guess it depends whether your database is growing faster than Moore's law. I could keep some basic data on 100 million users all in RAM on my 2GB laptop. (name, email, DOB, password). Is the dataset bigger than that? > Or you could use a lightweight db or NoSQL system, which has already > implemented those features for you. > Perhaps MongoDB or CouchDB would suit you? Speaking of 'NoSQL' has anyone used the 'nosql' package in Debian? It provides a TSV based RDB system based on pipes and processors (unix-style tools). I really like this approach and prefer it compared to SQL databases. You can do nice unixy things with this sort of textual database, such as diff <(sort db1/table1) <(sort db2/table2) Sam From daniel at rimspace.net Mon Aug 23 23:05:38 2010 From: daniel at rimspace.net (Daniel Pittman) Date: Tue, 24 Aug 2010 16:05:38 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <20100824044833.GA9921@nipl.net> (Sam Watkins's message of "Tue, 24 Aug 2010 04:48:33 +0000") References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com> <20100824044833.GA9921@nipl.net> Message-ID: <87vd70sfyl.fsf@rimspace.net> Sam Watkins writes: > On Mon, Aug 23, 2010 at 12:41:02PM +1000, Adrian Masters wrote: >> David, >> >> [snip] >> > Say for example you have 6,000,000 objects each with 10 fields. I would store >> > the objects on disk in the manner of Debian packages files: [...] >> > Text files, key-value pairs, records terminated with a blank line. >> [snip] >> >> If you went down this road and were considering exchanging data with others, I'd suggest using either JSON or YAML > > The format I'm suggesting is like YAML-lite, without the kitchen sink, as > used in email and http headers. Ah. So, it is entirely insensitive to linear whitespace inline, are not LWS-preserving, have a limit of 998 and 78 characters total and per-line, possibly including or excluding LWS, in an implementation defined fashion, have case-insensitive and ASCII-only keys, and contains only ASCII characters without encoding in one of URL or RFC2047 MIME word format, then. Right? > The only addition over those is the blank-line as record separator. It's > the same as debian package files. Once you add that it becomes clearer. So, do you support the 'single period' syntax for whitespace inside a line-folded record, and the optional non-folded headers that Debian package control files do, or not? [...] > Other formats like XML and even YAML and JSON are unnecessarily > over-complicated in my opinion. Simplicity, Clarity, Generality!! Sadly, without defining what you mean that very vague description doesn't actually *specify* anything, just give a vague (and English/ASCII oriented) hint in the general direction of what you were thinking. Much as I hate, loath and detest much of the hype around it, the one thing that XML got right (which, naturally, it inherited from SGML) is that it actually specifies the details of how you process arbitrary data in that format. Most of the "simple" things either don't scale to cover the world, or don't actually specify enough that you end up with crazy, crazy things. (STOMP, I am lookin' right at you, here.) Daniel -- ? Daniel Pittman ? daniel at rimspace.net ? +61 401 155 707 ? made with 100 percent post-consumer electrons From sam at nipl.net Tue Aug 24 21:23:31 2010 From: sam at nipl.net (Sam Watkins) Date: Wed, 25 Aug 2010 04:23:31 +0000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <87vd70sfyl.fsf@rimspace.net> References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com> <20100824044833.GA9921@nipl.net> <87vd70sfyl.fsf@rimspace.net> Message-ID: <20100825042331.GA24581@nipl.net> On Tue, Aug 24, 2010 at 04:05:38PM +1000, Daniel Pittman wrote: > > The format I'm suggesting is like YAML-lite, without the kitchen sink, as > > used in email and http headers. > > Ah. So, it is entirely insensitive to linear whitespace inline, are not > LWS-preserving, have a limit of 998 and 78 characters total and per-line, > possibly including or excluding LWS, in an implementation defined fashion, > have case-insensitive and ASCII-only keys, and contains only ASCII characters > without encoding in one of URL or RFC2047 MIME word format, then. > > Right? No. I assume you're being sarcastic and attempting to demostrate how unsimple the header formats are. I am impressed by your knowledge anyway! I use something simpler than that. If a particular application wants to reject long lines or specify an encoding, that's not my concern. > > The only addition over those is the blank-line as record separator. It's > > the same as debian package files. > > Once you add that it becomes clearer. So, do you support the 'single period' > syntax for whitespace inside a line-folded record, and the optional non-folded > headers that Debian package control files do, or not? I think it's useful to support multi-line values. The single period thing sounds reasonable, but I would probably go with simplicity over readability and just use a lone tab or indent to indicate a blank line in the middle of a value, like this (a bad example as addresses seldom contain blank lines!): address: Spry Street, Corburg North 3058 Given that any more value lines after such a blank line must be indented, and headers must not be indented, it's not really a visual problem to omit the period. The difficulty might be that some editors are reluctant to indent blank lines, no big problem I think. > > Other formats like XML and even YAML and JSON are unnecessarily > > over-complicated in my opinion. Simplicity, Clarity, Generality!! > > Sadly, without defining what you mean that very vague description doesn't > actually *specify* anything, just give a vague (and English/ASCII oriented) > hint in the general direction of what you were thinking. sure, this conversation is not a specification. The format I have in mind is crystal clear, simple and unambiguous, and I can supply parsers and formatters for it in perl if you like. > Much as I hate, loath and detest much of the hype around it, the one thing > that XML got right (which, naturally, it inherited from SGML) is that it > actually specifies the details of how you process arbitrary data in that > format. I do like plain simple XML for markup, that's what it's for. I do not like it as a hierarchical file format for storing records, that is a misuse of XML. The format I'm describing can hold values with arbitrary binary data (or text in any chosen encoding) without the need for any escaping or encoding. This is simple and comprehensive. It would normally be used with utf-8 encoded keys and data I suppose, but it would be acceptable to insert binary or differently-encoded data for certain particular keys. The application can interpret the values however it wishes. > Most of the "simple" things either don't scale to cover the world, or don't > actually specify enough that you end up with crazy, crazy things. (STOMP, > I am lookin' right at you, here.) So what are you saying, that I'm crazy, crazy? Which of my things are 'crazy, crazy'? Don't tell me they've got you maintaining some of my perl code? I don't understand your apparent hostility, the 'maintainer' conjecture is the only explanation that comes to mind. I think a data format which can be produced and parsed in say 10 lines of code, and is simple, clear and general, such a format is a lot less crazy that the crock of complexity and featuritis which is full-blown XML. Sam From dsk_gr at hotmail.com Tue Aug 24 22:39:11 2010 From: dsk_gr at hotmail.com (Kostas Avlonitis) Date: Wed, 25 Aug 2010 15:39:11 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <20100825042331.GA24581@nipl.net> References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com> <20100824044833.GA9921@nipl.net> <87vd70sfyl.fsf@rimspace.net> <20100825042331.GA24581@nipl.net> Message-ID: <4C74AC7F.4060900@hotmail.com> [snip] >> I am lookin' right at you, here.) >> > [snip] Wow. Probably some background I'm not aware of. However I don't know when was the last time anyone convinced anyone else using irony and personal calling-out as a method - unless they're playing for the audience - or perhaps it didn't come across as intended. I think Sam's concept of a flat file is valid for a single-user setup even with even a medium volume of data. However there are potential problems: Normalisation and maintenance may become issues if the types are not strictly handled by the app or in a loose multi-programmer environment. Also, as previous posters said, probably not easily scalable as the back-end of even a medium multi-end-user setup (locking records, cross-referencing it with other data, adding fields, indexing, massive data-growth, multiple data-maintainers, reporting additions etc). DBs are a pain but not as much a pain as maintaining files in my experience. I guess it depends on the scale and breadth of the application, but I'm putting my vote on the side of when-in-doubt use a DB. Kostas From daniel at rimspace.net Wed Aug 25 05:48:25 2010 From: daniel at rimspace.net (Daniel Pittman) Date: Wed, 25 Aug 2010 22:48:25 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <20100825042331.GA24581@nipl.net> (Sam Watkins's message of "Wed, 25 Aug 2010 04:23:31 +0000") References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com> <20100824044833.GA9921@nipl.net> <87vd70sfyl.fsf@rimspace.net> <20100825042331.GA24581@nipl.net> Message-ID: <87tymisvs6.fsf@rimspace.net> Sam Watkins writes: > On Tue, Aug 24, 2010 at 04:05:38PM +1000, Daniel Pittman wrote: >> > The format I'm suggesting is like YAML-lite, without the kitchen sink, as >> > used in email and http headers. >> >> Ah. So, it is entirely insensitive to linear whitespace inline, are not >> LWS-preserving, have a limit of 998 and 78 characters total and per-line, >> possibly including or excluding LWS, in an implementation defined fashion, >> have case-insensitive and ASCII-only keys, and contains only ASCII characters >> without encoding in one of URL or RFC2047 MIME word format, then. >> >> Right? > > No. I assume you're being sarcastic and attempting to demostrate how > unsimple the header formats are. I think mostly bitter, because "simple" formats usually don't turn out to be, and like CSV this is one of my least favorite. :) > I am impressed by your knowledge anyway! I use something simpler than that. > If a particular application wants to reject long lines or specify an > encoding, that's not my concern. *nod* My point was, in part, that it isn't as simple as it sounds, because HTTP headers and Email headers have a whole lot of really weird properties as a result of their history. So, yeah: for your own use, not a problem. Any problem is easy when you don't have to interoperate. It gets tricky when you add other people, because you never know which out of those we both might thing were in or out unless we actually discussed it. :) [...] >> > Other formats like XML and even YAML and JSON are unnecessarily >> > over-complicated in my opinion. Simplicity, Clarity, Generality!! >> >> Sadly, without defining what you mean that very vague description doesn't >> actually *specify* anything, just give a vague (and English/ASCII oriented) >> hint in the general direction of what you were thinking. > > sure, this conversation is not a specification. The format I have in mind > is crystal clear, simple and unambiguous, and I can supply parsers and > formatters for it in perl if you like. Nah: just make sure that, if you are documenting it, you do supply a strict specification with it ? because it is harder than it sounds. >> Much as I hate, loath and detest much of the hype around it, the one thing >> that XML got right (which, naturally, it inherited from SGML) is that it >> actually specifies the details of how you process arbitrary data in that >> format. > > I do like plain simple XML for markup, that's what it's for. I do not like it > as a hierarchical file format for storing records, that is a misuse of XML. *nod* SGML is terrible for structuring data. It is wonderful for doing basic markup, though, which coincidentally is what it was designed for initially. Who would have thought? [...] >> Most of the "simple" things either don't scale to cover the world, or don't >> actually specify enough that you end up with crazy, crazy things. (STOMP, >> I am lookin' right at you, here.) > > So what are you saying, that I'm crazy, crazy? > Which of my things are 'crazy, crazy'? Ah, no. Sorry. I was absolutely not calling you crazy, and I am sorry that I wasn't clear about that. No, I was calling the situation that grew up around STOMP crazy: because the specification was so loose, and poor, you end up with a whole lot of versions that don't work together, and all sorts of conventions you need to understand to make it work that are not in the "spec", but are in most real-world implementations. At that point you don't have any more a *simple* messaging protocol, but a crazy mess full of work-arounds and other nasty stuff. [...] > I think a data format which can be produced and parsed in say 10 lines of > code, and is simple, clear and general, such a format is a lot less crazy > that the crock of complexity and featuritis which is full-blown XML. Almost certainly. The trick is getting everyone who works with that data to agree on the *same* ten lines of code, and their interpretation. ;) Daniel -- ? Daniel Pittman ? daniel at rimspace.net ? +61 401 155 707 ? made with 100 percent post-consumer electrons From dsk_gr at hotmail.com Wed Aug 25 06:07:03 2010 From: dsk_gr at hotmail.com (Kostas Avlonitis) Date: Wed, 25 Aug 2010 23:07:03 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <184DF3C7-F44F-464A-8731-D31419FF9105@strategicdata.com.au> References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com> <20100824044833.GA9921@nipl.net> <87vd70sfyl.fsf@rimspace.net> <20100825042331.GA24581@nipl.net> <4C74AC7F.4060900@hotmail.com> <184DF3C7-F44F-464A-8731-D31419FF9105@strategicdata.com.au> Message-ID: <4C751577.3010903@hotmail.com> On 25/08/2010 5:26 PM, Adam Clarke wrote: [snip] > The original quote was " (STOMP, I am lookin' right at you, here.)" So I think you'll find that the thing being looked at was STOMP not Sam. > > http://stomp.codehaus.org/ > > I suspect that STOMP couldn't care less :) > > Cheers > > -- > Adam Clarke > www.strategicdata.com.au > > ...ooops, that's embarrassing. I have to apologise to Daniel and the list here. Was not aware of the STOMP protocol - thought it was some kind of aggressive, pounding emphasis to the sentence. I'll now go back to lurking. K. From daniel at rimspace.net Wed Aug 25 22:06:21 2010 From: daniel at rimspace.net (Daniel Pittman) Date: Thu, 26 Aug 2010 15:06:21 +1000 Subject: [Melbourne-pm] Designing modules to handle large data files In-Reply-To: <4C751577.3010903@hotmail.com> (Kostas Avlonitis's message of "Wed, 25 Aug 2010 23:07:03 +1000") References: <4C6CDA0A.2090003@strategicdata.com.au> <20100823011427.GB21113@nipl.net> <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com> <20100824044833.GA9921@nipl.net> <87vd70sfyl.fsf@rimspace.net> <20100825042331.GA24581@nipl.net> <4C74AC7F.4060900@hotmail.com> <184DF3C7-F44F-464A-8731-D31419FF9105@strategicdata.com.au> <4C751577.3010903@hotmail.com> Message-ID: <878w3u7yk2.fsf@rimspace.net> Kostas Avlonitis writes: > On 25/08/2010 5:26 PM, Adam Clarke wrote: > > [snip] >> The original quote was " (STOMP, I am lookin' right at you, here.)" So I >> think you'll find that the thing being looked at was STOMP not Sam. >> >> http://stomp.codehaus.org/ >> >> I suspect that STOMP couldn't care less :) [...] > ...ooops, that's embarrassing. I have to apologise to Daniel and the list > here. Was not aware of the STOMP protocol - thought it was some kind of > aggressive, pounding emphasis to the sentence. I'll now go back to lurking. Hey, don't be embarrassed: I managed to completely miscommunicate my intentions and all, so your error was trivial by comparison. Daniel -- ? Daniel Pittman ? daniel at rimspace.net ? +61 401 155 707 ? made with 100 percent post-consumer electrons From toby.corkindale at strategicdata.com.au Mon Aug 30 00:37:28 2010 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Mon, 30 Aug 2010 17:37:28 +1000 Subject: [Melbourne-pm] Melbourne Perl Mongers September meeting In-Reply-To: <4C5A5130.2040706@strategicdata.com.au> References: <4C5A5130.2040706@strategicdata.com.au> Message-ID: <4C7B5FB8.9080207@strategicdata.com.au> Good evening, The next Melbourne Perl Mongers meeting will be held on Wednesday the 8th of August, at 6:30pm. It will be hosted by David Dick at Remasys. Remasys Pty Ltd Level 1 180 Flinders St MELBOURNE VIC 3121 I don't think we have any talks lined up yet.. Does anyone have a topic they would like to speak about? Thanks, Toby