From benjamin.j.hayes at exxonmobil.com  Sun Aug  1 21:43:24 2010
From: benjamin.j.hayes at exxonmobil.com (benjamin.j.hayes at exxonmobil.com)
Date: Mon, 2 Aug 2010 15:43:24 +1100
Subject: [Melbourne-pm] pod2html problem on Windows
Message-ID: <OF11677739.0BF4265D-ONCA257773.00177D84-CA257773.0019F2B1@exxonmobil.com>


Hi Perl Mongers,

I'm in the process of porting a build script from Solaris to Windows. The
script packages up a collection of Perl scripts for distribution on our
corporate network and one of the things it does it to build all the POD
into a nice, pretty set of html pages. The code all lives in my TFS
workspace on my c: drive and I'm having trouble with pod2html accepting a
path which contains a : (like in c:\). I discovered this is because
pod2html (in Pod::Html) tries to split the podpath on the : character, so
it crashes because it attempts to open a file called C. Of course this all
worked perfectly on Solaris where file paths are sans : characters. I tried
replacing C:\ with \\$ENV{COMPUTERNAME}\c$, but it appears that only works
if you have admin rights on the machine, which in this instance I don't. It
seems inconceivable to me that pod2html doesn't work on Windows and I feel
there must be a simple solution, but I have not been able to find it. Can
anyone help?

Regards

Ben Hayes
Onsite Application Support Coordinator
ExxonMobil Technical Computing Company / Upstream IT
Upstream Technical Computing / UTC Applications / Application & Data
Integration
Esso Australia Pty Ltd
Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia
Phone: +61-3-9270-3538?Fax: +61-3-9270-3600? E-mail:
benjamin.j.hayes at exxonmobil.com


From alfiejohn at gmail.com  Sun Aug  1 21:59:15 2010
From: alfiejohn at gmail.com (Alfie John)
Date: Mon, 2 Aug 2010 14:59:15 +1000
Subject: [Melbourne-pm] pod2html problem on Windows
In-Reply-To: <OF11677739.0BF4265D-ONCA257773.00177D84-CA257773.0019F2B1@exxonmobil.com>
References: <OF11677739.0BF4265D-ONCA257773.00177D84-CA257773.0019F2B1@exxonmobil.com>
Message-ID: <AANLkTi=fF_Pc5GMHcfkvx0_tbpZiZwKWiVahZg9tkaUW@mail.gmail.com>

Hi Benjamin,

In Pod::Html, it looks like the following line is the offender:

  @Podpath  = split(":", $opt_podpath) if defined $opt_podpath;

If you want a quick fix, you can edit in place and get it working by looking
at $^O to see what system you're on. Otherwise, submit a patch that does it
more portably.

Alfie

On Mon, Aug 2, 2010 at 2:43 PM, <benjamin.j.hayes at exxonmobil.com> wrote:

>
> Hi Perl Mongers,
>
> I'm in the process of porting a build script from Solaris to Windows. The
> script packages up a collection of Perl scripts for distribution on our
> corporate network and one of the things it does it to build all the POD
> into a nice, pretty set of html pages. The code all lives in my TFS
> workspace on my c: drive and I'm having trouble with pod2html accepting a
> path which contains a : (like in c:\). I discovered this is because
> pod2html (in Pod::Html) tries to split the podpath on the : character, so
> it crashes because it attempts to open a file called C. Of course this all
> worked perfectly on Solaris where file paths are sans : characters. I tried
> replacing C:\ with \\$ENV{COMPUTERNAME}\c$, but it appears that only works
> if you have admin rights on the machine, which in this instance I don't. It
> seems inconceivable to me that pod2html doesn't work on Windows and I feel
> there must be a simple solution, but I have not been able to find it. Can
> anyone help?
>
> Regards
>
> Ben Hayes
> Onsite Application Support Coordinator
> ExxonMobil Technical Computing Company / Upstream IT
> Upstream Technical Computing / UTC Applications / Application & Data
> Integration
> Esso Australia Pty Ltd
> Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia
> Phone: +61-3-9270-3538 Fax: +61-3-9270-3600  E-mail:
> benjamin.j.hayes at exxonmobil.com
>
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/melbourne-pm/attachments/20100802/41459e7c/attachment.html>

From benjamin.j.hayes at exxonmobil.com  Sun Aug  1 22:13:48 2010
From: benjamin.j.hayes at exxonmobil.com (benjamin.j.hayes at exxonmobil.com)
Date: Mon, 2 Aug 2010 16:13:48 +1100
Subject: [Melbourne-pm] pod2html problem on Windows
In-Reply-To: <AANLkTi=fF_Pc5GMHcfkvx0_tbpZiZwKWiVahZg9tkaUW@mail.gmail.com>
Message-ID: <OF74A2129A.98C64F66-ONCA257773.001C12C7-CA257773.001CBB83@exxonmobil.com>


Thanks Alfie,

The problem is that : is used as a delimiter to allow multiple paths to be
passed in on the -podpath option. So it would be necessary to change the UI
to use a different delimiter on Windows. I was hoping there might be some
way to specify an escape character to tell split to ignore particular :
characters, and that wouldn't involve changing Html.pm. pod2html has been
around for years and I'm frankly amazed that it appears not to work on
Windows, which gives a strong feeling that this is user error and I'm
missing something....

Regards

Ben Hayes
Onsite Application Support Coordinator
ExxonMobil Technical Computing Company / Upstream IT
Upstream Technical Computing / UTC Applications / Application & Data
Integration
Esso Australia Pty Ltd
Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia
Phone: +61-3-9270-3538?Fax: +61-3-9270-3600? E-mail:
benjamin.j.hayes at exxonmobil.com


             Alfie John                                                    
             <alfiejohn at gma                                                
             il.com>                                                    To 
                                      benjamin.j.hayes at exxonmobil.com      
                                                                        cc 
             02/08/2010               melbourne-pm at pm.org                  
             02:59 PM                                              Subject 
                                      Re: [Melbourne-pm] pod2html problem  
                                      on Windows                           
                                                                           
                                                                           
Hi Benjamin,

In Pod::Html, it looks like the following line is the offender:  ? @Podpath
= split(":", $opt_podpath) if defined $opt_podpath;

If you want a quick fix, you can edit in place and get it working by
looking at $^O to see what system you're on. Otherwise, submit a patch that
does it more portably.

Alfie

On Mon, Aug 2, 2010 at 2:43 PM, <benjamin.j.hayes at exxonmobil.com> wrote:

  Hi Perl Mongers,

  I'm in the process of porting a build script from Solaris to Windows. The
  script packages up a collection of Perl scripts for distribution on our
  corporate network and one of the things it does it to build all the POD
  into a nice, pretty set of html pages. The code all lives in my TFS
  workspace on my c: drive and I'm having trouble with pod2html accepting a
  path which contains a : (like in c:\). I discovered this is because
  pod2html (in Pod::Html) tries to split the podpath on the : character, so
  it crashes because it attempts to open a file called C. Of course this
  all
  worked perfectly on Solaris where file paths are sans : characters. I
  tried
  replacing C:\ with \\$ENV{COMPUTERNAME}\c$, but it appears that only
  works
  if you have admin rights on the machine, which in this instance I don't.
  It
  seems inconceivable to me that pod2html doesn't work on Windows and I
  feel
  there must be a simple solution, but I have not been able to find it. Can
  anyone help?

  Regards

  Ben Hayes
  Onsite Application Support Coordinator
  ExxonMobil Technical Computing Company / Upstream IT
  Upstream Technical Computing / UTC Applications / Application & Data
  Integration
  Esso Australia Pty Ltd
  Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia
  Phone: +61-3-9270-3538?Fax: +61-3-9270-3600? E-mail:
  benjamin.j.hayes at exxonmobil.com

  _______________________________________________
  Melbourne-pm mailing list
  Melbourne-pm at pm.org
  http://mail.pm.org/mailman/listinfo/melbourne-pm


From alfiejohn at gmail.com  Sun Aug  1 22:35:32 2010
From: alfiejohn at gmail.com (Alfie John)
Date: Mon, 2 Aug 2010 15:35:32 +1000
Subject: [Melbourne-pm] pod2html problem on Windows
In-Reply-To: <OF74A2129A.98C64F66-ONCA257773.001C12C7-CA257773.001CBB83@exxonmobil.com>
References: <AANLkTi=fF_Pc5GMHcfkvx0_tbpZiZwKWiVahZg9tkaUW@mail.gmail.com>
	<OF74A2129A.98C64F66-ONCA257773.001C12C7-CA257773.001CBB83@exxonmobil.com>
Message-ID: <AANLkTikqH7jXzFv0nVWdYOwpquaGxzJQ89Jr=HXbeqxj@mail.gmail.com>

Hey again,

I think because there is no more info given to the split, you're out of
luck. Maybe try subclassing Pod::Html and overriding parse_command_line() or
scan_podpath() to do what you want. I know it should do the right thing
being an old module. I guess most users either were on a Unix platform, or
in a Windows box with their source on the same drive.

Alfie

On Mon, Aug 2, 2010 at 3:13 PM, <benjamin.j.hayes at exxonmobil.com> wrote:

>
>
> Thanks Alfie,
>
> The problem is that : is used as a delimiter to allow multiple paths to be
> passed in on the -podpath option. So it would be necessary to change the UI
> to use a different delimiter on Windows. I was hoping there might be some
> way to specify an escape character to tell split to ignore particular :
> characters, and that wouldn't involve changing Html.pm. pod2html has been
> around for years and I'm frankly amazed that it appears not to work on
> Windows, which gives a strong feeling that this is user error and I'm
> missing something....
>
> Regards
>
> Ben Hayes
> Onsite Application Support Coordinator
> ExxonMobil Technical Computing Company / Upstream IT
> Upstream Technical Computing / UTC Applications / Application & Data
> Integration
> Esso Australia Pty Ltd
> Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia
> Phone: +61-3-9270-3538 Fax: +61-3-9270-3600  E-mail:
> benjamin.j.hayes at exxonmobil.com
>
>
>
>              Alfie John
>             <alfiejohn at gma
>             il.com>                                                    To
>                                       benjamin.j.hayes at exxonmobil.com
>                                                                         cc
>             02/08/2010               melbourne-pm at pm.org
>             02:59 PM                                              Subject
>                                      Re: [Melbourne-pm] pod2html problem
>                                      on Windows
>
>
>
>
>
>
>
>
>
>
> Hi Benjamin,
>
> In Pod::Html, it looks like the following line is the offender:    @Podpath
> = split(":", $opt_podpath) if defined $opt_podpath;
>
> If you want a quick fix, you can edit in place and get it working by
> looking at $^O to see what system you're on. Otherwise, submit a patch that
> does it more portably.
>
> Alfie
>
> On Mon, Aug 2, 2010 at 2:43 PM, <benjamin.j.hayes at exxonmobil.com> wrote:
>
>  Hi Perl Mongers,
>
>  I'm in the process of porting a build script from Solaris to Windows. The
>  script packages up a collection of Perl scripts for distribution on our
>  corporate network and one of the things it does it to build all the POD
>  into a nice, pretty set of html pages. The code all lives in my TFS
>  workspace on my c: drive and I'm having trouble with pod2html accepting a
>  path which contains a : (like in c:\). I discovered this is because
>  pod2html (in Pod::Html) tries to split the podpath on the : character, so
>  it crashes because it attempts to open a file called C. Of course this
>  all
>  worked perfectly on Solaris where file paths are sans : characters. I
>  tried
>  replacing C:\ with \\$ENV{COMPUTERNAME}\c$, but it appears that only
>  works
>  if you have admin rights on the machine, which in this instance I don't.
>  It
>  seems inconceivable to me that pod2html doesn't work on Windows and I
>  feel
>  there must be a simple solution, but I have not been able to find it. Can
>  anyone help?
>
>  Regards
>
>  Ben Hayes
>  Onsite Application Support Coordinator
>  ExxonMobil Technical Computing Company / Upstream IT
>  Upstream Technical Computing / UTC Applications / Application & Data
>  Integration
>  Esso Australia Pty Ltd
>  Room 5.36, 12 Riverside Quay, Southbank, VIC 3006, Australia
>  Phone: +61-3-9270-3538 Fax: +61-3-9270-3600  E-mail:
>  benjamin.j.hayes at exxonmobil.com
>
>  _______________________________________________
>  Melbourne-pm mailing list
>  Melbourne-pm at pm.org
>  http://mail.pm.org/mailman/listinfo/melbourne-pm
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/melbourne-pm/attachments/20100802/715c7adb/attachment-0001.html>

From toby.corkindale at strategicdata.com.au  Wed Aug  4 22:50:40 2010
From: toby.corkindale at strategicdata.com.au (Toby Corkindale)
Date: Thu, 05 Aug 2010 15:50:40 +1000
Subject: [Melbourne-pm] Melbourne Perl Mongers August meeting
Message-ID: <4C5A5130.2040706@strategicdata.com.au>

Good afternoon,

The next Melbourne Perl Mongers meeting will be held on Wednesday the 
11th of August at 6:30pm.
It will hosted by Strategic Data:

     Strategic Data
     Level 2
     51-55 Johnston Street
     Fitzroy   3065

(After this I think we will be moving back into the CBD for some meetings.)

Talks for Wednesday are:
  * David doing a talk on building and deploying native Win32 Perl 
applications, including the installers, IIS, user permissions, etc.

Has anyone else spoken to me about talks they could do?
I'd love to see one on Padre - if no-one else steps up, then I'll at 
least install it and give a 5 minute demo at the meeting.

After the meeting we will retire the the nearby "Standard" hotel on 
Fitzroy street.

Cheers,
Toby

From toby.corkindale at strategicdata.com.au  Tue Aug 10 18:01:47 2010
From: toby.corkindale at strategicdata.com.au (Toby Corkindale)
Date: Wed, 11 Aug 2010 11:01:47 +1000
Subject: [Melbourne-pm] Melbourne Perl Mongers TONIGHT!
Message-ID: <4C61F67B.70409@strategicdata.com.au>

Good morning, Mongers!

The Melbourne Perl Mongers meeting will be held TONIGHT at 6:30pm.
It will hosted by Strategic Data:

     Strategic Data
     Level 2
     51-55 Johnston Street
     Fitzroy   3065

We'll provide some refreshments.

Talks for Wednesday are:
  * David doing a talk on building and deploying native Win32 Perl
applications, including the installers, IIS, user permissions, etc.
  * Hamish will be taking us through the wonders of Padr?.

After the meeting we will retire the the nearby "Standard" hotel on
Fitzroy street.

Cheers,
Toby

From ddick at iinet.net.au  Wed Aug 11 05:31:51 2010
From: ddick at iinet.net.au (David Dick)
Date: Wed, 11 Aug 2010 22:31:51 +1000
Subject: [Melbourne-pm] Melbourne Perl Mongers TONIGHT!
In-Reply-To: <4C61F67B.70409@strategicdata.com.au>
References: <4C61F67B.70409@strategicdata.com.au>
Message-ID: <4C629837.7010801@iinet.net.au>

On 11/08/10 11:01, Toby Corkindale wrote:
> Talks for Wednesday are:
> * David doing a talk on building and deploying native Win32 Perl
> applications, including the installers, IIS, user permissions, etc.

My talk has been uploaded to 
http://perl.net.au/wiki/Melbourne_Perl_Mongers/Meeting_History_2010_08 
for future reference.

From david.tulloh at AirservicesAustralia.com  Wed Aug 18 23:52:41 2010
From: david.tulloh at AirservicesAustralia.com (Tulloh, David)
Date: Thu, 19 Aug 2010 16:52:41 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
Message-ID: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>

Dear List,

As part of my work I have built several modules to handle data files.
The idea is to hide the structure and messiness of the data file in a
nice reusable module.  This also allows the script to focus on the
processing rather than the data format.

Unfortunately while the method I have evolved towards meets these
objectives reasonably well I'm running into significant memory and speed
problems with large data files.  I have some ideas of ways to
restructure it to improve this but all involve some uncomfortable
compromises.

I was hoping some of the more experienced eyes on the list could look
over my approach and make a few suggestions.

Following is the basic module structure followed by usage examples.


David


package DataType;
use Moose;
use 5.010;
use MyTypes;

around BUILDARGS => sub {
	my ($orig, $class, $file) = @_;
	return $class->$orig(_file => $file);
};

has '_file' => (
	is       => 'ro',
	isa      => 'MyTypes::File', # File handle, IO handle or
filename
	coerce   => 1,
	required => 1,
	trigger  => \&_process_file,
);

sub _process_file {
	my ($this, $file) = @_;

	# Break file into entries

	$this->_set_rows([map {DataType::Entry->new($_)}
@entry_strings]);
}

# An easy optimisation is to store a hash of array refs where the
# key of the hash is the most commonly searched for string.  If
# there is no strong key candidate I just leave it as an array.

has '_rows' => (
	is      => 'ro',
	isa     => 'ArrayRef[DataType::Entry]',
	writer  => '_set_rows',
	default => sub {[]},
);

sub find {
	my ($this, %fields) = @_;

	my @possibles = @{$this->_rows};

	foreach my $k (keys %fields) {
		@possibles = grep {$_->$k ~~ $fields{$k}} @possibles;
	}

	return @possibles;
}

no Moose;
__PACKAGE__->meta->make_immutable;

package DataType::Entry;
use Moose;
use 5.010;

around BUILDARGS => sub {
	my ($orig, $class, $string) = @_;

	# Process string into structure

	return $class->$orig(%structure);
}

has [qw(field list)] => (
	is => 'ro',
);

no Moose;
__PACKAGE__->meta->make_immutable;


Examples of typical usage:

my $data = DataType->new($filename);

# Convert to a different data format
say join "\n", map {} sort {} map {} $data->find;

# Loop through all data
foreach ($data->find) {}

# loop through a subset
foreach ($data->find(destination => "YSSY")) {}

From toby.corkindale at strategicdata.com.au  Thu Aug 19 00:15:22 2010
From: toby.corkindale at strategicdata.com.au (Toby Corkindale)
Date: Thu, 19 Aug 2010 17:15:22 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
Message-ID: <4C6CDA0A.2090003@strategicdata.com.au>

On 19/08/10 16:52, Tulloh, David wrote:
> Dear List,
>
> As part of my work I have built several modules to handle data files.
> The idea is to hide the structure and messiness of the data file in a
> nice reusable module.  This also allows the script to focus on the
> processing rather than the data format.
>
> Unfortunately while the method I have evolved towards meets these
> objectives reasonably well I'm running into significant memory and speed
> problems with large data files.  I have some ideas of ways to
> restructure it to improve this but all involve some uncomfortable
> compromises.
>
> I was hoping some of the more experienced eyes on the list could look
> over my approach and make a few suggestions.

Suggestion 1:
Perhaps you should import the data file into a database, then let the 
database do all the hard work for you? By all means put a layer over the 
DB interface so as to make it nice for people to use.
You are running the risk of reinventing the wheel otherwise.

Suggestion 2:
If you want to stick with processing the file in situ, then you'll need 
to approach it with a streaming processor, rather than loading the whole 
thing into memory at once.
Are you familiar with that concept?

Cheers,
Toby

From david.tulloh at AirservicesAustralia.com  Thu Aug 19 00:35:22 2010
From: david.tulloh at AirservicesAustralia.com (Tulloh, David)
Date: Thu, 19 Aug 2010 17:35:22 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <4C6CDA0A.2090003@strategicdata.com.au>
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
	<4C6CDA0A.2090003@strategicdata.com.au>
Message-ID: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>

On 19/08/10 17:15, Toby Corkindale wrote:
> Suggestion 1:
> Perhaps you should import the data file into a database, then let the
database do all the hard work for you? By all means put a layer over the
DB interface so as to make it nice for people to use.
> You are running the risk of reinventing the wheel otherwise.
> 
> Suggestion 2:
> If you want to stick with processing the file in situ, then you'll
need to approach it with a streaming processor, rather than loading the
whole thing into memory at once.
> Are you familiar with that concept?

Thanks for the ideas.

My hesitation with the first suggestion is that a database felt like
overkill for what is normally simple data structures.  Ideally I would
like all the data to be permanently kept in a database but that's
unlikely to happen soon.  I'll have another look into temporary SQLite
databases as an option.

The catch with processing in situ is that often I want random access and
some file formats need at least one full pass (data and cancellation
entries for example).

The more I ponder the more I feel that my objectives are too broad for a
single solution.  Switching to a database for the complex messy data
sets and streaming for the simpler ones may be the ticket.  Possibly
with a file size check early on.


David

From sam at nipl.net  Sun Aug 22 18:14:27 2010
From: sam at nipl.net (Sam Watkins)
Date: Mon, 23 Aug 2010 01:14:27 +0000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
	<4C6CDA0A.2090003@strategicdata.com.au>
	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
Message-ID: <20100823011427.GB21113@nipl.net>

hi David,

When you say 'large' datasets, how large do you mean?  I did experiment with
using Perl for a toy full-text search system, it's quite capable to handle
medium sized datasets (maybe 500MB) and query and process them very quickly.

I think if you have datasets that are smaller than your RAM, and you don't
create too many unnecessary perl strings and objects, you should be able to
process everything in perl if you prefer to do it like that.  It may even
outperform a general relational database.

Say for example you have 6,000,000 objects each with 10 fields.  I would store
the objects on disk in the manner of Debian packages files:

	name: Sam
	email: sam at ai.ki
	
	name: Fred
	email: fred at yahoo.com
	

Text files, key-value pairs, records terminated with a blank line.

I'm not sure as I haven't tried this, but you might find that loading each
object into a single string, and parsing out the fields 'on demand' will save
you a lot of memory and the program will run faster.  IO and specifically
swapping is what will kill your performance.

You will also need to create indexes of course (perl hash tables).  If you are
really running out of RAM, you could compress objects using Compress::Zlib or
similar - or buy some more RAM!

I do like to use streaming systems where possible, but sometimes you want
Random access.  You could also look at creating your indexes in RAM, but
reading the object data from files, or perhaps using Berkerley DB for indexes
if your indexes become too big for RAM.  I'm not a big fan of SQL, but I do
like the mathemtical concept of relational databases.


Sam


From toby.corkindale at strategicdata.com.au  Sun Aug 22 18:49:17 2010
From: toby.corkindale at strategicdata.com.au (Toby Corkindale)
Date: Mon, 23 Aug 2010 11:49:17 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <20100823011427.GB21113@nipl.net>
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>	<4C6CDA0A.2090003@strategicdata.com.au>	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
	<20100823011427.GB21113@nipl.net>
Message-ID: <4C71D39D.90500@strategicdata.com.au>

On 23/08/10 11:14, Sam Watkins wrote:
> I think if you have datasets that are smaller than your RAM, and you don't
> create too many unnecessary perl strings and objects, you should be able to
> process everything in perl if you prefer to do it like that.  It may even
> outperform a general relational database.

Outperform, yes, but it won't scale well at all.


[snip example]
> I'm not sure as I haven't tried this, but you might find that loading each
> object into a single string, and parsing out the fields 'on demand' will save
> you a lot of memory and the program will run faster.

To both of you - I suggest you benchmark this suggestion before 
implementing your program around it. My intuition suggests you won't 
save that much memory with this approach. Perl scalars aren't as 
inefficient as you imagine.

> You will also need to create indexes of course (perl hash tables).  If you are
> really running out of RAM, you could compress objects using Compress::Zlib or
> similar - or buy some more RAM!

Or you could use a lightweight db or NoSQL system, which has already 
implemented those features for you.
Perhaps MongoDB or CouchDB would suit you?

You can keep buying ram in the short-term, but what happens when your 
dataset gets 10x bigger? You stop being able to economically install 
more ram quite quickly.. whereas using a scalable approach will enable 
you to process more data at no cost and a more linear increase in time.

> I do like to use streaming systems where possible, but sometimes you want
> Random access.  You could also look at creating your indexes in RAM, but
> reading the object data from files, or perhaps using Berkerley DB for indexes
> if your indexes become too big for RAM.  I'm not a big fan of SQL, but I do
> like the mathemtical concept of relational databases.

From adrian at ash-blue.org  Sun Aug 22 19:41:02 2010
From: adrian at ash-blue.org (Adrian Masters)
Date: Mon, 23 Aug 2010 12:41:02 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <20100823011427.GB21113@nipl.net>
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
	<4C6CDA0A.2090003@strategicdata.com.au>
	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
	<20100823011427.GB21113@nipl.net>
Message-ID: <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com>

David,

[snip]
> Say for example you have 6,000,000 objects each with 10 fields.  I would store
> the objects on disk in the manner of Debian packages files:
>
> 	name: Sam
> 	email: sam at ai.ki
>
> 	name: Fred
> 	email: fred at yahoo.com
>
>
> Text files, key-value pairs, records terminated with a blank line.
[snip]

If you went down this road and were considering exchanging data with others, I'd suggest using either JSON or YAML, as they model rich data
structures without the (full) overhead of XML. Doctrine & Propel frameworks for PHP use YAML for ORM schema & data representation.

If you want something fast, which parses the data file once, use a stream based approach. You could handle your complex field requirements using a
design pattern like SAX (see http://search.cpan.org/~grantm/XML-SAX-0.96/SAX/Intro.pod).

If you are going to query the parsed data more often than parsing it, a database is the way to go (as per the worthy suggestions previously).

If you want to go full geek, you could look at writing a BTree index for your file, and record characters position (1 index per use case) ;).

Adrian.


From toby.corkindale at strategicdata.com.au  Sun Aug 22 21:31:32 2010
From: toby.corkindale at strategicdata.com.au (Toby Corkindale)
Date: Mon, 23 Aug 2010 14:31:32 +1000
Subject: [Melbourne-pm] Fwd: WebDev Warehouse
Message-ID: <4C71F9A4.3010305@strategicdata.com.au>

Hey guys,
I'm not at all affiliated with them, but they asked to forward this on, 
and maybe it's useful to you.. Seems to be an incubator/shared-office place.
-Toby

---------- Forwarded message ----------
From: Shaun Moss <shaun at starmultimedia.biz>
Date: 21 August 2010 19:04
Subject: The Webdev Warehouse launch party

Hi guys

I've updated the site at http://webdevwarehouse.com/ again.  I added a 
"Stuff We Need" page (thanks Mark) and a photos gallery for those who 
didn't see the pics on facebook.

Here are the details of the launch party again - I really hope to see a 
few of you there, your support would mean a lot.  Did I mention free 
beer?  If you think you will come please let me know so I have an idea 
about how much beer and pizza to organise.

Location: 16a Linden Street, Brunswick East (see the website for a map 
and tram instructions)
Time: Thursday, 26th August, 2010 from 18:30

Thanks!  If you are on the relevant list, can you please forward this 
information on to the Melbourne Perlmongers, Mobile Monday, the 
Melbourne Ruby Group, and any other Melbourne-based coders groups who 
may be interested in something like this.  Much appreciated!

Shaun

From shaun at astromultimedia.com  Sun Aug 22 22:31:05 2010
From: shaun at astromultimedia.com (Shaun Moss)
Date: Mon, 23 Aug 2010 15:31:05 +1000
Subject: [Melbourne-pm] Fwd: WebDev Warehouse
In-Reply-To: <4C71F9A4.3010305@strategicdata.com.au>
References: <4C71F9A4.3010305@strategicdata.com.au>
Message-ID: <4C720799.6070000@astromultimedia.com>

  Hi guys

This email was from me - I've just joined this list (my knowledge of 
Perl is rudimentary, but I like it!)

Anyway if there are any freelancers out there looking for a place to 
work, please check out the website and feel free to come along on 
Thursday night.

Cheers,
Shaun


On 2010-08-23 14:31, Toby Corkindale wrote:
> Hey guys,
> I'm not at all affiliated with them, but they asked to forward this 
> on, and maybe it's useful to you.. Seems to be an 
> incubator/shared-office place.
> -Toby
>
> ---------- Forwarded message ----------
> From: Shaun Moss <shaun at starmultimedia.biz>
> Date: 21 August 2010 19:04
> Subject: The Webdev Warehouse launch party
>
> Hi guys
>
> I've updated the site at http://webdevwarehouse.com/ again.  I added a 
> "Stuff We Need" page (thanks Mark) and a photos gallery for those who 
> didn't see the pics on facebook.
>
> Here are the details of the launch party again - I really hope to see 
> a few of you there, your support would mean a lot.  Did I mention free 
> beer?  If you think you will come please let me know so I have an idea 
> about how much beer and pizza to organise.
>
> Location: 16a Linden Street, Brunswick East (see the website for a map 
> and tram instructions)
> Time: Thursday, 26th August, 2010 from 18:30
>
> Thanks!  If you are on the relevant list, can you please forward this 
> information on to the Melbourne Perlmongers, Mobile Monday, the 
> Melbourne Ruby Group, and any other Melbourne-based coders groups who 
> may be interested in something like this.  Much appreciated!
>
> Shaun
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>

From daniel at rimspace.net  Mon Aug 23 01:48:31 2010
From: daniel at rimspace.net (Daniel Pittman)
Date: Mon, 23 Aug 2010 18:48:31 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <4C71D39D.90500@strategicdata.com.au> (Toby Corkindale's message
	of "Mon, 23 Aug 2010 11:49:17 +1000")
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
	<4C6CDA0A.2090003@strategicdata.com.au>
	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
	<20100823011427.GB21113@nipl.net>
	<4C71D39D.90500@strategicdata.com.au>
Message-ID: <87iq31ww80.fsf@rimspace.net>

Toby Corkindale <toby.corkindale at strategicdata.com.au> writes:
> On 23/08/10 11:14, Sam Watkins wrote:
>
>> I think if you have datasets that are smaller than your RAM, and you don't
>> create too many unnecessary perl strings and objects, you should be able to
>> process everything in perl if you prefer to do it like that.  It may even
>> outperform a general relational database.
>
> Outperform, yes, but it won't scale well at all.

*nod*  Everything is easy, and every algorithm is sufficient, for data smaller
than core memory.  Given that 24 to 96 GB of memory is possible for a
dedicated home user today, that makes a lot of the old scaling problems go
away.

(Don't forget persistence, and hardware contention, though :)

[...]

>> You will also need to create indexes of course (perl hash tables).  If you are
>> really running out of RAM, you could compress objects using Compress::Zlib or
>> similar - or buy some more RAM!
>
> Or you could use a lightweight db or NoSQL system, which has already
> implemented those features for you.  Perhaps MongoDB or CouchDB would suit
> you?

For something like this I would also seriously consider Riak; the main
differences between Riak and the MongoDB/CouchDB models are in how they scale
across systems.  (Internal, invisible sharding vs replication, basically.)

They all use JavaScript based map/reduce as their inherent data mining tools,
and can generally deliver reasonably on exploiting data locally and the like.

        Daniel
-- 
? Daniel Pittman            ? daniel at rimspace.net            ? +61 401 155 707
               ? made with 100 percent post-consumer electrons

From scottp at dd.com.au  Mon Aug 23 03:54:59 2010
From: scottp at dd.com.au (Scott Penrose)
Date: Mon, 23 Aug 2010 20:54:59 +1000
Subject: [Melbourne-pm] Last chance for OSDC Presentations
Message-ID: <939274CF-D205-421C-869C-FFDFF1256492@dd.com.au>

Hi Melbourne PM Team

OSDC is in Melbourne this year ! You all know this. But we have had very few Perl talks, and I have not noticed any Melbourne PM people talking.

A golden opportunity to do a talk in Melbourne, not have to pay for travel, get free access to the conference.

But... today is the last day for proposals - remember, it is only a proposal - the idea, not the finished paper.

There has been lots of good talk, lots of good advice on Melbourne PM lately, lets see it as a talk.

Scott


From sam at nipl.net  Mon Aug 23 21:48:33 2010
From: sam at nipl.net (Sam Watkins)
Date: Tue, 24 Aug 2010 04:48:33 +0000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com>
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
	<4C6CDA0A.2090003@strategicdata.com.au>
	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
	<20100823011427.GB21113@nipl.net>
	<61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com>
Message-ID: <20100824044833.GA9921@nipl.net>

On Mon, Aug 23, 2010 at 12:41:02PM +1000, Adrian Masters wrote:
> David,
> 
> [snip]
> > Say for example you have 6,000,000 objects each with 10 fields.  I would store
> > the objects on disk in the manner of Debian packages files:
> >
> > 	name: Sam
> > 	email: sam at ai.ki
> >
> > 	name: Fred
> > 	email: fred at yahoo.com
> >
> >
> > Text files, key-value pairs, records terminated with a blank line.
> [snip]
> 
> If you went down this road and were considering exchanging data with others, I'd suggest using either JSON or YAML

The format I'm suggesting is like YAML-lite, without the kitchen sink, as used
in email and http headers.  The only addition over those is the blank-line as
record separator.  It's the same as debian package files.  I think it's more
than sufficient for practically any task, and it's an extremely Simple and
Readable format.  I don't know of a dataset that can't be expressed nicely like
this.  If you want more compactness, I would suggest going with TSV.

Other formats like XML and even YAML and JSON are unnecessarily
over-complicated in my opinion.  Simplicity, Clarity, Generality!!

  http://www.informit.com/ShowCover.aspx?isbn=020161586X

> If you want to go full geek, you could look at writing a BTree index for your file, and record characters position (1 index per use case) ;).

I like that method :)  The file is text, the BTree index can be regenerated
from the file.  I'd recommend using libdb4 for the index rather than coding
your own BTrees unless you'd like to do that.  The illustrious postfix does
something like this for its map files, well actually I think it creates binary
.db files from the text files, not indexes.

Although I do prefer to avoid them, It very likely would be much easier to use
an SQL database.

Sam

From sam at nipl.net  Mon Aug 23 21:54:16 2010
From: sam at nipl.net (Sam Watkins)
Date: Tue, 24 Aug 2010 04:54:16 +0000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <4C71D39D.90500@strategicdata.com.au>
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
	<4C6CDA0A.2090003@strategicdata.com.au>
	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
	<20100823011427.GB21113@nipl.net>
	<4C71D39D.90500@strategicdata.com.au>
Message-ID: <20100824045416.GB9921@nipl.net>

On Mon, Aug 23, 2010 at 11:49:17AM +1000, Toby Corkindale wrote:
> On 23/08/10 11:14, Sam Watkins wrote:
>> I think if you have datasets that are smaller than your RAM, and you don't
>> create too many unnecessary perl strings and objects, you should be able to
>> process everything in perl if you prefer to do it like that.  It may even
>> outperform a general relational database.
>
> Outperform, yes, but it won't scale well at all.

True, I guess it depends whether your database is growing faster than Moore's
law.  I could keep some basic data on 100 million users all in RAM on my 2GB
laptop.  (name, email, DOB, password).  Is the dataset bigger than that?

> Or you could use a lightweight db or NoSQL system, which has already  
> implemented those features for you.
> Perhaps MongoDB or CouchDB would suit you?

Speaking of 'NoSQL' has anyone used the 'nosql' package in Debian?
It provides a TSV based RDB system based on pipes and processors (unix-style
tools).  I really like this approach and prefer it compared to SQL databases.

You can do nice unixy things with this sort of textual database, such as diff
  <(sort db1/table1) <(sort db2/table2)

Sam

From daniel at rimspace.net  Mon Aug 23 23:05:38 2010
From: daniel at rimspace.net (Daniel Pittman)
Date: Tue, 24 Aug 2010 16:05:38 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <20100824044833.GA9921@nipl.net> (Sam Watkins's message of "Tue, 
	24 Aug 2010 04:48:33 +0000")
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
	<4C6CDA0A.2090003@strategicdata.com.au>
	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
	<20100823011427.GB21113@nipl.net>
	<61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com>
	<20100824044833.GA9921@nipl.net>
Message-ID: <87vd70sfyl.fsf@rimspace.net>

Sam Watkins <sam at nipl.net> writes:
> On Mon, Aug 23, 2010 at 12:41:02PM +1000, Adrian Masters wrote:
>> David,
>> 
>> [snip]
>> > Say for example you have 6,000,000 objects each with 10 fields.  I would store
>> > the objects on disk in the manner of Debian packages files:

[...]

>> > Text files, key-value pairs, records terminated with a blank line.
>> [snip]
>> 
>> If you went down this road and were considering exchanging data with others, I'd suggest using either JSON or YAML
>
> The format I'm suggesting is like YAML-lite, without the kitchen sink, as
> used in email and http headers.

Ah.  So, it is entirely insensitive to linear whitespace inline, are not
LWS-preserving, have a limit of 998 and 78 characters total and per-line,
possibly including or excluding LWS, in an implementation defined fashion,
have case-insensitive and ASCII-only keys, and contains only ASCII characters
without encoding in one of URL or RFC2047 MIME word format, then.

Right?

> The only addition over those is the blank-line as record separator.  It's
> the same as debian package files.

Once you add that it becomes clearer.  So, do you support the 'single period'
syntax for whitespace inside a line-folded record, and the optional non-folded
headers that Debian package control files do, or not?

[...]

> Other formats like XML and even YAML and JSON are unnecessarily
> over-complicated in my opinion.  Simplicity, Clarity, Generality!!

Sadly, without defining what you mean that very vague description doesn't
actually *specify* anything, just give a vague (and English/ASCII oriented)
hint in the general direction of what you were thinking.

Much as I hate, loath and detest much of the hype around it, the one thing
that XML got right (which, naturally, it inherited from SGML) is that it
actually specifies the details of how you process arbitrary data in that
format.

Most of the "simple" things either don't scale to cover the world, or don't
actually specify enough that you end up with crazy, crazy things.  (STOMP,
I am lookin' right at you, here.)

        Daniel

-- 
? Daniel Pittman            ? daniel at rimspace.net            ? +61 401 155 707
               ? made with 100 percent post-consumer electrons

From sam at nipl.net  Tue Aug 24 21:23:31 2010
From: sam at nipl.net (Sam Watkins)
Date: Wed, 25 Aug 2010 04:23:31 +0000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <87vd70sfyl.fsf@rimspace.net>
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
	<4C6CDA0A.2090003@strategicdata.com.au>
	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
	<20100823011427.GB21113@nipl.net>
	<61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com>
	<20100824044833.GA9921@nipl.net> <87vd70sfyl.fsf@rimspace.net>
Message-ID: <20100825042331.GA24581@nipl.net>

On Tue, Aug 24, 2010 at 04:05:38PM +1000, Daniel Pittman wrote:
> > The format I'm suggesting is like YAML-lite, without the kitchen sink, as
> > used in email and http headers.
> 
> Ah.  So, it is entirely insensitive to linear whitespace inline, are not
> LWS-preserving, have a limit of 998 and 78 characters total and per-line,
> possibly including or excluding LWS, in an implementation defined fashion,
> have case-insensitive and ASCII-only keys, and contains only ASCII characters
> without encoding in one of URL or RFC2047 MIME word format, then.
> 
> Right?

No.  I assume you're being sarcastic and attempting to demostrate how unsimple
the header formats are.  I am impressed by your knowledge anyway!  I use
something simpler than that.  If a particular application wants to reject long
lines or specify an encoding, that's not my concern.

> > The only addition over those is the blank-line as record separator.  It's
> > the same as debian package files.
> 
> Once you add that it becomes clearer.  So, do you support the 'single period'
> syntax for whitespace inside a line-folded record, and the optional non-folded
> headers that Debian package control files do, or not?

I think it's useful to support multi-line values.  The single period thing
sounds reasonable, but I would probably go with simplicity over readability and
just use a lone tab or indent to indicate a blank line in the middle of a
value, like this (a bad example as addresses seldom contain blank lines!):

address: Spry Street,
	Corburg North
	
	3058

Given that any more value lines after such a blank line must be indented, and
headers must not be indented, it's not really a visual problem to omit the
period.  The difficulty might be that some editors are reluctant to indent
blank lines, no big problem I think.

> > Other formats like XML and even YAML and JSON are unnecessarily
> > over-complicated in my opinion.  Simplicity, Clarity, Generality!!
> 
> Sadly, without defining what you mean that very vague description doesn't
> actually *specify* anything, just give a vague (and English/ASCII oriented)
> hint in the general direction of what you were thinking.

sure, this conversation is not a specification.  The format I have in mind is
crystal clear, simple and unambiguous, and I can supply parsers and formatters
for it in perl if you like.

> Much as I hate, loath and detest much of the hype around it, the one thing
> that XML got right (which, naturally, it inherited from SGML) is that it
> actually specifies the details of how you process arbitrary data in that
> format.

I do like plain simple XML for markup, that's what it's for.  I do not like it
as a hierarchical file format for storing records, that is a misuse of XML.

The format I'm describing can hold values with arbitrary binary data (or text
in any chosen encoding) without the need for any escaping or encoding.  This is
simple and comprehensive.  It would normally be used with utf-8 encoded keys
and data I suppose, but it would be acceptable to insert binary or
differently-encoded data for certain particular keys.  The application can
interpret the values however it wishes.

> Most of the "simple" things either don't scale to cover the world, or don't
> actually specify enough that you end up with crazy, crazy things.  (STOMP,
> I am lookin' right at you, here.)

So what are you saying, that I'm crazy, crazy?
Which of my things are 'crazy, crazy'?
Don't tell me they've got you maintaining some of my perl code?
I don't understand your apparent hostility, the 'maintainer' conjecture is the
only explanation that comes to mind.

I think a data format which can be produced and parsed in say 10 lines of code,
and is simple, clear and general, such a format is a lot less crazy that the
crock of complexity and featuritis which is full-blown XML.


Sam

From dsk_gr at hotmail.com  Tue Aug 24 22:39:11 2010
From: dsk_gr at hotmail.com (Kostas Avlonitis)
Date: Wed, 25 Aug 2010 15:39:11 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <20100825042331.GA24581@nipl.net>
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>	<4C6CDA0A.2090003@strategicdata.com.au>	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>	<20100823011427.GB21113@nipl.net>	<61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com>	<20100824044833.GA9921@nipl.net>
	<87vd70sfyl.fsf@rimspace.net> <20100825042331.GA24581@nipl.net>
Message-ID: <4C74AC7F.4060900@hotmail.com>

[snip]
>> I am lookin' right at you, here.)
>>      
>
[snip]

Wow. Probably some background I'm not aware of.
However I don't know when was the last time anyone convinced anyone else 
using irony and personal calling-out as a method - unless they're 
playing for the audience - or perhaps it didn't come across as intended.

I think Sam's concept of a flat file is valid for a single-user setup 
even with even a medium volume of data.
However there are potential problems:
Normalisation and maintenance may become issues if the types are not 
strictly handled by the app or in a loose multi-programmer environment. 
Also, as previous posters said, probably not easily scalable as the 
back-end of even a medium multi-end-user setup (locking records, 
cross-referencing it with other data, adding fields, indexing, massive 
data-growth, multiple data-maintainers, reporting additions etc). DBs 
are a pain but not as much a pain as maintaining files in my experience.

I guess it depends on the scale and breadth of the application, but I'm 
putting my vote on the side of when-in-doubt use a DB.

Kostas

From daniel at rimspace.net  Wed Aug 25 05:48:25 2010
From: daniel at rimspace.net (Daniel Pittman)
Date: Wed, 25 Aug 2010 22:48:25 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <20100825042331.GA24581@nipl.net> (Sam Watkins's message of "Wed, 
	25 Aug 2010 04:23:31 +0000")
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
	<4C6CDA0A.2090003@strategicdata.com.au>
	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
	<20100823011427.GB21113@nipl.net>
	<61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com>
	<20100824044833.GA9921@nipl.net> <87vd70sfyl.fsf@rimspace.net>
	<20100825042331.GA24581@nipl.net>
Message-ID: <87tymisvs6.fsf@rimspace.net>

Sam Watkins <sam at nipl.net> writes:
> On Tue, Aug 24, 2010 at 04:05:38PM +1000, Daniel Pittman wrote:
>> > The format I'm suggesting is like YAML-lite, without the kitchen sink, as
>> > used in email and http headers.
>>
>> Ah.  So, it is entirely insensitive to linear whitespace inline, are not
>> LWS-preserving, have a limit of 998 and 78 characters total and per-line,
>> possibly including or excluding LWS, in an implementation defined fashion,
>> have case-insensitive and ASCII-only keys, and contains only ASCII characters
>> without encoding in one of URL or RFC2047 MIME word format, then.
>>
>> Right?
>
> No.  I assume you're being sarcastic and attempting to demostrate how
> unsimple the header formats are.

I think mostly bitter, because "simple" formats usually don't turn out to be,
and like CSV this is one of my least favorite. :)

> I am impressed by your knowledge anyway!  I use something simpler than that.
> If a particular application wants to reject long lines or specify an
> encoding, that's not my concern.

*nod*  My point was, in part, that it isn't as simple as it sounds, because
HTTP headers and Email headers have a whole lot of really weird properties
as a result of their history.

So, yeah: for your own use, not a problem.  Any problem is easy when you don't
have to interoperate.  It gets tricky when you add other people, because you
never know which out of those we both might thing were in or out unless we
actually discussed it. :)

[...]

>> > Other formats like XML and even YAML and JSON are unnecessarily
>> > over-complicated in my opinion.  Simplicity, Clarity, Generality!!
>>
>> Sadly, without defining what you mean that very vague description doesn't
>> actually *specify* anything, just give a vague (and English/ASCII oriented)
>> hint in the general direction of what you were thinking.
>
> sure, this conversation is not a specification.  The format I have in mind
> is crystal clear, simple and unambiguous, and I can supply parsers and
> formatters for it in perl if you like.

Nah: just make sure that, if you are documenting it, you do supply a strict
specification with it ? because it is harder than it sounds.

>> Much as I hate, loath and detest much of the hype around it, the one thing
>> that XML got right (which, naturally, it inherited from SGML) is that it
>> actually specifies the details of how you process arbitrary data in that
>> format.
>
> I do like plain simple XML for markup, that's what it's for.  I do not like it
> as a hierarchical file format for storing records, that is a misuse of XML.

*nod*  SGML is terrible for structuring data.  It is wonderful for doing basic
markup, though, which coincidentally is what it was designed for initially.
Who would have thought?

[...]

>> Most of the "simple" things either don't scale to cover the world, or don't
>> actually specify enough that you end up with crazy, crazy things.  (STOMP,
>> I am lookin' right at you, here.)
>
> So what are you saying, that I'm crazy, crazy?
> Which of my things are 'crazy, crazy'?

Ah, no.  Sorry.  I was absolutely not calling you crazy, and I am sorry that
I wasn't clear about that.

No, I was calling the situation that grew up around STOMP crazy: because the
specification was so loose, and poor, you end up with a whole lot of versions
that don't work together, and all sorts of conventions you need to understand
to make it work that are not in the "spec", but are in most real-world
implementations.

At that point you don't have any more a *simple* messaging protocol, but a
crazy mess full of work-arounds and other nasty stuff.

[...]

> I think a data format which can be produced and parsed in say 10 lines of
> code, and is simple, clear and general, such a format is a lot less crazy
> that the crock of complexity and featuritis which is full-blown XML.

Almost certainly.  The trick is getting everyone who works with that data to
agree on the *same* ten lines of code, and their interpretation. ;)

        Daniel
-- 
? Daniel Pittman            ? daniel at rimspace.net            ? +61 401 155 707
               ? made with 100 percent post-consumer electrons

From dsk_gr at hotmail.com  Wed Aug 25 06:07:03 2010
From: dsk_gr at hotmail.com (Kostas Avlonitis)
Date: Wed, 25 Aug 2010 23:07:03 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <184DF3C7-F44F-464A-8731-D31419FF9105@strategicdata.com.au>
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>	<4C6CDA0A.2090003@strategicdata.com.au>	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>	<20100823011427.GB21113@nipl.net>	<61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com>	<20100824044833.GA9921@nipl.net>
	<87vd70sfyl.fsf@rimspace.net> <20100825042331.GA24581@nipl.net>
	<4C74AC7F.4060900@hotmail.com>
	<184DF3C7-F44F-464A-8731-D31419FF9105@strategicdata.com.au>
Message-ID: <4C751577.3010903@hotmail.com>

On 25/08/2010 5:26 PM, Adam Clarke wrote:

[snip]
> The original quote was " (STOMP, I am lookin' right at you, here.)" So I think you'll find that the thing being looked at was STOMP not Sam.
>
> http://stomp.codehaus.org/
>
> I suspect that STOMP couldn't care less :)
>
> Cheers
>
> --
> Adam Clarke
> www.strategicdata.com.au
>
>    
...ooops, that's embarrassing. I have to apologise to Daniel and the 
list here.
Was not aware of the STOMP protocol - thought it was some kind of 
aggressive, pounding emphasis to the sentence.
I'll now go back to lurking.

K.

From daniel at rimspace.net  Wed Aug 25 22:06:21 2010
From: daniel at rimspace.net (Daniel Pittman)
Date: Thu, 26 Aug 2010 15:06:21 +1000
Subject: [Melbourne-pm] Designing modules to handle large data files
In-Reply-To: <4C751577.3010903@hotmail.com> (Kostas Avlonitis's message of
	"Wed, 25 Aug 2010 23:07:03 +1000")
References: <BAB09FDA79C9D84EA87A53C2F956ACD0034AFF16@cxcb04.asanet.prd.airservices.gov.au>
	<4C6CDA0A.2090003@strategicdata.com.au>
	<BAB09FDA79C9D84EA87A53C2F956ACD0034AFF61@cxcb04.asanet.prd.airservices.gov.au>
	<20100823011427.GB21113@nipl.net>
	<61f4cf2a76a8fc138f4609a857cdfcb3.squirrel@webmail.bella.lunarpages.com>
	<20100824044833.GA9921@nipl.net> <87vd70sfyl.fsf@rimspace.net>
	<20100825042331.GA24581@nipl.net> <4C74AC7F.4060900@hotmail.com>
	<184DF3C7-F44F-464A-8731-D31419FF9105@strategicdata.com.au>
	<4C751577.3010903@hotmail.com>
Message-ID: <878w3u7yk2.fsf@rimspace.net>

Kostas Avlonitis <dsk_gr at hotmail.com> writes:
> On 25/08/2010 5:26 PM, Adam Clarke wrote:
>
> [snip]
>> The original quote was " (STOMP, I am lookin' right at you, here.)" So I
>> think you'll find that the thing being looked at was STOMP not Sam.
>>
>> http://stomp.codehaus.org/
>>
>> I suspect that STOMP couldn't care less :)

[...]

> ...ooops, that's embarrassing. I have to apologise to Daniel and the list
> here.  Was not aware of the STOMP protocol - thought it was some kind of
> aggressive, pounding emphasis to the sentence.  I'll now go back to lurking.

Hey, don't be embarrassed: I managed to completely miscommunicate my
intentions and all, so your error was trivial by comparison.

        Daniel

-- 
? Daniel Pittman            ? daniel at rimspace.net            ? +61 401 155 707
               ? made with 100 percent post-consumer electrons

From toby.corkindale at strategicdata.com.au  Mon Aug 30 00:37:28 2010
From: toby.corkindale at strategicdata.com.au (Toby Corkindale)
Date: Mon, 30 Aug 2010 17:37:28 +1000
Subject: [Melbourne-pm] Melbourne Perl Mongers September meeting
In-Reply-To: <4C5A5130.2040706@strategicdata.com.au>
References: <4C5A5130.2040706@strategicdata.com.au>
Message-ID: <4C7B5FB8.9080207@strategicdata.com.au>

Good evening,

The next Melbourne Perl Mongers meeting will be held on Wednesday the 
8th of August, at 6:30pm.
It will be hosted by David Dick at Remasys.

   Remasys Pty Ltd
   Level 1
   180 Flinders St
   MELBOURNE VIC 3121

I don't think we have any talks lined up yet..
Does anyone have a topic they would like to speak about?


Thanks,
Toby