[Melbourne-pm] Designing modules to handle large data files
Tulloh, David
david.tulloh at AirservicesAustralia.com
Wed Aug 18 23:52:41 PDT 2010
Dear List,
As part of my work I have built several modules to handle data files.
The idea is to hide the structure and messiness of the data file in a
nice reusable module. This also allows the script to focus on the
processing rather than the data format.
Unfortunately while the method I have evolved towards meets these
objectives reasonably well I'm running into significant memory and speed
problems with large data files. I have some ideas of ways to
restructure it to improve this but all involve some uncomfortable
compromises.
I was hoping some of the more experienced eyes on the list could look
over my approach and make a few suggestions.
Following is the basic module structure followed by usage examples.
David
package DataType;
use Moose;
use 5.010;
use MyTypes;
around BUILDARGS => sub {
my ($orig, $class, $file) = @_;
return $class->$orig(_file => $file);
};
has '_file' => (
is => 'ro',
isa => 'MyTypes::File', # File handle, IO handle or
filename
coerce => 1,
required => 1,
trigger => \&_process_file,
);
sub _process_file {
my ($this, $file) = @_;
# Break file into entries
$this->_set_rows([map {DataType::Entry->new($_)}
@entry_strings]);
}
# An easy optimisation is to store a hash of array refs where the
# key of the hash is the most commonly searched for string. If
# there is no strong key candidate I just leave it as an array.
has '_rows' => (
is => 'ro',
isa => 'ArrayRef[DataType::Entry]',
writer => '_set_rows',
default => sub {[]},
);
sub find {
my ($this, %fields) = @_;
my @possibles = @{$this->_rows};
foreach my $k (keys %fields) {
@possibles = grep {$_->$k ~~ $fields{$k}} @possibles;
}
return @possibles;
}
no Moose;
__PACKAGE__->meta->make_immutable;
package DataType::Entry;
use Moose;
use 5.010;
around BUILDARGS => sub {
my ($orig, $class, $string) = @_;
# Process string into structure
return $class->$orig(%structure);
}
has [qw(field list)] => (
is => 'ro',
);
no Moose;
__PACKAGE__->meta->make_immutable;
Examples of typical usage:
my $data = DataType->new($filename);
# Convert to a different data format
say join "\n", map {} sort {} map {} $data->find;
# Loop through all data
foreach ($data->find) {}
# loop through a subset
foreach ($data->find(destination => "YSSY")) {}
More information about the Melbourne-pm
mailing list