APM: Perl, Win32 OLE, and Excel
John Warner
jwarner at texas.net
Mon Jul 27 08:10:26 PDT 2009
Thank you all for the suggestions. I will try them out later this week. I
especially like the idea of converting to CSV as it would be nice to work
with something other than a 75MB Excel spreadsheet. Getting rid of the
array with the names helped a lot but given that I am only using Win32::OLE
and Win32::OLE::Const, I think bypassing OLE will give me more bang for the
buck.
Thanks!
John
From: austin-bounces+jwarner=texas.net at pm.org
[mailto:austin-bounces+jwarner=texas.net at pm.org] On Behalf Of Randall Smith
Sent: Thursday, July 23, 2009 3:33 PM
To: Eric Ellington
Cc: Austin: pm.org
Subject: Re: APM: Perl, Win32 OLE, and Excel
Is it possible for you to export the XLS file to a CSV and then process it
that way without having to go through the OLE modules? I used to process
Word documents using Perl and at a certain point I would run into issues
with the OLE, or it would just take a long time since Perl was spending most
of its time actually waiting on the OLE stuff to do its thing. If you can
export it to a CSV. I haven't had issues with processing just CSV data.
If you do need to write things back to a destination of some sort while
you're processing, maybe importing it into a databse (MySQL, PostgreSQL)
might be good, since you could create a database with a table holding the
data you're processing and then create whatever other tables you need to
store the results of your work. You could then dump the final product out
into a CSV file (or files) and reprocess it as need be.
Randy
On Thu, Jul 23, 2009 at 2:21 PM, Eric Ellington <e.ellington at gmail.com>
wrote:
I used to do this a bunch. You mention 133k rows. Excel used to max
out around something like 65k rows. Maybe I am out of date but how is
so much data crammed into a single worksheet?
What packages are you using?
Thanks,
Eric
On Thu, Jul 23, 2009 at 12:20 PM, John Warner<jwarner at texas.net> wrote:
> All,
>
> I have a project where I am trying to filter through a large amount of
data
> from an Excel spreadsheet. Since I don't have access to the databases
where
> the data actually resides, I have to use a spreadsheet that was given to
me.
> The spreadsheet contains 79 columns and approximately 113k rows. The data
> are customer satisfaction survey results along with a plethora of other
> garbage I don't need. I am only interested in a few columns.
>
> My code goes like this...
>
> Create an Excel Object
> Use Object to open Source and Destination spreadsheets
> Find the column and row boundaries of where data is within the source.
>
> my @ArrayOfNames = ('Bill', 'Bob', 'Jane', 'Tom', 'Dick', 'Harry');
>
> #Columns
> # Source Destination Description
> # Column Column
> # 28 3
Responsible
> Tech
> # 55 5 Tech Sat
> Score
> # 57 6 Overall
Sat
> Score
> #
> foreach my $row (2..$LastRow) #skip header row on row 1
> {
> #check the responsible tech
> foreach my $t (@ArrayOfNames)
> {
> my $cellObj = $srcSheet->Cells($row,28);
> print "Current: $t \t Incident tech: $cellObj->{Value}
";
>
> if ($t =~ m/$srcSheet->Cells($row,28)->{Value}/)
> {
> print "found a match!\n";
> if ($srcSheet->Cells($row,55)->{Value} < 7 ||
> $srcSheet->Cells($row,57)->{Value} < 7)
> {
> #copy data from source to destination
> }
> }else{
> #print "not a match \n";
> next;
> }
> }
> }
>
> My question: With 113k rows to go through, Perl runs out of memory and
the
> processing takes quite a while. How can I be more efficient?
>
>
> John Warner
> jwarner at texas.net
> H: 512.251.1270
> C: 512.426.3813
>
>
>
>
> _______________________________________________
> Austin mailing list
> Austin at pm.org
> http://mail.pm.org/mailman/listinfo/austin
>
--
Eric Ellington
e.ellington at gmail.com
_______________________________________________
Austin mailing list
Austin at pm.org
http://mail.pm.org/mailman/listinfo/austin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/austin/attachments/20090727/47611c87/attachment.html>
More information about the Austin
mailing list