APM: Perl, Win32 OLE, and Excel

John Warner jwarner at texas.net
Mon Jul 27 08:10:26 PDT 2009


Thank you all for the suggestions.  I will try them out later this week.  I
especially like the idea of converting to CSV as it would be nice to work
with something other than a 75MB Excel spreadsheet.  Getting rid of the
array with the names helped a lot but given that I am only using Win32::OLE
and Win32::OLE::Const, I think bypassing OLE will give me more bang for the
buck.

 

Thanks!

 

John

From: austin-bounces+jwarner=texas.net at pm.org
[mailto:austin-bounces+jwarner=texas.net at pm.org] On Behalf Of Randall Smith
Sent: Thursday, July 23, 2009 3:33 PM
To: Eric Ellington
Cc: Austin: pm.org
Subject: Re: APM: Perl, Win32 OLE, and Excel

 

Is it possible for you to export the XLS file to a CSV and then process it
that way without having to go through the OLE modules?  I used to process
Word documents using Perl and at a certain point I would run into issues
with the OLE, or it would just take a long time since Perl was spending most
of its time actually waiting on the OLE stuff to do its thing.  If you can
export it to a CSV.  I haven't had issues with processing just CSV data.  

If you do need to write things back to a destination of some sort while
you're processing, maybe importing it into a databse (MySQL, PostgreSQL)
might be good, since you could create a database with a table holding the
data you're processing and then create whatever other tables you need to
store the results of your work.  You could then dump the final product out
into a CSV file (or files) and reprocess it as need be.

Randy

On Thu, Jul 23, 2009 at 2:21 PM, Eric Ellington <e.ellington at gmail.com>
wrote:

I used to do this a bunch. You mention 133k rows. Excel used to max
out around something like 65k rows. Maybe I am out of date but how is
so much data crammed into a single worksheet?

What packages are you using?

Thanks,

Eric


On Thu, Jul 23, 2009 at 12:20 PM, John Warner<jwarner at texas.net> wrote:
> All,
>
> I have a project where I am trying to filter through a large amount of
data
> from an Excel spreadsheet.  Since I don't have access to the databases
where
> the data actually resides, I have to use a spreadsheet that was given to
me.
> The spreadsheet contains 79 columns and approximately 113k rows.  The data
> are customer satisfaction survey results along with a plethora of other
> garbage I don't need.  I am only interested in a few columns.
>
> My code goes like this...
>
> Create an Excel Object
> Use Object to open Source and Destination spreadsheets
> Find the column and row boundaries of where data is within the source.
>
> my @ArrayOfNames = ('Bill', 'Bob', 'Jane', 'Tom', 'Dick', 'Harry');
>
> #Columns
> #       Source          Destination             Description
> #       Column          Column
> #               28                      3
Responsible
> Tech
> #               55                      5                       Tech Sat
> Score
> #               57                      6                       Overall
Sat
> Score
> #
> foreach my $row (2..$LastRow) #skip header row on row 1
> {
>        #check the responsible tech
>        foreach my $t (@ArrayOfNames)
>        {
>                my $cellObj = $srcSheet->Cells($row,28);
>                print "Current:  $t  \t Incident tech:  $cellObj->{Value}
";
>
>                if ($t =~ m/$srcSheet->Cells($row,28)->{Value}/)
>                {
>                        print "found a match!\n";
>                        if ($srcSheet->Cells($row,55)->{Value} < 7 ||
> $srcSheet->Cells($row,57)->{Value} < 7)
>                        {
>                                #copy data from source to destination
>                        }
>                }else{
>                        #print "not a match \n";
>                        next;
>                }
>        }
> }
>
> My question:  With 113k rows to go through, Perl runs out of memory and
the
> processing takes quite a while.  How can I be more efficient?
>
>
> John Warner
> jwarner at texas.net
> H:  512.251.1270
> C:  512.426.3813
>
>
>
>
> _______________________________________________
> Austin mailing list
> Austin at pm.org
> http://mail.pm.org/mailman/listinfo/austin
>




--
Eric Ellington
e.ellington at gmail.com

_______________________________________________
Austin mailing list
Austin at pm.org
http://mail.pm.org/mailman/listinfo/austin

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/austin/attachments/20090727/47611c87/attachment.html>


More information about the Austin mailing list