APM: Perl, Win32 OLE, and Excel

Thu Jul 23 13:33:12 PDT 2009

Is it possible for you to export the XLS file to a CSV and then process it
that way without having to go through the OLE modules?  I used to process
Word documents using Perl and at a certain point I would run into issues
with the OLE, or it would just take a long time since Perl was spending most
of its time actually waiting on the OLE stuff to do its thing.  If you can
export it to a CSV.  I haven't had issues with processing just CSV data.

If you do need to write things back to a destination of some sort while
you're processing, maybe importing it into a databse (MySQL, PostgreSQL)
might be good, since you could create a database with a table holding the
data you're processing and then create whatever other tables you need to
store the results of your work.  You could then dump the final product out
into a CSV file (or files) and reprocess it as need be.

Randy

On Thu, Jul 23, 2009 at 2:21 PM, Eric Ellington <e.ellington at gmail.com>wrote:

> I used to do this a bunch. You mention 133k rows. Excel used to max
> out around something like 65k rows. Maybe I am out of date but how is
> so much data crammed into a single worksheet?
>
> What packages are you using?
>
> Thanks,
>
> Eric
>
> On Thu, Jul 23, 2009 at 12:20 PM, John Warner<jwarner at texas.net> wrote:
> > All,
> >
> > I have a project where I am trying to filter through a large amount of
> data
> > from an Excel spreadsheet.  Since I don't have access to the databases
> where
> > the data actually resides, I have to use a spreadsheet that was given to
> me.
> > The spreadsheet contains 79 columns and approximately 113k rows.  The
> data
> > are customer satisfaction survey results along with a plethora of other
> > garbage I don't need.  I am only interested in a few columns.
> >
> > My code goes like this...
> >
> > Create an Excel Object
> > Use Object to open Source and Destination spreadsheets
> > Find the column and row boundaries of where data is within the source.
> >
> > my @ArrayOfNames = ('Bill', 'Bob', 'Jane', 'Tom', 'Dick', 'Harry');
> >
> > #Columns
> > #       Source          Destination             Description
> > #       Column          Column
> > #               28                      3
> Responsible
> > Tech
> > #               55                      5                       Tech Sat
> > Score
> > #               57                      6                       Overall
> Sat
> > Score
> > #
> > foreach my $row (2..$LastRow) #skip header row on row 1
> > {
> >        #check the responsible tech
> >        foreach my $t (@ArrayOfNames)
> >        {
> >                my $cellObj = $srcSheet->Cells($row,28);
> >                print "Current:  $t  \t Incident tech:  $cellObj->{Value}
> ";
> >
> >                if ($t =~ m/$srcSheet->Cells($row,28)->{Value}/)
> >                {
> >                        print "found a match!\n";
> >                        if ($srcSheet->Cells($row,55)->{Value} < 7 ||
> > $srcSheet->Cells($row,57)->{Value} < 7)
> >                        {
> >                                #copy data from source to destination
> >                        }
> >                }else{
> >                        #print "not a match \n";
> >                        next;
> >                }
> >        }
> > }
> >
> > My question:  With 113k rows to go through, Perl runs out of memory and
> the
> > processing takes quite a while.  How can I be more efficient?
> >
> >
> > John Warner
> > jwarner at texas.net
> > H:  512.251.1270
> > C:  512.426.3813
> >
> >
> >
> >
> > _______________________________________________
> > Austin mailing list
> > Austin at pm.org
> > http://mail.pm.org/mailman/listinfo/austin
> >
>
>
>
> --
> Eric Ellington
> e.ellington at gmail.com
> _______________________________________________
> Austin mailing list
> Austin at pm.org
> http://mail.pm.org/mailman/listinfo/austin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/austin/attachments/20090723/51b9b693/attachment.html>