APM: Perl, Win32 OLE, and Excel
Randall Smith
rhennig at gmail.com
Thu Jul 23 13:33:12 PDT 2009
Is it possible for you to export the XLS file to a CSV and then process it
that way without having to go through the OLE modules? I used to process
Word documents using Perl and at a certain point I would run into issues
with the OLE, or it would just take a long time since Perl was spending most
of its time actually waiting on the OLE stuff to do its thing. If you can
export it to a CSV. I haven't had issues with processing just CSV data.
If you do need to write things back to a destination of some sort while
you're processing, maybe importing it into a databse (MySQL, PostgreSQL)
might be good, since you could create a database with a table holding the
data you're processing and then create whatever other tables you need to
store the results of your work. You could then dump the final product out
into a CSV file (or files) and reprocess it as need be.
Randy
On Thu, Jul 23, 2009 at 2:21 PM, Eric Ellington <e.ellington at gmail.com>wrote:
> I used to do this a bunch. You mention 133k rows. Excel used to max
> out around something like 65k rows. Maybe I am out of date but how is
> so much data crammed into a single worksheet?
>
> What packages are you using?
>
> Thanks,
>
> Eric
>
> On Thu, Jul 23, 2009 at 12:20 PM, John Warner<jwarner at texas.net> wrote:
> > All,
> >
> > I have a project where I am trying to filter through a large amount of
> data
> > from an Excel spreadsheet. Since I don't have access to the databases
> where
> > the data actually resides, I have to use a spreadsheet that was given to
> me.
> > The spreadsheet contains 79 columns and approximately 113k rows. The
> data
> > are customer satisfaction survey results along with a plethora of other
> > garbage I don't need. I am only interested in a few columns.
> >
> > My code goes like this...
> >
> > Create an Excel Object
> > Use Object to open Source and Destination spreadsheets
> > Find the column and row boundaries of where data is within the source.
> >
> > my @ArrayOfNames = ('Bill', 'Bob', 'Jane', 'Tom', 'Dick', 'Harry');
> >
> > #Columns
> > # Source Destination Description
> > # Column Column
> > # 28 3
> Responsible
> > Tech
> > # 55 5 Tech Sat
> > Score
> > # 57 6 Overall
> Sat
> > Score
> > #
> > foreach my $row (2..$LastRow) #skip header row on row 1
> > {
> > #check the responsible tech
> > foreach my $t (@ArrayOfNames)
> > {
> > my $cellObj = $srcSheet->Cells($row,28);
> > print "Current: $t \t Incident tech: $cellObj->{Value}
> ";
> >
> > if ($t =~ m/$srcSheet->Cells($row,28)->{Value}/)
> > {
> > print "found a match!\n";
> > if ($srcSheet->Cells($row,55)->{Value} < 7 ||
> > $srcSheet->Cells($row,57)->{Value} < 7)
> > {
> > #copy data from source to destination
> > }
> > }else{
> > #print "not a match \n";
> > next;
> > }
> > }
> > }
> >
> > My question: With 113k rows to go through, Perl runs out of memory and
> the
> > processing takes quite a while. How can I be more efficient?
> >
> >
> > John Warner
> > jwarner at texas.net
> > H: 512.251.1270
> > C: 512.426.3813
> >
> >
> >
> >
> > _______________________________________________
> > Austin mailing list
> > Austin at pm.org
> > http://mail.pm.org/mailman/listinfo/austin
> >
>
>
>
> --
> Eric Ellington
> e.ellington at gmail.com
> _______________________________________________
> Austin mailing list
> Austin at pm.org
> http://mail.pm.org/mailman/listinfo/austin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/austin/attachments/20090723/51b9b693/attachment.html>
More information about the Austin
mailing list