Is it possible for you to export the XLS file to a CSV and then process it that way without having to go through the OLE modules? I used to process Word documents using Perl and at a certain point I would run into issues with the OLE, or it would just take a long time since Perl was spending most of its time actually waiting on the OLE stuff to do its thing. If you can export it to a CSV. I haven't had issues with processing just CSV data. <br>
<br>If you do need to write things back to a destination of some sort while you're processing, maybe importing it into a databse (MySQL, PostgreSQL) might be good, since you could create a database with a table holding the data you're processing and then create whatever other tables you need to store the results of your work. You could then dump the final product out into a CSV file (or files) and reprocess it as need be.<br>
<br>Randy<br><br><div class="gmail_quote">On Thu, Jul 23, 2009 at 2:21 PM, Eric Ellington <span dir="ltr"><<a href="mailto:e.ellington@gmail.com">e.ellington@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I used to do this a bunch. You mention 133k rows. Excel used to max<br>
out around something like 65k rows. Maybe I am out of date but how is<br>
so much data crammed into a single worksheet?<br>
<br>
What packages are you using?<br>
<br>
Thanks,<br>
<br>
Eric<br>
<div><div></div><div class="h5"><br>
On Thu, Jul 23, 2009 at 12:20 PM, John Warner<<a href="mailto:jwarner@texas.net">jwarner@texas.net</a>> wrote:<br>
> All,<br>
><br>
> I have a project where I am trying to filter through a large amount of data<br>
> from an Excel spreadsheet. Since I don't have access to the databases where<br>
> the data actually resides, I have to use a spreadsheet that was given to me.<br>
> The spreadsheet contains 79 columns and approximately 113k rows. The data<br>
> are customer satisfaction survey results along with a plethora of other<br>
> garbage I don't need. I am only interested in a few columns.<br>
><br>
> My code goes like this...<br>
><br>
> Create an Excel Object<br>
> Use Object to open Source and Destination spreadsheets<br>
> Find the column and row boundaries of where data is within the source.<br>
><br>
> my @ArrayOfNames = ('Bill', 'Bob', 'Jane', 'Tom', 'Dick', 'Harry');<br>
><br>
> #Columns<br>
> # Source Destination Description<br>
> # Column Column<br>
> # 28 3 Responsible<br>
> Tech<br>
> # 55 5 Tech Sat<br>
> Score<br>
> # 57 6 Overall Sat<br>
> Score<br>
> #<br>
> foreach my $row (2..$LastRow) #skip header row on row 1<br>
> {<br>
> #check the responsible tech<br>
> foreach my $t (@ArrayOfNames)<br>
> {<br>
> my $cellObj = $srcSheet->Cells($row,28);<br>
> print "Current: $t \t Incident tech: $cellObj->{Value} ";<br>
><br>
> if ($t =~ m/$srcSheet->Cells($row,28)->{Value}/)<br>
> {<br>
> print "found a match!\n";<br>
> if ($srcSheet->Cells($row,55)->{Value} < 7 ||<br>
> $srcSheet->Cells($row,57)->{Value} < 7)<br>
> {<br>
> #copy data from source to destination<br>
> }<br>
> }else{<br>
> #print "not a match \n";<br>
> next;<br>
> }<br>
> }<br>
> }<br>
><br>
> My question: With 113k rows to go through, Perl runs out of memory and the<br>
> processing takes quite a while. How can I be more efficient?<br>
><br>
><br>
> John Warner<br>
> <a href="mailto:jwarner@texas.net">jwarner@texas.net</a><br>
> H: 512.251.1270<br>
> C: 512.426.3813<br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Austin mailing list<br>
> <a href="mailto:Austin@pm.org">Austin@pm.org</a><br>
> <a href="http://mail.pm.org/mailman/listinfo/austin" target="_blank">http://mail.pm.org/mailman/listinfo/austin</a><br>
><br>
<br>
<br>
<br>
</div></div><font color="#888888">--<br>
Eric Ellington<br>
<a href="mailto:e.ellington@gmail.com">e.ellington@gmail.com</a><br>
</font><div><div></div><div class="h5">_______________________________________________<br>
Austin mailing list<br>
<a href="mailto:Austin@pm.org">Austin@pm.org</a><br>
<a href="http://mail.pm.org/mailman/listinfo/austin" target="_blank">http://mail.pm.org/mailman/listinfo/austin</a><br>
</div></div></blockquote></div><br>