<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Thank you all for the suggestions. I will try them out later
this week. I especially like the idea of converting to CSV as it would be
nice to work with something other than a 75MB Excel spreadsheet. Getting
rid of the array with the names helped a lot but given that I am only using Win32::OLE
and Win32::OLE::Const, I think bypassing OLE will give me more bang for the
buck.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Thanks!<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>John<o:p></o:p></span></p>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'>
<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>
austin-bounces+jwarner=texas.net@pm.org
[mailto:austin-bounces+jwarner=texas.net@pm.org] <b>On Behalf Of </b>Randall
Smith<br>
<b>Sent:</b> Thursday, July 23, 2009 3:33 PM<br>
<b>To:</b> Eric Ellington<br>
<b>Cc:</b> Austin: pm.org<br>
<b>Subject:</b> Re: APM: Perl, Win32 OLE, and Excel<o:p></o:p></span></p>
</div>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal style='margin-bottom:12.0pt'>Is it possible for you to
export the XLS file to a CSV and then process it that way without having to go
through the OLE modules? I used to process Word documents using Perl and
at a certain point I would run into issues with the OLE, or it would just take
a long time since Perl was spending most of its time actually waiting on the
OLE stuff to do its thing. If you can export it to a CSV. I haven't
had issues with processing just CSV data. <br>
<br>
If you do need to write things back to a destination of some sort while you're
processing, maybe importing it into a databse (MySQL, PostgreSQL) might be
good, since you could create a database with a table holding the data you're
processing and then create whatever other tables you need to store the results
of your work. You could then dump the final product out into a CSV file
(or files) and reprocess it as need be.<br>
<br>
Randy<o:p></o:p></p>
<div>
<p class=MsoNormal>On Thu, Jul 23, 2009 at 2:21 PM, Eric Ellington <<a
href="mailto:e.ellington@gmail.com">e.ellington@gmail.com</a>> wrote:<o:p></o:p></p>
<p class=MsoNormal>I used to do this a bunch. You mention 133k rows. Excel used
to max<br>
out around something like 65k rows. Maybe I am out of date but how is<br>
so much data crammed into a single worksheet?<br>
<br>
What packages are you using?<br>
<br>
Thanks,<br>
<br>
Eric<o:p></o:p></p>
<div>
<div>
<p class=MsoNormal style='margin-bottom:12.0pt'><br>
On Thu, Jul 23, 2009 at 12:20 PM, John Warner<<a
href="mailto:jwarner@texas.net">jwarner@texas.net</a>> wrote:<br>
> All,<br>
><br>
> I have a project where I am trying to filter through a large amount of
data<br>
> from an Excel spreadsheet. Since I don't have access to the
databases where<br>
> the data actually resides, I have to use a spreadsheet that was given to
me.<br>
> The spreadsheet contains 79 columns and approximately 113k rows. The
data<br>
> are customer satisfaction survey results along with a plethora of other<br>
> garbage I don't need. I am only interested in a few columns.<br>
><br>
> My code goes like this...<br>
><br>
> Create an Excel Object<br>
> Use Object to open Source and Destination spreadsheets<br>
> Find the column and row boundaries of where data is within the source.<br>
><br>
> my @ArrayOfNames = ('Bill', 'Bob', 'Jane', 'Tom', 'Dick', 'Harry');<br>
><br>
> #Columns<br>
> # Source
Destination Description<br>
> # Column Column<br>
> # 28
3
Responsible<br>
> Tech<br>
> # 55
5
Tech Sat<br>
> Score<br>
> # 57
6
Overall Sat<br>
> Score<br>
> #<br>
> foreach my $row (2..$LastRow) #skip header row on row 1<br>
> {<br>
> #check the responsible tech<br>
> foreach my $t (@ArrayOfNames)<br>
> {<br>
> my $cellObj =
$srcSheet->Cells($row,28);<br>
> print
"Current: $t \t Incident tech: $cellObj->{Value}
";<br>
><br>
> if ($t =~
m/$srcSheet->Cells($row,28)->{Value}/)<br>
> {<br>
>
print "found a match!\n";<br>
>
if ($srcSheet->Cells($row,55)->{Value} < 7 ||<br>
> $srcSheet->Cells($row,57)->{Value} < 7)<br>
>
{<br>
>
#copy data from source to destination<br>
>
}<br>
> }else{<br>
>
#print "not a match \n";<br>
>
next;<br>
> }<br>
> }<br>
> }<br>
><br>
> My question: With 113k rows to go through, Perl runs out of memory
and the<br>
> processing takes quite a while. How can I be more efficient?<br>
><br>
><br>
> John Warner<br>
> <a href="mailto:jwarner@texas.net">jwarner@texas.net</a><br>
> H: 512.251.1270<br>
> C: 512.426.3813<br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Austin mailing list<br>
> <a href="mailto:Austin@pm.org">Austin@pm.org</a><br>
> <a href="http://mail.pm.org/mailman/listinfo/austin" target="_blank">http://mail.pm.org/mailman/listinfo/austin</a><br>
><br>
<br>
<br>
<o:p></o:p></p>
</div>
</div>
<p class=MsoNormal><span style='color:#888888'>--<br>
Eric Ellington<br>
<a href="mailto:e.ellington@gmail.com">e.ellington@gmail.com</a></span><o:p></o:p></p>
<div>
<div>
<p class=MsoNormal>_______________________________________________<br>
Austin mailing list<br>
<a href="mailto:Austin@pm.org">Austin@pm.org</a><br>
<a href="http://mail.pm.org/mailman/listinfo/austin" target="_blank">http://mail.pm.org/mailman/listinfo/austin</a><o:p></o:p></p>
</div>
</div>
</div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
</body>
</html>