<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">


<head>

<meta http-equiv=Content-Type content="text/html; charset=us-ascii">

<meta name=Generator content="Microsoft Word 12 (filtered medium)">

<style>

<!--

 /* Font Definitions */

 @font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:Tahoma;

        panose-1:2 11 6 4 3 5 4 4 2 4;}

 /* Style Definitions */

 p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Times New Roman","serif";}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

span.EmailStyle17

        {mso-style-type:personal-reply;

        font-family:"Calibri","sans-serif";

        color:#1F497D;}

.MsoChpDefault

        {mso-style-type:export-only;}

@page Section1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.Section1

        {page:Section1;}

-->

</style>

<!--[if gte mso 9]><xml>

 <o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

 <o:shapelayout v:ext="edit">

  <o:idmap v:ext="edit" data="1" />

 </o:shapelayout></xml><![endif]-->

</head>


<body lang=EN-US link=blue vlink=purple>


<div class=Section1>


<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";

color:#1F497D'>Thank you all for the suggestions.&nbsp; I will try them out later

this week.&nbsp; I especially like the idea of converting to CSV as it would be

nice to work with something other than a 75MB Excel spreadsheet.&nbsp; Getting

rid of the array with the names helped a lot but given that I am only using Win32::OLE

and Win32::OLE::Const, I think bypassing OLE will give me more bang for the

buck.<o:p></o:p></span></p>


<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";

color:#1F497D'><o:p>&nbsp;</o:p></span></p>


<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";

color:#1F497D'>Thanks!<o:p></o:p></span></p>


<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";

color:#1F497D'><o:p>&nbsp;</o:p></span></p>


<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";

color:#1F497D'>John<o:p></o:p></span></p>


<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'>


<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span

style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>

austin-bounces+jwarner=texas.net@pm.org

[mailto:austin-bounces+jwarner=texas.net@pm.org] <b>On Behalf Of </b>Randall

Smith<br>

<b>Sent:</b> Thursday, July 23, 2009 3:33 PM<br>

<b>To:</b> Eric Ellington<br>

<b>Cc:</b> Austin: pm.org<br>

<b>Subject:</b> Re: APM: Perl, Win32 OLE, and Excel<o:p></o:p></span></p>


</div>


<p class=MsoNormal><o:p>&nbsp;</o:p></p>


<p class=MsoNormal style='margin-bottom:12.0pt'>Is it possible for you to

export the XLS file to a CSV and then process it that way without having to go

through the OLE modules?&nbsp; I used to process Word documents using Perl and

at a certain point I would run into issues with the OLE, or it would just take

a long time since Perl was spending most of its time actually waiting on the

OLE stuff to do its thing.&nbsp; If you can export it to a CSV.&nbsp; I haven't

had issues with processing just CSV data.&nbsp; <br>

<br>

If you do need to write things back to a destination of some sort while you're

processing, maybe importing it into a databse (MySQL, PostgreSQL) might be

good, since you could create a database with a table holding the data you're

processing and then create whatever other tables you need to store the results

of your work.&nbsp; You could then dump the final product out into a CSV file

(or files) and reprocess it as need be.<br>

<br>

Randy<o:p></o:p></p>


<div>


<p class=MsoNormal>On Thu, Jul 23, 2009 at 2:21 PM, Eric Ellington &lt;<a

href="mailto:e.ellington@gmail.com">e.ellington@gmail.com</a>&gt; wrote:<o:p></o:p></p>


<p class=MsoNormal>I used to do this a bunch. You mention 133k rows. Excel used

to max<br>

out around something like 65k rows. Maybe I am out of date but how is<br>

so much data crammed into a single worksheet?<br>

<br>

What packages are you using?<br>

<br>

Thanks,<br>

<br>

Eric<o:p></o:p></p>


<div>


<div>


<p class=MsoNormal style='margin-bottom:12.0pt'><br>

On Thu, Jul 23, 2009 at 12:20 PM, John Warner&lt;<a

href="mailto:jwarner@texas.net">jwarner@texas.net</a>&gt; wrote:<br>

&gt; All,<br>

&gt;<br>

&gt; I have a project where I am trying to filter through a large amount of

data<br>

&gt; from an Excel spreadsheet. &nbsp;Since I don't have access to the

databases where<br>

&gt; the data actually resides, I have to use a spreadsheet that was given to

me.<br>

&gt; The spreadsheet contains 79 columns and approximately 113k rows. &nbsp;The

data<br>

&gt; are customer satisfaction survey results along with a plethora of other<br>

&gt; garbage I don't need. &nbsp;I am only interested in a few columns.<br>

&gt;<br>

&gt; My code goes like this...<br>

&gt;<br>

&gt; Create an Excel Object<br>

&gt; Use Object to open Source and Destination spreadsheets<br>

&gt; Find the column and row boundaries of where data is within the source.<br>

&gt;<br>

&gt; my @ArrayOfNames = ('Bill', 'Bob', 'Jane', 'Tom', 'Dick', 'Harry');<br>

&gt;<br>

&gt; #Columns<br>

&gt; # &nbsp; &nbsp; &nbsp; Source &nbsp; &nbsp; &nbsp; &nbsp;

&nbsp;Destination &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Description<br>

&gt; # &nbsp; &nbsp; &nbsp; Column &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Column<br>

&gt; # &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 28 &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3 &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Responsible<br>

&gt; Tech<br>

&gt; # &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 55 &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5 &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Tech Sat<br>

&gt; Score<br>

&gt; # &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 57 &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;6 &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Overall Sat<br>

&gt; Score<br>

&gt; #<br>

&gt; foreach my $row (2..$LastRow) #skip header row on row 1<br>

&gt; {<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp;#check the responsible tech<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp;foreach my $t (@ArrayOfNames)<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp;{<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;my $cellObj =

$srcSheet-&gt;Cells($row,28);<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;print

&quot;Current: &nbsp;$t &nbsp;\t Incident tech: &nbsp;$cellObj-&gt;{Value}

&quot;;<br>

&gt;<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;if ($t =~

m/$srcSheet-&gt;Cells($row,28)-&gt;{Value}/)<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;{<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp;print &quot;found a match!\n&quot;;<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp;if ($srcSheet-&gt;Cells($row,55)-&gt;{Value} &lt; 7 ||<br>

&gt; $srcSheet-&gt;Cells($row,57)-&gt;{Value} &lt; 7)<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp;{<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;#copy data from source to destination<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp;}<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}else{<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp;#print &quot;not a match \n&quot;;<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

&nbsp; &nbsp;next;<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}<br>

&gt; &nbsp; &nbsp; &nbsp; &nbsp;}<br>

&gt; }<br>

&gt;<br>

&gt; My question: &nbsp;With 113k rows to go through, Perl runs out of memory

and the<br>

&gt; processing takes quite a while. &nbsp;How can I be more efficient?<br>

&gt;<br>

&gt;<br>

&gt; John Warner<br>

&gt; <a href="mailto:jwarner@texas.net">jwarner@texas.net</a><br>

&gt; H: &nbsp;512.251.1270<br>

&gt; C: &nbsp;512.426.3813<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; Austin mailing list<br>

&gt; <a href="mailto:Austin@pm.org">Austin@pm.org</a><br>

&gt; <a href="http://mail.pm.org/mailman/listinfo/austin" target="_blank">http://mail.pm.org/mailman/listinfo/austin</a><br>

&gt;<br>

<br>

<br>

<o:p></o:p></p>


</div>


</div>


<p class=MsoNormal><span style='color:#888888'>--<br>

Eric Ellington<br>

<a href="mailto:e.ellington@gmail.com">e.ellington@gmail.com</a></span><o:p></o:p></p>


<div>


<div>


<p class=MsoNormal>_______________________________________________<br>

Austin mailing list<br>

<a href="mailto:Austin@pm.org">Austin@pm.org</a><br>

<a href="http://mail.pm.org/mailman/listinfo/austin" target="_blank">http://mail.pm.org/mailman/listinfo/austin</a><o:p></o:p></p>


</div>


</div>


</div>


<p class=MsoNormal><o:p>&nbsp;</o:p></p>


</div>


</body>


</html>