APM: Perl, Win32 OLE, and Excel

Pat Ludwig havoc at boldo.com
Thu Jul 23 11:29:10 PDT 2009


Some ideasa) throw the techs into a hash and use -- if (exists
$techhash{$cellObj->{Value}}
b) dump the excel file into a text file if possible
c) after b) run the text file thru the unix utility "cut" to grab just the
columns you need

The amount of data you are parsing isn't overly large so I suspect that the
perl interface to excel you are using is the memory hog.


--Pat
On Thu, Jul 23, 2009 at 1:16 PM, John Warner <jwarner at texas.net> wrote:

> In my actual code, I am copying from source file to destination file and
> the rows are independent of each other.  One thought that occurred to me was
> to not use an array of names using something like the code below.  This
> should reduce this process from an order n^2 operation to an order n
> operation.
>
> my @index;
>
> foreach my $row (2..$LastRow) #skip header row on row 1
> {
>       my $cellObj = $srcSheet->Cells($row,28);
>       print "Incident tech:  $cellObj->{Value} ";
>
>                if ($cellObj->{Value}  =~ m/name1|name2|name3/)
>                {
>                                push $row, @index;
>                 }else{
>                                print “not a match \n”;
>                                next;
>                }
> }
>
> foreach my $i (@index)
> {
>     copy target data from source file to destination file
> }
>
> I'll have to test to see what the difference in CPU and memory utilization
> are once I get into the office.  Time to start my day!
>
> Thanks for all the suggestions!
>
>
> John Warner
> jwarner at texas.net
> H:  512.251.1270
> C:  512.426.3813
>
>
>
>
> -----Original Message-----
> From: austin-bounces+jwarner=texas.net at pm.org [mailto:
> austin-bounces+jwarner <austin-bounces%2Bjwarner>=texas.net at pm.org] On
> Behalf Of jameschoate at austin.rr.com
> Sent: Thursday, July 23, 2009 1:04 PM
> To: austin
> Subject: Re: APM: Perl, Win32 OLE, and Excel
>
> Besides only processing each sub-set of each row via an input filter --
>
> Assuming each row is not dependent on the other rows, why use an output
> array at all? Just write it to a file. The only reason I can see writing
> this to an array is to keep it in memory for subsequence processing.
>
> ---- Keith Howanitz <howanitz at gmail.com> wrote:
> > On Thu, Jul 23, 2009 at 12:20 PM, John Warner<jwarner at texas.net> wrote:
> > > All,
> > >
> > > I have a project where I am trying to filter through a large amount of
> data
> > > from an Excel spreadsheet.  Since I don't have access to the databases
> where
> > > the data actually resides, I have to use a spreadsheet that was given
> to me.
> > > The spreadsheet contains 79 columns and approximately 113k rows.  The
> data
> > > are customer satisfaction survey results along with a plethora of other
> > > garbage I don't need.  I am only interested in a few columns.
> > [SNIP]
> >
> > Have you tried putting a simple output when reading the xls file to
> > show you how far you are getting in the file before having problems -
> > maybe you are really going beyond 113k records, or it is choking on
> > unusual data in one particular record.
> >
> > I wonder if you simply saved the xls file as a csv file and used the
> > TEXT::CSV_XS module if you would still have troubles.
> >
> > If you want to read the whole thing into memory, you could always read
> > each line, put the 3 important fields in array, and then read the next
> > line so that you only end up with an array that is 113k x 3 rather
> > than the whole spreadsheet in memory.
> > _______________________________________________
> > Austin mailing list
> > Austin at pm.org
> > http://mail.pm.org/mailman/listinfo/austin
>
> --
>  -- -- -- --
> Venimus, Vidimus, Dolavimus
>
> James Choate
> jameschoate at austin.rr.com
> james.choate at twcable.com
> 512-657-1279
> www.ssz.com
> http://www.twine.com/twine/1128gqhxn-dwr/solar-soyuz-zaibatsu
> http://www.twine.com/twine/1178v3j0v-76w/confusion-research-center
>
> Adapt, Adopt, Improvise
>  -- -- -- --
> _______________________________________________
> Austin mailing list
> Austin at pm.org
> http://mail.pm.org/mailman/listinfo/austin
>
>
> _______________________________________________
> Austin mailing list
> Austin at pm.org
> http://mail.pm.org/mailman/listinfo/austin
>



-- 
Pat Ludwig <havoc at boldo.com>
AIM: HaVoCPaT     MSN:pludwigtx at hotmail.com <MSN%3Apludwigtx at hotmail.com>
 GTalk:havoclad at gmail.com <GTalk%3Ahavoclad at gmail.com>   YiM:havoclad

"Having a public that actually knows something is our best defense against
ever again electing a president who knows nothing."  -- Bill Maher 5/8/2009
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/austin/attachments/20090723/429ad5f6/attachment-0001.html>


More information about the Austin mailing list