APM: Perl, Win32 OLE, and Excel

Thu Jul 23 11:16:00 PDT 2009

In my actual code, I am copying from source file to destination file and the rows are independent of each other.  One thought that occurred to me was to not use an array of names using something like the code below.  This should reduce this process from an order n^2 operation to an order n operation.

my @index;

foreach my $row (2..$LastRow) #skip header row on row 1
{
      my $cellObj = $srcSheet->Cells($row,28);
      print "Incident tech:  $cellObj->{Value} ";

                if ($cellObj->{Value}  =~ m/name1|name2|name3/)
                {
                                push $row, @index;
                }else{
                                print “not a match \n”;
                                next;
                }
}

foreach my $i (@index)
{
     copy target data from source file to destination file
}

I'll have to test to see what the difference in CPU and memory utilization are once I get into the office.  Time to start my day!

Thanks for all the suggestions!

John Warner
jwarner at texas.net
H:  512.251.1270
C:  512.426.3813

-----Original Message-----
From: austin-bounces+jwarner=texas.net at pm.org [mailto:austin-bounces+jwarner=texas.net at pm.org] On Behalf Of jameschoate at austin.rr.com
Sent: Thursday, July 23, 2009 1:04 PM
To: austin
Subject: Re: APM: Perl, Win32 OLE, and Excel

Besides only processing each sub-set of each row via an input filter --

Assuming each row is not dependent on the other rows, why use an output array at all? Just write it to a file. The only reason I can see writing this to an array is to keep it in memory for subsequence processing.

---- Keith Howanitz <howanitz at gmail.com> wrote: 
> On Thu, Jul 23, 2009 at 12:20 PM, John Warner<jwarner at texas.net> wrote:
> > All,
> >
> > I have a project where I am trying to filter through a large amount of data
> > from an Excel spreadsheet.  Since I don't have access to the databases where
> > the data actually resides, I have to use a spreadsheet that was given to me.
> > The spreadsheet contains 79 columns and approximately 113k rows.  The data
> > are customer satisfaction survey results along with a plethora of other
> > garbage I don't need.  I am only interested in a few columns.
> [SNIP]
> 
> Have you tried putting a simple output when reading the xls file to
> show you how far you are getting in the file before having problems -
> maybe you are really going beyond 113k records, or it is choking on
> unusual data in one particular record.
> 
> I wonder if you simply saved the xls file as a csv file and used the
> TEXT::CSV_XS module if you would still have troubles.
> 
> If you want to read the whole thing into memory, you could always read
> each line, put the 3 important fields in array, and then read the next
> line so that you only end up with an array that is 113k x 3 rather
> than the whole spreadsheet in memory.
> _______________________________________________
> Austin mailing list
> Austin at pm.org
> http://mail.pm.org/mailman/listinfo/austin

--
 -- -- -- --
Venimus, Vidimus, Dolavimus

James Choate
jameschoate at austin.rr.com
james.choate at twcable.com
512-657-1279
www.ssz.com
http://www.twine.com/twine/1128gqhxn-dwr/solar-soyuz-zaibatsu
http://www.twine.com/twine/1178v3j0v-76w/confusion-research-center

Adapt, Adopt, Improvise
 -- -- -- --
_______________________________________________
Austin mailing list
Austin at pm.org
http://mail.pm.org/mailman/listinfo/austin