[mplspm]: RE: Help with a transpose script (included)

Fri Nov 30 14:22:50 CST 2001

It's hard for me to follow the transpose code because I'm not sure what
was originally intended. Along with the useless 's/(.)/$1/g', it seems
like there may be a number of loops that aren't really necessary:

       for ( $j .. $linesm1 ) {
           $line[$_] .= ' ';
       }

where $_ contains data character(s).

At first I was thinking that the author wrote it to use little memory
and therefore compromised speed, but I don't think that's the case
either. If you're not worried about memory usage with really large
files, a quick and simple solution is attached... It just steps through
every line and appends each character onto an array.

Anyone see any problems with this besides reading the entire data file
in before printing anything out?

-Shaun
------------------------------------------------------------
Shaun Hawkinson,     |        Qwest        | w:612-664-3062
Staff Software       |                     |
Development Engineer | shaun at hawkinson.net | p:612-613-5623
------------------------------------------------------------
OHNOSECOND - that miniscule fraction of time in which you realize
             you've just made a big mistake.

On Fri, Nov 30, 2001 at 10:07:18AM -0800, Nielsen, Aaron M wrote:
> What the code does is the following.  It takes large pattern files of the
> following format and transposes them.
> 
> cccccc ccccc cc
> oooooo ooooo oo
> llllll lllll ll
> ______ _____ __ 
> 123456 78911 11
>           12 34
> xxxxxx xxxxx xx
> xxxxxx 00000 00
> 000000 00000 00
> 000000 11111 11
> 111111 11111 11
> 111111 11111 11
> 111111 11111 11
> 000000 00000 00
> 000000 00000 00
> 000000 00000 00
> xxxxxx xxxxx xx
> xxxxxx xxxxx xx
> xxxxxx xxxxx xx
> 
> to this
> 
> col_1 xx00111000xxx
> col_2 xx00111000xxx
> col_3 xx00111000xxx
> col_4 xx00111000xxx
> col_5 xx00111000xxx
> col_6 xx00111000xxx
> 
> col_7 x001111000xxx
> col_8 x001111000xxx
> col_9 x001111000xxx
> col_10x001111000xxx
> col_11x001111000xxx
> col_12x001111000xxx
> 
> col_13x001111000xxx
> col_14x001111000xxx
> 
> what I do is transpose a large file as mentioned then do some pattern
> matching to mask transitions i.e. 1100 -> 1x00 or 11x00
> 
> then I transpose it back.
> 
> I'm new to perl and took this transpose code and modified it to do what I
> wanted but don't fully understand it.
> Any help would be appreciated.
> 
> Aaron
> 
> -----Original Message-----
> From: Ken Williams [mailto:ken at mathforum.org]
> Sent: Thursday, November 29, 2001 8:07 PM
> To: mpls at pm.org
> Subject: Re: [mplspm]: RE: Help with a transpose script (included)
> 
> 
> Aaron,
> 
> Can you explain, specifically, what the code is supposed to do?  Perhaps 
> an example?  What's the format of the file, i.e. what do you mean by 
> "column format"?
> 
> By the way, the line "$line[$_] =~ s/(.)/$1/g;" is a very slow way to not 
> do anything.  What's its intention?
> 
>  -Ken
> 
> 
> 
> "Nielsen, Aaron M" <aaron.m.nielsen at intel.com> wrote:
> 
> >
> >> I'm working on large pattern data in column format and am using the
> >> following sub to transpose the file so I cant do pattern matching on
> >> columns.  I then call the transpose sub again to return the file to
> >> column format.  The problem I'm running into is that the code is very
> >> slow and thought it will finish on small patterns it dies on large
> >> ones (around 10Meg).  Any ideas on more efficient transpose scripts or
> >> a more efficient way to match patterns on column data?
> >>
> >> ------------------------------------------------
> >> transpose($file,$file_transposed);
> >>
> >> sub transpose {
> >>
> >>     package transpose;
> >>     my (@ops) = @_;
> >>     my $linesm1 = undef;
> >>     my @line = undef;
> >>     my $j = undef;
> >>     my $i = undef;
> >>     open (IN, $ops[0]) || die "1 $!\n";
> >>     open (OUT,"> $ops[1]") || die "2 $!\n";
> >>     while ( <IN> ) {
> >>    $j = 0;
> >>    chomp;
> >>    @_ = split //;
> >>    for ( @_ ) {
> >>        if ( $j > $linesm1 ) {
> >>            $line[$j] = " " x $i;
> >>            $linesm1++;
> >>        }
> >>        $line[$j] .= $_;
> >>        $j++;
> >>    }
> >>    for ( $j .. $linesm1 ) {
> >>        $line[$_] .= ' ';
> >>    }
> >>    $i++;
> >>     }
> >>     for ( 0 .. $linesm1 ){
> >>    $line[$_] =~ s/(.)/$1/g;
> >>    print OUT "$line[$_]\n";
> >>     }
> >>     close OUT;
> >>     close IN;
> >> }
> >>
> >> ------------------------------------------------
> >>
> >> Thanks
> >>
> >>
> >> Aaron Nielsen                      PNG/CMO/PE&TE
> >>                                                    503-712-1822
> >>
> >>
> >
> >
> > --------------------------------------------------
> > Minneapolis Perl Mongers mailing list
> >
> > To unsubscribe, send mail to majordomo at pm.org
> > with "unsubscribe mpls" in the body of the message.
> 
> 
> 
> 
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
> 
> To unsubscribe, send mail to majordomo at pm.org
> with "unsubscribe mpls" in the body of the message.
> 
> 
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
> 
> To unsubscribe, send mail to majordomo at pm.org
> with "unsubscribe mpls" in the body of the message.
-------------- next part --------------
#!/usr/bin/perl -w
#

transpose('infile', 'outfile');

sub transpose {
	my @files = @_;
	my @col;

	## open inbound and outbound files and 
	## replace "DATA" and "STDOUT" below accordingly

	## read it all in
	while (my $row = <DATA>) {
		chomp $row;
		$num_cols = length($row) if !$num_cols;
		for my $i (0 .. $num_cols-1) {
			$col[$i] .= substr($row,$i,1);
		}
	}

	## print it all out
	for my $i (0 .. $num_cols-1) {
		print STDOUT "$col[$i]\n" if $col[$i];
	}
}

__DATA__
cccccc ccccc cc
oooooo ooooo oo
llllll lllll ll
______ _____ __
123456 78911 11
          12 34
xxxxxx xxxxx xx
xxxxxx 00000 00
000000 00000 00
000000 11111 11
111111 11111 11
111111 11111 11
111111 11111 11
000000 00000 00
000000 00000 00
000000 00000 00
xxxxxx xxxxx xx
xxxxxx xxxxx xx
xxxxxx xxxxx xx