[Pdx-pm] read and concatenate two lines at a time

Thomas J Keller kellert at ohsu.edu
Tue Dec 7 16:09:10 CST 2004


Good point. As it turned out, after capturing the header line, the file 
was consistent with respect to each subsequent pair of lines "went 
together". The data did have lots of blanks and odd characters though. 
This made the sorting task they wanted done prior to separating the 
line-pairs again more difficult.

At the risk of exposing my already battered ego to more bruising, what 
I decided to do (after concatenating the line-pairs into an unsorted 
array) was create an array of default values and "fill-in" any blanks 
with the defaults, then do the sorting via the Schwartzian transform. 
Then split the lines back to a new file with the correctly sorted pairs 
of lines. The following worked for this data file:
#!/usr/bin/perl

use strict;
use warnings;

my $header = <>;						## read header line
chomp $header;
my @keys = split(/\t+/,$header);
$keys[0] = "ID";
## need to add a new element $keys[6] = "variance assumption";
my @new_keys = (@keys[0..5],"Variance Parameter", at keys[6..$#keys]);
my @defaults = qw(no_id no_strain 0.0 99 99 99 none 99 99 99 99 99 
no_strain 0.0 99 99 99 none 99 99 99 99 99);

## concatenate consecutive pairs of lines
my @unsorted;
while (my $line = <>){
	chomp $line;
	chomp($line .= <>);					## grab next line
	$line =~ s/\t\./\t1/g;				## substitute "1" for "." values
	push @unsorted, $line;				## push consecutive lines
}

## fill empty fields with default values
my @unsorted_filled;
foreach my $line (@unsorted) {
	my @data = split "\t", $line;
	foreach (0..$#defaults) {
		if ($data[$_] eq "0") {			## in case the data contains real 0 values
			$data[$_] = "0.000";		## this "zero" won't evaluate to false in 
boolean comparisons
		} else {
			$data[$_] = $data[$_] || $defaults[$_];
		}
	}
	push @unsorted_filled, join "\t", @data;
}


## Sort Data by P-value ##
my @sorted =
	map { $_->[0] }					## return sorted array of lines
	sort { $a->[1] <=> $b->[1] }	## sort on second value of each tuple
	map { [$_, (split "\t")[7]] }	## create [line, p-value] tuple as anon. 
array within array
	@unsorted_filled;						## from unsorted lines


## Output ##
print join("\t", @new_keys), "\n";
foreach (@sorted) {
	my @data = split("\t",$_);
	print join("\t", @data[0..11]),"\n";
	print join("\t", ($data[0], @data[12..$#data])),"\n";
	#print join( "\t", @data), "\n";
}

Any other suggestions or warnings gladly accepted.

Thanks for your help folks. I very much appreciate it.
Tom Keller


On Dec 7, 2004, at 1:44 PM, Randal L. Schwartz wrote:

>>>>>> "Ken" == Ken Brush <ken at cgi101.com> writes:
>
> Ken> FYI,  You can even reduce it by one more operation by doing:
>
> Ken> while( my $entry = <> . <>)  {
> Ken> 	$entry =~ s/\n//g;
> Ken> ...
>
> Not safely.  If the first operation returns undef, to indicate the end
> of the @ARGV list, the second operation will read a line from STDIN!
>
> -- 
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 
> 0095
> <merlyn at stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
> See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl 
> training!
> _______________________________________________
> Pdx-pm-list mailing list
> Pdx-pm-list at mail.pm.org
> http://mail.pm.org/mailman/listinfo/pdx-pm-list




More information about the Pdx-pm-list mailing list