[Omaha.pm] Little tab-delimited file manipulator
Jay Hannah
jay at jays.net
Sun Mar 2 15:14:12 PST 2008
Yet another little program I wrote today.
Perl roolz. :)
(Let's see if this RTF format will post to the list OK w/o line
wrapping...)
j
$ cat in
zero one two three four
01 11 21 31 41
02 12 22 32 42
03 13 23 33 43
01 11 21 31 41
02 12 22 32 42
03 13 23 33 43
01 11 21 31 41
02 12 22 32 42
03 13 23 33 43
01 11 21 31 41
02 12 22 32 42
03 13 23 33 43
$ perl ./microarray_to_R.pl --label_column 4 --discard_columns "1..2"
--file in
four zero three
41 01 31
42 02 32
43 03 33
41 CLAB2 01 31
42 CLAB2 02 32
43 CLAB2 03 33
41 CLAB3 01 31
42 CLAB3 02 32
43 CLAB3 03 33
41 CLAB4 01 31
42 CLAB4 02 32
43 CLAB4 03 33
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
my ($discard_columns, $label_column, $file);
my $result = GetOptions (
"discard_columns=s" => \$discard_columns,
"label_column=s" => \$label_column,
"file=s" => \$file,
);
usage() unless (-r $file && defined $label_column);
my @discard_columns;
if ($discard_columns) {
@discard_columns = eval $discard_columns;
}
foreach my $column (reverse sort numerically @discard_columns) {
# Stop silliness
if ($column == $label_column) {
die "You can't discard your label_column.";
}
# Each splice might move my label_column to the left...
if ($column < $label_column) {
$label_column--;
}
}
my %labels;
open (IN, $file) or die;
my $row = 1;
while (<IN>) {
chomp;
my @input = split /\t/;
my @output = @input;
# discard_columns
foreach my $column (reverse sort numerically @discard_columns) {
splice @output, $column, 1;
}
# label_column
# Grab the label
my $label = splice @output, $label_column, 1;
# Make sure it's unique
$labels{$label}++;
if ($labels{$label} > 1) {
$label = "$label CLAB$labels{$label}";
}
# Stick it on the front of the array
unshift @output, $label;
no warnings 'uninitialized';
print join "\t", @output;
print "\n";
$row++;
}
close IN;
# END MAIN
sub numerically { $a <=> $b }
sub usage {
print <<EOT;
microarray_to_R.pl \
--discard_columns "2..5,7,9,10" \
--label_column 1 \
--file All_Jan_03_08.txt
Read the microarray data in the file above and output a file format
that will make the default read.table in R happy.
discard_columns: The columns listed will be removed. The value is
a Perl
expression, so use commas and the range operator (..). Column
numbers start at zero.
label_column: The column which we will sent to R as the label for
each row. Column
numbers start at zero.
All of the values in label_column must be unique. If they are not
this
program makes all values unique by adding " CLAB#" to the end of
non-unique
labels, starting at 2. For example, these duplicate labels:
"NM_020552"
"NM_020552"
"NM_020552"
"NM_020552"
Are turned into these:
"NM_020552"
"NM_020552 CLAB2"
"NM_020552 CLAB3"
"NM_020552 CLAB4"
EOT
exit;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/omaha-pm/attachments/20080302/545e2b5b/attachment.html
More information about the Omaha-pm
mailing list