<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Yet another little program I wrote today.
<div><br class="webkit-block-placeholder"></div><div>Perl roolz. :)</div><div><br class="webkit-block-placeholder"></div><div>(Let's see if this RTF format will post to the list OK w/o line wrapping...)</div><div><br class="webkit-block-placeholder"></div><div>j</div><div><br class="webkit-block-placeholder"></div><div><br></div><div><br class="webkit-block-placeholder"></div><div><br class="webkit-block-placeholder"></div><div><div><font class="Apple-style-span" face="'Courier New'">$ cat in</font></div> <div><font class="Apple-style-span" face="'Courier New'">zero one two three four</font></div> <div><font class="Apple-style-span" face="'Courier New'">01 11 21 31 41</font></div> <div><font class="Apple-style-span" face="'Courier New'">02 12 22 32 42</font></div> <div><font class="Apple-style-span" face="'Courier New'">03 13 23 33 43</font></div> <div><font class="Apple-style-span" face="'Courier New'">01 11 21 31 41</font></div> <div><font class="Apple-style-span" face="'Courier New'">02 12 22 32 42</font></div> <div><font class="Apple-style-span" face="'Courier New'">03 13 23 33 43</font></div> <div><font class="Apple-style-span" face="'Courier New'">01 11 21 31 41</font></div> <div><font class="Apple-style-span" face="'Courier New'">02 12 22 32 42</font></div> <div><font class="Apple-style-span" face="'Courier New'">03 13 23 33 43</font></div> <div><font class="Apple-style-span" face="'Courier New'">01 11 21 31 41</font></div> <div><font class="Apple-style-span" face="'Courier New'">02 12 22 32 42</font></div> <div><font class="Apple-style-span" face="'Courier New'">03 13 23 33 43</font></div> <div><font class="Apple-style-span" face="'Courier New'">$ perl ./microarray_to_R.pl --label_column 4 --discard_columns "1..2" --file in </font></div> <div><font class="Apple-style-span" face="'Courier New'">four zero three</font></div> <div><font class="Apple-style-span" face="'Courier New'">41 01 31</font></div> <div><font class="Apple-style-span" face="'Courier New'">42 02 32</font></div> <div><font class="Apple-style-span" face="'Courier New'">43 03 33</font></div> <div><font class="Apple-style-span" face="'Courier New'">41 CLAB2 01 31</font></div> <div><font class="Apple-style-span" face="'Courier New'">42 CLAB2 02 32</font></div> <div><font class="Apple-style-span" face="'Courier New'">43 CLAB2 03 33</font></div> <div><font class="Apple-style-span" face="'Courier New'">41 CLAB3 01 31</font></div> <div><font class="Apple-style-span" face="'Courier New'">42 CLAB3 02 32</font></div> <div><font class="Apple-style-span" face="'Courier New'">43 CLAB3 03 33</font></div> <div><font class="Apple-style-span" face="'Courier New'">41 CLAB4 01 31</font></div> <div><font class="Apple-style-span" face="'Courier New'">42 CLAB4 02 32</font></div> <div><font class="Apple-style-span" face="'Courier New'">43 CLAB4 03 33</font></div></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><div><font class="Apple-style-span" face="'Courier New'">#!/usr/bin/perl -w</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'">use strict;</font></div><div><font class="Apple-style-span" face="'Courier New'">use Getopt::Long;</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'">my ($discard_columns, $label_column, $file);</font></div><div><font class="Apple-style-span" face="'Courier New'">my $result = GetOptions (</font></div><div><font class="Apple-style-span" face="'Courier New'"> "discard_columns=s" => \$discard_columns,</font></div><div><font class="Apple-style-span" face="'Courier New'"> "label_column=s" => \$label_column,</font></div><div><font class="Apple-style-span" face="'Courier New'"> "file=s" => \$file,</font></div><div><font class="Apple-style-span" face="'Courier New'">);</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'">usage() unless (-r $file && defined $label_column);</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'">my @discard_columns;</font></div><div><font class="Apple-style-span" face="'Courier New'">if ($discard_columns) {</font></div><div><font class="Apple-style-span" face="'Courier New'"> @discard_columns = eval $discard_columns;</font></div><div><font class="Apple-style-span" face="'Courier New'">}</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'">foreach my $column (reverse sort numerically @discard_columns) {</font></div><div><font class="Apple-style-span" face="'Courier New'"> # Stop silliness</font></div><div><font class="Apple-style-span" face="'Courier New'"> if ($column == $label_column) {</font></div><div><font class="Apple-style-span" face="'Courier New'"> die "You can't discard your label_column.";</font></div><div><font class="Apple-style-span" face="'Courier New'"> }</font></div><div><font class="Apple-style-span" face="'Courier New'"> # Each splice might move my label_column to the left...</font></div><div><font class="Apple-style-span" face="'Courier New'"> if ($column < $label_column) {</font></div><div><font class="Apple-style-span" face="'Courier New'"> $label_column--;</font></div><div><font class="Apple-style-span" face="'Courier New'"> }</font></div><div><font class="Apple-style-span" face="'Courier New'">}</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'">my %labels;</font></div><div><font class="Apple-style-span" face="'Courier New'">open (IN, $file) or die;</font></div><div><font class="Apple-style-span" face="'Courier New'">my $row = 1;</font></div><div><font class="Apple-style-span" face="'Courier New'">while (<IN>) {</font></div><div><font class="Apple-style-span" face="'Courier New'"> chomp;</font></div><div><font class="Apple-style-span" face="'Courier New'"> my @input = split /\t/;</font></div><div><font class="Apple-style-span" face="'Courier New'"> my @output = @input;</font></div><div><font class="Apple-style-span" face="'Courier New'"> </font></div><div><font class="Apple-style-span" face="'Courier New'"> # discard_columns</font></div><div><font class="Apple-style-span" face="'Courier New'"> foreach my $column (reverse sort numerically @discard_columns) {</font></div><div><font class="Apple-style-span" face="'Courier New'"> splice @output, $column, 1;</font></div><div><font class="Apple-style-span" face="'Courier New'"> }</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"> # label_column</font></div><div><font class="Apple-style-span" face="'Courier New'"> # Grab the label</font></div><div><font class="Apple-style-span" face="'Courier New'"> my $label = splice @output, $label_column, 1;</font></div><div><font class="Apple-style-span" face="'Courier New'"> # Make sure it's unique</font></div><div><font class="Apple-style-span" face="'Courier New'"> $labels{$label}++;</font></div><div><font class="Apple-style-span" face="'Courier New'"> if ($labels{$label} > 1) {</font></div><div><font class="Apple-style-span" face="'Courier New'"> $label = "$label CLAB$labels{$label}";</font></div><div><font class="Apple-style-span" face="'Courier New'"> }</font></div><div><font class="Apple-style-span" face="'Courier New'"> # Stick it on the front of the array</font></div><div><font class="Apple-style-span" face="'Courier New'"> unshift @output, $label;</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"> no warnings 'uninitialized';</font></div><div><font class="Apple-style-span" face="'Courier New'"> print join "\t", @output;</font></div><div><font class="Apple-style-span" face="'Courier New'"> print "\n";</font></div><div><font class="Apple-style-span" face="'Courier New'"> $row++;</font></div><div><font class="Apple-style-span" face="'Courier New'">}</font></div><div><font class="Apple-style-span" face="'Courier New'">close IN;</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"># END MAIN</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'">sub numerically { $a <=> $b }</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'">sub usage {</font></div><div><font class="Apple-style-span" face="'Courier New'"> print <<EOT;</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'">microarray_to_R.pl \</font></div><div><font class="Apple-style-span" face="'Courier New'"> --discard_columns "2..5,7,9,10" \</font></div><div><font class="Apple-style-span" face="'Courier New'"> --label_column 1 \</font></div><div><font class="Apple-style-span" face="'Courier New'"> --file All_Jan_03_08.txt</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"> Read the microarray data in the file above and output a file format </font></div><div><font class="Apple-style-span" face="'Courier New'"> that will make the default read.table in R happy.</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"> discard_columns: The columns listed will be removed. The value is a Perl</font></div><div><font class="Apple-style-span" face="'Courier New'"> expression, so use commas and the range operator (..). Column numbers start at zero.</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"> label_column: The column which we will sent to R as the label for each row. Column</font></div><div><font class="Apple-style-span" face="'Courier New'"> numbers start at zero.</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"> All of the values in label_column must be unique. If they are not this </font></div><div><font class="Apple-style-span" face="'Courier New'"> program makes all values unique by adding " CLAB#" to the end of non-unique</font></div><div><font class="Apple-style-span" face="'Courier New'"> labels, starting at 2. For example, these duplicate labels:</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"> "NM_020552"</font></div><div><font class="Apple-style-span" face="'Courier New'"> "NM_020552"</font></div><div><font class="Apple-style-span" face="'Courier New'"> "NM_020552"</font></div><div><font class="Apple-style-span" face="'Courier New'"> "NM_020552"</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"> Are turned into these:</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><font class="Apple-style-span" face="'Courier New'"> "NM_020552"</font></div><div><font class="Apple-style-span" face="'Courier New'"> "NM_020552 CLAB2"</font></div><div><font class="Apple-style-span" face="'Courier New'"> "NM_020552 CLAB3"</font></div><div><font class="Apple-style-span" face="'Courier New'"> "NM_020552 CLAB4"</font></div><div><font class="Apple-style-span" face="'Courier New'"> </font></div><div><font class="Apple-style-span" face="'Courier New'">EOT</font></div><div><font class="Apple-style-span" face="'Courier New'"> exit;</font></div><div><font class="Apple-style-span" face="'Courier New'">}</font></div><div><font class="Apple-style-span" face="'Courier New'"><br class="webkit-block-placeholder"></font></div><div><br class="webkit-block-placeholder"></div></div></body></html>