[Classiccity-pm] basic idiot question

Sat Oct 25 23:43:57 CDT 2003

On Sat, Oct 25, 2003 at 08:46:32PM -0600, Jeff Scarbrough wrote:
> Now, the only problem I have is when data does not exist for the key, I get 
> an "uninitialized" kind of error when I write...I need to include something 
> on the order of:
> 
> if ! $array_1{$key_sort} {
>   $array_1{$key_sort} = " ";
> }
> 
> before I do the write... I'm sure the syntax there is faulty, but that's 

You're probably looking for "defined".

I've been mulling this over and although I don't have all the details yet, I
think a hash of arrays might be the way to go.  The idea is to open one file
after another (however many doesn't matter) and for each timestamp push the
data you want onto the array that is defined in each hash element (sorted by
key of course).  I think you said you were newish to perl so here's a prog I
whipped up for your sample data:

=================
#!/usr/bin/perl -w

use strict;

my %bow; #whole ball of wax   

for (@ARGV) { #read command-line args, I'll pass it filenames

  my @wholefile = <>; chomp @wholefile; #read in file to array

  for my $line (@wholefile) {
    my (undef, undef, undef, undef, undef, undef, undef, undef,
        $year, $month, $day, $hour, $min, undef, undef)
        = split /,\s+/, $line;

    #add leading 0's so things sort properly - could use sprintf

    if (length($month) == 1) { $month = "0" . $month};
    if (length($day) == 1) { $day = "0" . $day};
    if (length($hour) == 1) { $hour = "0" . $hour};
    if (length($min) == 1) { $min = "0" . $min};

    my $timestamp = $year . $month . $day . $hour . $min ;

    #now push the line of data onto the array
    push @{ $bow{$timestamp}}, $line;
  }
}

use Data::Dumper;

print Dumper(\%bow);

======================

When run you get (long lines):

pkeck at dirdir:~$ ./jeff.pl jeff1.txt jeff2.txt 
$VAR1 = {
          '200310200952' => [
                              '  293.4113, 2261.00,   18.16,    2,  74,  125,   1678,  1681,  2003,  10,  20,  9, 52,  CH4OP-1005, 4001'
                            ],
          '200310200947' => [
                              '  293.4077, 2711.00,   21.78,    1,  81,  125,   1766,  1766,  2003,  10,  20,  9, 47,  CH4OP-1005, 4001'
                            ],
          '200310200949' => [
                              '  293.4091,  382.00,    3.16,    2,  98,  121,   5894,  5994,  2003,  10,  20,  9, 49,  CH4OP-1006, 4001'
                            ],
          '200310200950' => [
                              '  293.4100, 2379.50,   19.11,    2,  74,  125,   1653,  1676,  2003,  10,  20,  9, 50,  CH4OP-1005, 4001',
                              '  293.4099,  320.00,    2.64,    2,  98,  121,   5854,  5889,  2003,  10,  20,  9, 50,  CH4OP-1006, 4001'
                            ],
          '200310200951' => [
                              '  293.4105, 2210.00,   17.75,    1,  71,  125,   1659,  1659,  2003,  10,  20,  9, 51,  CH4OP-1005, 4001'
                            ],
          '200310200948' => [
                              '  293.4085, 2588.50,   20.79,    2,  79,  125,   1697,  1751,  2003,  10,  20,  9, 48,  CH4OP-1005, 4001',
                              '  293.4084,  356.00,    2.94,    1,  99,  121,   5873,  5873,  2003,  10,  20,  9, 48,  CH4OP-1006, 4001'
                            ]
        };
pkeck at dirdir:~$ 

I just used dumper because if gives a nice view of the data.  To access it
yourself you should probably use something like:

============
for my $timestmp (sort keys %bow) {
  for my $line ( @{ $bow{$timestmp} } ) {
    print $line, "\n";
  }
}
============

Notice the syntax " @{ $bow{$timestmp} } " - this is an array that is pushed
into the hash element $bow{$timestmp} .  That hash element is really an
ARRAYREF, but you don't need to care.

which gives:

pkeck at dirdir:~$ ./jeff.pl jeff1.txt jeff2.txt 
  293.4077, 2711.00,   21.78,    1,  81,  125,   1766,  1766,  2003,  10,  20,  9, 47,  CH4OP-1005, 4001
  293.4085, 2588.50,   20.79,    2,  79,  125,   1697,  1751,  2003,  10,  20,  9, 48,  CH4OP-1005, 4001
  293.4084,  356.00,    2.94,    1,  99,  121,   5873,  5873,  2003,  10,  20,  9, 48,  CH4OP-1006, 4001
  293.4091,  382.00,    3.16,    2,  98,  121,   5894,  5994,  2003,  10,  20,  9, 49,  CH4OP-1006, 4001
  293.4100, 2379.50,   19.11,    2,  74,  125,   1653,  1676,  2003,  10,  20,  9, 50,  CH4OP-1005, 4001
  293.4099,  320.00,    2.64,    2,  98,  121,   5854,  5889,  2003,  10,  20,  9, 50,  CH4OP-1006, 4001
  293.4105, 2210.00,   17.75,    1,  71,  125,   1659,  1659,  2003,  10,  20,  9, 51,  CH4OP-1005, 4001
  293.4113, 2261.00,   18.16,    2,  74,  125,   1678,  1681,  2003,  10,  20,  9, 52,  CH4OP-1005, 4001
pkeck at dirdir:~$ 

Now here is where my fuzziness comes in- back on the line with all the
undef's you could be assigning stuff to real variables and then just pushing
onto the arrays the things you really care about, rather than the entire
lines like I did.  And I don't know what exactly you're going to do with
this stuff later- maybe it would make sense to make a hash of hashes rather
than arrays.  If you wanted to go really crazy you could replace the line 

    my (undef, undef, undef, undef, undef, undef, undef, undef,
        $year, $month, $day, $hour, $min, undef, undef)
        = split /,\s+/, $line;

with 

    my (undef, undef, undef, undef, undef, undef, undef, undef,
        $year, $month, $day, $hour, $min, $device, undef)
        = split /,\s+/, $line;

and 

push @{ $bow{$timestamp}}, $line;

with 

$bow{$timestamp}{$device} = $line;

Running with the Dumper again gives you:

pkeck at dirdir:~$ ./jeff.pl jeff1.txt jeff2.txt 
$VAR1 = {
          '200310200952' => {
                              'CH4OP-1005' => '  293.4113, 2261.00,   18.16,    2,  74,  125,   1678,  1681,  2003,  10,  20,  9, 52,  CH4OP-1005, 4001'
                            },
          '200310200947' => {
                              'CH4OP-1005' => '  293.4077, 2711.00,   21.78,    1,  81,  125,   1766,  1766,  2003,  10,  20,  9, 47,  CH4OP-1005, 4001'
                            },
          '200310200949' => {
                              'CH4OP-1006' => '  293.4091,  382.00,    3.16,    2,  98,  121,   5894,  5994,  2003,  10,  20,  9, 49,  CH4OP-1006, 4001'
                            },
          '200310200950' => {
                              'CH4OP-1006' => '  293.4099,  320.00,    2.64,    2,  98,  121,   5854,  5889,  2003,  10,  20,  9, 50,  CH4OP-1006, 4001',
                              'CH4OP-1005' => '  293.4100, 2379.50,   19.11,    2,  74,  125,   1653,  1676,  2003,  10,  20,  9, 50,  CH4OP-1005, 4001'
                            },
          '200310200951' => {
                              'CH4OP-1005' => '  293.4105, 2210.00,   17.75,    1,  71,  125,   1659,  1659,  2003,  10,  20,  9, 51,  CH4OP-1005, 4001'
                            },
          '200310200948' => {
                              'CH4OP-1006' => '  293.4084,  356.00,    2.94,    1,  99,  121,   5873,  5873,  2003,  10,  20,  9, 48,  CH4OP-1006, 4001',
                              'CH4OP-1005' => '  293.4085, 2588.50,   20.79,    2,  79,  125,   1697,  1751,  2003,  10,  20,  9, 48,  CH4OP-1005, 4001'
                            }
        };
pkeck at dirdir:~$ 

Notice now you have little hashes under your big hash, with the line
assigned to another hash keyed on what device it came from.  

To do the "simple print" for that data structure:

=========
for my $timestmp (sort keys %bow) { 
  for my $subkey ( sort keys %{ $bow{$timestmp} } ) {
    print $bow{$timestmp}{$subkey}, "\n";
  } 
} 
=========

Notice the syntax "%{ $bow{$timestmp} }" - that is a hash defined by the
contents of $bow{$timestmp}.  $bow{$timestmp} is a HASHREF, but you don't
need to care.

which gives:

pkeck at dirdir:~$ ./jeff.pl jeff1.txt jeff2.txt 
  293.4077, 2711.00,   21.78,    1,  81,  125,   1766,  1766,  2003,  10,  20,  9, 47,  CH4OP-1005, 4001
  293.4085, 2588.50,   20.79,    2,  79,  125,   1697,  1751,  2003,  10,  20,  9, 48,  CH4OP-1005, 4001
  293.4084,  356.00,    2.94,    1,  99,  121,   5873,  5873,  2003,  10,  20,  9, 48,  CH4OP-1006, 4001
  293.4091,  382.00,    3.16,    2,  98,  121,   5894,  5994,  2003,  10,  20,  9, 49,  CH4OP-1006, 4001
  293.4100, 2379.50,   19.11,    2,  74,  125,   1653,  1676,  2003,  10,  20,  9, 50,  CH4OP-1005, 4001
  293.4099,  320.00,    2.64,    2,  98,  121,   5854,  5889,  2003,  10,  20,  9, 50,  CH4OP-1006, 4001
  293.4105, 2210.00,   17.75,    1,  71,  125,   1659,  1659,  2003,  10,  20,  9, 51,  CH4OP-1005, 4001
  293.4113, 2261.00,   18.16,    2,  74,  125,   1678,  1681,  2003,  10,  20,  9, 52,  CH4OP-1005, 4001
pkeck at dirdir:~$ 

Which should look like the first way.

> the idea - same for each of the other arrays.  Perhaps a nice single malt 
> will aid comprehension....

I second that!

I don't know how much you've used hashes of hashes or hashes of arrays, but
they help get a handle on these complicated sets of data that you might not
be able to do much with on a line-by-line basis.

-- 
Paul Keck       pkeck at uga.edu         http://www.arches.uga.edu/~pkeck
University of Georgia                 http://www.uga.edu/ucns/telecom
EITS Network Engineering              mailto:pkeck at ediacara.org
    --Opinions mine.--                Go fighting anomalocaridids!!!