[Classiccity-pm] basic idiot question

Jeff Scarbrough rail at uga.edu
Fri Oct 24 22:19:51 CDT 2003


At 07:08 PM 10/24/2003 -0500, you wrote:
>On Fri, 24 Oct 2003, Jeff Scarbrough wrote:
>
> > Today's project is a little more complicated:  I have two or three similar
> > data files from different instruments, and I need to read them all in and
> > output a file where the times recorded in each line match.  It's
>
>Can you give some examples of what the data looks like?

Sure... <G>

One file:

   293.4077, 2711.00,   21.78,    1,  81,  125,   1766,  1766, 
2003,  10,  20,  9, 47,  CH4OP-1005, 4001
   293.4085, 2588.50,   20.79,    2,  79,  125,   1697,  1751, 
2003,  10,  20,  9, 48,  CH4OP-1005, 4001
   293.4100, 2379.50,   19.11,    2,  74,  125,   1653,  1676, 
2003,  10,  20,  9, 50,  CH4OP-1005, 4001
   293.4105, 2210.00,   17.75,    1,  71,  125,   1659,  1659, 
2003,  10,  20,  9, 51,  CH4OP-1005, 4001
   293.4113, 2261.00,   18.16,    2,  74,  125,   1678,  1681, 
2003,  10,  20,  9, 52,  CH4OP-1005, 4001

The other file:

   293.4084,  356.00,    2.94,    1,  99,  121,   5873,  5873, 
2003,  10,  20,  9, 48,  CH4OP-1006, 4001
   293.4091,  382.00,    3.16,    2,  98,  121,   5894,  5994, 
2003,  10,  20,  9, 49,  CH4OP-1006, 4001
   293.4099,  320.00,    2.64,    2,  98,  121,   5854,  5889, 
2003,  10,  20,  9, 50,  CH4OP-1006, 4001

Columns are:

  Day-Of-Year (not unique or concurrent across files - DOY for same minute 
could be different in each file
data 1
data 2
data3
data4
data5
data6
data7
year
month
day
hour
minute
ID
status

All lines contain newline character except last line.

Output file will contain DOY, then data1 through status for each file.


>Basically, what you will want to do is iterate through both files at the
>same time, and parse the time.

This is sort of what I had in mind, but I will need to write a line of data 
if it exists in either file.  Also I will need to write the data in the 
first column of the first file that has data for a given minute into the 
first column of that output for that minute.  Did that make sense?  I will 
be drawing graphs from the files, and the first column of the output file 
will be the x axis data.  Data is to be comma separated, and the case of no 
data will be represented by consecutive commas.

One thing I wanted to try was to set it up to read more than two files.  We 
have three instruments right now, but may go as high as five or six in one 
location.  I'm not sure whether it's practical to write the program to do 
more than two at a time, though, at this point - I'd go for the easy one 
first, then complicate as necessary.

>If you can give lines of data as a test, I am sure one of us can write out
>a sample of code to handle it. I wouldn't consider this brute force, but
>it's the sanest way to handle the problem. If the data includes a date as
>well as the time, and is therefore guaranteed to be unique, you *could*
>parse one file into a hash using the time+date as the key and the entire
>line as the value, then scan the second file, and for every line with a
>matching key, delete the element from that hash and write it out to the
>output file... but that is so much messier, and requires more memory as
>the first file is in size.

I hadn't thought about the hash trick, but I was reading about them trying 
to see if that might be helpful...hmmm.  As a rule, the files are not large 
(<150k) so that might be an approach...though 'messy' doesn't appeal to me, 
the state of my office notwithstanding.


>Hope this helps straighten out some of the mess...

Thanks, Mark.  I'm enjoying the learning process.  The last computer 
language I was formally trained in was FORTRAN66.  My boss wrote a bunch of 
programs in BASIC, and I'm trying to show him that there's better solutions 
out there now....

Cheers,
Jeff 




More information about the Classiccity-pm mailing list