From westerman at purdue.edu Wed May 7 12:53:30 2008 From: westerman at purdue.edu (Rick Westerman) Date: Wed, 07 May 2008 15:53:30 -0400 Subject: [Purdue-pm] May technical meeting in 6 days. Message-ID: <482208BA.8020808@purdue.edu> After canceling last month's meeting I hope that everyone is fired up for this month's meeting that will be held on Tuesday May 13th, 6:30 - 8:00 (note the new time in honor of Dave's new job). But we do need speakers! I have signed up for two provocatively titled talks: * Does YAML::SYCK really S*CK? -- 20 minutes. * A year later the dark side has seduced me. -- 5 minutes. Does anyone else wish to speak. Surely you've discovered something interesting in the last couple of months. Please either get hold of me or directly edit the meeting page at: http://pm.purdue.org/Wiki/wiki.pl/Meetings Don't forget that YAPC::NA in Chicago is coming up Jun 16-20. http://conferences.mongueurs.net/yn2008/ It is a cheap but very good totally Perl-oriented conference. $100 registration. $200 tutorials. And for people with more money and the inclination to travel to the west coast, OSCON (O'Reilly open source) is July 21-25. There are several Perl talks and Perl tutorials along with many other open source languages and projects. -- Rick Westerman westerman at purdue.edu Bioinformatics specialist at the Genomics Facility. Phone: (765) 494-0505 FAX: (765) 496-7255 Department of Horticulture and Landscape Architecture 625 Agriculture Mall Drive West Lafayette, IN 47907-2010 Physically located in room S049, WSLR building From westerman at purdue.edu Wed May 7 13:10:47 2008 From: westerman at purdue.edu (Rick Westerman) Date: Wed, 07 May 2008 16:10:47 -0400 Subject: [Purdue-pm] More on conferences In-Reply-To: <482208BA.8020808@purdue.edu> References: <482208BA.8020808@purdue.edu> Message-ID: <48220CC7.50105@purdue.edu> And for the more biologically-oriented Mongers, ISMB (International Society for Computation Biology) is coming back to North America and thus becomes at least theoretically affordable. It was last here (Detroit) in 2005. This year it is in Toronto, Canada on July 19-23. The major attraction for Mongers would be BOSC -- Bioinformatics Open Source -- which will be one of the SIGs featured at ISMB. There are also some other computer talks and tutorials. -- Rick Westerman westerman at purdue.edu Bioinformatics specialist at the Genomics Facility. Phone: (765) 494-0505 FAX: (765) 496-7255 Department of Horticulture and Landscape Architecture 625 Agriculture Mall Drive West Lafayette, IN 47907-2010 Physically located in room S049, WSLR building From westerman at purdue.edu Tue May 13 10:23:36 2008 From: westerman at purdue.edu (Rick Westerman) Date: Tue, 13 May 2008 13:23:36 -0400 Subject: [Purdue-pm] May technical meeting tonight. In-Reply-To: <482208BA.8020808@purdue.edu> References: <482208BA.8020808@purdue.edu> Message-ID: <4829CE98.2080400@purdue.edu> The tech meeting is from 6:30 onwards tonight. ME 119 as usual. Mark and I are currently scheduled to talk. We may have other people pop up with small talks at the last moment. -- Rick From westerman at purdue.edu Thu May 22 13:30:22 2008 From: westerman at purdue.edu (Rick Westerman) Date: Thu, 22 May 2008 16:30:22 -0400 Subject: [Purdue-pm] YAML Message-ID: <4835D7DE.2050700@purdue.edu> As people who attended the last PM meeting know, I talked about YAML::SYCK and how the YAML data serialization format can be used. Derrick asked what the performance of YAML was compared to just reading a file. I tried two different methods to find this out. First I did a random made-up data set of about 300,000 people with addresses, names, phone numbers, etc. As one might expect, the size of the YAML file is larger than a non-YAML file since the YAML file has to contain tag words in addition to the data. Exactly how much large depends on the keyword size to data size ratio but at the worst case one would probably expect no more than a 2:1 ratio. Thus the YAML size would be 3 times as large as the text file. In the worst case scenario YAML was about 2.5 times slower than a straight text file 'read and parse'. This despite YAML being written in C. In a better case scenario YAML was still slower by a factor of 2. We are talking about seconds instead of minutes here; e.g., 21 seconds for YAML, 10 seconds for the text file read-n-parse. Of course at 300,000 records a person might just want to use a database instead. So how does YAML work with small real-life data sets? So as my second test I modified our pipeline routines (which use text files) to, optionally, read and write via YAML. There the datasets are much smaller -- we do have a file with 100,000+ records but each record is fairly small -- most of the other datasets have much fewer records although more data per record. So what is the conclusion? The read times between reading the YAML files and the text files still has a difference but since the files are small the difference may be a second if at that. What is more troubling is that YAML has problems with data types. I was having problems with reading in data in the form '01234' until I realized that YAML was converting to octal (except in cases like '09123' -- with the non-octal 9 in there). This can be taken coded around but, still, it makes YAML not as friendly as desired. YAML does have the advantage of producing more easy to read files. An example, a YAML file could look like: 1: LastName: Westerman FirstName: Rick Phone: "(765) 494-0505" Instead of a more cryptic line in a text file of: 1:Westerman:Rick:(764) 494-0505: Bottom line: 1) YAML is slower than text read-n-parse; but not significantly with sub-100,000 record files. 2) YAML's data conversion can be troubling. 3) YAML does produce a more friendly-to-edit file. I am still up in the air about YAML's usefulness. I will probably continue to use it for some data files. I might slowly convert our pipeline to using it. But carefully because of the data conversion problems. -- Rick Westerman westerman at purdue.edu Bioinformatics specialist at the Genomics Facility. Phone: (765) 494-0505 FAX: (765) 496-7255 Department of Horticulture and Landscape Architecture 625 Agriculture Mall Drive West Lafayette, IN 47907-2010 Physically located in room S049, WSLR building