From nkuipers at uvic.ca Wed Sep 4 23:18:23 2002 From: nkuipers at uvic.ca (nkuipers) Date: Wed Aug 4 00:11:08 2004 Subject: question about DBI's fetchrow_arrayref() Message-ID: <3D76EDEE@wm2.uvic.ca> Hello everyone, I'm a little puzzled about what's actually going on with the following code snippet (from Programming the Perl DBI, page 114): 1 my @stash; 2 while ( my $array_ref = $sth->fetchrow_arrayref ) { 3 push @stash, [ @$array_ref ]; 4 } 5 foreach $array_ref ( @stash ) { 6 print :Row: @$array_ref\n"; 7 } It's line 3. It looks like a dereferenced arrayref stored in a reference to an anonymous array? Did I get that right? I realize what it is intended to do: store a copy of the values that the original reference points to and not just the reference itself. Just wanted to make sure I understand the mechanics of that line. Thanks, hope everything is well, Nathanael Kuipers From abez at abez.ca Wed Sep 4 23:28:33 2002 From: abez at abez.ca (abez) Date: Wed Aug 4 00:11:08 2004 Subject: question about DBI's fetchrow_arrayref() In-Reply-To: <3D76EDEE@wm2.uvic.ca> Message-ID: @$array_ref - dereferences $array_ref for all intensive purposes @$array_ref is now @tmp_arr [ @tmp_arr ] - copies all values of @tmp_arr into the anonymous array [ ]. For instance you could go [ @tmp_arr,@tmp_arr ] to make a duplicated array copy. Yes you are right, it copies the array. On Wed, 4 Sep 2002, nkuipers wrote: > Hello everyone, > > I'm a little puzzled about what's actually going on with the following code > snippet (from Programming the Perl DBI, page 114): > > 1 my @stash; > 2 while ( my $array_ref = $sth->fetchrow_arrayref ) { > 3 push @stash, [ @$array_ref ]; > 4 } > > 5 foreach $array_ref ( @stash ) { > 6 print :Row: @$array_ref\n"; > 7 } > > It's line 3. It looks like a dereferenced arrayref stored in a reference to > an anonymous array? Did I get that right? I realize what it is intended to > do: store a copy of the values that the original reference points to and not > just the reference itself. Just wanted to make sure I understand the > mechanics of that line. > > Thanks, hope everything is well, > > Nathanael Kuipers > -- ABeZ------------ ------- ------ - ---------- -- ------------ http://www.indexdirect.com/abez/ Abram Hindle (abez@abez.ca) ---- ------- ----------- ----------- - - ------ --------ABeZ From darren at DarrenDuncan.net Thu Sep 5 13:42:44 2002 From: darren at DarrenDuncan.net (Darren Duncan) Date: Wed Aug 4 00:11:08 2004 Subject: question about DBI's fetchrow_arrayref() In-Reply-To: <3D76EDEE@wm2.uvic.ca> Message-ID: On Wed, 4 Sep 2002, nkuipers wrote: > 1 my @stash; > 2 while ( my $array_ref = $sth->fetchrow_arrayref ) { > 3 push @stash, [ @$array_ref ]; > 4 } > 5 foreach $array_ref ( @stash ) { > 6 print :Row: @$array_ref\n"; > 7 } Since we are fetching the whole row set into an array anyway before using it, I find it much more efficient to use DBI's built-in function for selecting all rows, rather than manually doing it one at a time. This is how I use it in my own code: ... code for preparing and executing $sth ... ... if we get here there was no db error generated by above ... my ($rowset); eval { $rowset = $sth->fetchall_arrayref({}); # get array of hashes }; if( $@ ) { ... register that an error happened ... } If there was no error, then we loop result as normal, such as like this: foreach my $row (@{$rowset}) { print "\n-----------------------------\n"; foreach my $field_name (keys %{$row}) { my $field_value = $row->{$field_name}; print "field '$field_name' contains '$field_value'\n"; } } There is little point in copying to a literal array, as it is just needless copying. By leaving the data structure as a reference, it is easy to pass around as a unit between functions or store it in a larger structure. The code I used above (with fetchall) is actually part of a function in my own class that wraps DBI; that function returns $rowset. Also, since SQL databases by their definition don't have columns/fields in a particular order, fetching field values by name will make sure you get the right one, rather than taking a chance they are returned in a particular numerical array index. Thats why I return rows as hashes. -- Darren Duncan From darren at DarrenDuncan.net Thu Sep 5 17:14:33 2002 From: darren at DarrenDuncan.net (Darren Duncan) Date: Wed Aug 4 00:11:08 2004 Subject: Mac OS X.2 (Jaguar) detailed review by John Siracusa Message-ID: Hello. This is a cross-post to Victoria tech groups I am in, but there's something for everyone. Continuing the long tradition of excellence, John Siracusa of ArsTechnica.com has just today published a detailed review of the latest version of Mac OS X (code-named Jaguar). You can see it here: http://www.arstechnica.com/reviews/02q3/macosx-10.2/macosx-10.2-1.html For those not familiar with this series, John has published reviews of every Mac OS X version since DP2 in 1999, since then including DP3, DP4, Public Beta, 10.0, 10.1, 10.2, and a couple others in between. I have also read every one of them when each was new, and I can follow them fairly easily. Each of the reviews has been around 14 pages long, with judicious use of pictures or diagrams, and has taken me over 2 hours to read. But they are well written, educational, and never boring. The articles cover a lot of features, user interface issues, and how-it-works technical information, done to please either a traditional Mac user or a Unix/Linux user, and programmers or other people who like to know what's so great (or missing) about Mac OS X. There's even speed tests on two different machines. I don't know what else John Siracusa has written, but I find his Mac OS X series the best overview of the OS that I have seen anywhere. -- Darren Duncan From Peter at PSDT.com Sat Sep 7 16:35:28 2002 From: Peter at PSDT.com (Peter Scott) Date: Wed Aug 4 00:11:08 2004 Subject: September Victoria.pm meeting Message-ID: <4.3.2.7.2.20020907143346.00b45dd0@shell2.webquarry.com> Date choices for our next meeting are September 17, 18, or 19. I've heard one (weak) preference for the 18th (Wednesday). Any stronger preferences for other choices? -- Peter Scott Pacific Systems Design Technologies http://www.perldebugged.com/ From abez at abez.ca Mon Sep 9 00:32:47 2002 From: abez at abez.ca (abez) Date: Wed Aug 4 00:11:08 2004 Subject: September Victoria.pm meeting In-Reply-To: <4.3.2.7.2.20020907143346.00b45dd0@shell2.webquarry.com> Message-ID: I have no opinion. On Sat, 7 Sep 2002, Peter Scott wrote: > Date choices for our next meeting are September 17, 18, or 19. I've > heard one (weak) preference for the 18th (Wednesday). Any stronger > preferences for other choices? > > -- > Peter Scott > Pacific Systems Design Technologies > http://www.perldebugged.com/ > -- ABeZ------------ ------- ------ - ---------- -- ------------ http://www.indexdirect.com/abez/ Abram Hindle (abez@abez.ca) ---- ------- ----------- ----------- - - ------ --------ABeZ From peter at PSDT.com Mon Sep 9 11:14:11 2002 From: peter at PSDT.com (Peter Scott) Date: Wed Aug 4 00:11:08 2004 Subject: September Victoria.pm meeting Message-ID: <4.3.2.7.2.20020909091100.00ab6c60@shell2.webquarry.com> Okay, based upon preferences I've received, the next meeting will be on Thursday, September 19. Shall we try again for the object-oriented programming talk or do we have any suggestions for another topic? -- Peter Scott Pacific Systems Design Technologies http://www.perldebugged.com/ From darren at DarrenDuncan.net Mon Sep 9 11:30:37 2002 From: darren at DarrenDuncan.net (Darren Duncan) Date: Wed Aug 4 00:11:08 2004 Subject: September Victoria.pm meeting In-Reply-To: <4.3.2.7.2.20020909091100.00ab6c60@shell2.webquarry.com> Message-ID: On Mon, 9 Sep 2002, Peter Scott wrote: > Okay, based upon preferences I've received, the next meeting will be on > Thursday, September 19. Shall we try again for the object-oriented > programming talk or do we have any suggestions for another topic? > -- > Peter Scott > Pacific Systems Design Technologies > http://www.perldebugged.com/ What do you mean by "again"? Was there a previous meeting where you discussed object-orientivity? I thought the group was brand new and as yet had no meetings so far. -- Darren Duncan From abez at abez.ca Mon Sep 9 11:35:52 2002 From: abez at abez.ca (abez) Date: Wed Aug 4 00:11:08 2004 Subject: September Victoria.pm meeting In-Reply-To: <4.3.2.7.2.20020909091100.00ab6c60@shell2.webquarry.com> Message-ID: OOP Perl might be keeping people from the meeting. So if one where to come to the meeting one would have a choice of hearing it or not. So do come if you to hear about OOP Perl and do come if you don't :) I still have my note-cards :) Other possible topics I can think of are: CGI Templating Perl/TK Perl/GTK Personally I've never gotten around to GUIs in perl and would be interested in anything that would reasonably cross platform. Abram On Mon, 9 Sep 2002, Peter Scott wrote: > Okay, based upon preferences I've received, the next meeting will be on > Thursday, September 19. Shall we try again for the object-oriented > programming talk or do we have any suggestions for another topic? > -- > Peter Scott > Pacific Systems Design Technologies > http://www.perldebugged.com/ > -- ABeZ------------ ------- ------ - ---------- -- ------------ http://www.indexdirect.com/abez/ Abram Hindle (abez@abez.ca) ---- ------- ----------- ----------- - - ------ --------ABeZ From abez at abez.ca Mon Sep 9 11:36:46 2002 From: abez at abez.ca (abez) Date: Wed Aug 4 00:11:08 2004 Subject: September Victoria.pm meeting In-Reply-To: Message-ID: We keep attempting a talk on OOP-Perl BUT there usually aren't enough people to make it worth it so we discuss solutions to people's current problems and other perl related fun. On Mon, 9 Sep 2002, Darren Duncan wrote: > On Mon, 9 Sep 2002, Peter Scott wrote: > > Okay, based upon preferences I've received, the next meeting will be on > > Thursday, September 19. Shall we try again for the object-oriented > > programming talk or do we have any suggestions for another topic? > > -- > > Peter Scott > > Pacific Systems Design Technologies > > http://www.perldebugged.com/ > > What do you mean by "again"? Was there a previous meeting where you > discussed object-orientivity? I thought the group was brand new and as > yet had no meetings so far. -- Darren Duncan > -- ABeZ------------ ------- ------ - ---------- -- ------------ http://www.indexdirect.com/abez/ Abram Hindle (abez@abez.ca) ---- ------- ----------- ----------- - - ------ --------ABeZ From Peter at PSDT.com Mon Sep 9 11:44:16 2002 From: Peter at PSDT.com (Peter Scott) Date: Wed Aug 4 00:11:08 2004 Subject: September Victoria.pm meeting In-Reply-To: References: <4.3.2.7.2.20020909091100.00ab6c60@shell2.webquarry.com> Message-ID: <4.3.2.7.2.20020909094237.00ab23d0@shell2.webquarry.com> At 09:30 AM 9/9/02 -0700, Darren Duncan wrote: >On Mon, 9 Sep 2002, Peter Scott wrote: > > Okay, based upon preferences I've received, the next meeting will be on > > Thursday, September 19. Shall we try again for the object-oriented > > programming talk or do we have any suggestions for another topic? > >What do you mean by "again"? Was there a previous meeting where you >discussed object-orientivity? I thought the group was brand new and as >yet had no meetings so far. -- Darren Duncan This will be meeting #4, so we're new but not virgin. But attendance has been low due (we think) to summer. -- Peter Scott Pacific Systems Design Technologies http://www.perldebugged.com/ From darren at DarrenDuncan.net Mon Sep 9 11:55:47 2002 From: darren at DarrenDuncan.net (Darren Duncan) Date: Wed Aug 4 00:11:08 2004 Subject: September Victoria.pm meeting In-Reply-To: Message-ID: On Mon, 9 Sep 2002, abez wrote: > We keep attempting a talk on OOP-Perl BUT there usually aren't enough > people to make it worth it so we discuss solutions to people's current > problems and other perl related fun. For the present I agree with this idea. Just make up a topic at the meeting itself, based on what people who are there need to know or want help with or want to share. Personally, I like OOP and use it wherever possible in Perl programs of any decent size. More specifically, I use OOP for all of my CGI scripts and web applications. The main places where I don't use OOP is for smaller scripts where, for example, I am extracting data from a text file and outputting derived data. But that is just because said scripts are relatively simple (100-200 lines); when these become more complicated, like my 10,000+ line web/database apps, they will go OOP. I find OOP makes is much easier to keep large programs organized, easy to fix, and add features to. -- Darren Duncan From abez at abez.ca Sat Sep 14 01:48:53 2002 From: abez at abez.ca (abez) Date: Wed Aug 4 00:11:08 2004 Subject: Formula Evaluation In Perl Message-ID: Since can effectively evaluate perl code I thought wouldn't be neat to evaluate formulas. Turns it's with regexes it is pretty easy. This also got me thinking.. one could effectively do some symbolic math just by mucking around with strings. Symbolic math is what I'll define as executing functions as symbols rather than discrete values. For instance dy/dx(x^4)=4*x**(4-1) which I "demo'd" poorly. You can type in the following into this system and it will work well sort(5,3,4,1) x=2;y=3;x*y*x*log(x) sin=1;sin(sin) x=3.14;cos(x)*10 I hacked in the derivative function at the last second and it only works if it's the only outer function, it also only works for simple degrees of a symbol x=2;y=4 derivative(x,x^y) would produce 32, it would do a symbolic differentiation before evaluating tho. . None of this took very long either. It was pretty cool. #!/usr/bin/perl #Sad Sad "Symbolic Math" my $in; %hash = (); while($in=<>) { chomp($in); if ($in =~ /^derivative$(.*)$\s*$/) { # this is hack, more so a proof # of concept a better design would # be needed so we'd know symbolic # functions from normal functions $in = $1; $in = derivative(split(",",$in)); } my @vars = ($in =~ /([a-zA-Z]+)\b(?!\()/g); my %symhash = (); foreach (@vars) { $symhash{$_} = 0; } my @syms = keys %symhash; my $str = $in; foreach (@syms) { if (!defined($hash{$_})) { $hash{$_} = 0 } $in =~ s/$_\b(?!\()/\$hash{$_}/g; } print join(",",eval $in.";"),"\n"; warn $@ if $@; foreach my $key (sort keys %hash) { print "$key = ".$hash{$key}. " "; } print "\n"; } #bad demo but shows some of the power of string manipulation #for doing math.. sub derivative { my $symbol = shift; my $formula = shift; #exponentiation #x**y == (y)*x**(y-1) $formula =~ s/$symbol\*\*([a-zA-Z]+\b)/$symbol\*\*($1-1)*$1/g; return $formula; } -- ABeZ------------ ------- ------ - ---------- -- ------------ http://www.indexdirect.com/abez/ Abram Hindle (abez@abez.ca) ---- ------- ----------- ----------- - - ------ --------ABeZ From nkuipers at uvic.ca Wed Sep 18 13:49:44 2002 From: nkuipers at uvic.ca (nkuipers) Date: Wed Aug 4 00:11:08 2004 Subject: Ideas? Message-ID: <3D8A0C86@wm2.uvic.ca> Hello all, I have a bit of a problem. To present it, I need to first give a bit of a biology primer. A DNA sequence can be represented as a string of A,G,C,T, which are 1-letter representations of different nucleotides. Think GATTACA :). Often, a sequence is considered in blocks of 3 nucleotides; this block is called a codon. An array of codons occupies a "reading frame", and for a given sequence there are 6 reading frames. For example, for ACG|GTC|TTT|CGA|TAA|AAA... the frames are: 1)as written 2)remove the first nucleotide from 1), giving CGG|TCT|TTC|GAT|AAA|A... 3)remove the first nucleotide from 2), giving GGT|CTT|TCG|ATA|AAA... The other three frames are derived with similar mechanics, but the original sequence is first reversed, then "complemented" (essentially, tr/ACGT/TGCA/). I am interested in finding all instances of 3 specific codons, and have created 2 regex objects (forward and reverse complement, for a total of 6 codons) that do this perfectly. I am also interested in knowing the locations of each matched codon in the string. Currently I am using the pos function, and this is fine for the first frame in either orientation. But...my current implementation of creating the next frame involves removing the current first nucleotide from the sequence with s/^\w// which comprimises the "absolute" position of a match with pos. I need ideas please. Arrays? Tmp vars? Adding/subtracting appropriate integer to the pos return (easy,viable, but sort of messy as I imagine it). A better logical foundation is needed? I am quite sure I could come up with an answer to this with more thought but wanted to hear other opinions which are likely more elegant than mine. How would you best do a frame-specific search while still being able to annotate the match location based on the original, untouched sequence? From peter at PSDT.com Wed Sep 18 14:03:49 2002 From: peter at PSDT.com (Peter Scott) Date: Wed Aug 4 00:11:08 2004 Subject: Meeting reminder Message-ID: <4.3.2.7.2.20020918120016.00b10e40@shell2.webquarry.com> Hello, this is a reminder that Victoria.pm will have its first meeting of the fall (well, close enough) tomorrow (Thursday 19th). Meet between 6:45 and 7 at the art gallery area just inside the entrance to UVic's McPherson Library and we'll decamp to a conference room at 7. If you arrive later we usually leave a sign or the front desk might know where we are. Email for directions if you need them. This might finally be the night of the Object-Oriented Perl Programming talk...! -- Peter Scott Pacific Systems Design Technologies http://www.perldebugged.com/ From Peter at PSDT.com Wed Sep 18 15:24:06 2002 From: Peter at PSDT.com (Peter Scott) Date: Wed Aug 4 00:11:08 2004 Subject: Ideas? In-Reply-To: <3D8A0C86@wm2.uvic.ca> Message-ID: <4.3.2.7.2.20020918120608.00b0d240@shell2.webquarry.com> At 11:49 AM 9/18/02 -0700, nkuipers wrote: >Hello all, > >I have a bit of a problem. To present it, I need to first give a bit of a >biology primer. > >A DNA sequence can be represented as a string of A,G,C,T, which are 1-letter >representations of different nucleotides. Think GATTACA :). Often, a >sequence is considered in blocks of 3 nucleotides; this block is called a >codon. An array of codons occupies a "reading frame", and for a given >sequence there are 6 reading frames. For example, for >ACG|GTC|TTT|CGA|TAA|AAA... the frames are: > >1)as written >2)remove the first nucleotide from 1), giving CGG|TCT|TTC|GAT|AAA|A... I'm confused. You have 5 terminal A's in (1) but 4 in (2). How is it still a reading frame since it has an incomplete codon on the end? If you remove a nucleotide then surely you no longer have an arr >3)remove the first nucleotide from 2), giving GGT|CTT|TCG|ATA|AAA... Ditto. I quess the ... is what's throwing me off. When you say "for a given sequence there are 6 reading frames" and then talk about stripping off the first letter it seems to violate your definition. I could guess at what you mean, but I'd prefer an example where you show a complete (if artificially small) sequence. >The other three frames are derived with similar mechanics, but the original >sequence is first reversed, then "complemented" (essentially, tr/ACGT/TGCA/). > >I am interested in finding all instances of 3 specific codons, and have >created 2 regex objects (forward and reverse complement, for a total of 6 >codons) that do this perfectly. I am also interested in knowing the >locations >of each matched codon in the string. Currently I am using the pos function, >and this is fine for the first frame in either orientation. But...my current >implementation of creating the next frame involves removing the current first >nucleotide from the sequence with s/^\w// which comprimises the "absolute" >position of a match with pos. I need ideas please. Arrays? Tmp vars? >Adding/subtracting appropriate integer to the pos return (easy,viable, but >sort of messy as I imagine it). A better logical foundation is needed? I am >quite sure I could come up with an answer to this with more thought >but wanted >to hear other opinions which are likely more elegant than mine. How >would you >best do a frame-specific search while still being able to annotate the match >location based on the original, untouched sequence? I'd have a better idea once I see an example. One possibility is to just change the first character to a non-letter. -- Peter Scott Pacific Systems Design Technologies http://www.perldebugged.com/ From Peter at PSDT.com Wed Sep 18 19:27:23 2002 From: Peter at PSDT.com (Peter Scott) Date: Wed Aug 4 00:11:08 2004 Subject: Fwd: Answer to my own problem Message-ID: <4.3.2.7.2.20020918171509.00b08cf0@shell2.webquarry.com> Hah, I guess the mailing list thought you were trying to enter a command. Hopefully the indentation will stop that this time. >Date: Wed, 18 Sep 2002 19:16:47 -0500 >From: owner-victoria-pm@pm.org >X-Authentication-Warning: mail.pm.org: majordomo set sender to > owner-victoria-pm@pm.org using -f >To: victoria-pm-approval@pm.org >Subject: BOUNCE victoria-pm@pm.org: Admin request of type /^sub\b/i at > line 3 > > >From peter@psdt.com Wed Sep 18 19:16:46 2002 >Received: from cascara.uvic.ca (root@cascara.uvic.ca [142.104.5.28]) > by mail.pm.org (8.11.6/8.11.3) with ESMTP id g8J0GjA30721 > for ; Wed, 18 Sep 2002 19:16:45 -0500 >Received: from wm2.uvic.ca (ntsrvr5.comp.uvic.ca [142.104.5.65]) > by cascara.uvic.ca (8.12.4/8.12.4) with ESMTP id g8J05Wof484896 > for ; Wed, 18 Sep 2002 17:05:32 -0700 >X-WebMail-UserID: nkuipers >Date: Wed, 18 Sep 2002 17:05:32 -0700 >Sender: nkuipers >From: nkuipers >To: victoria-pm@pm.org >X-EXP32-SerialNo: 00003609 >Subject: Answer to my own problem >Message-ID: <3D89190E@wm2.uvic.ca> >Mime-Version: 1.0 >Content-Type: text/plain; charset="ISO-8859-1" >Content-Transfer-Encoding: 7bit >X-Mailer: WebMail (Hydra) SMTP v3.62 >X-UVic-Virus-Scanned: OK - Passed virus scan by Sophos v3.59 (sophie) >on cascara >X-Scanned-By: MIMEDefang 2.19 (www . roaringpenguin . com / mimedefang) > >Involved setting pos before calling it on a match location. Simple. > >sub scan_for_stops { > my ($seqref, $patternref, $type) = @_; > my $frame = 1; > my ($codon, $list); > > SCANNER: > my $stoplocations = "$type$frame("; > pos $$seqref = $frame; > while ( $$seqref =~ m/(\w{3})/g and $codon = $1 ) { > if ( $codon =~ m/$$patternref/ ) { > $stoplocations .= (pos($$seqref) - 2) . '&' > } > } > $stoplocations =~ s/&$//; #just some formatting... > $stoplocations .= ") "; #...to make it look nice > $list .= $stoplocations; > $frame++; > goto SCANNER unless $frame > 3; > > $list =~ s/ $//; > return $list; >} I doubt your pattern is so long that you save more in passing a reference than you do in dereferencing. -- Peter Scott Pacific Systems Design Technologies http://www.perldebugged.com/ From peter at PSDT.com Thu Sep 19 18:25:41 2002 From: peter at PSDT.com (Peter Scott) Date: Wed Aug 4 00:11:08 2004 Subject: Meeting Message-ID: <4.3.2.7.2.20020919162404.00b14d40@shell2.webquarry.com> Reminder - Victoria.pm meeting tonight. I will bring a printout of SelfGol. If we have time after the O-O talk we can see how far we can get in figuring out how it works. If you've never seen Damian Conway's SelfGol presentation... well, it's worth it... -- Peter Scott Pacific Systems Design Technologies http://www.perldebugged.com/ From darren at DarrenDuncan.net Fri Sep 20 00:48:04 2002 From: darren at DarrenDuncan.net (Darren Duncan) Date: Wed Aug 4 00:11:08 2004 Subject: future meetings, more on XSL Message-ID: Hello. I have a few follow-up tidbits after today's meeting. First of all, I wanted to let you know that thursdays work best for me for future meetings (today it was a coincidence), transportation-wise. That is, since I normally travel by bus and I would take a 70 to somewhere near the Swartz Bay ferry terminal, I would have to leave down-town at 8:40 or earlier, meaning leaving UVIC at about 8:00pm or earlier. If we had a meeting some other day of the week than thursday we would need to have it an hour earlier (6-8), or I would leave half-way through at 8:00pm. But on thursday a family member who drives has their own event to attend that is also at UVIC and between 7 and 9. So if we meet thursday I can ride with them in and out as I did today. So as long as this is the case I recommend we keep to thursdays where possible. On the second issue, regarding my recommendation about using XSL and DOMs to solve a problem and have it last for the long term, let me also add that besides the normal control structures like conditionals and loops, XSL also supports function-like reusable constructs called templates. But templates don't just have to be called by name. They can also be called automatically when certain data is encountered. For example, if you have a set of DOM nodes representing form fields, you can simply say "call-templates" on the node list, and appropriate templates would be called automatically. The trick is just to define a template as saying "match on varname having value or matching pattern". So you could define a template for each form field type and it is automatically called when a field of that type is to be made, as defined in your field metadata node list. Yes, XSL is quite powerful. And you're saved having to parse the XSL/XML yourself. As for the DOM, given its tree structure, nodes keep the best of arrays and hashes rolled into one. Any node can have any number of node children; the node names are like hash keys, but you can have multiple siblings with the same name; all nodes have an order with a first child/sibling and subsequent ones in order. DOMs are also easy to serialize and deserialize (they are simply parsed XML), without any manual work from you. Each node can have any number of attributes. That's all for now. -- Darren Duncan From abez at abez.ca Fri Sep 20 01:09:31 2002 From: abez at abez.ca (abez) Date: Wed Aug 4 00:11:08 2004 Subject: Perl & Graphs Message-ID: Might be useful to someone. vcg as well as AT&T graphviz are both great software packages such that you can easily create graphs from a list of defined nodes and edges. These apps are great as it simplies making a graph down to just declaring nodes and edges. BANG the graph is plotted for you! Here's a quick script to convert from .vcg to .dot . It won't work for everything but it works for the .vcgs I produced. #!/usr/bin/perl my ($file1) = @ARGV; my @out = (); if ($file1) { open(FILE,$file1); @out = ; close(FILE); } else { @out = ; } my $title = "graph"; my @nodes = (); my @edges = (); foreach my $line (@out) { if ($line =~ /^\s*title:\s+"(.*)"\s*$/) { $title = $1; } elsif ($line =~ /^\s*node:\s*\{\s*title:\s+"(.*)"\s*\}\s*$/) { push @nodes,$1; } elsif ($line =~ /^\s*edge:\s*\{\s*sourcename:\s+"(.*)"\s*targetname:\s+"(.*)"\s*\}\s*$/) { push @edges,[$1,$2]; } } print "digraph $title \{\n"; foreach my $node (@nodes) { print "\"$node\";\n"; } foreach my $edge (@edges) { print "\"".$edge->[0] ."\" -> \"".$edge->[1]."\";\n"; } print "}\n"; Here's an AT&T graphviz graph detailing dependencies of gawk .c files, if you're in SENG420 this term these 2 graphs are copyright Abram Hindle so don't copy them (sorry I have to cover myself). (c) 2002 Abram Hindle. digraph TITLE { "re.c"; "gawkmisc.c"; "eval.c"; "field.c"; "missing.c"; "version.c"; "node.c"; "builtin.c"; "regex.c"; "msg.c"; "array.c"; "awktab.c"; "main.c"; "getopt.c"; "alloca.c"; "getopt1.c"; "random.c"; "dfa.c"; "io.c"; "array.c" -> "array.c"; "array.c" -> "awktab.c"; "array.c" -> "dfa.c"; "array.c" -> "eval.c"; "array.c" -> "field.c"; "array.c" -> "io.c"; "array.c" -> "main.c"; "awktab.c" -> "array.c"; "awktab.c" -> "awktab.c"; "awktab.c" -> "builtin.c"; "awktab.c" -> "dfa.c"; "awktab.c" -> "eval.c"; "awktab.c" -> "field.c"; "awktab.c" -> "getopt.c"; "awktab.c" -> "io.c"; "awktab.c" -> "main.c"; "awktab.c" -> "msg.c"; "awktab.c" -> "node.c"; "awktab.c" -> "random.c"; "awktab.c" -> "re.c"; "awktab.c" -> "regex.c"; "awktab.c" -> "version.c"; "builtin.c" -> "awktab.c"; "builtin.c" -> "builtin.c"; "builtin.c" -> "eval.c"; "builtin.c" -> "main.c"; "builtin.c" -> "node.c"; "builtin.c" -> "re.c"; "dfa.c" -> "alloca.c"; "dfa.c" -> "array.c"; "dfa.c" -> "awktab.c"; "dfa.c" -> "builtin.c"; "dfa.c" -> "dfa.c"; "dfa.c" -> "eval.c"; "dfa.c" -> "field.c"; "dfa.c" -> "gawkmisc.c"; "dfa.c" -> "getopt.c"; "dfa.c" -> "getopt1.c"; "dfa.c" -> "io.c"; "dfa.c" -> "main.c"; "dfa.c" -> "msg.c"; "dfa.c" -> "node.c"; "dfa.c" -> "random.c"; "dfa.c" -> "re.c"; "dfa.c" -> "regex.c"; "dfa.c" -> "version.c"; "eval.c" -> "array.c"; "eval.c" -> "eval.c"; "eval.c" -> "field.c"; "eval.c" -> "getopt.c"; "eval.c" -> "io.c"; "eval.c" -> "main.c"; "eval.c" -> "re.c"; "eval.c" -> "regex.c"; "field.c" -> "awktab.c"; "field.c" -> "eval.c"; "field.c" -> "field.c"; "field.c" -> "getopt.c"; "field.c" -> "io.c"; "field.c" -> "main.c"; "gawkmisc.c" -> "alloca.c"; "gawkmisc.c" -> "dfa.c"; "gawkmisc.c" -> "gawkmisc.c"; "gawkmisc.c" -> "regex.c"; "io.c" -> "alloca.c"; "io.c" -> "array.c"; "io.c" -> "awktab.c"; "io.c" -> "builtin.c"; "io.c" -> "eval.c"; "io.c" -> "field.c"; "io.c" -> "io.c"; "io.c" -> "main.c"; "io.c" -> "msg.c"; "io.c" -> "re.c"; "main.c" -> "alloca.c"; "main.c" -> "array.c"; "main.c" -> "awktab.c"; "main.c" -> "builtin.c"; "main.c" -> "dfa.c"; "main.c" -> "eval.c"; "main.c" -> "field.c"; "main.c" -> "gawkmisc.c"; "main.c" -> "getopt.c"; "main.c" -> "getopt1.c"; "main.c" -> "io.c"; "main.c" -> "main.c"; "main.c" -> "msg.c"; "main.c" -> "node.c"; "main.c" -> "random.c"; "main.c" -> "re.c"; "main.c" -> "regex.c"; "main.c" -> "version.c"; "msg.c" -> "alloca.c"; "msg.c" -> "awktab.c"; "msg.c" -> "builtin.c"; "msg.c" -> "dfa.c"; "msg.c" -> "eval.c"; "msg.c" -> "field.c"; "msg.c" -> "getopt.c"; "msg.c" -> "io.c"; "msg.c" -> "main.c"; "msg.c" -> "missing.c"; "msg.c" -> "msg.c"; "msg.c" -> "node.c"; "msg.c" -> "random.c"; "msg.c" -> "re.c"; "msg.c" -> "regex.c"; "msg.c" -> "version.c"; "node.c" -> "array.c"; "node.c" -> "awktab.c"; "node.c" -> "builtin.c"; "node.c" -> "eval.c"; "node.c" -> "field.c"; "node.c" -> "io.c"; "node.c" -> "main.c"; "node.c" -> "node.c"; "node.c" -> "re.c"; "random.c" -> "builtin.c"; "random.c" -> "eval.c"; "random.c" -> "random.c"; "random.c" -> "regex.c"; "re.c" -> "alloca.c"; "re.c" -> "awktab.c"; "re.c" -> "builtin.c"; "re.c" -> "dfa.c"; "re.c" -> "eval.c"; "re.c" -> "field.c"; "re.c" -> "io.c"; "re.c" -> "main.c"; "re.c" -> "re.c"; } Here's the same thing in VCG: graph: { title: "TITLE" color: lightgray node: { title: "re.c" } node: { title: "gawkmisc.c" } node: { title: "eval.c" } node: { title: "field.c" } node: { title: "missing.c" } node: { title: "version.c" } node: { title: "node.c" } node: { title: "builtin.c" } node: { title: "regex.c" } node: { title: "msg.c" } node: { title: "array.c" } node: { title: "awktab.c" } node: { title: "main.c" } node: { title: "getopt.c" } node: { title: "alloca.c" } node: { title: "getopt1.c" } node: { title: "random.c" } node: { title: "dfa.c" } node: { title: "io.c" } edge: { sourcename: "array.c" targetname: "array.c"} edge: { sourcename: "array.c" targetname: "awktab.c"} edge: { sourcename: "array.c" targetname: "dfa.c"} edge: { sourcename: "array.c" targetname: "eval.c"} edge: { sourcename: "array.c" targetname: "field.c"} edge: { sourcename: "array.c" targetname: "io.c"} edge: { sourcename: "array.c" targetname: "main.c"} edge: { sourcename: "awktab.c" targetname: "array.c"} edge: { sourcename: "awktab.c" targetname: "awktab.c"} edge: { sourcename: "awktab.c" targetname: "builtin.c"} edge: { sourcename: "awktab.c" targetname: "dfa.c"} edge: { sourcename: "awktab.c" targetname: "eval.c"} edge: { sourcename: "awktab.c" targetname: "field.c"} edge: { sourcename: "awktab.c" targetname: "getopt.c"} edge: { sourcename: "awktab.c" targetname: "io.c"} edge: { sourcename: "awktab.c" targetname: "main.c"} edge: { sourcename: "awktab.c" targetname: "msg.c"} edge: { sourcename: "awktab.c" targetname: "node.c"} edge: { sourcename: "awktab.c" targetname: "random.c"} edge: { sourcename: "awktab.c" targetname: "re.c"} edge: { sourcename: "awktab.c" targetname: "regex.c"} edge: { sourcename: "awktab.c" targetname: "version.c"} edge: { sourcename: "builtin.c" targetname: "awktab.c"} edge: { sourcename: "builtin.c" targetname: "builtin.c"} edge: { sourcename: "builtin.c" targetname: "eval.c"} edge: { sourcename: "builtin.c" targetname: "main.c"} edge: { sourcename: "builtin.c" targetname: "node.c"} edge: { sourcename: "builtin.c" targetname: "re.c"} edge: { sourcename: "dfa.c" targetname: "alloca.c"} edge: { sourcename: "dfa.c" targetname: "array.c"} edge: { sourcename: "dfa.c" targetname: "awktab.c"} edge: { sourcename: "dfa.c" targetname: "builtin.c"} edge: { sourcename: "dfa.c" targetname: "dfa.c"} edge: { sourcename: "dfa.c" targetname: "eval.c"} edge: { sourcename: "dfa.c" targetname: "field.c"} edge: { sourcename: "dfa.c" targetname: "gawkmisc.c"} edge: { sourcename: "dfa.c" targetname: "getopt.c"} edge: { sourcename: "dfa.c" targetname: "getopt1.c"} edge: { sourcename: "dfa.c" targetname: "io.c"} edge: { sourcename: "dfa.c" targetname: "main.c"} edge: { sourcename: "dfa.c" targetname: "msg.c"} edge: { sourcename: "dfa.c" targetname: "node.c"} edge: { sourcename: "dfa.c" targetname: "random.c"} edge: { sourcename: "dfa.c" targetname: "re.c"} edge: { sourcename: "dfa.c" targetname: "regex.c"} edge: { sourcename: "dfa.c" targetname: "version.c"} edge: { sourcename: "eval.c" targetname: "array.c"} edge: { sourcename: "eval.c" targetname: "eval.c"} edge: { sourcename: "eval.c" targetname: "field.c"} edge: { sourcename: "eval.c" targetname: "getopt.c"} edge: { sourcename: "eval.c" targetname: "io.c"} edge: { sourcename: "eval.c" targetname: "main.c"} edge: { sourcename: "eval.c" targetname: "re.c"} edge: { sourcename: "eval.c" targetname: "regex.c"} edge: { sourcename: "field.c" targetname: "awktab.c"} edge: { sourcename: "field.c" targetname: "eval.c"} edge: { sourcename: "field.c" targetname: "field.c"} edge: { sourcename: "field.c" targetname: "getopt.c"} edge: { sourcename: "field.c" targetname: "io.c"} edge: { sourcename: "field.c" targetname: "main.c"} edge: { sourcename: "gawkmisc.c" targetname: "alloca.c"} edge: { sourcename: "gawkmisc.c" targetname: "dfa.c"} edge: { sourcename: "gawkmisc.c" targetname: "gawkmisc.c"} edge: { sourcename: "gawkmisc.c" targetname: "regex.c"} edge: { sourcename: "io.c" targetname: "alloca.c"} edge: { sourcename: "io.c" targetname: "array.c"} edge: { sourcename: "io.c" targetname: "awktab.c"} edge: { sourcename: "io.c" targetname: "builtin.c"} edge: { sourcename: "io.c" targetname: "eval.c"} edge: { sourcename: "io.c" targetname: "field.c"} edge: { sourcename: "io.c" targetname: "io.c"} edge: { sourcename: "io.c" targetname: "main.c"} edge: { sourcename: "io.c" targetname: "msg.c"} edge: { sourcename: "io.c" targetname: "re.c"} edge: { sourcename: "main.c" targetname: "alloca.c"} edge: { sourcename: "main.c" targetname: "array.c"} edge: { sourcename: "main.c" targetname: "awktab.c"} edge: { sourcename: "main.c" targetname: "builtin.c"} edge: { sourcename: "main.c" targetname: "dfa.c"} edge: { sourcename: "main.c" targetname: "eval.c"} edge: { sourcename: "main.c" targetname: "field.c"} edge: { sourcename: "main.c" targetname: "gawkmisc.c"} edge: { sourcename: "main.c" targetname: "getopt.c"} edge: { sourcename: "main.c" targetname: "getopt1.c"} edge: { sourcename: "main.c" targetname: "io.c"} edge: { sourcename: "main.c" targetname: "main.c"} edge: { sourcename: "main.c" targetname: "msg.c"} edge: { sourcename: "main.c" targetname: "node.c"} edge: { sourcename: "main.c" targetname: "random.c"} edge: { sourcename: "main.c" targetname: "re.c"} edge: { sourcename: "main.c" targetname: "regex.c"} edge: { sourcename: "main.c" targetname: "version.c"} edge: { sourcename: "msg.c" targetname: "alloca.c"} edge: { sourcename: "msg.c" targetname: "awktab.c"} edge: { sourcename: "msg.c" targetname: "builtin.c"} edge: { sourcename: "msg.c" targetname: "dfa.c"} edge: { sourcename: "msg.c" targetname: "eval.c"} edge: { sourcename: "msg.c" targetname: "field.c"} edge: { sourcename: "msg.c" targetname: "getopt.c"} edge: { sourcename: "msg.c" targetname: "io.c"} edge: { sourcename: "msg.c" targetname: "main.c"} edge: { sourcename: "msg.c" targetname: "missing.c"} edge: { sourcename: "msg.c" targetname: "msg.c"} edge: { sourcename: "msg.c" targetname: "node.c"} edge: { sourcename: "msg.c" targetname: "random.c"} edge: { sourcename: "msg.c" targetname: "re.c"} edge: { sourcename: "msg.c" targetname: "regex.c"} edge: { sourcename: "msg.c" targetname: "version.c"} edge: { sourcename: "node.c" targetname: "array.c"} edge: { sourcename: "node.c" targetname: "awktab.c"} edge: { sourcename: "node.c" targetname: "builtin.c"} edge: { sourcename: "node.c" targetname: "eval.c"} edge: { sourcename: "node.c" targetname: "field.c"} edge: { sourcename: "node.c" targetname: "io.c"} edge: { sourcename: "node.c" targetname: "main.c"} edge: { sourcename: "node.c" targetname: "node.c"} edge: { sourcename: "node.c" targetname: "re.c"} edge: { sourcename: "random.c" targetname: "builtin.c"} edge: { sourcename: "random.c" targetname: "eval.c"} edge: { sourcename: "random.c" targetname: "random.c"} edge: { sourcename: "random.c" targetname: "regex.c"} edge: { sourcename: "re.c" targetname: "alloca.c"} edge: { sourcename: "re.c" targetname: "awktab.c"} edge: { sourcename: "re.c" targetname: "builtin.c"} edge: { sourcename: "re.c" targetname: "dfa.c"} edge: { sourcename: "re.c" targetname: "eval.c"} edge: { sourcename: "re.c" targetname: "field.c"} edge: { sourcename: "re.c" targetname: "io.c"} edge: { sourcename: "re.c" targetname: "main.c"} edge: { sourcename: "re.c" targetname: "re.c"} } -- ABeZ------------ ------- ------ - ---------- -- ------------ http://www.indexdirect.com/abez/ Abram Hindle (abez@abez.ca) ---- ------- ----------- ----------- - - ------ --------ABeZ From nkuipers at uvic.ca Fri Sep 20 01:59:14 2002 From: nkuipers at uvic.ca (nkuipers) Date: Wed Aug 4 00:11:08 2004 Subject: background for my problem Message-ID: <3D8ACCD6@wm2.uvic.ca> Actually I wrote the following after our meeting to be the weekly progress report that I write for my employer. I apologize if it sounds over-complicated or pretentious...that would reflect my confused and somewhat dismayed state of mind. I'll elaborate the situation further as you fire questions my way. Terms: cluster: think of this as a superstring of DNA bases (A,C,G,T) repeat: sub-string of some minimum length (in this case 21) that occurs more than once in the cluster-set ORF: open reading frame; always delimited on at least one end by a stop codon unless the entire superstring is contained in an even larger open reading frame, in which case we can't say where the stop codon might be ############################################################################## # This week was spent working on a tool for screening repeats located in putative coding region. To do this, several data are needed for each cluster: 1) where the stop codons are located in each frame 2)where the repeats are located 3)a comparison of repeat locations to candidate ORFs The comparison is based in turn on three statements: 1)clusters with no stop codons in at least one frame are entire ORFs 2)sequences with stop codons evenly distributed in a frame may be considered noncoding noncoding for that frame 3)sequences with stop codons nonrandomly distributed in a frame may be considered partially coding for that frame I have code for finding stop codons in each frame. It works by adjusting the return value of the Perl built-in pos function on a per-frame basis. I have code for accurately matching repeats to clusters and getting the locations of every match. However it is horribly inefficient, so my supervisor suggested an alternative method. In this approach, an index of all k-size DNA fragments (k-strings) found in the clusters is created; each k-string points to all clusters in which the k-string occurs. Then, for each repeat, the first k nucleotides are tested against the index. The full-length, global repeat matching is attempted only against the clusters pointed to by the k-string. The only difficulty I am having with this approach is deciding on data structures for capturing results. Initially, I had planned the following, which uses several levels of nested, anonymous structures to fully annotate each cluster: HASH: { clust_id => ARRAY: [ STRING: clust_seq, HASH: { frame => CSV: stop positions, }, HASH: { rep_seq => CSV: positions, } ] , } It?s nicely organized in theory but a total nightmare to build, access piecemeal, and pass around to subroutines for iterating etc.; the Data::Dumper module may be a viable tool for accessing the whole structure. In discussing data structures with other member of the Victoria Perl Mongers, the above schema could also be extraordinarily RAM intensive unless I am frequently dumping to a database or file. Though the annotation is interesting and deserves to be placed in a database for its own sake, the immediate goal is a series of logical comparisons, so relevant information needs to stay in memory unless comparisons and resulting decisions are made on a per-cluster basis and then flushed from memory, which is an attractive thought. The Victoria PM also had the idea of compressing di-nucleotide combinations (of which there are only 16) by symbol. This would speed up pattern matching, but would add overhead in other ways. An interesting idea nonetheless and one which I am eager to try. In summary, the biggest problems are data structure and input size, keeping in mind the need to use the data and not just gather it, and to integrate with the existing data pipeline. I?ve decided to backburner everything but the repeats-to-cluster code until satisfactory. I can?t help thinking that a more creative, brilliant, yet simpler solution to the overall task exists, so there is a lot to ponder. From pavel at md5.ca Fri Sep 20 11:33:09 2002 From: pavel at md5.ca (Pavel Zaitsev) Date: Wed Aug 4 00:11:08 2004 Subject: Perl & Graphs In-Reply-To: References: Message-ID: <20020920163309.GA26414@md5.ca> Hi, Might be OT, but it is used in doxygen to draw very very nice graphs of C++ and Java code. Wish there would be something like that for perl. I leared code base of 100,000 in 2 weeks. p. abez(abez@abez.ca)@Thu, Sep 19, 2002 at 11:09:31PM -0700: > Might be useful to someone. > > vcg as well as AT&T graphviz are both great software packages such > that you can easily create graphs from a list of defined nodes and > edges. > > These apps are great as it simplies making a graph down to just declaring > nodes and edges. BANG the graph is plotted for you! > > Here's a quick script to convert from .vcg to .dot . It won't work for > everything but it works for the .vcgs I produced. > > #!/usr/bin/perl > my ($file1) = @ARGV; > my @out = (); > if ($file1) { > open(FILE,$file1); > @out = ; > close(FILE); > } else { > @out = ; > } > my $title = "graph"; > my @nodes = (); > my @edges = (); > foreach my $line (@out) { > if ($line =~ /^\s*title:\s+"(.*)"\s*$/) { > $title = $1; > } elsif ($line =~ /^\s*node:\s*\{\s*title:\s+"(.*)"\s*\}\s*$/) { > push @nodes,$1; > } elsif ($line =~ > /^\s*edge:\s*\{\s*sourcename:\s+"(.*)"\s*targetname:\s+"(.*)"\s*\}\s*$/) { > push @edges,[$1,$2]; > } > } > print "digraph $title \{\n"; > foreach my $node (@nodes) { > print "\"$node\";\n"; > } > foreach my $edge (@edges) { > print "\"".$edge->[0] ."\" -> \"".$edge->[1]."\";\n"; > } > print "}\n"; > > > Here's an AT&T graphviz graph detailing dependencies of gawk .c files, > if you're in SENG420 this term these 2 graphs are copyright Abram Hindle so > don't copy them (sorry I have to cover myself). > > (c) 2002 Abram Hindle. > > digraph TITLE { > "re.c"; > "gawkmisc.c"; > "eval.c"; > "field.c"; > "missing.c"; > "version.c"; > "node.c"; > "builtin.c"; > "regex.c"; > "msg.c"; > "array.c"; > "awktab.c"; > "main.c"; > "getopt.c"; > "alloca.c"; > "getopt1.c"; > "random.c"; > "dfa.c"; > "io.c"; > "array.c" -> "array.c"; > "array.c" -> "awktab.c"; > "array.c" -> "dfa.c"; > "array.c" -> "eval.c"; > "array.c" -> "field.c"; > "array.c" -> "io.c"; > "array.c" -> "main.c"; > "awktab.c" -> "array.c"; > "awktab.c" -> "awktab.c"; > "awktab.c" -> "builtin.c"; > "awktab.c" -> "dfa.c"; > "awktab.c" -> "eval.c"; > "awktab.c" -> "field.c"; > "awktab.c" -> "getopt.c"; > "awktab.c" -> "io.c"; > "awktab.c" -> "main.c"; > "awktab.c" -> "msg.c"; > "awktab.c" -> "node.c"; > "awktab.c" -> "random.c"; > "awktab.c" -> "re.c"; > "awktab.c" -> "regex.c"; > "awktab.c" -> "version.c"; > "builtin.c" -> "awktab.c"; > "builtin.c" -> "builtin.c"; > "builtin.c" -> "eval.c"; > "builtin.c" -> "main.c"; > "builtin.c" -> "node.c"; > "builtin.c" -> "re.c"; > "dfa.c" -> "alloca.c"; > "dfa.c" -> "array.c"; > "dfa.c" -> "awktab.c"; > "dfa.c" -> "builtin.c"; > "dfa.c" -> "dfa.c"; > "dfa.c" -> "eval.c"; > "dfa.c" -> "field.c"; > "dfa.c" -> "gawkmisc.c"; > "dfa.c" -> "getopt.c"; > "dfa.c" -> "getopt1.c"; > "dfa.c" -> "io.c"; > "dfa.c" -> "main.c"; > "dfa.c" -> "msg.c"; > "dfa.c" -> "node.c"; > "dfa.c" -> "random.c"; > "dfa.c" -> "re.c"; > "dfa.c" -> "regex.c"; > "dfa.c" -> "version.c"; > "eval.c" -> "array.c"; > "eval.c" -> "eval.c"; > "eval.c" -> "field.c"; > "eval.c" -> "getopt.c"; > "eval.c" -> "io.c"; > "eval.c" -> "main.c"; > "eval.c" -> "re.c"; > "eval.c" -> "regex.c"; > "field.c" -> "awktab.c"; > "field.c" -> "eval.c"; > "field.c" -> "field.c"; > "field.c" -> "getopt.c"; > "field.c" -> "io.c"; > "field.c" -> "main.c"; > "gawkmisc.c" -> "alloca.c"; > "gawkmisc.c" -> "dfa.c"; > "gawkmisc.c" -> "gawkmisc.c"; > "gawkmisc.c" -> "regex.c"; > "io.c" -> "alloca.c"; > "io.c" -> "array.c"; > "io.c" -> "awktab.c"; > "io.c" -> "builtin.c"; > "io.c" -> "eval.c"; > "io.c" -> "field.c"; > "io.c" -> "io.c"; > "io.c" -> "main.c"; > "io.c" -> "msg.c"; > "io.c" -> "re.c"; > "main.c" -> "alloca.c"; > "main.c" -> "array.c"; > "main.c" -> "awktab.c"; > "main.c" -> "builtin.c"; > "main.c" -> "dfa.c"; > "main.c" -> "eval.c"; > "main.c" -> "field.c"; > "main.c" -> "gawkmisc.c"; > "main.c" -> "getopt.c"; > "main.c" -> "getopt1.c"; > "main.c" -> "io.c"; > "main.c" -> "main.c"; > "main.c" -> "msg.c"; > "main.c" -> "node.c"; > "main.c" -> "random.c"; > "main.c" -> "re.c"; > "main.c" -> "regex.c"; > "main.c" -> "version.c"; > "msg.c" -> "alloca.c"; > "msg.c" -> "awktab.c"; > "msg.c" -> "builtin.c"; > "msg.c" -> "dfa.c"; > "msg.c" -> "eval.c"; > "msg.c" -> "field.c"; > "msg.c" -> "getopt.c"; > "msg.c" -> "io.c"; > "msg.c" -> "main.c"; > "msg.c" -> "missing.c"; > "msg.c" -> "msg.c"; > "msg.c" -> "node.c"; > "msg.c" -> "random.c"; > "msg.c" -> "re.c"; > "msg.c" -> "regex.c"; > "msg.c" -> "version.c"; > "node.c" -> "array.c"; > "node.c" -> "awktab.c"; > "node.c" -> "builtin.c"; > "node.c" -> "eval.c"; > "node.c" -> "field.c"; > "node.c" -> "io.c"; > "node.c" -> "main.c"; > "node.c" -> "node.c"; > "node.c" -> "re.c"; > "random.c" -> "builtin.c"; > "random.c" -> "eval.c"; > "random.c" -> "random.c"; > "random.c" -> "regex.c"; > "re.c" -> "alloca.c"; > "re.c" -> "awktab.c"; > "re.c" -> "builtin.c"; > "re.c" -> "dfa.c"; > "re.c" -> "eval.c"; > "re.c" -> "field.c"; > "re.c" -> "io.c"; > "re.c" -> "main.c"; > "re.c" -> "re.c"; > } > > Here's the same thing in VCG: > graph: { > title: "TITLE" > color: lightgray > node: { title: "re.c" } > node: { title: "gawkmisc.c" } > node: { title: "eval.c" } > node: { title: "field.c" } > node: { title: "missing.c" } > node: { title: "version.c" } > node: { title: "node.c" } > node: { title: "builtin.c" } > node: { title: "regex.c" } > node: { title: "msg.c" } > node: { title: "array.c" } > node: { title: "awktab.c" } > node: { title: "main.c" } > node: { title: "getopt.c" } > node: { title: "alloca.c" } > node: { title: "getopt1.c" } > node: { title: "random.c" } > node: { title: "dfa.c" } > node: { title: "io.c" } > edge: { sourcename: "array.c" targetname: "array.c"} > edge: { sourcename: "array.c" targetname: "awktab.c"} > edge: { sourcename: "array.c" targetname: "dfa.c"} > edge: { sourcename: "array.c" targetname: "eval.c"} > edge: { sourcename: "array.c" targetname: "field.c"} > edge: { sourcename: "array.c" targetname: "io.c"} > edge: { sourcename: "array.c" targetname: "main.c"} > edge: { sourcename: "awktab.c" targetname: "array.c"} > edge: { sourcename: "awktab.c" targetname: "awktab.c"} > edge: { sourcename: "awktab.c" targetname: "builtin.c"} > edge: { sourcename: "awktab.c" targetname: "dfa.c"} > edge: { sourcename: "awktab.c" targetname: "eval.c"} > edge: { sourcename: "awktab.c" targetname: "field.c"} > edge: { sourcename: "awktab.c" targetname: "getopt.c"} > edge: { sourcename: "awktab.c" targetname: "io.c"} > edge: { sourcename: "awktab.c" targetname: "main.c"} > edge: { sourcename: "awktab.c" targetname: "msg.c"} > edge: { sourcename: "awktab.c" targetname: "node.c"} > edge: { sourcename: "awktab.c" targetname: "random.c"} > edge: { sourcename: "awktab.c" targetname: "re.c"} > edge: { sourcename: "awktab.c" targetname: "regex.c"} > edge: { sourcename: "awktab.c" targetname: "version.c"} > edge: { sourcename: "builtin.c" targetname: "awktab.c"} > edge: { sourcename: "builtin.c" targetname: "builtin.c"} > edge: { sourcename: "builtin.c" targetname: "eval.c"} > edge: { sourcename: "builtin.c" targetname: "main.c"} > edge: { sourcename: "builtin.c" targetname: "node.c"} > edge: { sourcename: "builtin.c" targetname: "re.c"} > edge: { sourcename: "dfa.c" targetname: "alloca.c"} > edge: { sourcename: "dfa.c" targetname: "array.c"} > edge: { sourcename: "dfa.c" targetname: "awktab.c"} > edge: { sourcename: "dfa.c" targetname: "builtin.c"} > edge: { sourcename: "dfa.c" targetname: "dfa.c"} > edge: { sourcename: "dfa.c" targetname: "eval.c"} > edge: { sourcename: "dfa.c" targetname: "field.c"} > edge: { sourcename: "dfa.c" targetname: "gawkmisc.c"} > edge: { sourcename: "dfa.c" targetname: "getopt.c"} > edge: { sourcename: "dfa.c" targetname: "getopt1.c"} > edge: { sourcename: "dfa.c" targetname: "io.c"} > edge: { sourcename: "dfa.c" targetname: "main.c"} > edge: { sourcename: "dfa.c" targetname: "msg.c"} > edge: { sourcename: "dfa.c" targetname: "node.c"} > edge: { sourcename: "dfa.c" targetname: "random.c"} > edge: { sourcename: "dfa.c" targetname: "re.c"} > edge: { sourcename: "dfa.c" targetname: "regex.c"} > edge: { sourcename: "dfa.c" targetname: "version.c"} > edge: { sourcename: "eval.c" targetname: "array.c"} > edge: { sourcename: "eval.c" targetname: "eval.c"} > edge: { sourcename: "eval.c" targetname: "field.c"} > edge: { sourcename: "eval.c" targetname: "getopt.c"} > edge: { sourcename: "eval.c" targetname: "io.c"} > edge: { sourcename: "eval.c" targetname: "main.c"} > edge: { sourcename: "eval.c" targetname: "re.c"} > edge: { sourcename: "eval.c" targetname: "regex.c"} > edge: { sourcename: "field.c" targetname: "awktab.c"} > edge: { sourcename: "field.c" targetname: "eval.c"} > edge: { sourcename: "field.c" targetname: "field.c"} > edge: { sourcename: "field.c" targetname: "getopt.c"} > edge: { sourcename: "field.c" targetname: "io.c"} > edge: { sourcename: "field.c" targetname: "main.c"} > edge: { sourcename: "gawkmisc.c" targetname: "alloca.c"} > edge: { sourcename: "gawkmisc.c" targetname: "dfa.c"} > edge: { sourcename: "gawkmisc.c" targetname: "gawkmisc.c"} > edge: { sourcename: "gawkmisc.c" targetname: "regex.c"} > edge: { sourcename: "io.c" targetname: "alloca.c"} > edge: { sourcename: "io.c" targetname: "array.c"} > edge: { sourcename: "io.c" targetname: "awktab.c"} > edge: { sourcename: "io.c" targetname: "builtin.c"} > edge: { sourcename: "io.c" targetname: "eval.c"} > edge: { sourcename: "io.c" targetname: "field.c"} > edge: { sourcename: "io.c" targetname: "io.c"} > edge: { sourcename: "io.c" targetname: "main.c"} > edge: { sourcename: "io.c" targetname: "msg.c"} > edge: { sourcename: "io.c" targetname: "re.c"} > edge: { sourcename: "main.c" targetname: "alloca.c"} > edge: { sourcename: "main.c" targetname: "array.c"} > edge: { sourcename: "main.c" targetname: "awktab.c"} > edge: { sourcename: "main.c" targetname: "builtin.c"} > edge: { sourcename: "main.c" targetname: "dfa.c"} > edge: { sourcename: "main.c" targetname: "eval.c"} > edge: { sourcename: "main.c" targetname: "field.c"} > edge: { sourcename: "main.c" targetname: "gawkmisc.c"} > edge: { sourcename: "main.c" targetname: "getopt.c"} > edge: { sourcename: "main.c" targetname: "getopt1.c"} > edge: { sourcename: "main.c" targetname: "io.c"} > edge: { sourcename: "main.c" targetname: "main.c"} > edge: { sourcename: "main.c" targetname: "msg.c"} > edge: { sourcename: "main.c" targetname: "node.c"} > edge: { sourcename: "main.c" targetname: "random.c"} > edge: { sourcename: "main.c" targetname: "re.c"} > edge: { sourcename: "main.c" targetname: "regex.c"} > edge: { sourcename: "main.c" targetname: "version.c"} > edge: { sourcename: "msg.c" targetname: "alloca.c"} > edge: { sourcename: "msg.c" targetname: "awktab.c"} > edge: { sourcename: "msg.c" targetname: "builtin.c"} > edge: { sourcename: "msg.c" targetname: "dfa.c"} > edge: { sourcename: "msg.c" targetname: "eval.c"} > edge: { sourcename: "msg.c" targetname: "field.c"} > edge: { sourcename: "msg.c" targetname: "getopt.c"} > edge: { sourcename: "msg.c" targetname: "io.c"} > edge: { sourcename: "msg.c" targetname: "main.c"} > edge: { sourcename: "msg.c" targetname: "missing.c"} > edge: { sourcename: "msg.c" targetname: "msg.c"} > edge: { sourcename: "msg.c" targetname: "node.c"} > edge: { sourcename: "msg.c" targetname: "random.c"} > edge: { sourcename: "msg.c" targetname: "re.c"} > edge: { sourcename: "msg.c" targetname: "regex.c"} > edge: { sourcename: "msg.c" targetname: "version.c"} > edge: { sourcename: "node.c" targetname: "array.c"} > edge: { sourcename: "node.c" targetname: "awktab.c"} > edge: { sourcename: "node.c" targetname: "builtin.c"} > edge: { sourcename: "node.c" targetname: "eval.c"} > edge: { sourcename: "node.c" targetname: "field.c"} > edge: { sourcename: "node.c" targetname: "io.c"} > edge: { sourcename: "node.c" targetname: "main.c"} > edge: { sourcename: "node.c" targetname: "node.c"} > edge: { sourcename: "node.c" targetname: "re.c"} > edge: { sourcename: "random.c" targetname: "builtin.c"} > edge: { sourcename: "random.c" targetname: "eval.c"} > edge: { sourcename: "random.c" targetname: "random.c"} > edge: { sourcename: "random.c" targetname: "regex.c"} > edge: { sourcename: "re.c" targetname: "alloca.c"} > edge: { sourcename: "re.c" targetname: "awktab.c"} > edge: { sourcename: "re.c" targetname: "builtin.c"} > edge: { sourcename: "re.c" targetname: "dfa.c"} > edge: { sourcename: "re.c" targetname: "eval.c"} > edge: { sourcename: "re.c" targetname: "field.c"} > edge: { sourcename: "re.c" targetname: "io.c"} > edge: { sourcename: "re.c" targetname: "main.c"} > edge: { sourcename: "re.c" targetname: "re.c"} > } > > -- > ABeZ------------ ------- ------ - ---------- -- ------------ > http://www.indexdirect.com/abez/ Abram Hindle (abez@abez.ca) > ---- ------- ----------- ----------- - - ------ --------ABeZ > -- Create like god, rule like a king, work like a slave. From peter at PSDT.com Fri Sep 20 11:58:24 2002 From: peter at PSDT.com (Peter Scott) Date: Wed Aug 4 00:11:08 2004 Subject: float bit strings Message-ID: <4.3.2.7.2.20020920095540.00b1d580@shell2.webquarry.com> (Per question last night) Well, I *thought* this was the way: % perl -le 'print unpack"b*",pack "f*",3.14159' 00000010100100101111000000001011 But I don't understand this: % perl -le 'print unpack"b*",pack "f*",0' 00000000000000000000000000000000 That's not right, is it? Looks right for other cases, though. -- Peter Scott Pacific Systems Design Technologies http://www.perldebugged.com/ From abez at abez.ca Fri Sep 20 12:25:48 2002 From: abez at abez.ca (abez) Date: Wed Aug 4 00:11:08 2004 Subject: float bit strings In-Reply-To: <4.3.2.7.2.20020920095540.00b1d580@shell2.webquarry.com> Message-ID: IEEE representation of 0 is 0^32, this is denormalized form. 1st bit sign next 8 bits exponents with bias of 127 subtracted next 23 bits the the mantissa There is an implied extra bit infront of the mantissa. On Fri, 20 Sep 2002, Peter Scott wrote: > (Per question last night) > > Well, I *thought* this was the way: > > % perl -le 'print unpack"b*",pack "f*",3.14159' > 00000010100100101111000000001011 > > > But I don't understand this: > > % perl -le 'print unpack"b*",pack "f*",0' > 00000000000000000000000000000000 > > That's not right, is it? > > Looks right for other cases, though. > -- > Peter Scott > Pacific Systems Design Technologies > http://www.perldebugged.com/ > -- ABeZ------------ ------- ------ - ---------- -- ------------ http://www.indexdirect.com/abez/ Abram Hindle (abez@abez.ca) ---- ------- ----------- ----------- - - ------ --------ABeZ From peter at PSDT.com Fri Sep 20 15:42:25 2002 From: peter at PSDT.com (Peter Scott) Date: Wed Aug 4 00:11:08 2004 Subject: background for my problem In-Reply-To: <3D8ACCD6@wm2.uvic.ca> Message-ID: <4.3.2.7.2.20020920134117.00b20bf0@shell2.webquarry.com> At 11:59 PM 9/19/02 -0700, nkuipers wrote: >Initially, I had planned the following, >which uses several levels of nested, anonymous structures to fully annotate >each cluster: > >HASH: { clust_id => ARRAY: [ STRING: clust_seq, > HASH: { frame => CSV: stop > positions, }, > HASH: { rep_seq => CSV: positions, } > ] , > } > >It?s nicely organized in theory but a total nightmare to build, access >piecemeal, and pass around to subroutines for iterating etc.; the >Data::Dumper >module may be a viable tool for accessing the whole structure. What this says to me is that you need to take the next step to turn this into an object-oriented application so you can encapsulate behaviour behind method calls. Then you'll be able to change internal representations transparently if you've done the decomposition right. -- Peter Scott Pacific Systems Design Technologies http://www.perldebugged.com/ From nkuipers at uvic.ca Fri Sep 20 15:48:10 2002 From: nkuipers at uvic.ca (nkuipers) Date: Wed Aug 4 00:11:08 2004 Subject: background for my problem Message-ID: <3D8B9435@wm2.uvic.ca> >What this says to me is that you need to take the next step to turn >this into an object-oriented application so you can encapsulate >behaviour behind method calls. YEESSS. This is also the conclusion I have reached. It's one reason why I am so interested in OO themes in the PM. Or at least, I need to dump to a database on a per cluster basis and then use OO to access and manipulate the records for comparisons/calculations. I think this is the best strategy since my existing software is not OO to this point but can easily be made to talk to a database. From nkuipers at uvic.ca Tue Sep 24 15:09:39 2002 From: nkuipers at uvic.ca (nkuipers) Date: Wed Aug 4 00:11:08 2004 Subject: question about the nature of DBM ties Message-ID: <3D90DB0C@wm2.uvic.ca> Hi all, When you tie a data structure to an external file, is populating that structure for large input updating directly into the file or is it crowding more and more stuff into memory which then gets dumped into the file, or what? In other words, does tying in this manner free up more RAM? The bottleneck in my code is the unique function but this function is necessary. All in all the code works perfectly but takes too long. #!/usr/bin/perl use strict; use warnings; use DB_File; my $infile = shift; my $wordsize = 10; my %clusters; #key=>value = 'id_string' => 'DNA_string' my %k_strings; #key=>value =(ie.) 'ACGTGGTCAC' => [id_string1, id_string2,...] tie(%k_strings, "DB_File", "index.tmp") or die "Can't open filename: $!"; %k_strings = &build_index(\%clusters); untie %k_strings; sub build_index { my $clusters_hashref = shift; my %k_hash; while ( (my $id, my $sequence) = each %$clusters_hashref ) { my $tmp = $sequence; while ( length($tmp) >= $wordsize ) { my $kstring = substr($tmp, 0, $wordsize); if ( exists $k_hash{$kstring} ) { push @{ $k_hash{$kstring} }, $id if unique(\@{ $k_hash{$kstring} }, \$id) } else { $k_hash{$kstring} = [ $id ] } $tmp =~ s/^\w//; } } return %k_hash; } sub unique { my ($array_ref, $id_ref) = @_; my $flag = 0; for (@$array_ref) { if ( $_ eq $$id_ref ) { $flag = 1; last; } } $flag == 1 ? return 0 : (return 1); } __END__ From abez at abez.ca Tue Sep 24 16:46:03 2002 From: abez at abez.ca (abez) Date: Wed Aug 4 00:11:08 2004 Subject: question about the nature of DBM ties In-Reply-To: <3D90DB0C@wm2.uvic.ca> Message-ID: Have you tried it without saving to file? On Tue, 24 Sep 2002, nkuipers wrote: > Hi all, > > When you tie a data structure to an external file, is populating that > structure for large input updating directly into the file or is it crowding > more and more stuff into memory which then gets dumped into the file, or what? > In other words, does tying in this manner free up more RAM? The bottleneck > in my code is the unique function but this function is necessary. All in all > the code works perfectly but takes too long. > > #!/usr/bin/perl > > use strict; > use warnings; > use DB_File; > > my $infile = shift; > my $wordsize = 10; > my %clusters; #key=>value = 'id_string' => 'DNA_string' > my %k_strings; #key=>value =(ie.) 'ACGTGGTCAC' => [id_string1, id_string2,...] > > tie(%k_strings, "DB_File", "index.tmp") or die "Can't open filename: $!"; > > %k_strings = &build_index(\%clusters); > > untie %k_strings; > > sub build_index { > my $clusters_hashref = shift; > my %k_hash; > while ( (my $id, my $sequence) = each %$clusters_hashref ) { > my $tmp = $sequence; > while ( length($tmp) >= $wordsize ) { > my $kstring = substr($tmp, 0, $wordsize); > if ( exists $k_hash{$kstring} ) { > push @{ $k_hash{$kstring} }, $id > if unique(\@{ $k_hash{$kstring} }, \$id) > } else { $k_hash{$kstring} = [ $id ] } > $tmp =~ s/^\w//; > } > } > return %k_hash; > } > > sub unique { > my ($array_ref, $id_ref) = @_; > my $flag = 0; > for (@$array_ref) { > if ( $_ eq $$id_ref ) { > $flag = 1; > last; > } > } > $flag == 1 ? return 0 : (return 1); > } > > __END__ > -- ABeZ------------ ------- ------ - ---------- -- ------------ http://www.indexdirect.com/abez/ Abram Hindle (abez@abez.ca) ---- ------- ----------- ----------- - - ------ --------ABeZ From nkuipers at uvic.ca Tue Sep 24 16:59:20 2002 From: nkuipers at uvic.ca (nkuipers) Date: Wed Aug 4 00:11:08 2004 Subject: question about the nature of DBM ties Message-ID: <3D91016D@wm2.uvic.ca> >===== Original Message From abez ===== >Have you tried it without saving to file? You mean without that tie statement? Yes. With and without. Makes no difference to the speed of the computation. The array of values that the unique sub must check before adding a new value grows so scans take longer and longer to execute. Although I would think that continuously dumping to the DBM file (rather than building up the hash in memory) would leave more memory for other stuff, anyway. I was looking through CPAN and saw a module called Array-Unique-0.03 which probably implements what I need (growing an anonymous array of unique values) more efficiently. I need to get the sysadmin to install it though since I am not root. *sigh* nathanael From abez at abez.ca Tue Sep 24 17:04:17 2002 From: abez at abez.ca (abez) Date: Wed Aug 4 00:11:08 2004 Subject: question about the nature of DBM ties In-Reply-To: <3D91016D@wm2.uvic.ca> Message-ID: No you can install your own modules locally if you include a lib directory use lib qw(./lib); I think works it will look in the ./lib dir for Modules. On Tue, 24 Sep 2002, nkuipers wrote: > >===== Original Message From abez ===== > >Have you tried it without saving to file? > > You mean without that tie statement? Yes. With and without. Makes no > difference to the speed of the computation. The array of values that the > unique sub must check before adding a new value grows so scans take longer and > longer to execute. Although I would think that continuously dumping to the > DBM file (rather than building up the hash in memory) would leave more memory > for other stuff, anyway. I was looking through CPAN and saw a module called > Array-Unique-0.03 which probably implements what I need (growing an anonymous > array of unique values) more efficiently. I need to get the sysadmin to > install it though since I am not root. *sigh* > > nathanael > -- ABeZ------------ ------- ------ - ---------- -- ------------ http://www.indexdirect.com/abez/ Abram Hindle (abez@abez.ca) ---- ------- ----------- ----------- - - ------ --------ABeZ From Peter at PSDT.com Wed Sep 25 17:12:08 2002 From: Peter at PSDT.com (Peter Scott) Date: Wed Aug 4 00:11:08 2004 Subject: question about the nature of DBM ties In-Reply-To: <3D90DB0C@wm2.uvic.ca> Message-ID: <4.3.2.7.2.20020925151024.00ac19a0@shell2.webquarry.com> At 01:09 PM 9/24/2002 -0700, nkuipers wrote: >When you tie a data structure to an external file, is populating that >structure for large input updating directly into the file or is it crowding >more and more stuff into memory which then gets dumped into the file, or what? > In other words, does tying in this manner free up more RAM? The bottleneck >in my code is the unique function but this function is necessary. All in all >the code works perfectly but takes too long. I'm in the middle of an exercise break for my students, so I don't have long, but one thing I will suggest you check for is constructs which vitiate the use of tie. For instance, keys %tied_hash makes the tie-ing pointless since it constructs the list of all the keys in memory. Look for the use of any lists like these: keys %hash values %hash %hash in list context and get rid of them. Use only each() for iterating through the hash. Peter Scott peter@psdt.com http://www.perldebugged.com