From lsprilus at bioinformatics.weizmann.ac.il Mon Oct 28 00:12:16 2002 From: lsprilus at bioinformatics.weizmann.ac.il (lsprilus@bioinformatics.weizmann.ac.il) Date: Mon Aug 2 21:35:51 2004 Subject: [Rehovot-pm] Speeding up processing Message-ID: <200210280612.IAA21307@bioinformatics.weizmann.ac.il> Sometimes we can speed up processing by avoiding repeated calculations or processing already known data. A hash is an excellent data structure for this purpose. There are also some perl modules to extend this principle, but let's take a look to a simple solution: A brief description: The Fibonacci series is formed by adding the latest two numbers to get the next one, starting from 0 and 1: 0 1 --the series starts like this. 0+1=1 so the series is now 0 1 1 1+1=2 so the series continues... 0 1 1 2 and the next term is 1+2=3 so we now have 0 1 1 2 3 and it continues as follows ... n: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... Fib(n): 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 ... How can we generate the Fibonacci series: There are several formulas to compute this series. We will try a simple recursive approach in this fiboPlain.pl that will print the values for the positions 25 to 35 in the Fibonacci series. #!/usr/local/bin/perl # plain recursive Fibonacci series, J Prilusky, 2002 $|=1; foreach $position (25 .. 35) { printf ("%03d %30d\n",$position,fibo($position)); } sub fibo { my($n) = @_; return 0 if ($n == 0); return 1 if ($n == 1); return fibo($n-1) + fibo($n-2); } How this works?: The fibo subroutine does all the job by calling himself as a tool. Nice example of recursion. Can we speed this up?: A close examination shows that at any step we are computing twice the same values. We can store known values in a hash, as to avoid repeated calculation of the same values, in this fiboCache.pl #!/usr/local/bin/perl # cached recursive Fibonacci series, J Prilusky, 2002 $|=1; foreach $position (25 .. 35) { printf ("%03d %30d\n",$position,fibo($position)); } sub fibo { my($n) = @_; return 0 if ($n == 0); return 1 if ($n == 1); $cache{$n} = fibo($n-1) + fibo($n-2) if (!$cache{$n}); # <== HERE return $cache{$n}; } Try them both and see the difference. The trick is to ONLY compute a value for a given position if we haven't done it before. For each call of the fibo subroutine, we test to see if our %cache storage knows about the value. If $cache{$position} already has a value, we don't need to compute it again. The same caching technique can be used on several other situations. -- Dr Jaime Prilusky | Jaime.Prilusky@weizmann.ac.il Weizmann Institute of Science | fax: 972-8-9344113 76100 Rehovot - Israel | tel: 972-8-9344959 info URL http://bioinformatics.weizmann.ac.il/jaime_prilusky.html OCA is at http://bioinformatics.weizmann.ac.il:8500