From lsprilus at bioinformatics.weizmann.ac.il  Mon Oct 28 00:12:16 2002
From: lsprilus at bioinformatics.weizmann.ac.il (lsprilus@bioinformatics.weizmann.ac.il)
Date: Mon Aug  2 21:35:51 2004
Subject: [Rehovot-pm] Speeding up processing
Message-ID: <200210280612.IAA21307@bioinformatics.weizmann.ac.il>

Sometimes we can speed up processing by avoiding repeated calculations
or processing already known data. A hash is an excellent data structure for
this purpose. There are also some perl modules to extend this principle,
but let's take a look to a simple solution:

A brief description:
The Fibonacci series is formed by adding the latest two numbers to get 
the next one, starting from 0 and 1: 
 
  0 1 --the series starts like this.
  0+1=1 so the series is now 
  0 1 1
    1+1=2 so the series continues...
  0 1 1 2 and the next term is
      1+2=3 so we now have
  0 1 1 2 3  and it continues as follows ...

       n:  0  1  2  3  4  5  6  7  8  9 10 11  12  13  14  15  16 ...
  Fib(n):  0  1  1  2  3  5  8 13 21 34 55 89 144 233 377 610 987 ...

How can we generate the Fibonacci series:
There are several formulas to compute this series. We will try a simple 
recursive approach in this fiboPlain.pl that will print the values for
the positions 25 to 35 in the Fibonacci series.


#!/usr/local/bin/perl
# plain recursive Fibonacci series, J Prilusky, 2002
$|=1;

  foreach $position (25 .. 35) {
    printf ("%03d %30d\n",$position,fibo($position));
  }

sub fibo {
  my($n) = @_;
  return 0 if ($n == 0); 
  return 1 if ($n == 1); 
  return fibo($n-1) + fibo($n-2);
}


How this works?:
The fibo subroutine does all the job by calling himself as a tool. Nice example
of recursion.

Can we speed this up?:
A close examination shows that at any step we are computing twice the same 
values. We can store known values in a hash, as to avoid repeated calculation
of the same values, in this fiboCache.pl


#!/usr/local/bin/perl
# cached recursive Fibonacci series, J Prilusky, 2002
$|=1;

  foreach $position (25 .. 35) {
    printf ("%03d %30d\n",$position,fibo($position));
  }

sub fibo {
  my($n) = @_;
  return 0 if ($n == 0); 
  return 1 if ($n == 1); 
  $cache{$n} = fibo($n-1) + fibo($n-2) if (!$cache{$n}); # <== HERE
  return $cache{$n};
}


Try them both and see the difference.
The trick is to ONLY compute a value for a given position if we haven't
done it before. For each call of the fibo subroutine, we test to see
if our %cache storage knows about the value. If $cache{$position} already
has a value, we don't need to compute it again.

The same caching technique can be used on several other situations.

-- 
 Dr Jaime Prilusky                | Jaime.Prilusky@weizmann.ac.il
 Weizmann Institute of Science    | fax: 972-8-9344113
 76100 Rehovot - Israel           | tel: 972-8-9344959

 info URL http://bioinformatics.weizmann.ac.il/jaime_prilusky.html
 OCA is at http://bioinformatics.weizmann.ac.il:8500