APM: Perl zero not being zero, but 7e-12

Mon Sep 12 23:06:05 PDT 2005

Let's see if I get in my answer before half a dozen other people do.

Short form: you're expecting exact representation of decimal numbers.
Modern computers use binary in their hardware, and Perl (without a
library) uses the hardware.  The computer cannot represent all decimal
numbers exactly, and must use an approximation.  That approximation
introduces an error.  There's no guarantee that the same error occurs
in different calculations.

On Mon, 12 Sep 2005, David Bluestein II <dbii at interaction.net> wrote:
> Okay, I've seen this before and have a question how to avoid it.
>
> I take two variables:>
>     $a= 38071.63;
>     $b = $i + $j; # Where $i + $j => 38071.63
>     print "$a : $b"; # results in 38071.63 : 38071.63
>
> Yet if I do:
>
> if ($a == $b) {
> print "Equal";
> } else {
> print "Not equal";
> }

Indent your code.

> I get a "Not equal".
>
> If I subtract ($b-$a) the result is 7.27595761418343e-12. How do I
> get Perl to ignore this "noise" which comes from someplace?

You've just discovered round-off error.  In almost all modern
computers, numbers are represented in the machine hardware only as

- integers, which can represent only the integers between -(2**31)
  and (2**31)-1.

  Grungy detail: though even more modern machines can do -(2**63)
  .. (2**63)-1.  Much hardware allows smaller ranges, like 8 or 16.

  Grungy detail: some languages allow unsigned integers, for 0
  .. (2**32)-1 or (2**64)-1, depending.

- floating-point numbers, which are represented using a shorter
  integer and a small exponent of 2.

Some languages allow larger ranges, but on almost all machines, that
requires arrays of words and extra software to manipulate it.

Perl just uses the underlying hardware.  I believe the exact rules are
- If you supply a constant with no decimal point and within the range
  of integers, it's an integer value
- If you supply any other numeric constant, it's a floating-point
  value
- If you do arithmetic involving a floating-point value, the result
  is floating-point, else it is integer

38071.63 == 3807163e-02, and it cannot be represented exactly in the
internal binary floating-point notation.  So the computer hardware,
and therefore Perl, use the closest approximation.  Other values may
get different approximations.  It's not visible via print because the
print software rounded to a nearby value (usually based on the number
of significant digits), but while it prints as 38071.63, the internal
binary value is something like 38071.620000003694 or whatever.

On some older machines, the easiest example was like
    $a = 1/3;  $b = 3*$a;  print (1-$b);
which would print something like
    1e-12
or something.  That doesn't happen on my modernish CPU, though.

Another classic example still works:
    $a = 0.01;
    $b = 0;
    for (my $i = 0; $i < 100; ++$i) {
        $b += $a;
    }
    print (1-$b);
prints, on my machine,
    -6.66133814775094e-16
It's because 0.01 is similarly approximate, and the small error in
that approximation adds up in the final result to a larger error.

Your fundamental problem is that you're expecting equality to work
between floating-point numbers.  That's a sin and you need to avoid
it.  If you need to represent numbers exactly, like dollars and cents
for financial calculations, then shun floating-point.  There are two
common techniques:

- Use integers with a scale factor.  It can be implicit, like storing
  all dollar quantities as integer pennies.  Or it can be explicit,
  like having a separate exponent of 10 (thus implementing a base-10
  version of what the hardware provides in base 2).

  That can be moderately hard to do safely without error.

- Find a Perl library in CPAN that implements the ranges of numbers
  you need exactly (that is, without doing an approximation).

Or maybe you're doing something that doesn't need to be 100% exact but
jsut very precise, with the 7 or 12 significant digits that the
hardware provides.  In that case, don't try "==" on floating-point
numbers, because it will almost always return false due to
approximations.  If you know the ranges, you can implement a "close
enough" test.  Suppose you know that the numbers are from -99999.99 to
+99999.99, and you only care that they're within 0.01.  Then
    abs($b-$a) < 0.01
is the close-enough test.  Or maybe you want them to agree to within
0.1%.  Then
    abs($b/$a) < 0.001
is a first cut ... except if $a is small enough and $b large enough,
you can get overflow, and hence an error or a bad result.

Despite the fact that it looks long, that's a brutally short
explanation.  There are entire tracks of college classes in numerical
analysis, which (inter alia) deals with how to use computer hardware
to compute actual values and minimize errors.

-- 
Tim McDaniel; Reply-To: tmcd at panix.com