Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Comparison of floats (if a>b then), float precision 3

Status
Not open for further replies.

YpjY

Instructor
Jan 19, 2008
3
ES
Hi, could someone explain clearly what exactly Delphi does when I write "if a > b then", given that "var a, b: Extended"?

By trial and error I have found that it only checks the first 19 non-zero digits, e.g.

0.000...00001 > 0.0 is True for any number of zeros
98765432109876543220.0 > 98765432109876543210.0 is True (differ in the 19th digit)
98765432109876543211.0 > 98765432109876543210.0 is False (differ in the 20th digit)

The last one is sort of surprising result for me, because the difference is 1.0, which is quite a lot (in absolute terms).

Further, there is a plenty of recommendations everywhere and methods implemented in a plenty of packages, which try to do something like

Code:
const PrecisionEpsilon = 3.36210314311209558E-4932;
function IsAGreaterThanB(a, b: Extended): Boolean;
begin
Result := ((a - b) > PrecisionEpsilon);
end;

Clearly, the relation that resulted False above, would now be True. So, what is better? Can IsAGreaterThanB result in any problems? What about time requirements and precision? Should PrecisionEpsilon be as above or simply 1E-20?

Is a similar rule of taking only the first 19 digits applied also in multiplication (subtraction etc.) or all possibly 4932 digits are being multiplied? (If the rest is ignored, then perhaps the function IsAGreaterThanB is useless...)

Why is it in Delphi Help written that the number of significant digits for Extended is 10-20 (while for other float types they only differ by 1, so I would expect 19-20 here)?

Can onyone provide me with the first ~20 digits of the smallest (negative) Extended number? Delphi Help says that number is -3.6 x 10^4951, but this is not enough if one wants to be precise.

And the last doubt. When I define
Code:
const MAXFLOATVALUE = 1.18973149535723103E+4932;
my Turbo Delphi shows in the Structure-Errors an error saying "Invalid real constant at line...", though it compiles OK. The constants are Extended by default. So why that warning?

Many thanks for any insightful answer.
 
Okay, this can be a bit dicey, so I'll try to explain it the best I can.

Hi, could someone explain clearly what exactly Delphi does when I write "if a > b then", given that "var a, b: Extended"?

Delphi works in scientific notation with numbers, as I'm sure you've figured out. What it is doing is as it would for an integer or any other value: It is comparing one against the other.

Now you are either using fixed types or floating point types. A fixed type has a finite precision that can be expected always. A floating-type is the same way, but the precision is only evident in the first part of the scientific number. A floating-point number stores a certain number of values in the first part and a certain number of values in the second part.

For your examples, what you gave isn't quite right: For what they display, they are all equal, but there is always some drift. The method you describe is okay to a point, but there will always be that drift that is produced.

Is a similar rule of taking only the first 19 digits applied also in multiplication (subtraction etc.) or all possibly 4932 digits are being multiplied? (If the rest is ignored, then perhaps the function IsAGreaterThanB is useless...)

It's only applying the values that are stored in the first part of the sci. number. In the case of extended, it's 15, when I tested it out. Consequently, all you will get is calculations accurate enough to the relevant significant digits in the calculation.

Why is it in Delphi Help written that the number of significant digits for Extended is 10-20 (while for other float types they only differ by 1, so I would expect 19-20 here)?

This depends on the type and how the data are stored.

my Turbo Delphi shows in the Structure-Errors an error saying "Invalid real constant at line...", though it compiles OK. The constants are Extended by default. So why that warning?

Untyped variables, which is what you gave an example of, is not necessarily typed by default. Try this:

Code:
const MAXFLOATVALUE: Extended = 1.18973149535723103E+4932;

Overall, though, remember that floating-point variables are only guaranteed to be scientifically accurate (in terms of significant digits) and not mathematically accurate. A great demonstration you can try is to define an extended value, assign the value of 0.01 to it, and then add 0.01 to it until you hit 1000000 0.01's. You will see a grossly different number than you expect (10000).

If you seek mathematical accuracy as opposed to general scientific accuracy (like if you're working with money), you would do well to do fixed-point decimal arithmetic (through a pre-provided unit or your own code).
 
OP - If you really need processing on no.'s of this magnitude, perhaps an external component will fill your bill. Sorry, I wish I had time to search, but I'm sure they're out there.

For all the time you put into your reply, a *4U!
It will go down as a link to whenever questions are asked about floating precision.

Roo
Delphi Rules!
 
Thanks for your answers. I have just verified that the result of
Code:
IsAGreaterThanB(98765432109876543211.0, 98765432109876543210.0)
is False! Not True, as I suspected above without having it verified.. So it seems that "a > b" and "IsAGreaterThanB(a, b)" give always the same result. Or has anyone a counterexample?

The same result of the both expressions can be justified from the scientific notation Delphi uses. E.g, 321 expressed in the scientific notation is {+}{3.21}x10^{2}, where the three braced-expressions are called 'sign', 'significand' and 'exponent', stored as integers. I.e., one could imagine that there exists something like
Code:
function "a > b": Boolean; // for a, b: Extended
begin
  aa.sign := "the sign of 'a'";
  aa.significand := "the first 19 non-zero digits of 'a'";
  aa.exponent := "the order of magnitude of 'a'";
  bb.sign := "the sign of 'b'";
  bb.significand := "the first 19 non-zero digits of 'b'";
  bb.exponent := "the order of magnitude of 'b'";
  if aa.sign = '+' and bb.sign = '-' then Result := True;
  if aa.sign = '-' and bb.sign = '+' then Result := False;
  if aa.sign = '+' and bb.sign = '+' then 
    begin
      if aa.exponent > bb.exponent then Result := True;
      if aa.exponent < bb.exponent then Result := False;
      if aa.exponent = bb.exponent then 
        begin
          if aa.significand > bb.significand then Result := True;
          if aa.significand <= bb.significand then Result := False;
        end;
    end;
  if aa.sign = '-' and bb.sign = '-' then ... // analogous
end;
{Note: to simplify, I omitted 'Exit;' command after each assignment to 'Result'}

From the above it follows that "a > b" is equivalent to "a - b > 0", as long as the substraction is also done using the scientific notation (and it is)...

Thus, my conclusion is that there is no need to define IsAGreaterThanB. Am I right?

Regarding my "Invalid real constant at line..." error, it persists even if I add ": Extended". Indeed, Delphi Help says that constants are predefined of type Extended. It might be a Delphi BUG..

PJ.
 
Okay, it seems like there is still a lack of understanding given the explanation above:

is False! Not True, as I suspected above without having it verified.. So it seems that "a > b" and "IsAGreaterThanB(a, b)" give always the same result. Or has anyone a counterexample?

As was talked about above, there is only a storage of a specific number of significant digits per data type. In the case you give, A is not > B, a is equal to b. One could prove this to themselves by writing the program to study how the values work - I'll show that to you later.

Thus, my conclusion is that there is no need to define IsAGreaterThanB. Am I right?

There is a drift in the numbers due to rounding and other factors - to that point, yes it's necessary to do that.

Regarding my "Invalid real constant at line..." error, it persists even if I add ": Extended". Indeed, Delphi Help says that constants are predefined of type Extended. It might be a Delphi BUG..

Perhaps you are defining too many significant digits. I notice I get away with a lot working with floating-point numbers in Delphi 3 that I probably should not (D3 ignores all superfluous digits in numbers), and I'm sure by the time they got to Turbo Delphi, they locked all of that down.

All can be illustrated in this example - it always pays to test for yourself to see what is going on:

Code:
{$APPTYPE CONSOLE}
program float_test; uses sysutils;
  const
    C: Extended = 1.18973149535723103E+4932;
  var
    a: Extended;
    b: Extended;
  begin
    A := 98765432109876543211.0;
    B := 98765432109876543210.0;
    writeln('Text Rep of A: ', '98765432109876543211.0');
    writeln('Regular Rep A: ', a:0:1);
    writeln('    Sci Rep A: ',  a);
    writeln('Text Rep of B: ', '98765432109876543210.0');
    writeln('Regular Rep B: ', b:0:1);
    writeln('    Sci Rep B: ',  b);
    if a > b then
      writeln('A is greater than B')
    else
    if b > a then
      writeln('A is less than B')
    else
      writeln('A is equal to B');
    writeln('C: ', c);
    writeln('C in Text: ', c:0:20);
  end.

Now let's look at what the output is in this program:
Code:
Text Rep of A: 98765432109876543211.0
Regular Rep A: 98765432109876543200.0
    Sci Rep A:  9.87654321098765E+0019

The first line is the text representation of your input, the second is the text representation (FloatToStrF), and the third is the scientific number representation.

What do I notice? You are putting in two more significant digits than what Extended can store. The second line is showing 18 digits, the third is showing 15 digits.

Code:
Text Rep of B: 98765432109876543210.0
Regular Rep B: 98765432109876543200.0
    Sci Rep B:  9.87654321098765E+0019

This is consistent as well with what is above. You provide 19 significant digits instead of 20, but the explanation above still holds, since the type can only handle 18 digits.

Now what happens when it comes time to do your comparison? Let's look at what the computer is doing and not what you are expecting it to do:

Code:
{ if a > b }
if 98765432109876543200.0 > 98765432109876543200.0

Given what the computer stores for the numbers, can we expect anything less than this for our next output:

Code:
A is equal to B

Now to look at the matter of the max-float line. You try to set the value to:
Code:
1.18973149535723103E+4932

But if we look at what the computer is doing again:
Code:
C:  1.18973149535723E+4932
C in Text:  1.1E+4932

1) You provide three more significant digits (again) than what is stored in the first, but if we look at the second line, we get what is expected.
2) To that end, you are providing a value that is as Delphi is telling you: "Invalid real constant at line..." . Not a Delphi bug, but operator error.

And actually if you consult the manual:
Code:
Type	                     Range	
---------------------------------------------------------
Extended	    3.4 x 10-4932 .. 1.1 x 104932	

Significant digits	Size in bytes
-------------------------------------
      19-20	              10

The proper line to use (assuming you do not want to try a larger value):
Code:
const MAXFLOATVALUE: Extended = 1.1E+4932;

My suggestion is consistent with roo0047. If you really have a need for such precision (or accuracy) beyond what is called for in the significant digit calculation rules I linked to in the first post, you really should be seeking out an external component which does fixed-point decimal arithmetic to the precision you are demanding.
 
excellent explanation Glenn!
a star for the effort...

[thumbsup]

-----------------------------------------------------
What You See Is What You Get
Never underestimate tha powah of tha google!
 
Great! Thank you very much, I overlooked those links you gave in your first post..

Though, try running your code with A ending in '20' instead of '11'. I get exactly the same output as before (i.e., both the regular and the scientific representation of A and B are equal), YET the result is
Code:
A is greater than B
Uff... Given the logic you explained, the computer is doing
Code:
{ if a > b }
if 98765432109876543200.0 > 98765432109876543200.0
but it yields True! So, it seems that it works with 19 significant digits..

I understand that, for Extended, the number of significant digits is 64 in binary representation, so it may be 19 or 20 for different numbers.
With the constant I get the warning even if I write
Code:
C: Extended = 1E+4932;
It disappears when I reduce the exponent to 308, which is the maximum value of Double type. So, again, a Delphi bug? :)

And if I consult in my Turbo Delphi (build 2600: Service Pack 2) the Borland Help on Real Types, it says
Code:
Type                         Range    
---------------------------------------------------------
Extended        -3.6 x 10^4951 .. 1.1 x 10^4932    

Significant digits    Size in bytes
-------------------------------------
      10-20                  10

What's your Delphi version?

Finally, what you mean by the drift? Can you show an example?

Sorry for so many questions, but I would like to understand how this stuff works, in order to decide whether to use an external component (because I am also interested in quick running..).
 
(I should note this is turning out to be a good example of the usefulness of trying out things, investigating and observing results)

Though, try running your code with A ending in '20' instead of '11'. I get exactly the same output as before (i.e., both the regular and the scientific representation of A and B are equal),

I just tried this myself (you learn and try things, truthfully I don't use any floating-point type much, reasoning below), and this is true (hard to understand, again not something intrinsic since Delphi does not write this last digit ever). (of course this gets really nutty, and like I said it's hard to explain - it's easily confusing)

What it seems to be doing (after writing a file to disk and reading back, and looking at the values in hex) is storing an extra digit and not writing it for purposes of rounding, etc. For your other example, it still makes the two equal, since the last stored significant digit is still the same.

From the output of my new program, where I added lines for hex dumps - you know what the rest looks like:
Code:
Hex of A: DE CF A4 1C A7 A9 54 AB 41 40 
Hex of B: DD CF A4 1C A7 A9 54 AB 41 40 
A is greater than B

If you know how to read hex dumps (I'll spare that explanation unless requested - same for generating them out of variables in code it's off-topic for this thread and probably more worthy of a FAQ (if it hasn't already been done) than a simple post anyway), you'll see the first one has its last significant digit as one greater than the second, which produces the answer.

Code:
It disappears when I reduce the exponent to 308, which is the maximum value of Double type. So, again, a Delphi bug? :)

I don't know - the line worked for me. I will say 1.2E+4932 produces +Inf, which is what I would expect out of the floating-point types. The maximum storable extended value is between 1.1E+4932 and 1.2E+4932

As for the manual cite, an X was likely left out of the Delphi 3 version of it.

Finally, what you mean by the drift? Can you show an example?

I gave you one earlier (adding 0.01 to itself 1000000 times). This is really an example of the reason why I tend to not use floating-point types much - you don't necessarily get what you expect due to rounding and other factors. Therefore it makes it hard to check and validate programs with extensive numbers of floating-point calculations (like if you attempt data processing involving them, like with money). I did that, and had it write every 1000th value.

Starting from the beginning (remember all this program is doing is adding 0.01 to an Extended variable repeatedly):
Code:
  [b]Num. Times             Extended Data Value
    ----------------------------------------------[/b]
      1000                 10.00000000000000012000
      2000                 20.00000000000000030000
      3000                 30.00000000000000050000
      4000                 39.99999999999999940000
      5000                 49.99999999999999780000
      6000                 59.99999999999999630000
      7000                 69.99999999999999690000
      8000                 79.99999999999999880000
      9000                 90.00000000000000070000
(the correct answer is the left value divided by 100)

You can already look to the 4000 section and see a big problem. I could apply some logic to my code relating to significant figures and come up with code that says that anything past the second decimal place is invalid and to be rounded (since my values are 2 decimal places). But it's slower, and what do I do for varying decimal places and other operations, like multiplication (which just multiplies the rounding errors)?

Our rounding errors grow, until:
Code:
    143000               1430.00000000000099000000
    144000               1440.00000000000100000000
    145000               1450.00000000000100000000
    146000               1460.00000000000101000000
...
    995000               9949.99999999987857000000
    996000               9959.99999999987836000000
    997000               9969.99999999987814000000
    998000               9979.99999999987793000000
    999000               9989.99999999987772000000
   1000000               9999.99999999987750000000

Final Answer: 9999.99999999987750000000

Extended does not show the errors as badly (and therefore is perhaps the most "drift" resistant type), but if you try this with Single or Double, the errors get nightmarish (single is off by 134.776371875!).

Code:
Real:     10000.00041124224663000000
Single:    9865.22363281250000000000
Double:   10000.00000017185630000000
Extended:  9999.99999999987750000000

If you wish to see the weakness of accuracy in this way for floating-point types, the telco benchmark is a great exercise to attempt to illustrate it to you (roughly 12,000,000 f-p calculations for the expon180 file). This exercise is also a great advocate for a true decimal type (as it is the point of the main site), and a good test of what you are doing for your calculation speeds.

Personally, I've gotten burned a couple of times trying floating-point decimal types and will use fixed-point decimal arithmetic when I want mathematically accurate work in the future (and why I asked about what people used here - I know it'd be a nightmare to try to regularly do data processing with Delphi without it).
 
Glen, I hesitate to post this, since I don't really know what I'm doing. That said, run this and tell me what you think:
Code:
program Project2;

{$APPTYPE CONSOLE}

uses
  SysUtils, FMTBcd;

var
  v1, v2: variant;
  n1, n2: extended;
  b1, b2: TBcd;
  i: integer;
begin
  n1:= 0.01;
  n2:= 0.01;
  v1:= VarFMTBcdCreate(0.01, 18, 4);
  v2:= VarFMTBcdCreate(0.01, 18, 4);
  b1:= VarToBcd(v1);
  b2:= VarToBcd(v2);
  for i:= 1 to 1000000 do begin
   if i mod 1000 = 0 then
     writeln(i,BcdToDouble(b2), n2); //set break-point here - clear after i > 450000
   BcdAdd(b1, b2, b2);
   n2:= n2 + n1;
  end;
  write('Press <Enter> to close...');
  readln;
end.
The limitation may be BcdToDouble() or lack of BcdToExtended().

Roo
Delphi Rules!
 
(Had to fire up my Turbo Delphi 2006 for this one - Delphi 3 didn't work with this one) Anyway, I ran this, and it seemed to be fine for the most part (I didn't see any particularly wrong values jump out at me - all 0's after the 2nd decimal), but when I modified it to use five decimal places (0.00001), it started producing calculation errors similar to what the extended type was doing.

I looked at the source, and it seems to be taking the right approach as opposed to the equivalent in Delphi 3, where it converts to a Currency type and back (more on this later, since this is a valid avenue of argument for my post above).

BCD is a good avenue to eliminate the accuracy, which is why it is one of the things used in data processing (a native type in COBOL as well as in many mainframe system processors - floating-point values are almost never used) - it is fixed point decimal arithmetic. All of the values of the data are stored in guaranteeably retrievable form (12 34 56 78 9C is +123456789), and the values are processed as if they were integers. The decimal position is not stored, but is implied. BCD was in no doubt included in Delphi for to be able to work with databases stored on the mainframe systems (the BCD code in Delphi 3 is associated with the DB code).

Now to the currency issue. A currency value is a 64 bit integer (essentially int64 or comp), which is divided by 10000 and then rounded. It's a great idea (and works for the original test of 1000000 0.01). The problem is when we have to work with data with more decimal places than 4. This is not entirely uncommon with money figures as well. The telco benchmark, mentioned above, features examples of this, but a simple one can be illustrated below:

Code:
Currency: 20.24921000000000000000
Extended: 20.26548750000000000000
Computed: 20.2654875

This is the result of multiplying 325.55 by 0.06225 (the price of a sold item * a sales tax or VAT of 6.225%). The currency value is a fine thing, but you can't use it where a value of more than four decimal places is relevant.

1) The purchase price of the item can be currency (always 2 decimal places at least in the US).
2) The sales tax can not be and must be a regular FP type.
3) Therefore, the result must be an FP type.

Currency is a great data type to have for attempting correct calculations in data processing applications, but the problem is you have to be able to step through things and know when to use either data type.

The telco benchmark can be done in Delphi, but it has to involve currency, and in the right spots. It's still a challenge, though (it took me two weeks off and on to get the right answers - where it goes wrong is hard to debug, since the code is almost right on the 1m record file, but completely right on the 10 record file!).

The final solution giving the correct answers:
Code:
sumT = 1004737.58
sumB = 57628.30
sumD = 25042.17

If we use all extended, the reported answers:
Code:
sumT = 1004784.10
sumB = 57628.30
sumD = 25042.17

We can not obviously use all currency (given the example above), because of the precisions of all the rates and taxes in the spec (4 or 5).

Currency is an improvement when you want mathematical accuracy, but it still requires more effort than necessary to return a good result.
 
An interesting factoid regarding the BCD discussion: Turbo Pascal 3 included an option to use BCD reals: "Eliminates round-off error! A must for any serious business application" Yet they were phased out in Turbo Pascal 5.5

Interesting how people think they know what is necessary, yet don't seem to stick with it. The whole computer industry seems to be an exercise of relearning things done in the past and reinventing wheels already made in the past.

All cyclical and all vain the end.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top