Maker Pro
Maker Pro

Fixed point Vs Floating point

V

Vladimir Vassilevsky

John said:
You could also compute 1/x by bitwise successive approximation,
especially if you have a fast multiply handy. Square roots, too.

With bitwise successive approximation, the gain is one bit per
iteration, whereas the Newton-Raphson basically doubles the number of
known bits. So the number of iterations will be dramatically lower for
the cost of the additional complexity, of course.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
 
V

Vladimir Vassilevsky

John said:
That's just an inefficient successive approximation to the
already-known sine curve, which it will eventually exactly match. The
better way is to FFT every random iteration and see if it improves the
fundamental and minimizes harmonics.

The systematic approach to this problem is the noise shaping and the
dithering. You can generate the table in the way so the junk will be
either evenly spread around the spectrum or concentrated in the
particular frequency band, whatever is required.


Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
 
W

whit3rd

For the x86 series the FPU was brought onboard with the 486, and with
the 68020 in that series.

Alas, not the 68020, nor the 68030; it was the MC68040
that sported onboard floating point (the hardware support
for floating point on the '020 and '030 was in the form of
coprocessor compatibility).

The floating point unit was omitted on the MC68LC040,
so there were Macintoshes where a 'floating point unit'
upgrade consisted of replacing one '040 CPU with
another, more expensive one.
 
W

whit3rd

The floating point division by a variable X is usually done as
multiplication by 1/X. The 1/X can be computed either sequentially by
Newton-Raphson iteration or as a parallel computation

Just to amplify on this, the numerator (A) and the
denominator (B) are normalized (shifted) until
each is in the range of (1, 0.5], then one repeatedly
computes

A := A *(2-B) and B := B*(2-B)

note, A/B = A*(2-B)/(B*(2-B)) so the ratio is unchanged.
Note also, B rapidly approaches "1" (this is a consequence
of the normalization condition, in that B is nearly 1 to begin
with).

At the first step, you can substitute a table value for
the "2-B" approximation, to good effect. Typically, for
a 24-bit result, use a 64-entry lookup table and make
three pairs of multiplies, and B will be exactly 1 (to
the first 24 bits), thus A is the ratio.

See, for instance,
"An Algorithm for Division", Svoboda, Inf. Process. Mach.
_9: 25-32 (1963)

The (infamous) Pentium divide bug resulted from
a fault in transcribing the lookup table. They didn't
test B, just took a number of steps based on the worst-case
value.

This assumes there is a fast (integer) multiplier, which
is true in most modern microprocessors...
 
M

MooseFET

On 17 Jun, 14:26, John Larkin
On Tue, 17 Jun 2008 07:47:17 -0400, Phil Hobbs
[clip]
I was just playing with the minimal amount of assembly code that will
make a nice 4096-point sine table. Looks like maybe a dozen lines or
so, but the rounding errors are annnoying. Maybe I'll just stick with
the 2048-word lookup table, which is accurate and fast.
John
I wanted a small 256 sine lookup for an AVR. Being lazy I pulled one
from the web. Didn't look right on the scope or spectrum analyser, so
pulled another which for some reason had slightly different values.
That wasn't right either. Curious I then pulled another 3, all
different all crap!. The annoyance was that they all came with their
own righteous mathematical analyses.
Eventually had to do it myself. Being lazy is sometimes hard work.
For small tables with short word lengths, there must be some subtle
arrangement of rounding that nets the purest spectrum. Hurts me head
to think about that one.
I suppose you could Fourier it and find the amplitude/phase of all the
harmonics, one by one, and try to poke the inverse correction into the
table at the sub-LSB level, sort of buried in the rounding noise.
Hurts me head.
Write a bit of code:
Fill a table with all ones

save a copy of the table
randomly increment or decrement one element
best fit a sine to the new table
if the fit is better keep the new table
if NOT bored yet goto loop

That's just an inefficient successive approximation to the
already-known sine curve, which it will eventually exactly match.

No, it is always integer.
The
better way is to FFT every random iteration and see if it improves the
fundamental and minimizes harmonics.

The RMS error is the harmonics so no FFT is needed. Once you fit to
the sine, what is left is all the harmonic content because you sweep
through the whole table exactly once per cycle of the fundamental. It
is a very different situation to the case where the sample rate and
the frequency are not related.
 
M

MooseFET

You could also compute 1/x by bitwise successive approximation,
especially if you have a fast multiply handy. Square roots, too.

Or given a fast byte multiply and the need to do a multi-byte divide,
you can take advantage of these facts:

1/1 = 1 (kind of obvious I know)

1/(2^N) = 2^-N

1/(1-e) = 1+e when e is very small

A*B*C*D *( 1/(A*B*C*D*X) = 1/X

If this is integer: First you normalize by sliding the number up.
For floats that is already done

The first step needs a small table.

A is a small number in the form (1+N/256) that is obtained from a
table look up using the upper bit of X. The table values are set so
that they move the value of X towards the 0FFFF...FF value, but are
certain never to go beyond that. The actual math is done as X+X*N/256
so the bytewise multiply works nicely.

B is another small number in this case it is in the form of (1+N/8192)
(IIRC) producing this number does not require a table the N is made by
complimenting the bits of X.

C is the same thing only 6 more bits down.

D is again 6 more bits down.

Now you use the 1/(1+e)=1-e rule to invert the number very near 2^N.

Next you multiply the result by the A,B,C,D values.

I hope I explained that well enough that you can see the point. It
works nicely on an 8051.
 
V

Vladimir Vassilevsky

John Larkin wrote:

If the table is walked point-by-point, the RMS error from a perfect
sine function will show up as various mixes of harmonics, depending on
the values. I'd probably go for minimizing the worst harmonic, or
maybe minimizing THD, which probably aren't exactly the same thing.

In the case of DDS, we don't walk the table in successive points, but
sort of hop and skip all over the place, so math errors create
harmonics, non-harmonic spurs, and the equivalent of wideband noise.
That could be improved by some numerical analysis/iteration process,
if we had a few billion years to spare.

This problem is well known, and there are the standard approaches to it.
Add a random (or quasi-random) dither to the LSB of the DDS phase to
break the periodicity, and the trash will be evenly distributed over the
bandwidth. The quantization of the amplitude is the different story; the
way to deal with that is the interpolation and the noise shaping.
I'm sure work has been done on optimizing DDS waveform tables for
spectral content; I don't have any refs handy. It gets worse when
downstream gain and offset factors are applied digitally.

You can optimize the DDS table for the frequency. You can't optimize a
DDS table for a frequency.
We have found that hardware interpolation of "ideal" sinewave lookup
tables improves the spectra a lot.

It certainly does. The interpolation of the N-th order is equvalent to
the increase the number of table points by the power of (N+1).



Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
 
J

JosephKK

Do you know of any processor that handles denormals in hardware, other
than by flushing to zero? I haven't found one, and I don't think any
exists. That's what I mean by "no hardware support".

Cheers,

Phil Hobbs

Gosh, just after i named multiple ones. Go study the databooks before
you respond.
 
J

JosephKK

Alas, not the 68020, nor the 68030; it was the MC68040
that sported onboard floating point (the hardware support
for floating point on the '020 and '030 was in the form of
coprocessor compatibility).

The floating point unit was omitted on the MC68LC040,
so there were Macintoshes where a 'floating point unit'
upgrade consisted of replacing one '040 CPU with
another, more expensive one.

My memory differs, though i am aware it fails me occasionally. Please
provide some pointers so that all may witness the resolution.
 
J

JosephKK

On 17 Jun, 14:26, John Larkin
On Tue, 17 Jun 2008 07:47:17 -0400, Phil Hobbs
[clip]
I was just playing with the minimal amount of assembly code that will
make a nice 4096-point sine table. Looks like maybe a dozen lines or
so, but the rounding errors are annnoying. Maybe I'll just stick with
the 2048-word lookup table, which is accurate and fast.

I wanted a small 256 sine lookup for an AVR. Being lazy I pulled one
from the web. Didn't look right on the scope or spectrum analyser, so
pulled another which for some reason had slightly different values.
That wasn't right either. Curious I then pulled another 3, all
different all crap!. The annoyance was that they all came with their
own righteous mathematical analyses.
Eventually had to do it myself. Being lazy is sometimes hard work.
For small tables with short word lengths, there must be some subtle
arrangement of rounding that nets the purest spectrum. Hurts me head
to think about that one.
I suppose you could Fourier it and find the amplitude/phase of all the
harmonics, one by one, and try to poke the inverse correction into the
table at the sub-LSB level, sort of buried in the rounding noise.
Hurts me head.

What does "small 256 sine lookup" mean?  256 points?  How many bits
per point?

                                        ...Jim Thompson
--
| James E.Thompson, P.E.                           |    mens     |
| Analog Innovations, Inc.                         |     et      |
| Analog/Mixed-Signal ASIC's and Discrete Systems  |    manus    |
| Phoenix, Arizona  85048    Skype: Contacts Only  |             |
| Voice:(480)460-2350  Fax: Available upon request |  Brass Rat  |
| E-mail Icon athttp://www.analog-innovations.com|    1962     |

         America: Land of the Free, Because of the Brave

A list of 256 values, each value 8 bits. A 256 point approximation of
one complete sine cycle, rather than the accurate, 16 bit, 1/4 cycle
shown by Vladimir.

Full cycle tables are useful only of you have gates, cells, blocks of
ram or rom to piss away. Other than that a quadrant and tiny bit of
program or logic is quite sufficient.
 
J

JosephKK

The "decimal" point isn't necessarily fixed in "fixed point"
arithmetic. There is no exponent though.


Because of the exponent.


Neither are "hard", given an unlimited transistor budget. ;-) In
fact the fixed point unit I worked on was every bit as complicated
as the floating point unit.


Not at all clear.


No, there aren't two paths. As you note, bits are bits.


A floating point unit must treat the mantissa and exponent
differently. A multiply would add the exponent and multiply the
mantissa. What sort of "example" do you want?

Plus there is another interesting technique called block floating
point with one exponent and a block of mantissas.
 
J

JosephKK

None of the ones that you named do it--certainly not any Intel ones.
Support for denormals is provided by handling the EM_DENORMAL exception
in software, or by flushing to zero.

Nice bluff, though.

Cheers,

Phil Hobbs

Hmm. Though it was some time ago i had studied that databook rather
closely. IMM it did act properly, which at the time included throwing
interrupts (exceptions) {By the way there is a difference}. Perhaps
that property was lost when the FPU was brought onboard in the 486 and
never restored.
 
J

JosephKK

JosephKK said:
On 17 Jun, 17:24, Jim Thompson <To-Email-Use-The-Envelope-I...@My-Web-
Site.com> wrote:
On Tue, 17 Jun 2008 09:07:29 -0700, John Larkin



On 17 Jun, 14:26, John Larkin
On Tue, 17 Jun 2008 07:47:17 -0400, Phil Hobbs
[clip]

I was just playing with the minimal amount of assembly code that will
make a nice 4096-point sine table. Looks like maybe a dozen lines or
so, but the rounding errors are annnoying. Maybe I'll just stick with
the 2048-word lookup table, which is accurate and fast.

John

I wanted a small 256 sine lookup for an AVR. Being lazy I pulled one
from the web. Didn't look right on the scope or spectrum analyser, so
pulled another which for some reason had slightly different values.
That wasn't right either. Curious I then pulled another 3, all
different all crap!. The annoyance was that they all came with their
own righteous mathematical analyses.
Eventually had to do it myself. Being lazy is sometimes hard work.

For small tables with short word lengths, there must be some subtle
arrangement of rounding that nets the purest spectrum. Hurts me head
to think about that one.

I suppose you could Fourier it and find the amplitude/phase of all the
harmonics, one by one, and try to poke the inverse correction into the
table at the sub-LSB level, sort of buried in the rounding noise.
Hurts me head.

John

What does "small 256 sine lookup" mean? 256 points? How many bits
per point?

...Jim Thompson
--
| James E.Thompson, P.E. | mens |
| Analog Innovations, Inc. | et |
| Analog/Mixed-Signal ASIC's and Discrete Systems | manus |
| Phoenix, Arizona 85048 Skype: Contacts Only | |
| Voice:(480)460-2350 Fax: Available upon request | Brass Rat |
| E-mail Icon athttp://www.analog-innovations.com| 1962 |

America: Land of the Free, Because of the Brave

A list of 256 values, each value 8 bits. A 256 point approximation of
one complete sine cycle, rather than the accurate, 16 bit, 1/4 cycle
shown by Vladimir.

Full cycle tables are useful only of you have gates, cells, blocks of
ram or rom to piss away. Other than that a quadrant and tiny bit of
program or logic is quite sufficient.

Quadrants are nice if program loop time (ie output frequency) is not an
issue. Add a few quadrant manipulation instructions to a tight loop and the
performance suffers drastically (like 10x worse!).

And what part of "it is a time - memory space" tradeoff did you not
understand? Nor is the performance hit necessarily that bad, even in
assembler.
 
Top