Maker Pro
Maker Pro

Sine Lookup Table with Linear Interpolation

G

glen herrmannsfeldt

Interesting. This tickles a few grey cells. Two values based solely on
the MS portion of x and a value based solely on the LS portion. Two
tables, three lookups a multiply and an add. That could work. The
value for sin(M) would need to be full precision, but I assume the
values of sin(L) could have less range because sin(L) will always be a
small value...

The cos(M)sin(L) is one table, with some bits from each.

Well, when you do the interpolation, you first need to know the
spacing between points in the sine table. That spacing is
proportional to cos(x). So, the interpolation table is
indexed by the high four bits of M and the four bits of L.

(snip on getting to 90 degrees, or 2^n in binary.
(snip)
I think one of the other posters was saying to add an entry for 90
degrees. I don't like that. I could be done but it complicates the
process of using a table for 0-90 degrees.

I am not sure yet how they do that. I did notice that when adding the
interpolation value they propagate the carry, so the output can go to
the full 1.0000 instead of 0.99999. But it will do that before 90
degrees.

(snip)
I'm not using an actual ROM chip. This is block ram in a rather small
FPGA with only six blocks. I need two channels and may want to use some
of the blocks for other functions.

So, the one referenced uses three 256x4 ROMs to get 12 bits of sin(M)
from 8 bits of M, and then a fourth 128x8 ROM to get five bits to
add to the low bits of the 12 bit sin(M).

-- glen
 
R

robert bristow-johnson

this *should* be a relatively simple issue, but i am confused

I've been studying an approach to implementing a lookup table (LUT) to
implement a sine function. The two msbs of the phase define the
quadrant. I have decided that an 8 bit address for a single quadrant is
sufficient with an 18 bit output.

10 bits or 1024 points. since you're doing linear interpolation, add one
more, copy the zeroth point x[0] to the last x[1024] so you don't have
to do any modulo (by ANDing with (1023-1) on the address of the second
point. (probably not necessary for hardware implementation.)


x[n] = sin( (pi/512)*n ) for 0 <= n <= 1024

So you are suggesting a table with 2^n+1 entries? Not such a great idea
in some apps, like hardware.

i think about hardware as a technology, not so much an app. i think
that apps can have either hardware or software realizations.
What is the advantage?

in *software* when using a LUT *and* linear interpolation for sinusoid
generation, you are interpolating between x[n] and x[n+1] where n is
from the upper bits of the word that comes outa the phase accumulator:

#define PHASE_BITS 10
#define PHASE_MASK 0x001FFFFF // 2^21 - 1
#define FRAC_BITS 11
#define FRAC_MASK 0x000007FF // 2^11 - 1
#define ROUNDING_OFFSET 0x000400

long phase, phase_increment, int_part, frac_part;

long y, x[1025];

....

phase = phase + phase_increment;
phase = phase & PHASE_MASK;

int_part = phase >> FRAC_BITS;
frac_part = phase & FRAC_MASK;

y = x[int_part];
y = y + (((x[int_part+1]-y)*frac_part + ROUNDING_OFFSET)>>FRAC_BITS);


now if the lookup table is 1025 long with the last point a copy of the
first (that is x[1024] = x[0]), then you need not mask int_part+1.
x[int_part+1] always exists.

Also, why 10 bit
address for a 1024 element table?

uh, because 2^10 = 1024?

that came from your numbers: 8-bit address for a single quadrant and
there are 4 quadrants, 2 bits for the quadrant.

My calculations indicate a linear
interpolation can be done with 4 ppm accuracy with a 256 element LUT.
I'm not completely finished my simulation, but I'm pretty confident this
much is corrrect.

i'm agnostic about the specs for your numbers. you are saying that 8
bits per quadrant is enough and i am stipulating to that.
No, that is the phase sent to the LUT. The total phase accumulator can
be larger as the need requires.

why wouldn't you use those extra bits in the linear interpolation if
they are part of the phase word? not doing so makes no sense to me at
all. if you have more bits of precision in the fractional part of the phase
Yes, thanks for the correction. The max error? I'm not so worried about
that exactly. The error is a curve with the max magnitude near the
middle if nothing further is done to minimize it.



I'm talking about the LUT. The LUT only considers the first quadrant.

not in software. i realize that in hardware and you're worried about
real estate, you might want only one quadrant in ROM, but in software
you would have an entire cycle of the waveform (plus one repeated point
at the end for the linear interpolation).

and if you *don't* have the size constraint in your hardware target and
you can implement a 1024 point LUT as easily as a 256 point LUT, then
why not? so you don't have to fiddle with reflecting the single
quadrant around a couple of different ways.
No, not quite right. There is a LUT with points spaced at 90/255 degrees
apart starting at just above 0 degrees. The values between points in the
table are interpolated with a maximum deviation near the center of the
interpolation. Next to 90 degrees the interpolation is using the maximum
interpolation factor which will result in a value as close as you can
get to the correct value if the end points are used to construct the
interpolation line. 90 degrees itself won't actually be represented, but
rather points on either side, 90±delta where delta is 360° / 2^(n+1)
with n being the number of bits in the input to the sin function.

i read this over a couple times and cannot grok it quite. need to be
more explicit (mathematically, or with C or pseudocode) with what your
operations are.
Yes, it is a little tricky because at this point we are working with
integer math (or technically fixed point I suppose).

not with defining the points of the table. you do that in MATLAB or in
C or something. your hardware gets its LUT spec from something you
create with a math program.
Rounding errors is
what this is all about. I've done some spreadsheet simulations and I
have some pretty good results. I updated it a bit to generalize it to
the LUT size and I keep getting the same max error counts (adjusted to
work with integers rather than fractions) ±3 no matter what the size of
the interpolation factor. I don't expect this and I think I have
something wrong in the calculations. I'll need to resolve this.



We are talking about the lsbs of a 20+ bit word. Do you think there will
be much of a difference in result?

i'm just thinking that if you were using LUT to generate a sinusoid that
is a *signal* (not some parameter that you use to calculate other
stuff), then i think you want to minimize the mean square error (which
is the power of the noise relative to the power of the sinusoid). so
the LUT values might not be exactly the same as the sine function
evaluated at those points.
I need to actually be able to do the
calculations and get this done rather than continue to work on the
process. Also, each end point affects two lines,

what are "lines"? not quite following you.
so there are tradeoffs,
make one better and the other worse? It seems to get complicated very
quickly.

if you want to define your LUT to minimize the mean square error,
assuming linear interpolation between LUT entries, that can be a little
complicated, but i don't think harder than what i remember doing in grad
school. want me to show you? you get a big system of linear equations
to get the points. if you want to minimize the maximum error (again
assuming linear interpolation), then it's that fitting a straight line
to that little snippet of quadratic curve bit that i mentioned.
How does that imply a quadratic curve at 90 degrees?

sin(t) = t + ... small terms when t is small

cos(t) = 1 - (t^2)/2 + ... small terms when t is small
At least I think like the greats!

but they beat you by a few hundred years.

:)
 
E

Eric Jacobsen

I should learn that. I know it has been used to good effect in FPGAs.
Right now I am happy with the LUT, but I would like to learn more about
CORDIC. Any useful referrences?

If you are implementing in an FPGA with no multipliers, then CORDIC is
the first thing you should be looking at. Do a web search on
"CORDIC" for references. I know that's not at all obvious to you,
but it works to try. Or you could keep coming back here for more
spoonfeeding.

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com
 
G

glen herrmannsfeldt

(snip, I wrote)
I just tried it, copy and paste from the above link, and it worked.

You might be sure to use Adobe reader if that matters.

Also, it is page 273 in the document, page 282 in the PDF.

(snip)
 
R

Robert Baer

rickman said:
I can't say I follow this. How do I get a 22.5 degree sine table to
expand to 90 degrees?

As to my improvements, I can see where most implementations don't bother
pushing the limits of the hardware, but I can't imagine no one has done
this before. I'm sure there are all sorts of apps, especially in older
hardware where resources were constrained, where they pushed for every
drop of precision they could get. What about space apps? My
understanding is they *always* push their designs to be the best they
can be.
Once upon a time, about 30 years ago, i fiddled with routines to
multiply a millions-of-digits number by another millions-of-digits number.
Naturally, i used the FFT and convolution (and bit-reordering) to do
this.
As the number of digits grew, i needed to use a larger N (in 2^N
samples), so different algorithms were needed in different ranges to
maximize speed.
So i had to try a goodly number of published, standard, accepted,
used routines.
ALL algorithms for N larger than 5 worked but i was able to speed
them up anywhere from 10% to 30% by using "tricks" which were the
equivalent of "folding" around 22.5 degrees.
Start with a full 360 degree method.
"Fold" in half: sin(-x)=-sin(x) or 180 degrees.
Half again on the imaginary axis for what you are doing for 90
degrees: sin(90-x) = sin(90)*cos(x)-cos(90)*sin(x) = -cos(x) exact.
Half again: sin(45)=cos(45)=2^(3/2)~~.707 carry out as far as you
want; sin/cos to 45 degrees. sin(x=0..45) direct, sin(x=45..90)
=cos(45-x) accuracy depending on value of sin/cos(45).
That is prolly as far as you want to go with a FPGA, because (as you
indicated) you do not want to attempt multiplies in the FPGA.
That last step of "folding" might give better results or better FPGA
/ LUT calcs.

Calculating actual values for the LUT for maximum accuracy: the trig
series x-(x^3/3!)+(x^5/5!) etc should give better results from 0..90
than the approx on page 51.
Do not hesitate to try alternate schemes: sin(2x) = 2*sin(x)*cos(x)
and sin(0.5*x)=+/- sqrt((1-cos(x))/2) where that sign depends on the
quadrant of x.
 
J

Jasen Betts

If you are implementing in an FPGA with no multipliers, then CORDIC is
the first thing you should be looking at.

?? CORDIC requires even more multiplies
 
J

Jasen Betts

If you are implementing in an FPGA with no multipliers, then CORDIC is
the first thing you should be looking at.

?? CORDIC requires even more multiplies


But you can muliply two short lookup tables to simulate a longer lookup
table using he identity:

sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b)

so you do a coarse lookup table with sin(a) (and you can get cos(a) from it
ti if a is some integer factor of 90 degrees)

and do two fine lookup tables for sin(b) and cos(b)

eg: you can do 16 bits of angle at 16 bit resoluton in 512 bytes of
table and ~80 lines of assembler code
(but the computation needs two multiplies)

the first two bits determine the quadrant.

the next 7 bits are on a coarse lookup table
128 entries x 16 bits is 256 bytes

next bit defines the sign of b (if set b is negative,add one to a)

and the final 6 are the rest of b

from a 64 entry table you can get sin(|b|)
and another 64 for cos(|b|)

treat b=0 as a special case - have entries for 1..64 in the table
correct sin(|b|) for negative b

a few branches and bit-shifts, two multiplies, one add and you have
your value.
 
S

Spehro Pefhany

?? CORDIC requires even more multiplies

CORDIC requires shift and add. That's why it was used in early
scientific calculators- very little hardware was required.
 
R

Rob Gaddi

I've been studying an approach to implementing a lookup table (LUT) to
implement a sine function. The two msbs of the phase define the
quadrant. I have decided that an 8 bit address for a single quadrant is
sufficient with an 18 bit output. Another 11 bits of phase will give me
sufficient resolution to interpolate the sin() to 18 bits.

If you assume a straight line between the two endpoints the midpoint of
each interpolated segment will have an error of
((Sin(high)-sin(low))/2)-sin(mid)

Without considering rounding, this reaches a maximum at the last segment
before the 90 degree point. I calculate about 4-5 ppm which is about
the same as the quantization of an 18 bit number.

There are two issues I have not come across regarding these LUTs. One
is adding a bias to the index before converting to the sin() value for
the table. Some would say the index 0 represents the phase 0 and the
index 2^n represents 90 degrees. But this is 2^n+1 points which makes a
LUT inefficient, especially in hardware. If a bias of half the lsb is
added to the index before converting to a sin() value the value 0 to
2^n-1 becomes symmetrical with 2^n to 2^(n+1)-1 fitting a binary sized
table properly. I assume this is commonly done and I just can't find a
mention of it.

The other issue is how to calculate the values for the table to give the
best advantage to the linear interpolation. Rather than using the exact
match to the end points stored in the table, an adjustment could be done
to minimize the deviation over each interpolated segment. Without this,
the errors are always in the same direction. With an adjustment the
errors become bipolar and so will reduce the magnitude by half (approx).
Is this commonly done? It will require a bit of computation to get
these values, but even a rough approximation should improve the max
error by a factor of two to around 2-3 ppm.

Now if I can squeeze another 16 dB of SINAD out of my CODEC to take
advantage of this resolution! lol

One thing I learned while doing this is that near 0 degrees the sin()
function is linear (we all knew that, right?) but near 90 degrees, the
sin() function is essentially quadratic. Who would have thunk it?

Just one note, since you're doing this in an FPGA. If your look up
table is a dual-port block RAM, then you can look up the the slope
simultaneously rather than calculate it. sin(x)/dx = cos(x) = sin(x +
90 deg).
 
J

Jerry Avins

Oops, should be:
cos(x) ~= 1 - x^2/2
sin(x) ~= x (- x^3/6)
Mixed up your odd/even powers :)

Of course, the cubic is less important than the quadratic, so sin(x) ~=
x is sufficient for most purposes.

I suppose you could potentially synthesize the functions using low power
approximations (like these) and interpolating between them over modulo
pi/4 segments. Probably wouldn't save many clock cycles relative to a
CORDIC, higher order polynomial, or other, for a given accuracy, so the
standard space-time tradeoff (LUT vs. computation) isn't changed.

An in-between approach is quadratic interpolation. Rickman reads Forth.
http://users.erols.com/jyavins/typek.htm may inspire him. (It's all
fixed-point math.) ((As long as there is interpolation code anyway, an
octant is sufficient.))

Jerry
 
E

Eric Jacobsen

?? CORDIC requires even more multiplies


But you can muliply two short lookup tables to simulate a longer lookup
table using he identity:

sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b)

so you do a coarse lookup table with sin(a) (and you can get cos(a) from it
ti if a is some integer factor of 90 degrees)

and do two fine lookup tables for sin(b) and cos(b)

eg: you can do 16 bits of angle at 16 bit resoluton in 512 bytes of
table and ~80 lines of assembler code
(but the computation needs two multiplies)

He indicated that his target is an FPGA that has no multiplies. This
is the exact environment where the CORDIC excels. LUTs can be
expensive in FPGAs depending on the vendor and the device.

Almost any DSP process, including a large DFT, can be done in a LUT,
it's just that nobody wants to build memories that big. It's all
about the tradeoffs. FPGA with no multipliers -> CORDIC, unless there
are big memories available in which case additional tradeoffs have to
be considered. Most FPGAs that don't have multipliers also don't
have a lot of memory for LUTS.

the first two bits determine the quadrant.

the next 7 bits are on a coarse lookup table
128 entries x 16 bits is 256 bytes

next bit defines the sign of b (if set b is negative,add one to a)

and the final 6 are the rest of b

from a 64 entry table you can get sin(|b|)
and another 64 for cos(|b|)

treat b=0 as a special case - have entries for 1..64 in the table
correct sin(|b|) for negative b

a few branches and bit-shifts, two multiplies, one add and you have
your value.

There are lots of ways to do folded-wave implementations for sine
generation, e.g., DDS implementations, etc. Again, managing the
tradeoffs with the available resources usually drives how to do get it
done. Using a quarter-wave LUT with an accumulator has been around
forever, so it's a pretty well-studied problem.

Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com
 
R

rickman

Why can't some harmonic component be added to the table points to improve
downstream distortion? Of course, you'd need a full 360 degree table, not a
folded one.

Mostly because the table is already optimal, unless you want to correct
for other components.

For everything, including the DAC, but especially downstream stuff. The amps and
stuff may have 100x more distortion than the lookup table does.

Ok, then you can add that to the LUT. As you say, it will require a
full 360 degree table. This will have a limit on the order of the
harmonics you can apply, but I expect the higher order harmonics would
have very, very small coefficients anyway.
 
R

rickman

The simple method would be to compute a zillion-point FFT. Since the
function is really truly periodic, the FFT gives you the right answer
for the continuous-time Fourier transform, provided you have enough
points that you can see all the relevant harmonics.

Ok, as a mathematician would say, "There exists a solution!"
 
R

rickman

Once upon a time, about 30 years ago, i fiddled with routines to
multiply a millions-of-digits number by another millions-of-digits number.
Naturally, i used the FFT and convolution (and bit-reordering) to do this.
As the number of digits grew, i needed to use a larger N (in 2^N
samples), so different algorithms were needed in different ranges to
maximize speed.
So i had to try a goodly number of published, standard, accepted, used
routines.
ALL algorithms for N larger than 5 worked but i was able to speed them
up anywhere from 10% to 30% by using "tricks" which were the equivalent
of "folding" around 22.5 degrees.
Start with a full 360 degree method.
"Fold" in half: sin(-x)=-sin(x) or 180 degrees.
Half again on the imaginary axis for what you are doing for 90 degrees:
sin(90-x) = sin(90)*cos(x)-cos(90)*sin(x) = -cos(x) exact.

Almost exact, sin(90-x) = cos(x), not -cos(x). More important is
sin(180-x) = sin(x) from the same equation above. This is the folding
around 90 degrees. Or sin(90+x) = cos(x), but I don't have a cos
table... lol

This is the two levels of folding that are "free" or at least very
inexpensive. It requires controlled inversion on the bits of x and a
controlled 2's complement on the LUT output.

Half again: sin(45)=cos(45)=2^(3/2)~~.707 carry out as far as you want;
sin/cos to 45 degrees. sin(x=0..45) direct, sin(x=45..90) =cos(45-x)
accuracy depending on value of sin/cos(45).

You really lost me on this one. How do I get cos(45-x)? If using this
requires a multiply by sqrt(2), that isn't a good thing. I don't have a
problem with a 512 element table. The part I am using can provide up to
36 bit words, so I should be good to go with the full 90 degree table.

Separating x into two parts, 45 and x', where x' ranges from 0 to 45,
the notation should be, sin(45+x') = sin(45)cos(x') + cos(45)sin(x') =
sin(45) * (cos(x') + sin(x')). I'm not sure that helps on two levels.
I don't have cos(x') and I don't want to do multiplies unless I have to.

That is prolly as far as you want to go with a FPGA, because (as you
indicated) you do not want to attempt multiplies in the FPGA.
That last step of "folding" might give better results or better FPGA /
LUT calcs.

Calculating actual values for the LUT for maximum accuracy: the trig
series x-(x^3/3!)+(x^5/5!) etc should give better results from 0..90
than the approx on page 51.
Do not hesitate to try alternate schemes: sin(2x) = 2*sin(x)*cos(x) and
sin(0.5*x)=+/- sqrt((1-cos(x))/2) where that sign depends on the
quadrant of x.

I don't have resources to do a lot of math. A linear interpolation will
work ok primarily because it only uses one multiply of limited length.
I can share this entire circuit between the two channels because the
CODEC interface is time multiplexed anyway.

I did some more work on the interpolation last night and I put to rest
my last nagging concerns. I am confident I can get max errors of about
±2 lsbs using a 512 element LUT and 10 bit interpolation. This is not
just the rounding errors in the calculations but also the error due to
the input resolution which is 21 bits. The errors without the input
resolution are limited to about 1.5 lsbs. I expect the RMS error is
very small indeed as most point errors will be much smaller than the max.
 
R

rickman

Problem with that..could see only front and back cover.
Different source perhaps?

It is a lot of document for just two pages. Try this link to see the
short version...

arius.com/foobar/1972_National_MOS_Integrated_Circuits_Pages_273-274.pdf
 
R

rickman

The cos(M)sin(L) is one table, with some bits from each.

Well, when you do the interpolation, you first need to know the
spacing between points in the sine table. That spacing is
proportional to cos(x). So, the interpolation table is
indexed by the high four bits of M and the four bits of L.

Yes, but the error is much larger than the errors I am working with. I
want an 18 bit output from the process... at least a 16 bit accurate
since the DAC is only ~90 dB SINAD. I expect to get something that is
about 17-17.5 ENOB going into the DAC.

I don't think the method described in the paper is so good when you want
accuracies this good. The second LUT gets rather large. But then it
doesn't use a multiplier does it?

I am not sure yet how they do that. I did notice that when adding the
interpolation value they propagate the carry, so the output can go to
the full 1.0000 instead of 0.99999. But it will do that before 90
degrees.

They just meant to use a 2^n+1 long table rather than just 2^n length.
No real problem if you are working in software with lots of memory I
suppose.

So, the one referenced uses three 256x4 ROMs to get 12 bits of sin(M)
from 8 bits of M, and then a fourth 128x8 ROM to get five bits to
add to the low bits of the 12 bit sin(M).

Yep. About a factor of 35db worse than my design I expect.

One thing I realized about the descriptions most of these designs give
is they only analyze the noise from the table inaccuracies *at the point
of the sample*. In other words, they don't consider the noise from the
input resolution. Even if you get perfect reconstruction of the sine
values, such as a pure LUT, you still have the noise of the resolution
of the input to the sine generation. Just my 2 cents worth...
 
Top