Maker Pro
Maker Pro

Help comparing two FFT approaches

R

RobertMacy

Years ago I thought I verified there was no difference between these two
techniques, but may have done it wrong.

Trying to improve S/N, pulling signals out of noise: assume the noise is
white and the signals are long term stable.
FFT over a fixed length, done 100 times and all averaged together vs
FFT over 100 times the fixed length.

Somehow it just seems like it should be better to do the 100 times fixed
length vs doing the fixed length 100 times. I had expected it to be
better, but at the time could NOT verify through simulations. Became a
moot point, so moved on. However, today the system tradeoff again rears
its ugly head, thus I ask the knowledgeable people here.
 
O

o pere o

Years ago I thought I verified there was no difference between these two
techniques, but may have done it wrong.

Trying to improve S/N, pulling signals out of noise: assume the noise is
white and the signals are long term stable.
FFT over a fixed length, done 100 times and all averaged together vs
FFT over 100 times the fixed length.

Somehow it just seems like it should be better to do the 100 times fixed
length vs doing the fixed length 100 times. I had expected it to be
better, but at the time could NOT verify through simulations. Became a
moot point, so moved on. However, today the system tradeoff again rears
its ugly head, thus I ask the knowledgeable people here.

This may depend on the kind of signals involved and what you want to
extract from the FFT. Consider a pure sine wave, for instance. The long
FFT will give you better frequency resolution: you will get a narrow
spike, i.e. better frequency resolution. Each short FFT will give you a
blurred response (in the general case) and averaging won't help to
improve this. Averaging the FFTs will require less resources.

It could well be that the result for those frequencies that you get with
the short FFT is the same with averaging or without. A quick thought for
f=0 and for f=fs/2 makes me think so... And quick octave simulation
seems to confirm this!

(Tried abs((fft(x1)+fft(x2))) and abs(fft([x1,x2])) for two vectors x1
and x2)

Pere
 
M

Martin Brown

Years ago I thought I verified there was no difference between these two
techniques, but may have done it wrong.

Trying to improve S/N, pulling signals out of noise: assume the noise is
white and the signals are long term stable.
FFT over a fixed length, done 100 times and all averaged together vs
FFT over 100 times the fixed length.

You have to decide first what answer you are interested in and what
frequency resolution you really need in the final result. Iff you only
want the amplitude then choosing a convenient transform length and
averaging the (all positive) amplitudes will get you the nicest looking
results provided that the frequency resolution is adequate.

You will also have to pay attention to end effects if there is any long
term drift or your answer will be dominated by the FFT of a sawtooth.

Windowing is the traditional fixup for trading resolution against end
effects but you can do better at higher computational cost with the
likes of maximum entropy depending on exactly what you need to do.
Somehow it just seems like it should be better to do the 100 times fixed
length vs doing the fixed length 100 times. I had expected it to be
better, but at the time could NOT verify through simulations. Became a
moot point, so moved on. However, today the system tradeoff again rears
its ugly head, thus I ask the knowledgeable people here.

100x longer baseline allows you to distinguish between finer frequency
differences which might or might not be important in your application.
If it isn't then you pay extra for doing the longer transform.

N.logN

vs

M(N/M).log(M)

Assuming say N = 2^23, M=2^16 done 2^7 times as an example

23.2^23 vs 16.2^23

The time difference isn't huge but it favours using the shorter
transform if it will give you the frequency resolution you need. It is
easy to make mistakes taking short cuts for go faster stripes...

If applicable then using real to complex conjugate symmetric transform
will also get you almost a factor of two performance improvement.
 
P

Phil Hobbs

What about truncation/windowing artifacts?

Shift-and-add isn't hard to get right for power spectra. The effect of
windowing depends a lot on how much you overlap the data sets. If you
take say, 1024 samples, apply a von Hann window, transform, take another
1024, and so on, you're wasting a lot of information because not every
sample has the chance to be in the middle of a window, where it can
contribute more.

The other extreme is to compute the full windowed FFT at every sample
point, so that every sample eventually occupies each position in the
window function. This is called a "sliding transform".

Sampling theory comes to the rescue here. Each of the frequency
channels of the sliding transform is equivalent to a FIR filter whose
frequency response is the FT of the window function. This will be many
times narrower than the Nyquist frequency of the original sampled data,
and so the frequency channels don't have to be sampled as frequently.

A good window function will let you do a good job with an overlap of
about 3/4, i.e. you compute a 1024-point windowed transform every 256
samples. Folks like Martin B, whose data points are very expensive,
will probably argue for doing more computation than that, e.g. using an
overlap of 15/16 or even more.

[I've spent a good chunk of this evening discussing signal processing
details with a contract-engineering company working for a client of
mine. They don't understand this stuff in any great depth, but at least
they're willing to learn. (It would have been good to know that they
didn't understand before we started, but oh well.) They're nice guys who
want to do a good job, but not much fire in the belly, unfortunately.]

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510

hobbs at electrooptical dot net
http://electrooptical.net
 
P

Phil Hobbs

Think average sum versus sum of averages. If SNR is sufficiently
high then both are equvalent. At low SNR, average of sum is more
accurate.

Vladimir Vassilevsky DSP and Mixed Signal Designs www.abvolt.com

If all the operations are linear, the average of a sum is the same as
the sum of the average. I think you're referring to coherent vs.
incoherent averaging, i.e. averaging power spectra vs taking the power
spectrum of the average. That's different, for sure. I don't think the
OP has told us which he's actually contemplating.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510

hobbs at electrooptical dot net
http://electrooptical.net
 
R

Robert Baer

RobertMacy said:
Years ago I thought I verified there was no difference between these two
techniques, but may have done it wrong.

Trying to improve S/N, pulling signals out of noise: assume the noise is
white and the signals are long term stable.
FFT over a fixed length, done 100 times and all averaged together vs
FFT over 100 times the fixed length.

Somehow it just seems like it should be better to do the 100 times fixed
length vs doing the fixed length 100 times. I had expected it to be
better, but at the time could NOT verify through simulations. Became a
moot point, so moved on. However, today the system tradeoff again rears
its ugly head, thus I ask the knowledgeable people here.
Seems to me the S/N ratio would be the same for "short" periods as
for "long" periods - and so there would be concern if one saw a difference.
Now if you did the equivalent of bandwidth limiting and filtering...
 
R

Robert Baer

RobertMacy said:
Years ago I thought I verified there was no difference between these two
techniques, but may have done it wrong.

Trying to improve S/N, pulling signals out of noise: assume the noise is
white and the signals are long term stable.
FFT over a fixed length, done 100 times and all averaged together vs
FFT over 100 times the fixed length.

Somehow it just seems like it should be better to do the 100 times fixed
length vs doing the fixed length 100 times. I had expected it to be
better, but at the time could NOT verify through simulations. Became a
moot point, so moved on. However, today the system tradeoff again rears
its ugly head, thus I ask the knowledgeable people here.
Seems to me the S/N ratio would be the same for "short" periods as
for "long" periods - and so there would be concern if one saw a difference.
Now if you did the equivalent of bandwidth limiting and filtering...
 
M

miso

Years ago I thought I verified there was no difference between these two
techniques, but may have done it wrong.

Trying to improve S/N, pulling signals out of noise: assume the noise is
white and the signals are long term stable.
FFT over a fixed length, done 100 times and all averaged together vs
FFT over 100 times the fixed length.

Somehow it just seems like it should be better to do the 100 times fixed
length vs doing the fixed length 100 times. I had expected it to be
better, but at the time could NOT verify through simulations. Became a
moot point, so moved on. However, today the system tradeoff again rears
its ugly head, thus I ask the knowledgeable people here.

You need to read up on periodograms and specifically the Welch method.
The proper technique uses overlapping segments.
 
M

Martin Brown

Think average sum versus sum of averages.

Are identical to within a linear scaling factor by definition because
addition and multiplication are both associative and distributive.
If SNR is sufficiently high then both are equvalent.
At low SNR, average of sum is more accurate.

Utter nonsense. If it is all linear there is no difference.

The difference is between

|FFT{all data}|

and

Average( |FFT{N subsets of data}| ) useful

vs

| FFT{ Average(subsets of data) } | likely to be meaningless

It is taking the absolute modulus of the FFT that makes the process
non-linear and allows independent data to pull signal out of noise.
Vladimir Vassilevsky
DSP and Mixed Signal Designs
www.abvolt.com
Also for the OP you need to decide whether the interesting frequencies
are the high ones or the low ones and choose your subsets accordingly.
 
R

RobertMacy

You need to read up on periodograms and specifically the Welch method.
The proper technique uses overlapping segments.

Interesting. The idea of overlapping the sample strings did occur to me to
try *when* the signals were changing. But, [and I find this really hard to
believe] there may be some advantage to overlaying with a stable signal.
It's just that looking at the FFT equations; I just don't see the
advantage of either technique due to the summations that are the basis of
an FFT. Nor any advantage to overlapping.

As stated in the original, the assumption is that signals are stable and
the noise is white. Simulations between the two techniques yielded
identification of the signals within numerical accuracies [10-15, those
kinds of ranges]. Yet, recent data suggests something else going on. There
is a possibility that going to the 100 times longer fft 'culls' out the
'distortion related' harmonics of the signals by increasing the resolution
to the point the bins reject 'close', but unrelated, coherent energy,
perhaps that is what I'm seeing.

That could be it, because the deterioration appears to go up with
frequency where more and more upper harmonics exist, but the difference
does not appear to go up linearly, more abruptly at the high end. But,
that random increase could be just an accident of signal harmonic
locations.
 
R

RobertMacy

Think average sum versus sum of averages.
If SNR is sufficiently high then both are equvalent.
At low SNR, average of sum is more accurate.

Vladimir Vassilevsky
DSP and Mixed Signal Designs
www.abvolt.com

Years ago from memory, simulating signals in white noise; there was NO
difference [within numerical accuracies 10^-15 ] between the two
techniques. However, you have me curious to go back and do the experiments
again, this time with 'normal' S/N and with extremely low S/N. just to see
if the simulations predict a difference. I predict there will be NO
difference in the simulation results, BECAUSE the results were identical
except for numerical accuracies. It would seem that any potential
differences related to the original S/N would have shown up. ...just
thinking out loud BEFORE doing all that work.

In the ACTUAL data when comparing the two techniques:
The average of a series of short packet langths caused the higher
frequency signals to completely disappear from the spectral display.
The very long packet length the higher frequency signals were very
discernible.

That seems like the opposite of your statement that average of the sum of
short lengths would yield better results.
 
R

RobertMacy

The difference is between

|FFT{all data}|

and

Average( |FFT{N subsets of data}| ) useful

vs

| FFT{ Average(subsets of data) } | likely to be meaningless

It is taking the absolute modulus of the FFT that makes the process
non-linear and allows independent data to pull signal out of noise.

Also for the OP you need to decide whether the interesting frequencies
are the high ones or the low ones and choose your subsets accordingly.

Don't understand your comment. Why is it important to 'select' interesting
frequencies?

At least simulations show that energy is energy and S/N does not matter
whether in low freq band or high freq band. The FFT process pulls them out
regardless.
 
M

Martin Brown

Don't understand your comment. Why is it important to 'select'
interesting frequencies?

Think carefully about whether you are most interested in low frequency
detail or high frequency detail if you intend to segment the data. There
is more than one way of forming subsets of the raw data.

Concrete small example :

a b c d e f g h i j k l m n o p 16 values

taken as subsets of 4 you have the most obvious sequential

a b c d e f g h i j k l m n o p

This gives you maximum sensitivity to the alternating component.

Real implementations with windowing may overlap the data.

But you could study only lower frequencies forom the same time series by
subsampling every fourth data point as

a e i m b f j n c g k o d h l p

If there is a real risk of out of band components being present (as
there would be with white noise) you have to preconvove the time series
with a suitable short filter kernel to control it.

This is in essence what FFTs do behind the scene for decimation in time
and since you don't care about the phases if you are going to average
the power spectra you can study detail in the lower frequency band more
effectively this way (with some risk of aliassing).
At least simulations show that energy is energy and S/N does not matter
whether in low freq band or high freq band. The FFT process pulls them
out regardless.

The only question here is whether you can get to the same result faster.
There is no free lunch but sometimes you can get a discount...
 
G

George Herold

What about truncation/windowing artifacts?
Fun thread! So I'm fairly ignorant when it comes to windows and FFT's. But if you were going to average 100 FFT's with no window would the edge/clipping artifacts tend to average away?
(Next time I have the SRS770 fired up I'll have to try. Nothing like real data!)

Windowing seems to cause a lot of confusion. A colleague was doing FFT's of ringing exponential decays and then finally understood that he wanted no window.

Say I think it's mentioned before, but what's the classic text for DFT's (Discrete Fourier Transforms)?

George H.
 
G

George Herold

The difference is between

|FFT{all data}|

and

Average( |FFT{N subsets of data}| ) useful

vs

| FFT{ Average(subsets of data) } | likely to be meaningless
Well not completely meaningless if you make it a conditional average.
So feed noise into your DSO, take the FFT.
Now average the input, but trigger near the top of the signal.
And then ask for the FFT.
(This also takes out all the end effects so a flat top window is OK.. oh assuming that the time base zero is in the center of the display and not at the left hand side.)

Hmm (thinking out loud) I guess this depends a bit on what the noise looks like.
If it's a random series of delta functions, then I'm not sure if it still 'works'.
George H.
 
M

Martin Brown

No. They would be coherent on every dataset with an edge discontinuity.
Random noise will average away but systematic defects caused by the
implicit periodicity of the FFT can easily bite you in the backside.

There are also several conventions on how the tiling periodicity is
done. I view the traditional tiling as translational.

true data : ... ? ? ? a b c d ? ? ? ...

DFT input data : ... 0 0 0 a b c d 0 0 0 ...

FFT implicit : ... a b c d a b c d a b c d ...
(translation)

FFT implicit': ... a b c d d c b a a b c d ...
(mirror)

or

FFT implicit": ... a b c d c b a b c d ...
(mirror)

Jodrell bank used one of the latter two for MERLIN. I forget which.
It is closely related to the cosine transform that underpins JPEG.

It affects how artefacts from the edge discontinuity influence things.
Well, the old one on how to code FFTs is by E. Oran Brigham, but the
best imo is Bracewell's "The Fourier Transform and Its Applications".
Bracewell himself was a kick in the ass.

Although some of what he wrote wrt the Fast Hartley Transform was a
little misleading for non-practitioners it may actually help the OP.
Depending on whether he has a real valued time series to transform.

http://en.wikipedia.org/wiki/Discrete_Hartley_transform

I'd also vote for Bracewell's book as an introduction to FFTs & DFTs.

There are many more recent books on implementing FFTs and FFTW is now
the one to beat - their classic paper is also online:

http://fftw.org/newsplit.pdf

Although for some of the intricate practicalities the optimal gridding
functions for using FFT to compute DFT are in one of the VLA research
papers by Schwab (~1980). In summary the definition is here:

http://www.aoc.nrao.edu/~wyoung/lacook/html/CookHookap7.html

The paper memo scan is still online although it isn't an easy read.

http://www.vla.nrao.edu/memos/sci/132.pdf

It deals with the problem of gridding non-uniform sampled Fourier data
where the intention is to compute an ideal DFT using an FFT to do it. It
is also applicable to controlling aliasing elsewhere.

It is important to remember that a multiplication in the time domain is
a convolution in frequency and vice-versa. Choosing how you prepare your
data for transformation is important if you care about the results.
 
G

George Herold

On 12/17/2013 11:32 AM, George Herold wrote:


Well, the old one on how to code FFTs is by E. Oran Brigham, but the
best imo is Bracewell's "The Fourier Transform and Its Applications".
Bracewell himself was a kick in the ass.

Great, Bracewell is the name I'm looking for. Any wisdom on the various editions?
It looks like I can get the first for ~$15
the second for ~$45 (less if I order from France.)
And the current third edition is ~$100 (in paper back, Yuck.)

No Kindle edition offered :^)
OT.. Kindle is the only type of books my daughter reads these days.
(Well at least she is reading!)
But I complain to her that when I was young and a new book came out I'd have to wait a year for the paperback version that I could afford. Now when she finishes one it's click-click-click and the next in the series is down loaded, she doesn't even have to hike herself to the library or book store....
kids these days.

George H.
 
G

George Herold

No. They would be coherent on every dataset with an edge discontinuity.
Random noise will average away but systematic defects caused by the
implicit periodicity of the FFT can easily bite you in the backside.

There are also several conventions on how the tiling periodicity is
done. I view the traditional tiling as translational.

true data : ... ? ? ? a b c d ? ? ? ...

DFT input data : ... 0 0 0 a b c d 0 0 0 ...

FFT implicit : ... a b c d a b c d a b c d ...

(translation)

FFT implicit': ... a b c d d c b a a b c d ...

(mirror)

or

FFT implicit": ... a b c d c b a b c d ...
(mirror)

Jodrell bank used one of the latter two for MERLIN. I forget which.
It is closely related to the cosine transform that underpins JPEG.

Thanks for that Martin, the mirroring formats are interesting.
But then do you have to double your data set, so it takes longer to do the FFT?
So one period is now abcddcba (or abcdcba) rather than abcd. I admit to having no intuition for FFT's. (Bracewell may help in that regard.)

George H.
 
P

Phil Hobbs

Great, Bracewell is the name I'm looking for. Any wisdom on the
various editions? It looks like I can get the first for ~$15 the
second for ~$45 (less if I order from France.) And the current third
edition is ~$100 (in paper back, Yuck.)
The second edition is sort of the classic. I have the second and third,
but it's the second that I find myself looking up stuff in. The third
has all the Hartley stuff, but that's a niche interest.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics

160 North State Road #203
Briarcliff Manor NY 10510

hobbs at electrooptical dot net
http://electrooptical.net
 
M

Martin Brown

Thanks for that Martin, the mirroring formats are interesting.
But then do you have to double your data set, so it takes longer to do the FFT?

Heck no! The mirror symmetry is coded into the transform implementation
in a devious way to minimise non-trivial complex multiplications.

Here is a paper on the 8 point DCT that illustrates the principles (not
an easy read and unchecked but it looks about right at a glance).

http://homepages.cae.wisc.edu/~ece734/references/feig92.pdf
So one period is now abcddcba (or abcdcba) rather than abcd. I admit to having no intuition for FFT's. (Bracewell may help in that regard.)

The crucial point is you no long have sharp edge discontinuities, but
you pay for it with a different set of artefacts. There are tricks now
to incredibly accurately compute a true DFT for a subset range of the
FFT by putting a guard band on (and discarding) the ends of the array.

Most of this got discovered when the lowest noise receivers came along
and dynamic range of synthetic aperture radar and radio astronomy was
pushed to the limits by self calibration and then systematic data
processing faults started to show up above the noise floor.
 
Top