FFT short time fourier transform (STFT) question

Jamie · Aug 11, 2011

Hi,

I have begun to use an FFT in software, and I started thinking about how
to turn the intensity over frequency FFT chart into a frequency over
time chart, ie, by making a strip chart, time/frequency domain "Gabor
spectrum STFT" with each XY point of the chart having an intensity based
on the power at that frequency. So I was thinking to take that a step
further (and push the limits of a modern CPU), instead of generating the
strip chart data with FFT blocks of 1000 samples, what would happen if
there was an FFT generated for each incoming sample, ie 1000 FFT's per
second for a 1kHz sample rate, with each new FFT centered on the next
incoming sample. If this data was then stripcharted (1000 new columns
per second added to the chart) for a sine wave input of 250Hz, what
would the frequency over time greyscale image look like? For a normal
Gabor spectrum STFT chart (ie. 1 FFT per 1000 samples) with a 250Hz sine
wave input and 1kHz sampling frequency, the chart should just show a
steady intensity peak from left to right half way up the chart (500Hz
Nyquist frequency would be the top of the chart). Also this might be a
good way to encode/decode data for communication, in a 2-dimensional
representation that is converted via FFT's to and from voltage/time for
each sample, as long as the hardware can do one FFT per sample.

Gabor spectrum (STFT short time fourier transform chart:
http://sonicawe.com/content/tutorials/img/tut5/image06.png

cheers,
Jamie

Jamie · Aug 11, 2011

I think the real benefit of 1FFT per sample is it gives instant impulse
response compared to 1FFT per 1000 samples. ie. if the 250Hz sine wave
is changed to 225Hz, each new per sample FFT will have a moving
intensity line and the slope of this line will be proportional to how
fast the 250Hz was changed to 225Hz.

cheers,
Jamie

Martin Brown · Aug 11, 2011

Hi,

I have begun to use an FFT in software, and I started thinking about how
to turn the intensity over frequency FFT chart into a frequency over
time chart, ie, by making a strip chart, time/frequency domain "Gabor
spectrum STFT" with each XY point of the chart having an intensity based
on the power at that frequency. So I was thinking to take that a step
further (and push the limits of a modern CPU), instead of generating the
strip chart data with FFT blocks of 1000 samples, what would happen if
there was an FFT generated for each incoming sample, ie 1000 FFT's per
second for a 1kHz sample rate, with each new FFT centered on the next
incoming sample. If this data was then stripcharted (1000 new columns
per second added to the chart) for a sine wave input of 250Hz, what
would the frequency over time greyscale image look like? For a normal
Gabor spectrum STFT chart (ie. 1 FFT per 1000 samples) with a 250Hz sine
wave input and 1kHz sampling frequency, the chart should just show a
steady intensity peak from left to right half way up the chart (500Hz
Nyquist frequency would be the top of the chart). Also this might be a
good way to encode/decode data for communication, in a 2-dimensional
representation that is converted via FFT's to and from voltage/time for
each sample, as long as the hardware can do one FFT per sample.

The main effect of this if as seems likely from the question that you
don't know what you are doing is that you will see variable ripple
effects in the sidebands caused by the edge discontinuities at the start
and end of the sampling window. You would be better off doing larger
overlaps if you really need time domain resolution beyond what
sequential blocking will give you. The term for it in their application
is "Overlap". BTW such diagrams are often easier to interpret in false
colour rather than monochrome since you can distinguish several orders
of magnitude difference in signal strength more easily.

Gabor spectrum (STFT short time fourier transform chart:
http://sonicawe.com/content/tutorials/img/tut5/image06.png

DAQARTA does this stuff out of the box and is pretty good at it too.
Why reinvent the wheel?

Regards,
Martin Brown

Jamie · Aug 11, 2011

The main effect of this if as seems likely from the question that you
don't know what you are doing is that you will see variable ripple
effects in the sidebands caused by the edge discontinuities at the start
and end of the sampling window.

I'm just using a rectangular FFT windowing function so far.

You would be better off doing larger

overlaps if you really need time domain resolution beyond what
sequential blocking will give you. The term for it in their application
is "Overlap". BTW such diagrams are often easier to interpret in false
colour rather than monochrome since you can distinguish several orders
of magnitude difference in signal strength more easily.

I think there must be a formula to show that there is increased
frequency resolution available based on the amount of FFT overlap. ie.
like ADC voltage oversampling gives more effective voltage resolution
bits, overlapped FFT's should give more frequency bins, up to the max
overlap of the FFT window size - 1 sample. (1 FFT per sample).
Theoretically to add the most extra frequency bins at the highest time
resolution, the FFT window size should be as large as possible (n), and
the FFT overlap should be the FFT window size - 1 sample.

I take a wild guess that for non-overlap FFT with 2^x bins, adding the
full n-1 overlap can theoretically increase the effective number of
frequency bins to 2^(x+p) where p is some small integer related to the
FFT window size n. For an FFT window size n = 2^10, p might be 5 or 10
at most. That would require some form of post processing to generate
the missing extra frequency bins from all that FFT data though, similar
to decimation of ADC sample rate. Also 1 FFT per sample will generate
data at 1024x the rate of the sample frequency for a FFT window size of
1024, and use 1024 times more CPU than normal FFT code, but its possible
to get more effective frequency bins I am pretty sure with that technique.

cheers,
Jamie

Jamie · Aug 11, 2011

The main effect of this if as seems likely from the question that you
don't know what you are doing is that you will see variable ripple
effects in the sidebands caused by the edge discontinuities at the start
and end of the sampling window. You would be better off doing larger
overlaps if you really need time domain resolution beyond what
sequential blocking will give you. The term for it in their application
is "Overlap". BTW such diagrams are often easier to interpret in false
colour rather than monochrome since you can distinguish several orders
of magnitude difference in signal strength more easily.

DAQARTA does this stuff out of the box and is pretty good at it too.
Why reinvent the wheel?

DAQARTA has the limitation that there is a tradeoff between frequency
and temporal resolution in the spectrogram output.

I found this: www.comm.utoronto.ca/~dimitris/ece431/slidingdft.pdf

sliding DFT, it is designed to do 1FFT per incoming sample, and once the
first FFT is done, each successive FFT only takes "two real adds and one
complex multiply" independent of the FFT window size! (assuming the last
incremental FFT is available)

So now I just need to find an FFT decimation filter to crunch the 1 FFT
per sample data down, then can increase frequency and temporal
resolution on a spectrogram at the same time

cheers,
Jamie

Martin Brown · Aug 11, 2011

DAQARTA has the limitation that there is a tradeoff between frequency
and temporal resolution in the spectrogram output.

There is *always* a tradeoff between frequency resolution and time. They
are conjugate variables in the Fourier transform, confine one and the
other spreads out - this is intrinsic to the mathematics.

If you want better frequency resolution and have decent signal to noise
then you can use maximum entropy which gives the optimum tradeoff of
resolution against signal to noise but with some risk of unfamiliar
artefacts for practitioners not skilled in Fourier interpretation.

An introduction is in Numerical Recipes (whilst I don't think much of
their algorithm or code it does sort of work)

http://homepage.univie.ac.at/mario.barbatti/papers/NRF/bookfpdf/f13-7.pdf

Better algorithms do exist. You can't use them realtime unless you have
an exceptionally fast processor, but offline it will give you the most
precise answer for the frequencies of significant components in a time
series with an accuracy that depends on their intensity.

Strong signals have a much better defined frequency than weak ones.
Overfit the system noise and they will spuriously split into two.

I found this: www.comm.utoronto.ca/~dimitris/ece431/slidingdft.pdf

sliding DFT, it is designed to do 1FFT per incoming sample, and once the
first FFT is done, each successive FFT only takes "two real adds and one
complex multiply" independent of the FFT window size! (assuming the last
incremental FFT is available)

So now I just need to find an FFT decimation filter to crunch the 1 FFT
per sample data down, then can increase frequency and temporal
resolution on a spectrogram at the same time

You do realise this only calculates a single known frequency component
don't you?

To do a whole spectrum this way would be O(N^2) not fast at all. Or
require massive hardware parallelism (as is done for sigint devices).

Regards,
Martin Brown

Jamie · Aug 11, 2011

There is *always* a tradeoff between frequency resolution and time. They
are conjugate variables in the Fourier transform, confine one and the
other spreads out - this is intrinsic to the mathematics.

If you want better frequency resolution and have decent signal to noise
then you can use maximum entropy which gives the optimum tradeoff of
resolution against signal to noise but with some risk of unfamiliar
artefacts for practitioners not skilled in Fourier interpretation.

An introduction is in Numerical Recipes (whilst I don't think much of
their algorithm or code it does sort of work)

http://homepage.univie.ac.at/mario.barbatti/papers/NRF/bookfpdf/f13-7.pdf

Better algorithms do exist. You can't use them realtime unless you have
an exceptionally fast processor, but offline it will give you the most
precise answer for the frequencies of significant components in a time
series with an accuracy that depends on their intensity.

Strong signals have a much better defined frequency than weak ones.
Overfit the system noise and they will spuriously split into two.

You do realise this only calculates a single known frequency component
don't you?

Hi,

Thanks, I missed that line: "the kth value of the DFT". I am starting
to think the best method for what I would like to do is to use parallel
bandpass filters, with slight overlap. Setting a threshold trigger for
delta-power over time on the output of each of these filters can give
accurate time resolution of frequency events that an FFT can't do.

cheers,
Jamie

Jon Kirwan · Aug 12, 2011

<snip>
Thanks, I missed that line: "the kth value of the DFT". I am starting
to think the best method for what I would like to do is to use parallel
bandpass filters, with slight overlap. Setting a threshold trigger for
delta-power over time on the output of each of these filters can give
accurate time resolution of frequency events that an FFT can't do.
<snip>

biquads?

http://www.maxim-ic.com/app-notes/index.mvp/id/1762

Jon

Jon Kirwan · Aug 12, 2011

<snip>
Five years ago I built a 2 GSPS real-time FFT engine using
big Xilinx Virtex FPGAs that spit out a 1024-point FFT every
1.2 ms (total frame rate was 1.5 ms)
<snip>

That's fast, but I was doing 1024 pt, complex-in, complex-out
floating point FFTs on an integer-only ADSP-2111 and memory
says it took slightly over 3ms -- back in 1992 and before
FFTW was around. I'd have to dig up the details to be
absolutely sure, but I coded it and that's what I recall now
as the rate (we needed less than 4.5ms.) I'm not so
impressed that in 2006 on fancy and big FPGAs the rate would
be 1.2ms (and not knowing if it handled complex in/out
values.) I learned out of Brigham's Fast Fourier Transform
and Its Applications. A book I definitely can recommend,
even if it is old. Probably some better stuff around now
with more graphics to help hammer in the points. But I've
not been exposed to recent writings.

Jon

DonMack · Aug 12, 2011

"Jamie" wrote in message
Hi,

I have begun to use an FFT in software, and I started thinking about how
to turn the intensity over frequency FFT chart into a frequency over
time chart, ie, by making a strip chart, time/frequency domain "Gabor
spectrum STFT" with each XY point of the chart having an intensity based
on the power at that frequency. So I was thinking to take that a step
further (and push the limits of a modern CPU), instead of generating the
strip chart data with FFT blocks of 1000 samples, what would happen if
there was an FFT generated for each incoming sample, ie 1000 FFT's per
second for a 1kHz sample rate, with each new FFT centered on the next
incoming sample. If this data was then stripcharted (1000 new columns
per second added to the chart) for a sine wave input of 250Hz, what
would the frequency over time greyscale image look like? For a normal
Gabor spectrum STFT chart (ie. 1 FFT per 1000 samples) with a 250Hz sine
wave input and 1kHz sampling frequency, the chart should just show a
steady intensity peak from left to right half way up the chart (500Hz
Nyquist frequency would be the top of the chart). Also this might be a
good way to encode/decode data for communication, in a 2-dimensional
representation that is converted via FFT's to and from voltage/time for
each sample, as long as the hardware can do one FFT per sample.

Gabor spectrum (STFT short time fourier transform chart:
http://sonicawe.com/content/tutorials/img/tut5/image06.png

cheers,
Jamie

--------------------
It won't do any good. There is an innate inverse relationship between
frequency and time(f = 1/t). The Fourier transform's basis functions are not
localized and cannot describe local behavior well.

Take a square wave. The Fourier transform of this has infinite bandwidth.
Yet the function is one of the most basic functions that exist. The Fourier
transform has extreme difficulties describing functions that have sharp
transitions because sharp transitions occur over short periods of time and
short periods of time have high frequencies(the f = 1/t thing).

We generally like to think of a square wave or any other period function
with period p as having a frequency of 1/p. The Fourier transform can
recover this fundamental frequency but really it can't. Remember we have to
apply a low pass filter to it(mathematically we can take the limit though).

So, if something simple as a square wave has infinite bandwidth no matter
what kinda fourier transform you use you won't be able to describe it
accurately with finite bandwidth. Your sliding windowed Fourier transform
can't do any better than a multiresolution windowed Fourier transform.

There are several transforms that take a time based signal and transform it
in a frequency and time. The wavelet transform is one such transform.

But no matter how you slice and dice it, frequency and time are inversely
correlated. You can attempt to use a basis that is more local than the
fourier basis and you'll end up with something like wavelets. You can go
further and attempt to use the signal itself to determine the basis and get
something like empirical mode decomposition.

Ultimately you'll always end up with the same problem of trying to narrow
down the frequency and time and it just can't be done.

For example, what is the frequency of some signal s(t) at time t = 0? It's a
nonsensical question. Ultimately what you have to do to answer such a
question is not use the point at t = 0 but an interval around that point.
This allows you to "zoom out" a bit and see what the signal is doing and
hence determine it's frequency. For lower frequencies you have to zoom out
more because they have a larger period(p = 1/f and when f is small p is
large).

The real issue at hand is we define frequency as the rate of something
occurring periodically. Mathematically we actually compare it to sinusoids.
That is, sinusoids is the model for frequency. But a sinusoid has no
instantaneous frequency. Therefore anything that is composed of sinusoids
will not have instantaneous frequencies. Wavelets have more of a localized
frequency but now frequency becomes more vague(in fact wavelets do not have
frequency and we are comparing apples to oranges now).

There is a way to define instantaneous frequency as a mathematical
generalization:

http://en.wikipedia.org/wiki/Instantaneous_phase

The problem is that it seems to be useless to interpret as a real frequency
or we just haven't found out how.

i.e., with the instantaneous frequency we can actually answer the above
question about the frequency of s(t) at time 0 but don't expect for it to
make a lot of sense.

The main point is that you may not get much benefit but you are limited
mathematically so no matter your algorithms are they can't overcome the
frequency/time limitations described above. It is not inherent in the
mathematics itself but in the concept of frequency.

Jon Kirwan · Aug 12, 2011

The input arrays were real, but to double-up the rate I fed
two real arrays as one complex array into the FFT engine,
then I would extract the two FFTs on the other side (I will
have to dig up my design notes to get the algorithm, it's
not something I can remember in detail right now).

No need. I'm well aware of the technique. Used it many
times. Can describe it, even.

Anyway, now I'm mostly curious if you are saying that you
were doing _two_ 1024 pt real arrays in 1.2ms or if you were
actually doing them in 2.4ms in a complex-in complex-out 1024
pt FFT that you packed and then disassembled. Completely
different performance meanings to me, depending.

Jon

Martin Brown · Aug 12, 2011

Five years ago I built a 2 GSPS real-time FFT engine using big Xilinx Virtex FPGAs that spit out a 1024-point FFT every 1.2 ms (total frame rate was 1.5 ms). Dozens of these FFTs were raster-scanned to a video display with intensity displayed as different colors, but I am not at liberty to discuss what was being viewed, or why. It was pretty awesome, though.

At the time, it was the 2nd-fastest of its kind in the world. The fastest one usedVirtex-II devices, but I never heard if they actually got there system put to use, or not. Mine was put to use.

I saw one a long time ago ~1983-4. On "loan" to a big dish radio
observatory and it came with two US guards for the duration.
At the time it was the fastest of its kind in the world.

I am not so sure your idea *still* doesn't need hardware (as opposed to software) to run at the speed you describe.

I think he will struggle. FFTW would be the first thing to try to see if
the ballpark performance figures are even remotely close. Then stripping
it down to a hardwired fixed length transform and choosing a CPU with
appropriate cache lines to optimise a 1024 real transform. It is
anybodies guess what the best factorisation will be these days.

Intels IPPS benchmarks about 20% faster than FFTW3 at these lengths.

Temporal resolution could be improved using sub blocks to preserve
intermediate results and then twiddled together to avoid rework. The out
of place transforms are usually faster so this should not hurt.

Regards,
Martin Brown

Jamie · Aug 12, 2011

biquads?

http://www.maxim-ic.com/app-notes/index.mvp/id/1762

Jon

yep good guess

I am using this one here:
http://www.musicdsp.org/showone.php?id=38

cheers,
Jamie

Jon Kirwan · Aug 12, 2011

The system was composed of two 1 GSPS digitizers from
Thompson CSF with the companion Demux chips. They were run
on opposite phase clocks. The Demuxed outputs were fed into
a Xilinx Virtex 2000 for framing and scaling (and Gray to
offset binary conversion), then sent to a Virtex 3000 series
which contained the FFT engine.

The first FPGA stored a frame of data in internal SRAM and
passed framed data as two real 1024-point arrays to the FFT
engine, loaded as the {Re} and {Im} components of one
complex 1024-pt array. Then it would extract the two FFTs
from the output of the engine.

The FFT engine did its work in about 800 usec (IIRC). The
rest of the 1.2 ms was for extraction and data transfers.
The total frame time was 1.5 ms because another 300 us was
devoted to transmitting the extracted FFT arrays to a video
processor DSP, off-board.

Now that I am remembering more about this, I recall there
were four FFT engines working in parallel.

Too many projects have come and gone since that one.

Tom Pounds
near Albuquerque

Now I understand much better and am more impressed, overall.
Thanks very much for the detailed recollections. I really
enjoyed reading this.

Jon

Jamie · Aug 13, 2011

in message
Hi,

I have begun to use an FFT in software, and I started thinking about how
to turn the intensity over frequency FFT chart into a frequency over
time chart, ie, by making a strip chart, time/frequency domain "Gabor
spectrum STFT" with each XY point of the chart having an intensity based
on the power at that frequency. So I was thinking to take that a step
further (and push the limits of a modern CPU), instead of generating the
strip chart data with FFT blocks of 1000 samples, what would happen if
there was an FFT generated for each incoming sample, ie 1000 FFT's per
second for a 1kHz sample rate, with each new FFT centered on the next
incoming sample. If this data was then stripcharted (1000 new columns
per second added to the chart) for a sine wave input of 250Hz, what
would the frequency over time greyscale image look like? For a normal
Gabor spectrum STFT chart (ie. 1 FFT per 1000 samples) with a 250Hz sine
wave input and 1kHz sampling frequency, the chart should just show a
steady intensity peak from left to right half way up the chart (500Hz
Nyquist frequency would be the top of the chart). Also this might be a
good way to encode/decode data for communication, in a 2-dimensional
representation that is converted via FFT's to and from voltage/time for
each sample, as long as the hardware can do one FFT per sample.

Gabor spectrum (STFT short time fourier transform chart:
http://sonicawe.com/content/tutorials/img/tut5/image06.png

cheers,
Jamie

Hi,

A single FFT done over 1024 samples, will give a linear frequency bin
output on the X-axis. However theoretically the frequency bin
dispersion can be logarithmic, so that there are more frequency bins
closer together at higher frequency for the same timestep. A single FFT
can't produce this data though as it averages over the whole sample
period. That is where using multiple overlapped FFT's is a possible
advantage, as a spectrogram chart calculated using overlapped FFT's
should be able to show a logarithm Y axis with the frequency bins
getting closer together up the Y-axis, giving more time and frequency
resolution for that timestep than lower frequency bins in the same timestep.

cheers,
Jamie

Martin Brown · Aug 14, 2011

Hi,

A single FFT done over 1024 samples, will give a linear frequency bin
output on the X-axis. However theoretically the frequency bin dispersion
can be logarithmic, so that there are more frequency bins closer
together at higher frequency for the same timestep. A single FFT can't
produce this data though as it averages over the whole sample period.
That is where using multiple overlapped FFT's is a possible advantage,
as a spectrogram chart calculated using overlapped FFT's should be able
to show a logarithm Y axis with the frequency bins getting closer
together up the Y-axis, giving more time and frequency resolution for
that timestep than lower frequency bins in the same timestep.

ITYM the opposite of what you said. There is an efficient constant Q
fast transform that gives something vaguely useful for audio and studio
band equaliser type displays. But it has worse resolution with
increasing frequency in qualitatively the same was as the human ear.

There are techniques to get very high resolution frequency measurements
from limited time sampled data provided that the signal to noise ratio
is sufficiently high. Heuristically time between zero crossings on a
simple waveform will give you a feel for why this is possible. To do it
for an unknown signal requires some complex mathermatics and it isn't
really amenable to realtime displays taking typically a couple of orders
of magnitude more processing time. But if you are stuck with the
measurements you can actually make and need the answer it can be useful.

Regards,
Martin Brown

Jamie · Aug 14, 2011

ITYM the opposite of what you said. There is an efficient constant Q
fast transform that gives something vaguely useful for audio and studio
band equaliser type displays. But it has worse resolution with
increasing frequency in qualitatively the same was as the human ear.

There are techniques to get very high resolution frequency measurements
from limited time sampled data provided that the signal to noise ratio
is sufficiently high. Heuristically time between zero crossings on a
simple waveform will give you a feel for why this is possible.To do it
for an unknown signal requires some complex mathermatics and it isn't
really amenable to realtime displays taking typically a couple of orders
of magnitude more processing time. But if you are stuck with the
measurements you can actually make and need the answer it can be useful.

Hi,

Thanks for the clarification, I find it a fairly confusing topic!

I was looking at wavelets a bit which seem like they would be quite useful:
http://www.compsensor.com/images/wavelet.gif

cheers,
Jamie

josephkk · Aug 14, 2011

I'm just using a rectangular FFT windowing function so far.

You would be better off doing larger

I think there must be a formula to show that there is increased
frequency resolution available based on the amount of FFT overlap. ie.
like ADC voltage oversampling gives more effective voltage resolution
bits, overlapped FFT's should give more frequency bins, up to the max
overlap of the FFT window size - 1 sample. (1 FFT per sample).
Theoretically to add the most extra frequency bins at the highest time
resolution, the FFT window size should be as large as possible (n), and
the FFT overlap should be the FFT window size - 1 sample.

I take a wild guess that for non-overlap FFT with 2^x bins, adding the
full n-1 overlap can theoretically increase the effective number of
frequency bins to 2^(x+p) where p is some small integer related to the
FFT window size n. For an FFT window size n = 2^10, p might be 5 or 10
at most. That would require some form of post processing to generate
the missing extra frequency bins from all that FFT data though, similar
to decimation of ADC sample rate. Also 1 FFT per sample will generate
data at 1024x the rate of the sample frequency for a FFT window size of
1024, and use 1024 times more CPU than normal FFT code, but its possible
to get more effective frequency bins I am pretty sure with that technique.

cheers,
Jamie

No, it does not increase the frequency bins in any way. It does add back
some temporal information so that you can see how the various frequency
lines behave versus time.

?-)

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

FFT short time fourier transform (STFT) question

FFT short time fourier transform (STFT) question

Jamie

Jamie

Martin Brown

Jamie

Jamie

Martin Brown

Jamie

Jon Kirwan

Jon Kirwan

DonMack

Jon Kirwan

Martin Brown

Jamie

Jon Kirwan

Jamie

Martin Brown

Jamie

josephkk

Similar threads