Maker Pro
Maker Pro

16-bit SPI trying to read from 22-clock cycle ADC

K

krw

Well, I was wrong in at least one thing: I thought that, with CSAAT=0,
CS would be deasserted (high) between consecutive "word transfers"
within one "block transfer", but it is not. I had clear from the
beginning (from diagrams and text) that there was a way to keep CS=0
between word transfers, but I thought that it implied CSAAT=1, and it
is not true. CS is 0 between consecutive word transfers (of the same
block transfer) regardless of the value of CSAAT.

So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention
needed (other than at the beginning and at the end of each block
transfer), and I can use DMA, with two 11-bit word transfers per block
transfer.


This is good, but I think that it could be better. Difficult to
explain, but I'll try:

Imagine my external ADC (with SPI interface) is sampling the analog
input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
10 us between samples. Not much. Each sample needs 22-clock cycles
inside each assertion of CS=0, so each sample needs one DMA block
transfer (with for instance two 11-bit word transfers inside). Each
DMA block transfer needs CPU intervention. So, I need CPU intervention
every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
Since (that I know) a DMA block transfer cannot be triggered directly
by a timer overflow or underflow, an interrupt service routine
(triggered by a 10 us timer underflow) must be executed every so
often, so that the CPU can manually trigger the DMA block transfer and
collect the data. Adding up the overhead of the interrupt context
switching and the instructons needed to move data from and to the
block buffers, to re-trigger the block transfer, and all this in C++,
I think that all that may consume a "significant" portion of those 480
cycles. And the CPU is supposed to do something with that data, and
some other things. I see that hog as a killer, or at least as a pitty.

If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
allowed triggering the next word transfer (inside a block transfer)
when a certain timer underflows, then the DMA blocks wouldn't need to
be so small. Each analog sample could travel in one single SPI word
transfer, and one DMA block could be planned to carry for instance
1000 word transfers. That would be one DMA block every 10 ms. The
buffer (FIFO) memory would be larger, but the CPU intervention needed
would be much lower. There would be the same number of useful cycles,
but much fewer wasted cycles. There wouldn't need to exist an
interrupt service routine executed every 10 us, which is a killer.
That would be a good SPI and a good DMA, in my opinion, and the extra
cost in silicon is negligible, compared to the added benefit. Why
don't most MCUs allow that? Even cheap MCUs could include that. An MCU
with the price of a SAM7 should include that, in my opinion.

It's not a matter of silicon area, but what SPI devices do they want
to cover. SPI is a thousand twisty little passages, all different.
How are they going to service them all? The bottom line is that they
put enough in to put the bullet on the front page of the datasheet.
If you want custom I/O do it in an FPGA.
 
M

Meindert Sprang

Bill said:
Imagine my external ADC (with SPI interface) is sampling the analog
input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
10 us between samples. Not much. Each sample needs 22-clock cycles
inside each assertion of CS=0, so each sample needs one DMA block
transfer (with for instance two 11-bit word transfers inside). Each
DMA block transfer needs CPU intervention. So, I need CPU intervention
every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
Since (that I know) a DMA block transfer cannot be triggered directly
by a timer overflow or underflow, an interrupt service routine
(triggered by a 10 us timer underflow) must be executed every so
often, so that the CPU can manually trigger the DMA block transfer and
collect the data. Adding up the overhead of the interrupt context
switching and the instructons needed to move data from and to the
block buffers, to re-trigger the block transfer, and all this in C++,
I think that all that may consume a "significant" portion of those 480
cycles. And the CPU is supposed to do something with that data, and
some other things. I see that hog as a killer, or at least as a pitty.

In those instances, I always revert to assembly language for the interrupt
part. Handle things as quickly as you can, and a context switch suddenly
isn't half that bad. No more than pushing and popping a few registers. Hell,
I even did a Fast interrupt handler in C on a Motorola DSP56K. The interrupt
occured every 200 ns.... (no typo). Just... be smart...

Meindert
 
B

Bill

In those instances, I always revert to assembly language for the interrupt
part. Handle things as quickly as you can, and a context switch suddenly
isn't half that bad. No more than pushing and popping a few registers. Hell,
I even did a Fast interrupt handler in C on a Motorola DSP56K. The interrupt
occured every 200 ns.... (no typo). Just... be smart...

What was the clock frequency of your DSP56K?
 
U

Ulf Samuelsson

Bill skrev:
Well, I was wrong in at least one thing: I thought that, with CSAAT=0,
CS would be deasserted (high) between consecutive "word transfers"
within one "block transfer", but it is not. I had clear from the
beginning (from diagrams and text) that there was a way to keep CS=0
between word transfers, but I thought that it implied CSAAT=1, and it
is not true. CS is 0 between consecutive word transfers (of the same
block transfer) regardless of the value of CSAAT.

So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention
needed (other than at the beginning and at the end of each block
transfer), and I can use DMA, with two 11-bit word transfers per block
transfer.


This is good, but I think that it could be better. Difficult to
explain, but I'll try:

Imagine my external ADC (with SPI interface) is sampling the analog
input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
10 us between samples. Not much. Each sample needs 22-clock cycles
inside each assertion of CS=0, so each sample needs one DMA block
transfer (with for instance two 11-bit word transfers inside). Each
DMA block transfer needs CPU intervention. So, I need CPU intervention
every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
Since (that I know) a DMA block transfer cannot be triggered directly
by a timer overflow or underflow, an interrupt service routine
(triggered by a 10 us timer underflow) must be executed every so
often, so that the CPU can manually trigger the DMA block transfer and
collect the data. Adding up the overhead of the interrupt context
switching and the instructons needed to move data from and to the
block buffers, to re-trigger the block transfer, and all this in C++,
I think that all that may consume a "significant" portion of those 480
cycles. And the CPU is supposed to do something with that data, and
some other things. I see that hog as a killer, or at least as a pitty.

If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
allowed triggering the next word transfer (inside a block transfer)
when a certain timer underflows, then the DMA blocks wouldn't need to
be so small. Each analog sample could travel in one single SPI word
transfer, and one DMA block could be planned to carry for instance
1000 word transfers. That would be one DMA block every 10 ms. The
buffer (FIFO) memory would be larger, but the CPU intervention needed
would be much lower. There would be the same number of useful cycles,
but much fewer wasted cycles. There wouldn't need to exist an
interrupt service routine executed every 10 us, which is a killer.
That would be a good SPI and a good DMA, in my opinion, and the extra
cost in silicon is negligible, compared to the added benefit. Why
don't most MCUs allow that? Even cheap MCUs could include that. An MCU
with the price of a SAM7 should include that, in my opinion.

Best,

An idea:

Run a timer which is connected to the SSC input clock and ADC clock.
It also clocks another timer in PWM mode generating
the ADC chip select.

The ADC will see 22 active and 10 passive bits
and the SSC will see 32 bits.

(Did not test this)

Best Regards
Ulf Samuelsson
 
U

Ulf Samuelsson

Bill skrev:
Well, I was wrong in at least one thing: I thought that, with CSAAT=0,
CS would be deasserted (high) between consecutive "word transfers"
within one "block transfer", but it is not. I had clear from the
beginning (from diagrams and text) that there was a way to keep CS=0
between word transfers, but I thought that it implied CSAAT=1, and it
is not true. CS is 0 between consecutive word transfers (of the same
block transfer) regardless of the value of CSAAT.

So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention
needed (other than at the beginning and at the end of each block
transfer), and I can use DMA, with two 11-bit word transfers per block
transfer.


This is good, but I think that it could be better. Difficult to
explain, but I'll try:

Imagine my external ADC (with SPI interface) is sampling the analog
input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
10 us between samples. Not much. Each sample needs 22-clock cycles
inside each assertion of CS=0, so each sample needs one DMA block
transfer (with for instance two 11-bit word transfers inside). Each
DMA block transfer needs CPU intervention. So, I need CPU intervention
every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
Since (that I know) a DMA block transfer cannot be triggered directly
by a timer overflow or underflow, an interrupt service routine
(triggered by a 10 us timer underflow) must be executed every so
often, so that the CPU can manually trigger the DMA block transfer and
collect the data. Adding up the overhead of the interrupt context
switching and the instructons needed to move data from and to the
block buffers, to re-trigger the block transfer, and all this in C++,
I think that all that may consume a "significant" portion of those 480
cycles. And the CPU is supposed to do something with that data, and
some other things. I see that hog as a killer, or at least as a pitty.

If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
allowed triggering the next word transfer (inside a block transfer)
when a certain timer underflows, then the DMA blocks wouldn't need to
be so small. Each analog sample could travel in one single SPI word
transfer, and one DMA block could be planned to carry for instance
1000 word transfers. That would be one DMA block every 10 ms. The
buffer (FIFO) memory would be larger, but the CPU intervention needed
would be much lower. There would be the same number of useful cycles,
but much fewer wasted cycles. There wouldn't need to exist an
interrupt service routine executed every 10 us, which is a killer.
That would be a good SPI and a good DMA, in my opinion, and the extra
cost in silicon is negligible, compared to the added benefit. Why
don't most MCUs allow that? Even cheap MCUs could include that. An MCU
with the price of a SAM7 should include that, in my opinion.

Best,

An idea:

Run a timer which is connected to the SSC input clock and ADC clock.
It also clocks another timer in PWM mode generating
the ADC chip select.

The ADC will see 22 active and 10 passive bits
and the SSC will see 32 bits.

(Did not test this)

Best Regards
Ulf Samuelsson
 
B

Bill

An idea:

Run a timer which is connected to the SSC input clock and ADC clock.
It also clocks another timer in PWM mode generating
the ADC chip select.

The ADC will see 22 active and 10 passive bits
and the SSC will see 32 bits.

(Did not test this)

Hey, that's a wonderful idea!!
It opens up a broad array of new possibilities :)

Thanks!
 
Top