How to enable spi and access it in Debian os

@suicidaleggroll

There are some issues that have been identified with the block mode of the spi-qup driver. For transactions over a specific amount (31 bytes I believe), block mode is used for transactions. DMA is not used unless a very specific set of prerequisites are satisfied (like aligned to the cache line size).

I worked up a set of patches that should fix this issue. I have not seen any errors in transactions since applying them. I was seeing multiple failures at various block sizes without these patches.

git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git
branch: spi-wip

8ec8b82 spi: qup: Fix block mode to work correctly
807582a spi: qup: Use correct check for DMA prep return
767532d spi: qup: Wait for QUP to complete
2c438ae spi: qup: Fix transaction done signaling

These four are the ones you need. Please let me know if this fixes your issue. I need to redo some of the commit messages, but based on testing from srini and hopefully you, I can send them to the lists soon.

Regards,

Andy

@agross

That’s definitely made an improvement! (BTW, I think that most recent commit should be 698179c rather than 8ec8b82, I wasn’t getting any results with 8ec8b82 and 698179c matches the name you gave).

The problem hasn’t been fixed completely, I can still trigger it by using a batch size that’s a multiple of 16 and is at least 32 (eg: 32, 48, 64, 80), but anything other than that (like 4095) appears to run indefinitely. I was getting some “unexpected irq” warnings in the debug console when using a size of 127, but it was very intermittent and I didn’t get any hard failures. I haven’t seen it with any other sizes so far.

That said, the data is coming out in bursts, with significant delays between groups. The pattern is odd, but repeatable:
18 bytes with 60 nS between each byte
5000 nS delay
1 byte
170 nS delay
3 bytes with 60 nS between each byte
430 nS delay
1 byte
170 nS delay
3 bytes with 60 nS between each byte
430 nS delay
1 byte
170 nS delay
3 bytes with 60 nS between each byte
430 nS delay
1 byte
170 nS delay
1 byte
10700 nS delay
It keeps up this pattern of 18 bytes, delay, 14 bytes, delay as it moves through the entire 4095 byte buffer, then there’s an 84500 nS delay, and then the whole process repeats.

I know it can be hard to visualize it with numbers, so I took some screenshots as well:
4095-byte chunks:

18 and 14 byte pairs as it moves through the 4095 byte chunk:

zoomed in on one 18 and 14 byte pair:

With all of these delays, the total throughput is better than it was before, but still only ~1.2 MBps. Any ideas on how to improve this, or where these delays could be coming from? The big ones are the 5000 nS delay between the first 18 and last 14 bytes of a 32-byte chunk, and the 10700 nS delay between 32-byte chunks.

Thanks for your help with this.

@suicidaleggroll

To be clear, you get overrun/underruns? I’ll try your cases that cause issues. My hardware has a SPI flash device on it and I have been testing using dd to the raw mtd device and also jffs2 to the block device. So when I send transactions, i typically get 2 transactions before my data to set up the address on the flash device. This is different than your case, where you are basically piping it out in bursts.

Let me explain what should be happening in the hardware. For transactions of >= 32 bytes, we use FIFO mode. This means that we have to read/write bytes to the fifo one at a time. Each time we write a set, we check to see if the fifo is full. If full, we wait until the next IRQ to fill more. So you’d see typically some number of bytes, a small gap, and some more bytes. The latency of handling the IRQ generally generates more fifo space. So we tend to send a number of bytes each time.

For transactions larger than 32 bytes, we load up the FIFO using blocks of data. The block size is predetermined in the hardware configuration. So on my hardware, my block size is 16 bytes and my fifo is 64 bytes. It may be larger on the 410c. We write out up to a block’s worth of data, then check to see if it can take another block. We do this until it can no longer take any more blocks. Then we wait for the next IRQ. Rinse and repeat.

For DMA, it’s different. We load up the transfer count and then setup the dma. Once the DMA is complete, we wait for an IRQ from the QUP that lets us know the QUP is done dealing with the data. Even though the DMA completes, it doesnt necessarily drain out of the QUP until after the IRQ.

For spidev, you’ll find that you’ll almost never do a dma due to misaligned buffers. It has to be cache line aligned. If you adjust the spidev code to allow for the dma cache line size, it’ll use DMA. That said, I need to figure out what is going on with the block.

BTW, i idle on IRC as agross on freenode. Hit me up if you use IRC and we can discuss more.

Regards,
Andy

Hi Andy/suicidaleggroll,

I think the problem is that the dma is not working as expected!

We will continue to debug this issue but in the mean time could you try this patch (
srinivas.kandagatla/linux.git - srini kernel working tree )

This patch basically removes the dma properties on the spi device.

With this patch am able to test all the sizes with the dump program.
The reason why batch size that’s a multiple of 16 and is at least 32 fail is because it would use dma in all these cases. Any other size is non dma.

thanks,
srini

OK. So lets try this again. Can you guys try out:

git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git
branch: spi-wip-v2

This should get dma working solid. It also adjusts the prerequisites for dma on > v1 hardware. And then once you test that, can you disable dma by removing the dma lines from the .dts?

I want to see if block mode works fine for you on the 410c. On the apq8064/ipq8064, block mode doesn’t work at all.

Regards,

Andy

Hi Andy,
Thanks for the patches, These 2 patches work and I can confirm that it fixes the reported dma issue.

I will make sure that these patches gets merged in next debian release.

Thanks,
srini

Odd, the new version isn’t working very well at all for me.

With DMA disabled using srini’s trick in msm8916.dtsi, the test will run for any size, but I get a LOT of BUFFER_OVER_RUN and unexpected irq errors in the log, and for large sizes (close to 4096) it will eventually fail with “spidev spi0.0: SPI transfer failed: -5” after a few seconds.

With DMA enabled (the default), the test will only run with a size <=16. Anything over 16 instantly fails with “spidev spi0.0: SPI transfer failed: -110”, “spi_master spi0: failed to transfer one message from queue”, and the output of the test code is simply “can’t send spi message: Connection timed out”

Hi suicidaleggroll,
It was my bad, I had the dmas property removed from DT which is why it worked for me.
And Andy had rx_buf commented out in spidev test which is why it worked for him with DMA.
But now we fixed all the dma related issue with spi driver Can you try these two patches.

https://git.kernel.org/cgit/linux/kernel/git/agross/linux.git/commit/?h=spi-wip-v2&id=40bea4f5604d8d862d921f1ae0b613669726d1fd
https://git.kernel.org/cgit/linux/kernel/git/agross/linux.git/commit/?h=spi-wip-v2&id=ba08046f853a2eabd7d1fefb236d9e94e1dd3226

thanks,
srini

Nice! Everything is looking beautiful on my end. All of the sizes I’ve tried have worked correctly with no warnings/errors, and the data rate is now up where I expect. For anybody who’s interested, this is the average throughput I’m measuring for different sizes:

16 160 kBps
32 265 kBps
64 500 kBps
128 850 kBps
256 1.31 MBps
512 1.78 MBps
1024 2.35 MBps
2048 2.92 MBps
4096 3.35 MBps

1 Like

Hi,

I am very new to Linux and Android. I read the following link to understand how to enable SPI and get some questions. Why the guide let us “When the SPI bus is registered, create a slave device driver and register it with the SPI master.”

It looks like the slave drive is for a touch screen. Is this step just for test? I am wondering if we need connect hardware (touch screen) to run this test?

thanks a lot.

– link to the guide I read for enable the spi —
https://developer.qualcomm.com/qfile/28819/lm80-p0436-5_peripherals_programming_guide.pdf chapter 5

@gossiper

You can certainly attach hardware if you like. However, a minimum test would be to connect the miso and mosi lines together to form a hardware loopback. Then any data you write out of the spi can be read back and compared.

One odd thing about SPI is that you clock in data as you clock it out. So even with nothing connected, you will get data in (at the logic level of the floating or pulled up MISO).

Thanks, agross.

I am very new to those driver things and the guide isn’t telling me why need to install a slave device driver. With the method you mention, the slave driver indeed not necessary? it is just for the testing (of course need to attach the touch screen hardware)?

many thanks, I just want to make sure…

Hi everyone,

Thanks to all your information, I was able to the SPI on the LS expansion sending and receiving data using libsoc. However, I’ve noticed that I wasn’t able to change the clock speed. It seems to be stuck at either 1 MHz or 5 MHz (measured with oscilloscope). I am using this code which is based off the libsoc SPI example: http://pastebin.com/K9zRVAGS.

If I try to to set the speed to 1 MHz, 250KHz, or 10 KHz and then read back the speed, it is what is expected. However, the speed of the physical clock pin seems to stay above 1 MHz.

I have tried setting spi-max-frequency in the device tree under spidev’s node but it seemed to have no effect. Is there a minimum clock speed? Am I missing something? Thanks.

I’m seeing a similar issue using the http://learn.linksprite.com/96-board/sliding-rheostat example modified for HiKey. The user program sets the clock at 10kHz but measured 500kHz on the analyzer, which is also the max frequency stated in the DT. Is it possible that the DT ‘locks up’ the speed?

I recently added spidev on the HS connector as well, and there’s one thing I was ignoring before but is happening on the HS connector too that has me curious. The spidev/spi-qup driver doesn’t seem to be operating the CS line on my boards, I’ve been having to do it manually by exporting and writing to it at /sys/class/gpio.

Is there something that needs to be added to msm8916.dtsi or another file in order to tell the SPI driver to control the CS line?

I did notice that in the Qualcomm’s example dtsi file on page 64:

They have:
pinctrl-0 = <&spi0_default &spi0_cs0_active>;
and similar for pinctrl-1, while on the dragonboard we just have:
pinctrl-0 = <&spi3_default>;
or spi5, or what have you for the interface you’re looking at.

Is it as simple as adding &spi3_cs0_active and sleep to pinctrl-0 and pinctrl-1, or is there more to it than that?

Can you also try adding the cs-gpios property to the blsp_spi5 (or whatever it is for the HS) node and see if that works? Something like below, but not sure if the number designation is exact or not.


num-cs = &lt;1&gt;;
cs-gpios = &lt;&amp;gpio18 0 1&gt;;

Thanks for the suggestion. Unfortunately I tried all combinations I could think of with num-cs, cs-gpios, the lines I mentioned earlier, as well as removing the cs gpio from pinconf earlier in the dtsi file (since it’s called out explicitly in pinconf_cs a few lines later, I thought it might be causing a conflict), but none of them were able to get CS to move.

In general, it’d be good to start new threads for new problems. Can you please specify which spi you are trying to use on the system? Or which pins, on which header?

Given your answers, I’ll provide some snippets of DT that should work for you.

I figured it was suitable here since it still has to do with getting spi running on Debian, but I can see your point.

Ultimately I’m using spi3 on the HS header, but my configuration has both of them enabled and I’m using spi5 on the LS header for testing since it’s easier to probe. If I can see how to get CS running on either of them, I should be able to port the changes to the other without much hassle. Thanks.

I changed

                pinmux_cs {
                        function = &quot;gpio&quot;;
                        pins = &quot;gpio18&quot;;
                };

to

                pinmux_cs {
                        function = &quot;blsp_spi5&quot;;
                        pins = &quot;gpio18&quot;;
                };

and now it drives the chipselect.

Unfortunately it created a new problem. The chip select goes high between each byte of data transferred, but the device needs CS to stay low for all 3 bytes that are rw to the device.

I dug into the source of drivers/spi/spi-qup.c and found that the bit MX_CS_MODE is never set, hence the CS goes high between bytes. I added a line of code

      control |= SPI_IO_C_MX_CS_MODE;

At line 554 and now CS stays low for the entire transfer.