How to enable spi and access it in Debian os

So it looks like both spidev and spi-qup are designed to handle at least 4096 bytes per transfer, but spi-qup throws a buffer overrun error with anything more than 31 bytes. Is this a bug in spi-qup or elsewhere?

Apparently the problem is worse than I thought. While it will allow a single transfer up to 31 bytes, back-to-back transfers still cause it to fail unless the size is reduced to 16 bytes or smaller. Add in a ~70 ns delay between bytes and a ~78 uS delay between transfers, and this drops the total throughput to a measly 200 kBps (1.6 Mbps) when running with a 50 MHz spi clock.

I tested on an RPi2, and while it has a longer delay between transfers of ~140 uS and a slightly slower clock speed (41.67 MHz), it’s able to do 4096 bytes per transfer and sustain a throughput of a hair over 4 MBps (32 Mbps), about 20x faster than the dragonboard, with a significantly slower processor.

Something is definitely wrong with spi-qup on the current debian release for the dragonboard. It shouldn’t be throwing buffer overflow faults when sending it more than 16 bytes per transfer.

Any idea what could be causing this?

Hi suicidaleggroll,
Am happy to help you debug this issue, but to do that I would like few details from you to reproduce the same on my side.
1> exact hardware setup details.
2> software instructions to reproduce the issue.

On the other note, Recently I did try to hook up and ENC28J60 spi ethernet module with level shifters to DB410C and I could test it successfully, here is the log: http://paste.ubuntu.com/14680065/
and here are the kernel sources with patches https://git.linaro.org/people/srinivas.kandagatla/linux.git/shortlog/refs/heads/spi-debian-qcom-dragonboard410c-15.11 that I used for testing.

thanks,
srini

Sure thing.

The hardware is nothing special, just the 410c with spi on the LS expansion connector hooked up to a logic analyzer for testing. Eventually it will be attached to a custom CPLD/SRAM FIFO that I designed and built, which will buffer data from a high speed A/D so the 410c can pull it off when it gets around to it (getting around the need for a RTOS on the 410c, all I need is an average transfer rate > 3 MBps to keep up and the FIFO will buffer through any hiccups or delays).

For the RPi, just load the latest Raspbian image and use raspi-config to turn on spi, then reboot.

For the 410c, the build system is on OpenSUSE Tumbleweed, using the gcc-linaro-4.9-2014.11-x86_64_aarch64-linux-gnu compiler from here:
https://releases.linaro.org/14.11/components/toolchain/binaries/aarch64-linux-gnu/

The build process follows this guide:
http://builds.96boards.org/releases/dragonboard410c/linaro/debian/15.11/

With the following modifications:
in .config:
CONFIG_SPI_SPIDEV=m

in arch/arm64/boot/dts/qcom/msm8916.dtsi, in blsp_spi5 (line 1279):

spidev@0 {
   compatible = "spidev";
   spi-max-frequency = <50000000>;
   reg = <0>;
};

in arch/arm64/boot/dts/qcom/apq8016-sbc.dtsi, in aliases (line 29):

spi0    = &blsp_spi5;
spi1    = &blsp_spi3;

Once the boot image is built and loaded with fastboot, and the 410c is started, copy over spidev.ko and insert it.

Then build this code:
http://thesuicidaleggroll.com/hosting/drain.c

It started out as this:
http://free-electrons.com/kerneldoc/latest/spi/spidev_test.c
and I modified it slightly to just send the numbers 0-255 over and over again in configurable batch sizes. Use the “size” variable on line 294 to control the batch size. If it fails, it will also print out the number of batches it was able to successfully send before the failure.

On the RPi, any size between 1-4096 works fine. With it set to 4096, the RPi clocks out the 4096 bytes continuously (no gap between bytes) at a rate of 41.67 MHz, plus a gap of ~140 uS between batches. This means it takes ~925 uS per 4096 bytes, which is a rate of ~4.2 MBps. When it’s set to 4097 or higher, it throws the following error:
can’t send spi message: Message too long

which is the expected behavior.

On the 410c, any size between 1-16 works fine. With it set to 16, the 410c clocks out the 16 bytes at 50 MHz, with a gap of 70 nS between bytes, and a gap of ~78 uS between batches. This means it takes ~82 uS per 16 bytes, which is a rate of ~200 kBps. By my estimate, I need to be able to set the size to at least 650 in order to keep up with my desired data rate (a hair under 3 MBps). With it set to 4097, it throws the following error:
can’t send spi message: Message too long

which is the expected behavior. With any size between 17-31, it works initially but crashes within a few seconds. Any size between 32-4096 crashes it immediately. When it crashes, the output of the code is:
can’t send spi message: Input/output error

The output on the serial console is:
[51953.068850] spi_qup 78b9000.spi: OUTPUT_OVER_RUN
[51953.068972] spidev spi0.0: SPI transfer failed: -5
[51953.072568] spi_master spi0: failed to transfer one message from queue

And the output in /var/log/messages is:
Jul 4 09:12:08 linaro-developer kernel: [51953.068850] spi_qup 78b9000.spi: OUTPUT_OVER_RUN

When it crashes, the spidev and spi-qup kernel modules need to be removed and re-inserted (or the system rebooted) before SPI will work again.

@ljking Did you also have to change GPIO_CS 18 in the sample program to something else? I see no such number in /sys/class/gpio, only 0, 472, 476 and 480, so not sure which one to use instead.

Hi @vchong. No I did not change GPIO_CS 18 in the sample program, that was working just fine. I did do one thing different, I did not install the libsoc package (sudo apt-get install libsoc-dev), instead I downloaded the libsoc source from https://github.com/jackmitch/libsoc and rebuilt it from scratch. The latest libsoc has patches specific to the 410c board and it may be doing GPIO number translation (I’m not sure). Let me know if this works for you (I think this is what I did).

	sudo apt-get update
	sudo apt-get upgrade
	sudo apt-get install autoconf automake libtool
        git clone https://github.com/jackmitch/libsoc.git
        cd libsoc
        ./autogen.sh
        ./configure --enable-board=dragonboard410c
        make
        sudo make install

@ljking Thanks! I’ve looked at the libsoc_gpio.conf file for the db410c but see no 18 either. Can you please paste the content of your ls /sys/class/gpio?

My power-up /sys/class/gpio looks exactly the same as yours. There is no gpio18.

GPIO_18 is correct gpio. If you look at the 410c schematics (http://linaro.co/db410c-schematics) (page 5) you will see that GPIO_18 is the SPI chip select and it goes to the low-speed connector (page 29) at pin 12.

By default this SOC pin is not exported (same with most of the other gpios for that matter). What libsoc does is it first checks to see if the gpio has been exported, if it has not been exported (which is the case) then libsoc does an export by writing “18” to /sys/class/gpio/export (you can do this manually with echo). After the export you will find gpio18 in /sys/class/gpio. Finally you can drive the CS high or low by writing a “1” or “0” to /sys/class/gpio/gpio18/value (again you can do this manually with echo).

The libsoc GPIO config file only aliases GPIO_A through GPIO_L so that code written to one 96Boards SOC can be ported to another SOC. I suspect that there should be aliases added for things like the SPI chip select.

Full Disclosure: I am an employee of Qualcomm Canada, any opinions expressed in this or any other post may not reflect the opinions of my employer.

@suicidaleggroll

There are some issues that have been identified with the block mode of the spi-qup driver. For transactions over a specific amount (31 bytes I believe), block mode is used for transactions. DMA is not used unless a very specific set of prerequisites are satisfied (like aligned to the cache line size).

I worked up a set of patches that should fix this issue. I have not seen any errors in transactions since applying them. I was seeing multiple failures at various block sizes without these patches.

git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git
branch: spi-wip

8ec8b82 spi: qup: Fix block mode to work correctly
807582a spi: qup: Use correct check for DMA prep return
767532d spi: qup: Wait for QUP to complete
2c438ae spi: qup: Fix transaction done signaling

These four are the ones you need. Please let me know if this fixes your issue. I need to redo some of the commit messages, but based on testing from srini and hopefully you, I can send them to the lists soon.

Regards,

Andy

@agross

That’s definitely made an improvement! (BTW, I think that most recent commit should be 698179c rather than 8ec8b82, I wasn’t getting any results with 8ec8b82 and 698179c matches the name you gave).

The problem hasn’t been fixed completely, I can still trigger it by using a batch size that’s a multiple of 16 and is at least 32 (eg: 32, 48, 64, 80), but anything other than that (like 4095) appears to run indefinitely. I was getting some “unexpected irq” warnings in the debug console when using a size of 127, but it was very intermittent and I didn’t get any hard failures. I haven’t seen it with any other sizes so far.

That said, the data is coming out in bursts, with significant delays between groups. The pattern is odd, but repeatable:
18 bytes with 60 nS between each byte
5000 nS delay
1 byte
170 nS delay
3 bytes with 60 nS between each byte
430 nS delay
1 byte
170 nS delay
3 bytes with 60 nS between each byte
430 nS delay
1 byte
170 nS delay
3 bytes with 60 nS between each byte
430 nS delay
1 byte
170 nS delay
1 byte
10700 nS delay
It keeps up this pattern of 18 bytes, delay, 14 bytes, delay as it moves through the entire 4095 byte buffer, then there’s an 84500 nS delay, and then the whole process repeats.

I know it can be hard to visualize it with numbers, so I took some screenshots as well:
4095-byte chunks:

18 and 14 byte pairs as it moves through the 4095 byte chunk:

zoomed in on one 18 and 14 byte pair:

With all of these delays, the total throughput is better than it was before, but still only ~1.2 MBps. Any ideas on how to improve this, or where these delays could be coming from? The big ones are the 5000 nS delay between the first 18 and last 14 bytes of a 32-byte chunk, and the 10700 nS delay between 32-byte chunks.

Thanks for your help with this.

@suicidaleggroll

To be clear, you get overrun/underruns? I’ll try your cases that cause issues. My hardware has a SPI flash device on it and I have been testing using dd to the raw mtd device and also jffs2 to the block device. So when I send transactions, i typically get 2 transactions before my data to set up the address on the flash device. This is different than your case, where you are basically piping it out in bursts.

Let me explain what should be happening in the hardware. For transactions of >= 32 bytes, we use FIFO mode. This means that we have to read/write bytes to the fifo one at a time. Each time we write a set, we check to see if the fifo is full. If full, we wait until the next IRQ to fill more. So you’d see typically some number of bytes, a small gap, and some more bytes. The latency of handling the IRQ generally generates more fifo space. So we tend to send a number of bytes each time.

For transactions larger than 32 bytes, we load up the FIFO using blocks of data. The block size is predetermined in the hardware configuration. So on my hardware, my block size is 16 bytes and my fifo is 64 bytes. It may be larger on the 410c. We write out up to a block’s worth of data, then check to see if it can take another block. We do this until it can no longer take any more blocks. Then we wait for the next IRQ. Rinse and repeat.

For DMA, it’s different. We load up the transfer count and then setup the dma. Once the DMA is complete, we wait for an IRQ from the QUP that lets us know the QUP is done dealing with the data. Even though the DMA completes, it doesnt necessarily drain out of the QUP until after the IRQ.

For spidev, you’ll find that you’ll almost never do a dma due to misaligned buffers. It has to be cache line aligned. If you adjust the spidev code to allow for the dma cache line size, it’ll use DMA. That said, I need to figure out what is going on with the block.

BTW, i idle on IRC as agross on freenode. Hit me up if you use IRC and we can discuss more.

Regards,
Andy

Hi Andy/suicidaleggroll,

I think the problem is that the dma is not working as expected!

We will continue to debug this issue but in the mean time could you try this patch (
srinivas.kandagatla/linux.git - srini kernel working tree )

This patch basically removes the dma properties on the spi device.

With this patch am able to test all the sizes with the dump program.
The reason why batch size that’s a multiple of 16 and is at least 32 fail is because it would use dma in all these cases. Any other size is non dma.

thanks,
srini

OK. So lets try this again. Can you guys try out:

git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux.git
branch: spi-wip-v2

This should get dma working solid. It also adjusts the prerequisites for dma on > v1 hardware. And then once you test that, can you disable dma by removing the dma lines from the .dts?

I want to see if block mode works fine for you on the 410c. On the apq8064/ipq8064, block mode doesn’t work at all.

Regards,

Andy

Hi Andy,
Thanks for the patches, These 2 patches work and I can confirm that it fixes the reported dma issue.

I will make sure that these patches gets merged in next debian release.

Thanks,
srini

Odd, the new version isn’t working very well at all for me.

With DMA disabled using srini’s trick in msm8916.dtsi, the test will run for any size, but I get a LOT of BUFFER_OVER_RUN and unexpected irq errors in the log, and for large sizes (close to 4096) it will eventually fail with “spidev spi0.0: SPI transfer failed: -5” after a few seconds.

With DMA enabled (the default), the test will only run with a size <=16. Anything over 16 instantly fails with “spidev spi0.0: SPI transfer failed: -110”, “spi_master spi0: failed to transfer one message from queue”, and the output of the test code is simply “can’t send spi message: Connection timed out”

Hi suicidaleggroll,
It was my bad, I had the dmas property removed from DT which is why it worked for me.
And Andy had rx_buf commented out in spidev test which is why it worked for him with DMA.
But now we fixed all the dma related issue with spi driver Can you try these two patches.

https://git.kernel.org/cgit/linux/kernel/git/agross/linux.git/commit/?h=spi-wip-v2&id=40bea4f5604d8d862d921f1ae0b613669726d1fd
https://git.kernel.org/cgit/linux/kernel/git/agross/linux.git/commit/?h=spi-wip-v2&id=ba08046f853a2eabd7d1fefb236d9e94e1dd3226

thanks,
srini

Nice! Everything is looking beautiful on my end. All of the sizes I’ve tried have worked correctly with no warnings/errors, and the data rate is now up where I expect. For anybody who’s interested, this is the average throughput I’m measuring for different sizes:

16 160 kBps
32 265 kBps
64 500 kBps
128 850 kBps
256 1.31 MBps
512 1.78 MBps
1024 2.35 MBps
2048 2.92 MBps
4096 3.35 MBps

1 Like

Hi,

I am very new to Linux and Android. I read the following link to understand how to enable SPI and get some questions. Why the guide let us “When the SPI bus is registered, create a slave device driver and register it with the SPI master.”

It looks like the slave drive is for a touch screen. Is this step just for test? I am wondering if we need connect hardware (touch screen) to run this test?

thanks a lot.

– link to the guide I read for enable the spi —
https://developer.qualcomm.com/qfile/28819/lm80-p0436-5_peripherals_programming_guide.pdf chapter 5

@gossiper

You can certainly attach hardware if you like. However, a minimum test would be to connect the miso and mosi lines together to form a hardware loopback. Then any data you write out of the spi can be read back and compared.

One odd thing about SPI is that you clock in data as you clock it out. So even with nothing connected, you will get data in (at the logic level of the floating or pulled up MISO).

Thanks, agross.

I am very new to those driver things and the guide isn’t telling me why need to install a slave device driver. With the method you mention, the slave driver indeed not necessary? it is just for the testing (of course need to attach the touch screen hardware)?

many thanks, I just want to make sure…