Code and Life

Programming, electronics and other cool tech stuff

Benchmarking Raspberry Pi GPIO Speed

UPDATE2: You may also want to check out my Raspberry 2 vs 1 GPIO benchmark!

UPDATED: 2015-02-15! This article has been very popular, so I’ve now updated all the benchmarks using the latest firmware and library versions. The scope has also been upgraded to a PicoScope 5444B with better resolution and bandwith than the earlier models. :)

main2015

Don’t try this at home! Shorting GND and VCC with a probe might fry your Pi and more!

Method and Summary of Results

The basic test setup was to toggle one of the GPIO pins between zero and one as fast as possible. GPIO 4 was selected due to easy access and no overlapping functionality. This is basically the “upper limit” for any signalling one can hope to achieve with the GPIO pins – real-life scenarios where processing needs to be done would need to aim for some fraction of these values. Here are the current results:

Language Library Tested / version Square wave
Shell /proc/mem access 2015-02-14 2.8 kHz
Shell / gpio utility WiringPi gpio utility 2015-02-15 / 2.25 40 Hz
Python RPi.GPIO 2015-02-15 / 0.5.10 70 kHz
Python wiringpi2 bindings 2015-02-15 / latest github 28 kHz
Ruby wiringpi bindings 2015-02-15 / latest gem (1.1.0) 21 kHz
C Native library 2015-02-15 / latest RaspPi wiki code 22 MHz
C BCM 2835 2015-02-15 / 1.38 5.4 MHz
C wiringPi 2015-02-15 / 2.25 4.1 – 4.6 MHz
Perl BCM 2835 2015-02-15 / 1.9 48 kHz

Shell script

The easiest way to manipulate the Pi GPIO pins is via console. Here’s a simple shell script to toggle the GPIO 4 as fast as possible (add sleep 1 after both to get a nice LED toggle test):


#!/bin/sh

echo "4" > /sys/class/gpio/export
echo "out" > /sys/class/gpio/gpio4/direction

while true
do
	echo 1 > /sys/class/gpio/gpio4/value
	echo 0 > /sys/class/gpio/gpio4/value
done

As expected, the performance of this implementation is not good: A 2.9 kHz square wave can be generated using this method. For some reason, this figure has come down since 2012, when I measured 3.4 kHz. Might be a firmware update. For turnings things on and off this is enough, but no signalling and hardly even LED PWM is feasible.

2015_shell2

Update: Note that I have my probes at 1:10 setting, so the actual voltage value is 10x what is displayed in the figures!

Shell with WiringPi gpio utility

WiringPi comes with the gpio command, but its performance is almost 100x slower (40 Hz) than the plain shell, possibly due to starting delay of the executable. Code is a bit cleaner, though:


#!/bin/sh

gpio mode 7 out

while true
do
        gpio write 7 1
        gpio write 7 0
done

2015_shell_wiring

Python with RPi.GPIO

One of the simplest ways to access the GPIO with a “real programming language” (sorry bashers :) is with the RPi.GPIO Python library. Installing it was simple: Just download the .tar.gz file, extract files and run python setup.py install. Our test script is simple as well:


import RPi.GPIO as GPIO

GPIO.setmode(GPIO.BCM)

GPIO.setup(4, GPIO.OUT)

while True:
    GPIO.output(4, True)
    GPIO.output(4, False)

The library performance has increased steadily. 0.2.0 was less than 1 kHz, but 0.3.0 already bumped this to 44 kHz. As of version 0.5.10, the rate has again increased, and is now around 70 kHz!

2015_python_RPi.GPIO 0.5.10

The improved performance in Python is probably enough for simple multiplexing and LED PWM applications. Note that the new version requires some additional steps in installation, name getting Python development kit with sudo apt-get install python-dev. I originally got errors while trying this, but upgrading my packages solved that problem.

Python with WiringPi2 bindings

Another alternative for Python are the wiringPi Python bindings. Installation requires cloning the respective version and apt-get installation of python-dev and python-setuptools.

I installed the newer WiringPi2-Python version. Earlier tests with older version 1 gave a 19.5 kHz square wave. New test version with wiringpi2 module has improved to 28 kHz:


import wiringpi2

io = wiringpi2.GPIO(wiringpi2.GPIO.WPI_MODE_PINS)

io.pinMode(7,io.OUTPUT)

while True:
    io.digitalWrite(7,io.HIGH)
    io.digitalWrite(7,io.LOW)

2015_python_wiringpi2

Ruby with WiringPi bindings

WiringPi also has Ruby bindings, which can easily be installed:


sudo apt-get install ruby-dev
sudo gem install wiringpi

Code is also very simple:


require 'wiringpi'

io = WiringPi::GPIO.new

while true do
        io.write(7,0)
        io.write(7,1)
end

Performance is about the same as Python version, around 21 kHz square wave is generated:

2015_Ruby_wiringPi

C: Maximum performance

The Raspberry Pi Wiki gives a nice C code example for true hardware-level access to the GPIO. The interfacing is slightly more difficult, but code isn’t too bad. I took the example program and simplified the main method after setup_io() to this:


// Set GPIO pin 4 to output
INP_GPIO(4); // must use INP_GPIO before we can use OUT_GPIO
OUT_GPIO(4);

while(1) {
  GPIO_SET = 1<<4;
  GPIO_CLR = 1<<4;
}

Without any optimizations, I got an excellent 14 MHz square wave. Adding -O3 to the compile command (gcc -O3 strobe.c -o strobe) increases the rate to hefty 22 MHz. Measuring the waveform with oscilloscope starts to require VERY short wiring between probe and ground, otherwise it just looks like a sine wave due to capacitance in helper wires!

2015_C-O3

C with BCM2835 library

Mike McCauley has made a nice C library called bcm2835 that can also be used to interface with the GPIO pins using C. Its installation was also quite easy: download, run the standard configure / make / make install commands and you’re good to go. Compiling the code is done with the -lbcm2835 compiler flag to include the library. Benchmark code looked like this (note that in Broadcom numbering, GPIO 4 is P1_07):


#include <bcm2835.h>

#define PIN RPI_GPIO_P1_07 // GPIO 4

int main(int argc, char *argv[]) {
    if(!bcm2835_init())
	return 1;

    // Set the pin to be an output
    bcm2835_gpio_fsel(PIN, BCM2835_GPIO_FSEL_OUTP);

    while(1) { // Blink
	bcm2835_gpio_write(PIN, HIGH);
	//delay(500);
	bcm2835_gpio_write(PIN, LOW);
	//delay(500);
    }

    return 0;
}

The performance is not far beyond the earlier C example: A solid 5.4 MHz with the use of -O3 optimization flag. Definitely enough for most applications!

2015_C_bcm

C with WiringPi

Gordon Henderson has written an Arduino-like wiringPi library using C. It’s a popular one and quite easy to use. Here’s the simple test program:


#include <wiringPi.h>

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main() {
  if (wiringPiSetup () == -1)
    exit (1) ;

  pinMode(7, OUTPUT);

  while(1) {
    digitalWrite(7, 0);
    digitalWrite(7, 1);
  }

  return 0 ;
}

With the normal GPIO access method, the library already clocks an impressive 4.1 MHz square wave:

2015_C_wiringPi

There’s also a GPIO access method which involves calling wiringPiSetupGpio() instead of wiringPiSetup(), and using the normal GPIO numbering instead of wiringPi native renumbering system, so 7 becomes 4 in the above code. The performance is increased slightly to 4.6 MHz:

2015_C_wiringPi_gpio

Since 2012, the WiringPi performance has somewhat decreased, as I originally got 7.1 MHz from the GPIO access method. This might of course also be due to firmware changes (I am running the tests over multitasking OS in a SSH shell, after all).

Also, a /proc/sys based access method was provided, but it was a lot slower, running at 120 kHz on average (200 kHz). The wiringPi library also has Python, Ruby and Perl bindings. See above for the Python version performance, I’d expect the Perl and Ruby bindings to be on the same speed level.

Perl with BCM2835.pm

Mike McCauley has also made a Perl module that uses the above C library to provide GPIO access in our favorite language (who doesn’t love Perl?). For installation, I recommend skipping cpan command and searching for the latest version from CPAN, downloading the .tar.gz with wget, extracting, and running perl Makefile.PL / make / make install commands. Like it usually is, the Perl code isn’t pretty, but it does the job well:


use Device::BCM2835;
use strict;

Device::BCM2835::init() || die "Could not init library";

# Set RPi pin P1_07 (GPIO 4) to be an output
Device::BCM2835::gpio_fsel(&Device::BCM2835::RPI_GPIO_P1_07, 
                            &Device::BCM2835::BCM2835_GPIO_FSEL_OUTP);

while (1) { # Strobe
    Device::BCM2835::gpio_write(&Device::BCM2835::RPI_GPIO_P1_07, 1);
    Device::BCM2835::gpio_write(&Device::BCM2835::RPI_GPIO_P1_07, 0);
}

Compared to the Python version, the Perl module packs a bit more punch: 48 kHz square wave was achieved – enough for some PWM applications, if not quite enough for audio generation etc.

2015_Perl_bcm

As with the Python version, any tips to improve Perl execution performance are welcome! Interestingly enough, the 1.0 version achieved slightly better performance than the latest 1.9 version – around 59 kHz. The difference isn’t large enough to not upgrade, though.

Conclusion

Based on these simple benchmarks, I can conclude that shell is only usable for simple automation tasks etc., but at least Python/Ruby/Perl is needed for anything more complex such as LED PWM. Python with RPi.GPIO is the fastest of these, but Perl with BCM 2835 bindings comes close. For actual signalling applications, C seems like the only choice. I haven’t tried the C# and Java interfaces, but I’d expect them to be on the level of C and Perl, respectively, or a bit slower.

What is not evident from the snapshots, however, is that due to multitasking nature of Linux, the GPIO manipulation is constantly interrupted for short periods when the CPU is doing something else, such as receiving or sending data over network, writing log files, etc. Only a realtime OS is suitable for timing-critical stuff, or using the hardware level support for I2C, UART and such. A good alternative is an independent add-on board with a microcontroller, such as Arduino or several other alternatives. Communicating over UART is simple with such devices.

117 comments

florian:

nice work!
I am wondering how your measurement setup (e.g. probe-impedance) is. It seems that the signal is affected by some RC-filtering. Is the GPIO port max. level really only 350 mV? Seems a bit low to me.
Do you know the theoretical spec of the GPIO port?

jokkebk:

Actually now that you mention it, my scope is only rated for 10 MHz (sample rate 50 MS/s), so it might well be just the software doing interpolation with the 20 MHz signal.

The probes are at 1:10 which explains the ~340 mV value, the actual value is thus about 3.4V. In that setting, their impedance should be 10 Mohm.

Thanks for the comment, I’ll add some additional clarification to my post based on this!

pepper:

That was exactly what i have searched for! Nice article!

However, i wonder how stable the frequency is? Isn’t it affected by the scheduling of your thread?

Vinod S:

Hi,
so the top frequency obtained is 21.9MHz with -O3 option. I think this is done in user space.

If we are running the loop toogle test in the kernel space, may be we can obtain a very high value, is it?

Vinod S:

ooops.. not toogle but toggle ;-)

jokkebk:

I would guess (without better knowledge, though) that moving to kernel space would mainly eliminate some individual longer delays (it’s likely that every time the kernel does multitasking, the toggling stops for a microsecond or so), but not increase the frequency by much.

Stefan:

Hi jokkebk, i was wondering if you could state an version number of python library that you used, because from what i have read, there is a new version of python RPi library (3.0.1a) and it should pack a much more “punch” then the old one, it might be worth looking into (and update the python benchmark values if necessary)

jokkebk:

Hi! Good to know. I used the 0.2.0 version when doing my tests. I’ll try the newer one when I have the chance – now I just encountered errors when trying to compile it and have to get some sleep. :(

Stefan:

Sure take your time, i would benchmark it my self but my RPi is still stuck somewhere in the post office =<

jokkebk:

Hehe. I hope you’ll get yours soon, it’s really a great piece of equipment!

I’ve now updated the benchmark with results from 0.3.0 version. As you guessed, it’s a lot faster.

Mike:

I’m working on a 48 stage shift register board (24 inputs, 24 outputs), Shift out and read in at the same time either on two pins or one pin I switch back and forth between read and write using the 10k resister trick at this link http://robots.freehostia.com/Software/ShiftRegister/ShiftRegister.html

The plan is to have a daemon monitor an output file in ram (tmpfs) and write to an input file each containing a 24 bit unsigned integer. Sounds like a C-based daemon would be plenty fast!

jokkebk:

Sounds like a cool project. The C-based interface is definitely the fastest way to go currently. I just got a new scope and will be updating the measurements shortly.

Gordon Henderson:

That’s fantastic! I was able to get marginally faster myself with a bit of optimisation and overclocking (10MHz), but we’re really at the limits of general-purpose here. I have a little DSO quad and the waveforms at that speed really are on it’s limit of usability too.

And benchmarking a bash script with an oscilloscope – classic :-)

I’ve also been told that there is a real hardware limitation to the speed too – even if the loop were 3 lines of ARM assembler, then about 21MHz is what its going to top-out at, so there must be something else going on when we access hardware, but nice to know we can get there using C if we need to.

Issues I’ve seen myself (and from others) is that sometimes it’s too fast! e.g. scanning a matrix keypad – the pulse to the row isn’t long enough when you scan the columns – espeically if the keypad is at the end of a long bit of ribbon cable, so that’s something to be aware of too – there’s more to it than absolute speed!

Cheers,

-Gordon

jokkebk:

Great information, thanks for your post! Seems like ~20 MHz really is the limit. It’s a lot more than many microcontrollers, still, although the normal Linux isn’t a realtime OS, so some limitations. I’ll need to test reading speed and latency at some point, too. :)

Dan:

Thanks for the analysis, these are useful results. I’m curious to know what the CPU load of the C test at high frequency was?

jokkebk:

Good idea. I’ll try it out when I next have my RaspPi powered up and let you know. :)

Seryoga:

Greate job!
Did you test the reading speed and latency?
cheers Seryoga

Francisco:

Hi! This benchmarks are great! You’ve done a remarkable job, and gathered very useful information. I’d like to ask you a question, if I may. These tests where all performed with the GPIO as outputs, right? Any idea if the results are similar if used as inputs?

jokkebk:

Yes, this is output only. I haven’t tested the input speed (and likewise with outputs, there may be glitches with input speed so even if 10 million samples / sec could be achieved, there might be short periods of no polls at all, and this is even harder to verify with inputs than outputs)

jokkebk:

The C test with ~22 MHz uses almost 99 % of the CPU if the Pi is otherwise idle.

Multitasking works fine though, I was able to log in with another SSH shell and run “top” without any problems – it’s probably the square wave that suffers from this activity.

skynet:

Did you tested GPIO speed in kernel space by writing yourself a little kernel? (CPU in real mode, no SO interrupt activated) You can find a little help at: http://www.cl.cam.ac.uk/freshers/raspberrypi/tutorials/os/

Robert Savage:

Here is a similar article that covers Java on the Raspberry Pi running on the various JVMs:
http://www.savagehomeautomation.com/projects/raspberry-pi-java-gpio-frequency-benchmarks.html

RichP:

The issue with many hw/sw designs interactions is with determinism and this trumps raw speed in many designs. That is, if you start an operation can you be sure not to be interupted. An uninteruptable section of code must be made to insure this, that is called, a critical section. It’s formed by disabling interupts, then the real time code executes, then reenabling the interupts in it’s most crude implementation. Speed is nothing if there can be indeterminate interuptions anytime when intereacting with high speed hw.

Another way (ie as in a SQL database) is to have a common flag with an uninteruptable test and set operation which forms a semephore that locks a record from other processes. The scheduler then looks at this flag to delay scheduling the next thread. This is clearly problematic in linux and is the reason for Real time OS’s and to partition real time tasks to dedicated MPUs(adrunios) and hardware state machines(fpga s).

In my experience, MPUs interacting with real time hardware are best kept seperate as an adunio might be. A simple I2C interface can link to as many adunios as needed with a very simple model: I2C is able to read/write the memory space while the semephore flag in each CPU turns on/off access to the memory space. A symbol table is produced each time adrunio code is recompiled for each real time MPU on the bus and feeds the code on the Rpi.

The other alternative is to use a direct hardware state machine in fpga code or a very simple and fast soft-cpu inside an FPGA with an instruction set geared exactly to the task at hand. Xilinx has several soft-cpus that compile into their FPGAs.

Also, with high speed fast dev designs, system C can actually compile a specialized version of C into hardware for an FPGA or other targets. Or many use verilog/VHDL to create machines from psuedocode.

The point of all this is … beware real time code in a multithreaded OS as it’s a debugging nightmare with all possible varing interactions with hardware events (grows with n factorial). Keep it simple and partition real time code to dedicated machines that cooperate with a master. This has saved me many times because it is common to spend more time in debug (or simulation) than in design. It’s also prudent for the master to have an upfront debugging role for the real time slaves as a part of the initial design. Visability into the slaves is the key to making real time processes manageable and able to be properly sync’ed and optimized.

José Ricardo Borba:

Very nice and VERY useful post. My 2c is: try numba for python scripts. I’m not sure if exists an ARM version at this time, but it highly improves your code.

exco:

trying to reproduce your results … I only get about 1/5th of your benchmark results… odd. (3.6.11)

jokkebk:

Which language & library is that?

radioing:

No. The next step to increase the speed is to move from C to asm and optimize toggle routine for ARM core instruction set. You will be probably limited by waitstates generated by GPIO periphery in Broadcom CPU not by the CPU speed.

jokkebk:

Very good writeup! And I agree fully – one definitely should not expect much of GPIO determinism when running under Linux, unless there is a kernel-level (or MPU level) support for it, such might be the case with I2C or similar protocols. If certain latency or speed is always required, either realtime OS or separate device connected to the Pi (like Arduino) is definitely a better option.

jokkebk:

No, that one I didn’t do, it would require quite a bit of work to do that. :)

Zaid Pirwani:

While trying to find max speed of I2C, SPI and UART on Raspberry Pi, I found this… equally interesting and useful for me as my BE project is based on Raspi but I would love to have the info of speeds as well..
This is what I got till now:
UART: 476baud to 31.25Mbaud*
SPI: 3.8 kHz to 250 MHz*
I2C: 400kbps
*theoretical maximum

not tested on hardware as I don’t have the tools for it, if you got time, this is interesting to check out.. :)

BTW, Thanks for the different code examples and the benchmarks… these are GREAT

Manuel:

Hi,

You have done a pretty nice job, thanks for sharing!

We have performed a similar test to evaluate if the RaspPi can be useful for us at work. We got same results as you for python and C, but we also gave java a try.

– Using a standard VM from openJDK: 2KHz

– Using VM from Oracle SE embedded (ARMv6/7): 165KHz

We also examined the Arduino for comparision.

– Using standard IDE functions: 88KHz

– Using low level C functions (AVR standard C): 1MHz(*)

(*) we got 8MHz with a large program that toggled the output continuously instead of using a for/while loop.

Dave Cheney:

Hello,

I am developing a GPIO library in Go and recently did a similar test. Some basic results, running flat out in a loop

* using the /sys/class/gpio interface, I was able to get 116Khz
* using /dev/kmem to map in the GPIO control memory (this is the same approach libbcm2835 uses) I got 7.1 Mhz, but was able to increase this 12.2Mhz by inlining all the calls.

Afterwards I thought of a way to reduce the overhead even more, but was not able to measure the results. I’d estimate 14 to 16Mhz is possible, but only if you precalcuated all the addresses and shifts.

Tom:

Hello jokkebk,

thanks for this nice IO speed benchmark. I tried to reproduce your C language results and got the same for BCM2835 C library and Maximum performance.

Only exception: Compiling with -o3 does not change the IO speed, it is still 14 MHz. Do you have an idea what I can do to achieve your 22 MHz? I am using the Raspbian wheezy image.
Thanks, Tom

jokkebk:

Thanks! And nice to hear confirmation for the results. I don’t know what may have caused the speed bump with -O3, it might be an older version of the libraries I had at the time, I haven’t tested with recent distros.

J. F.:

Have you tried pypy (JIT)? It speeds python up to 10x. It might be worth a try.

fjw:

I wrote a wrapper of C solution for node.js:

https://github.com/fjw/node-fast-gpio

Thanks for your awesome benchmarks!

Marcus:

Thanks!!!
this help me so much!!

kayel:

Bravo for a great idea!
I would like to see the BBC Basic inline assembler tested under RISCOS. Any chance?

Joonas Pihlajamaa:

No, I don’t have any parts of that request at hand. :)

Alex:

FYI:
I just tried mono and C#. With direct Memory Access the GPIO speed was about 7.7Mhz. I didn’t use any wrappers, but unsafe code to directly address the GPIO Memory…
When using a wrapper around wiringPi, then the C# solution is about 200kHz…

Regards,

Alex

Neil Fazakerley:

GPIOs via BBC Basic/Assembler was speed tested here – 19.7MHz for output, 7.7Mhz for input:

maneesh:

i have a camera in wich video is coming on camlink interface with pixel clock of 40MHz(16 bit video data for each pixel) i want to use rpi to process and display the video can it be done with GPIO or with any interface of rpi

kayel:

Thanks for the information. I haven’t been back here for a while.My brother got 18.7Mhz for output on a home-made frequence meter he built just to test this.

Joonas Pihlajamaa:

I’d wager about zero chance of doing 40 MHz data capture or comms with GPIO, even without O/S it would be hard for 1 bit, let alone 16. If the Pi has a HW camlink bus, then it might be possible. But probably you need some extra HW to do that.

Gil Megidish:

Incredible information! Saved me a lot of time running these tests myself. I wish others would be so detailed :)

Rajan:

Hi,
We are instead trying to read external signal into Rasp Pi and with basic C code we are not getting reliable signals even at 100kHz. We are reading the data into a vector using digitalRead and later storing it.
The data looks irregular with missing 1’s when the input is a regular 100kHz signal. Any clue? Is reading into GPIO different than writing to from GPIO?
Thanks

Joonas Pihlajamaa:

I think it’s the same with output, the benchmark results only discuss average frequency, but not consistency – both are affected by Linux kernel which will probably launch its own interrupts several times a second, each lasting probably some microseconds at least.

Only way to use RaspPi for high frequency (anything more than few kHz I’d think) would be a realtime OS instead of Linux, or blocking all kernel interrupts (I encountered a piece of code when googling around, but not certain how long can you do it and what are the ill effects, e.g. will there be data loss if SD write or USB is interrupted or something like that). Or then use an external microcontroller that communicates with RaspPi via serial.

paddyg:

A while ago I did some more superficial tests with cython (see here http://www.raspberrypi.org/forum/viewtopic.php?p=446411#p446411) it would be interesting to see the oscilloscope reading for it. It wasn’t too hard to set up and was significantly faster than the PRi.GPIO method. One thing I noticed was that the Dom/Gert code was a bit faster than the standard C libraries if you just want to do input and output.

Ravikanth S:

Hey I just tried what you mentioned (a bare ASM program loaded on the SDCard). I could get only about 8.8MHz. But its strange that there are cases where even 22 MHz could be obtained through C.

Adam:

With a little bit of assembler inside an FIQ, I actually measure ~41.6 MHz for one complete high-low, low-high pin toggle with my Tek scope. So the true GPIO latency without higher level sugar is actually something around 12 nanoseconds…

Here’s the core portion my basic benchmark code:

.macro SINGLE_SHIFTED_BIT reg, bit
ldr \reg, =0x1
lsl \reg, \reg, \bit
.endm

SINGLE_SHIFTED_BIT R11, #15 // GPIO 15
str R11, [R9,#0x1C] // 0x1C = GPSET0 offset, R9 = GPIO pointer
str R11, [R9,#0x28] // 0x28 = GPCLR0 offset
str R11, [R9,#0x1C]
str R11, [R9,#0x28]
str R11, [R9,#0x1C]
str R11, [R9,#0x28]
str R11, [R9,#0x1C]
str R11, [R9,#0x28]
str R11, [R9,#0x1C]
str R11, [R9,#0x28]
str R11, [R9,#0x1C]
str R11, [R9,#0x28]
str R11, [R9,#0x1C]
str R11, [R9,#0x28]
str R11, [R9,#0x1C]
str R11, [R9,#0x28]

Joonas Pihlajamaa:

Cool! Thanks for sharing!

joan:

Using the PWM peripheral you can get a gpio to toggle in 4ns (so you could do 4ns on, 4ns off). A pretty pointless exercise.

http://abyz.co.uk/rpi/pigpio/code/nanopulse_c.zip

joan:

The pigpio library uses DMA to sample the gpios at up to 1MHz. It will certainly work at 100KHz.

http://abyz.co.uk/rpi/pigpio/

Jason:

Thanks a lot man! your an angel!!

Karthik:

The link to creating a bare-bones OS is fantastic! Thanks for the awesome job.

Oliver:

I can confirm that the absolute maximum of the GPIO speed is 25 MHz. I read and wrote an SRAM (5V) with that speed, but even with assembler it is not possible to speed it up further. Details here: http://d-fence.sytes.net/raspberry-pis-gpio-speed/

Andy:

Thanks for taking the time to perform this analysis.

Joao Matos:

Hello,

Fantastic benchmark.
Is it possible to update it to current software versions and RPi B+ and A+?

Thanks,

JM

Joonas Pihlajamaa:

Unfortunately I do not have a wide array of Pis in my possession. However, I would be really surprised if there were any differences between the hardware revisions, as the processor and chipset is identical in every Pi. The network/USB part has some minor changes but even network IRQ handling would only block signal generation for short periods, which would not change the benchmark results in a visible way.

I may revisit the benchmark at some point, but probably mainly to check how new SW versions have changed the situation.

lorenzo:

Hi, i tryed the input limit of one pin with the WiringPi library. this is the code in C:

#include
#include
#include
#include
#include

#define PIN 3

while(1){
digitalRead(PIN);
}

the maximum speed without error is around 100Khz, faster will mean falling in other kernel’s process taking the cpu and making you blind for some microseconds.

Remember this is just a benchmark, putting some code after the read will reduce the speed.

I don’t understand why this huge difference in reading and writing… 100Khz and 6Mhz is a pretty big difference.

DW:

Great info. I’m using your example code to do some inline testing with a serial data connection. Dropping down to native c for the speed but my syntax is a little off.

Here is what is looks like in WirePi. What would be the equivalent in native C?

#include
#include

int main (void)
{
int state;
printf (“Raspberry Pi In & Out\n”) ;

if (wiringPiSetup () == -1)
return 1 ;

pinMode (0, INPUT) ; // aka BCM_GPIO pin 17
pinMode (2, OUTPUT) ; // aka BCM_GPIO pin 27

for (;;)
{
state = digitalRead (0); // Read it
// printf (“Input is %d \n”,state);
// digitalWrite (2, 1) ; // On
// delay (500) ; // mS
// digitalWrite (2, 0) ; // Off
// delay (500) ;
// delay (1) ;
digitalWrite (2, state) ; // forward input
}
return 0 ;
}

DW:

The first two includes got mangled on the copy / paste. They should be stdio.h and wiringPi.h.

Ferran:

Good job,

But I get in most tests (native C, BCM and wiringPi) the half of the frequency you achieved, or a little bit less than you.
I don’t know why. Any help? I need all the speed I can get from my gpios.

Joonas Pihlajamaa:

I redid the benchmarks this month and some figures were a bit lower with the newest firmware. C benchmarks also, except the fastest, which requires -O3 flag.

Steve:

Very interesting article and useful information. Have you done any tests with PHP scripts ?

Cold Diamondz:

If you didn’t know, the python RPi.GPIO module has a pwm class.
It is still started the normal way but you set a variable up with the class and you can change frequency as well as work time.

PulsedPower:

I just tried your “C: Maximum performance” code on a PI2. The results are 41.667MHz. Measured with a Tek MSO4104 & TDP1000 probe.

Joonas Pihlajamaa:

Wow, great, thanks for the information! Maybe I should get a PI2 myself…

uriandwubber:

What about Raspberry Pi 2?

Joonas Pihlajamaa:

Just got one yesterday, will likely do a similar benchmark shortly.

Mark Williams:

I am very eager to see the Pi 2 performance figures.

Joonas Pihlajamaa:

Yes I noticed that some libraries expose the PWM functionality. I left it outside the benchmark scope, as hardware PWM is independent of programming language performance, and of course a bit more limited on what you can do (no signalling, for example).

Lukas M:

Please consider benchmarking on javascript with onoff library for Node.js (or io.js). https://github.com/fivdi/onoff
Thanks (-;

Ch.Idea:

https://pypi.python.org/pypi/RPi.GPIO
In one of official page above is saying that there is no H/W PWM implemented yet and PWM function in the python library itself is S/W implementation.
This means that basically both PWM and output are the same and it’s not worth to try comparing them…

Ch.Idea:

Leaving note that the maximum GPIO clock from official chipset manual (https://www.raspberrypi.org/wp-content/uploads/2012/02/BCM2835-ARM-Peripherals.pdf) is 25MHz.
So, native C is almost there to the maximum..

Ch.Idea:

But, also in the manual, mash filter can be manually turned off to make higher frequency up to 125MHz.
Here’s another project that did this to make a FM radio. It used C mmap function to accomplish this.
http://www.icrobotics.co.uk/wiki/index.php/Turning_the_Raspberry_Pi_Into_an_FM_Transmitter

TD:

Could you test pigpio (http://abyz.co.uk/rpi/pigpio) also?

IanH:

I’ve split the code samples on the eLinux wiki into a separate page, as the ‘Low level peripherals’ page was getting a bit large.

Code samples are now at:
http://elinux.org/RPi_GPIO_Code_Samples

Thanks
Ian

Sam:

Did you try PWM mode in Python with WiringPi? In theory that should allow MHz.

Joonas Pihlajamaa:

No, PWM would have little point as benchmark, as once it is set up, the performance will should be the same regardless of programming language, and only tied to clock speed.

Shogan:

Hey mate,
in python 2.7 for loops are faster than while loops.
Can you measure using?

while True:
for i in xrange(4294967295): #2^32-1
G.ouput(4,True)
G.output(4,False)

Also make sure to use Xrange not Range.
Cheers

Joonas Pihlajamaa:

No. Longer answer: Based on a quick test, a raw while True -loop with counter increment and test can do about 600 000 iterations per second on a RaspPi. With 70 kHz strobe it means the Pi spends less than 10 % (most likely even less than 5 %) of its time processing the while statement, and over 90 % of time doing the strobe. So if the for loop on Python 2.7 is 2x faster, it would only change the result by less than 5 %.

My personal guess would be that the optimized for-loop version might give 1 % speed boost at best. For some reason, a quick timing of a for loop and while actually gave me worse results on for loop…

W Sanders:

PWM with RPi.GPIO produces some flicker, particularly at short (<10%) duty cycles. As I understand it, the PWM routines use software timing for the PWM, not the hardware.

John:

I believe it can go beyond the results gathered. There’s a Pi program that is used to transmit FM signals from 88 MHz to 108 MHz called PiFM, so on GPIO 4 it can definitely reach >100 MHz, the pi runs on 700 MHz clock speed without the interrupts it would be possible to achieve GPIO speeds close to 700MHz.

LFF:

Hi, this article is great. Have you had a chance to test the Java library (pi4j ?) I made a small Java class to test, but the result is very disappointed. On Raspberry2, it can only toggle the GPIO 1300 times per seconds, which is 1.3kHZ. I feel quite strange that it is much slower than the Python or Ruby (I think it is reasonable that Java is slower than C. But it is no reasonable that Java is much more slower than Python or Perl)

I have attached my test class. I will also made some investigation.

import com.pi4j.io.gpio.*;

/**
* Created by LFF on 2015/11/7.
*/
public class SpeedTest {

private static GpioPinDigitalOutput DIO;

static final GpioController gpio = GpioFactory.getInstance();

public static void main(String[] argu) {
init();
test();
gpio.shutdown();
}

private static void test() {
long l0 = System.currentTimeMillis();
int count = 10000;
for (int i=0;i<count;i++) {
DIO.high();
DIO.low();
}
long time = (System.currentTimeMillis() – l0);
System.out.println(time + " used.");

float per = time / (float)count;

System.out.println(per + " per toggle, " + (1000 / per) + " per second");
}

private static void init() {
// create gpio controller
DIO = gpio.provisionDigitalOutputPin(RaspiPin.GPIO_04, "dio", PinState.HIGH);
}
}

Maxim Kamensky:

http://1drv.ms/1YtsSxK

Only 33,(3) On Pi2

rasz_pl:

you can go >100MHz if you abuse parallel Video port
DPI is used by VGA666, but you can hijack it for (18 pins) output GPIO by filling screen buffer with your data on Vblank. As a bonus you get reliable timing with no jitter and smooth frequency setting

https://github.com/fenlogic/vga666

Jonathan Perkin:

One easy way to improve the speed of the shell version is to use a file descriptor rather than opening the “value” file each time. Try benchmarking this instead (and be sure to use /bin/dash and not /bin/bash for additional speed):

#!/bin/sh

echo “4” >/sys/class/gpio/export
echo “out” >/sys/class/gpio/gpio4/direction

exec 3>/sys/class/gpio/gpio4/value

while true
do
echo 1 >&3
echo 0 >&2
done

walter pinto:

hi, you could try asm blinking program to test the speed, i think it may be the fastest but you’re right about raspi os isn’t a rtos and is constantly been interrupted by another task.

sorry my english, thanks for your work

Ravi Gupta:

Would you be interested in doing similar tests for interrupt handling. Say a simple pulse counter?

Joonas Pihlajamaa:

I might be, what would the exact test setup be? Generate a square wave on a microcontroller/signal generator, and have a input loop on Pi side and record the transitions? Due to O/S lag, probably with 1 kHz test signal (2k transitions) and Pi achieving 2+ kHz read speed, there would be a 1-5 % loss of transitions. Or does the Pi have an interrupt which could be used to count transitions?

Ravi Gupta:

The pi has interupt handling so the question is how close can interupt trigers be before they get missed. For example if we are monitoring RPM using the interupt handler at what pulse frequency do we start to miss trigers. Google raspberry pi interupts.

Further – How long does the pin read take? Is there a minimum period for which a pin must be kept stable for a read to complete.

So there are two aspects. The frequency of the square wave and the duty cycle.

Real application. A sensor/instrument trigers an interupt telling the Pi that data is ready to be read. How long must the data be kept stable before it is changed.

Ravi Gupta:

We are building a weather station for our club.
Anemometer – Will generate pulses fed into a ripple counter. To few pulses per revolution leads to degraded low speed accuracy. To many pulses per revolution may lead to high speed errors on the Pi. Yup, there are lots of solutions and we will likely take the trial and error approach.
Wind Vane – It you watch a weather station wind vane it almost never stays still. We are using an optical disk setup with latching and a monostabilizer.
At the end of the day I’m sure the Pi performance will not be an issue but since I came across your work I figured I’d ask :)

egorks:

hi! nice article.

did you try any tests with real-time kernels for the raspbian? i wonder how much would it help if prio’s are set right.

and also: what was the CPU load when doing above tests? 100%?

Mick:

Would love to see this redone with the multicore high speed Pi 3.

Johann CAMBOLY:

How to get 100MHz ? Give the program

jonathan scott james:

does anyone here know assembly?
what happens if you give it a block output command?
and
how fast can it read the gpio?
say i want to put a million sps conversion adc on the gpio with a 50 MHz oscillator, how many samples can i store in a burst?

Michael:

Toggling GPIO pin 20 on a 2016 Raspberry Pi 3 Model B running on Win 10 Core IoT 10.0.14393.67 delivers a maximum of around 230 kHz with the following c# code:
long i = 0;
long kHz;
long HiResTimeStamp1 = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
while (i < 1000000)
{
pin.Write(GpioPinValue.High);
pin.Write(GpioPinValue.Low);
i++;
}
long HiResTimeStamp2 = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();
kHz = (i/(HiResTimeStamp2 – HiResTimeStamp1));

Haven't got a scope so can't provide any waveforms… :-(

ronal erquinigo:

Nice article. Do you know how i can configure a real time os over the raspberry pi? What options do i have?

thank yu

Joonas Pihlajamaa:

No idea on those, sorry!

Max Mustermann:

Really helpful article! Thanks!

Nathan Neulinger:

Would be very interested in seeing an update to your benchmark using a simplistic kernel driver for the pin access instead of in user space, as that would give an indication of the maximum performance that could theoretically be delivered by the Pi hardware in any practical implementation.

Ali:

PIGPIO with Rpi2 using hardware PWM function can do a clean 1Mhz, but get distorted after that. Although you can go up to 30Mhz, your square wave will be distorted. http://abyz.co.uk/rpi/pigpio/python.html#hardware_PWM

Lakshmanan:

Can some one help me to get a python code to run in RaspberryPI B model for generating 1MHZ frequency?

Ulaş:

I could not see direct register access example with C ? We can access directly to the registers so it is possible to pass the linux kernel which means there will be no dependency with linux for blinking led ??

jonathan scott james:

i wander how much difference rpi3 would make. on rpi3 i got python to do bursts of well over 280kc with a simple python3 edited script with room a small tabulator program it was actually switching at 400 with the raspbian stretch default clock speed

Simon:

Hi Joonas,
Did you try ( or someone else) with 2 gpios?
is the square wave divide by two ?
example with python rpi.gpio :

while True:
GPIO.output((4,5), True)
GPIO.output((4,5), False)

with a raspi3?

William Cerniuk:

Excellent work, thank you very much for taking the time and publishing.

Jack:

Ur not mixing hardware pwm with software pwm right ? That’d make the article useless…

Joonas Pihlajamaa:

Goodness no. It’s actually discussed in the earlier comments. :)

LISSANDRO BASSANI:

Is a bit disappointing to find that a 700 MHz processor can make a 22 MHz square wave out using C. Do anyone have some bare metal figures for it, or using a lighter O.S. or disabling ints or so?

Patrick J Kramer:

The fastest possible way to toggle a GPIO pin indefinitely takes 11 assembly instructions:

set_pin:
ldr r3, [sp, #8] // virt GPIO base
add r3, r3, #0x1C // GPSET0
ldr r2, [r3] // get content of GPSET0
orr r2, r2, #1<<18 // set PIN 18
str r2,[r3] // set PIN 18 @ GPSET0
clear_pin:
ldr r3, [sp, #8] // virt GPIO base
add r3, r3, #0x28 // GPCLR0
ldr r2, [r3] // get content of GPCLR0
orr r2, r2, #1<<18 // set PIN 18
str r2,[r3] // set PIN 18 @ GPCLR0
b set_pin // branch to beginning of loop

The max theoretical rate @700 Mhz would be approximately 63.6 Mhz; 22Mhz is about a third of that which is actually pretty good considering the OS overhead; it would be interesting to see how fast this runs on the same OS as the fast C code example, I bet they are pretty close.

mutluit:

Operating Frequency
The maximum operating frequency of the General Purpose clocks is ~125MHz at 1.2V but
this will be reduced if the GPIO pins are heavily loaded or have a capacitive load.

(BCM2835 ARM Peripherals (pdf, p.106)

Mav:

Does it exist the same benchmark with testing input speed vs langage/RsPi ?

Joonas Pihlajamaa:

No unfortunately not, that would be a lot more complex endeavour. One possibility would be if you have variable speed square wave generator to see how many edges are reliably counted. But due to Linux O/S delays, some edges would always be missed (much like the generated square wave on outputs does…).

Jukka:

How about testing these numbers with RPi3 and the new 4?

Yu Jie:

Hi im very new to Pi. In the Maximum performance C test. Where exactly do I paste this section:

// Set GPIO pin 4 to output
INP_GPIO(4); // must use INP_GPIO before we can use OUT_GPIO
OUT_GPIO(4);

while(1) {
GPIO_SET = 1<<4;
GPIO_CLR = 1<<4;
}

Im not sure as the int main() appears before the void setup_io()