
Don’t try the above setup at home: my Raspberry Pi rebooted when I was removing the alligator clip!
Once I broke ground on Raspberry Pi hacking with the UART tutorial, I decided it would be interesting to see just how capable the GPIO offered really was. Considering I had a Picoscope at hand, I chose to see how fast those GPIO pins really are under various programming environments.
The basic test setup was to toggle one of the GPIO pins, namely the GPIO 4 (it was easily accessible with my adapter and didn’t interfere with UART) and see what frequency square wave could be achieved. This is basically the “upper limit” for any signalling one can hope to achieve with the GPIO pins – likely real-life scenarios where processing needs to be done would aim for some fraction of these values.
Here’s a useful cheat sheet to current benchmark results:
| Language | Library | Version / tested | Square wave |
|---|---|---|---|
| Shell | /proc/mem access | not applicable / July 3, 2012 | 3.4 kHz |
| Python | RPi.GPIO | 0.3.0 / August 1, 2012 | 44 kHz |
| Python | wiringPi | github @ August 14, 2012 | 20 kHz |
| C | Native library | not applicable / July 3 and August 14, 2012 | 14-22 MHz |
| C | BCM 2835 | 1.3? / July 3, 2012 | 4.7 – 5.1 MHz |
| C | wiringPi | not available / August 14, 2012 | 6.9 – 7.1 MHz |
| Perl | BCM 2835 | 1.0 / July 3, 2012 | 35 kHz |
Note: The earlier test images have been taken with longer leads, so high-frequency waveforms exhibit some roundness and overshoot that would not be there without the jumper wires I originally used to avoid resetting the Pi when poking the GPIO pins directly with a probe. So don’t read too much into the waveform. See the section on C benchmarks which have been redone with shorter ground lead, and you’ll see that even at 22 MHz the square wave generated is quite nice.
Shell script
The easiest way to manipulate the Pi GPIO pins is via console. Here’s a simple shell script to toggle the GPIO 4 as fast as possible (add sleep 1 after both to get a nice LED toggle test):
#!/bin/sh echo "4" > /sys/class/gpio/export echo "out" > /sys/class/gpio/gpio4/direction while true do echo 1 > /sys/class/gpio/gpio4/value echo 0 > /sys/class/gpio/gpio4/value done
As expected, the performance of this implementation is not good: A 3400 Hz square wave can be generated using this method. For turnings things on and off this is enough, but no signalling and hardly even LED PWM is feasible.
Update: Note that I have my probes at 1:10 setting, so the actual voltage value is 10x what is displayed in the figures!
Python
One of the simplest ways to access the GPIO with a “real programming language” (sorry bashers :) is with the RPi.GPIO Python library. Installing it was simple: Just download the .tar.gz file, extract files and run python setup.py install. Our test script is simple as well:
import RPi.GPIO as GPIO
GPIO.setmode(GPIO.BCM)
GPIO.setup(4, GPIO.OUT)
while True:
GPIO.output(4, True)
GPIO.output(4, False)
If you expected the Python library performance to be any better, prepare for a disappointment: The above script achieves only 900 Hz square wave, meaning that you don’t do anything fancy with this library. Update: While the 0.2.0 version performance was terrible, the newer 0.3.0 version has significantly improved performance: 44 kHz square wave could be generated. The diagram below is updated for version 0.3.0, see the older version’s results here.
I was really surprised by the lackluster performance of the Python implementation. I’m not very familiar with Python optimizations, so if someone can suggest any improvements to the code or execution parameters, drop me a line!
The improved performance in Python is probably enough for simple multiplexing and LED PWM applications. Note that the new version requires some additional steps in installation, name getting Python development kit with sudo apt-get install python-dev. I originally got errors while trying this, but upgrading my packages solved that problem.
Update: Another alternative for Python are the wiringPi Python bindings. With the following simple test program, a square wave of 19.5 kHz was generated – about half the speed of the updated RPi.GPIO library:
import wiringpi
io = wiringpi.GPIO(wiringpi.GPIO.WPI_MODE_PINS)
io.pinMode(7,io.OUTPUT)
while True:
io.digitalWrite(7,io.HIGH)
io.digitalWrite(7,io.LOW)
C: Maximum performance
The Raspberry Pi Wiki gives a nice C code example for true hardware-level access to the GPIO. The interfacing is slightly more difficult, but code isn’t too bad. I took the example program and simplified the main method after setup_io() to this:
// Set GPIO pin 4 to output
INP_GPIO(4); // must use INP_GPIO before we can use OUT_GPIO
OUT_GPIO(4);
while(1) {
GPIO_SET = 1<<4;
GPIO_CLR = 1<<4;
}
Without any optimizations, I got an excellent 14 MHz square wave. However, since we are operating at the very extremes of the device, the waveform isn’t too square anymore;
Update 3. The measurements for these high frequency signals are now completely redone. First there was an issue with inadequate sampling rate of the Picoscope 2204. Then I had adequate bandwith, but too long test leads (I used a female-male jumper wire for ground and test lead). The waveforms below are now completely redone and the rest of the text in this section rewritten. Sorry for the inconvenience, it should be accurate now:
Compiling with the -O3 flag gives even more impressive results: 21.9 MHz square wave. As seen from the waveform, one cannot expect the device go any further than this. As a clock signal, this still might work nicely, but I wouldn’t call it a square wave anymore. However, w We can conclude that using C, signalling at several MHz speeds should be achievable.
BCM2835 C library
Mike McCauley has made a nice C library called bcm2835 that can also be used to interface with the GPIO pins using C. Its installation was also quite easy: download, run the standard configure / make / make install commands and you’re good to go. Compiling the code is done with the -l bcm2835 compiler flag to include the library. Benchmark code looked like this (note that in Broadcom numbering, GPIO 4 is P1_07):
#include <bcm2835.h>
#define PIN RPI_GPIO_P1_07 // GPIO 4
int main(int argc, char *argv[]) {
if(!bcm2835_init())
return 1;
// Set the pin to be an output
bcm2835_gpio_fsel(PIN, BCM2835_GPIO_FSEL_OUTP);
while(1) { // Blink
bcm2835_gpio_write(PIN, HIGH);
//delay(500);
bcm2835_gpio_write(PIN, LOW);
//delay(500);
}
return 0;
}
The performance is not far beyond the earlier C example: A solid 4.7 MHz which could be bumped to 5.1 MHz with the use of -O3 optimization flag. Definitely enough for most applications!
Note that the overshoot in above image is likely due to a long ground lead (see the notes in the C section and beginning of the article for details).
WiringPi
Gordon Henderson has written an Arduino-like wiringPi library using C. I had a request to benchmark that, too. Here’s the simple test program:
#include <wiringPi.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main() {
if (wiringPiSetup () == -1)
exit (1) ;
pinMode(7, OUTPUT);
while(1) {
digitalWrite(7, 0);
digitalWrite(7, 1);
}
return 0 ;
}
With the normal GPIO access method, the library already clocks an impressive 6.9 MHz square wave. The picture below has also been updated for more accurate waveform:
There’s also a GPIO access method which involves calling wiringPiSetupGpio() instead of wiringPiSetup(), and using the normal GPIO numbering instead of wiringPi native renumbering system, so 7 becomes 4 in the above code. The performance is increased slightly to ~7.1 MHz.
Also, a /proc/sys based access method was provided, but it was a lot slower, running at 200 kHz on average. The wiringPi also comes with a command gpio that can be used to access the GPIO, but the performance is very poor. The program below achieved a 80 Hz square wave:
gpio -g mode 4 out while true do gpio -g write 4 1 gpio -g write 4 0 done
There’s also a “pwm” command, but I couldn’t get that to work. The wiringPi library also has Python, Ruby and Perl bindings. I tried out the Python version and got an almost 20 kHz square wave, which is quite good (about half the speed of RPi.Python 0.3). I’d expect the Perl and Ruby bindings to be on the same speed level.
Perl
Mike McCauley has also made a Perl module that uses the above C library to provide GPIO access in our favorite language (who doesn’t love Perl?). For installation, I recommend skipping cpan command and just downloading the module from its CPAN page, extracting, and running perl Makefile.PL / make / make install commands. Like it usually is, the Perl code isn’t pretty, but it does the job well:
use Device::BCM2835;
use strict;
Device::BCM2835::init() || die "Could not init library";
# Set RPi pin P1_07 (GPIO 4) to be an output
Device::BCM2835::gpio_fsel(&Device::BCM2835::RPI_GPIO_P1_07,
&Device::BCM2835::BCM2835_GPIO_FSEL_OUTP);
while (1) { # Strobe
Device::BCM2835::gpio_write(&Device::BCM2835::RPI_GPIO_P1_07, 1);
Device::BCM2835::gpio_write(&Device::BCM2835::RPI_GPIO_P1_07, 0);
}
Compared to the Python version, the Perl module packs a bit more punch: 35 kHz square wave was achieved – enough for some simple PWM applications, if not quite enough for audio generation etc.
As with the Python version, any tips to improve Perl execution performance are welcome!
Conclusion
Based on these simple benchmarks, I can conclude that shell and Python access to GPIO is enough for any automation tasks, but at least the Perl level performance is needed to do any fancier stuff. For actual signalling applications, C seems like the only choice. I haven’t tried the C# and Java interfaces, but I’d expect them to be on the level of C and Perl, respectively, or a bit slower.
Hope you enjoyed this article! As always, I recommend you to subscribe to the feed to get the latest posts without checking back manually. And if you really like the content, donations towards future hacking efforts are very much appreciated.








Posted by
Tagged with:
Raspberry Pi Serial Console With MAX3232CPE

florian:
July 10, 2012 at 12:47
nice work!
I am wondering how your measurement setup (e.g. probe-impedance) is. It seems that the signal is affected by some RC-filtering. Is the GPIO port max. level really only 350 mV? Seems a bit low to me.
Do you know the theoretical spec of the GPIO port?
jokkebk says:
July 10, 2012 at 13:18
Actually now that you mention it, my scope is only rated for 10 MHz (sample rate 50 MS/s), so it might well be just the software doing interpolation with the 20 MHz signal.
The probes are at 1:10 which explains the ~340 mV value, the actual value is thus about 3.4V. In that setting, their impedance should be 10 Mohm.
Thanks for the comment, I’ll add some additional clarification to my post based on this!
pepper:
July 14, 2012 at 13:45
That was exactly what i have searched for! Nice article!
However, i wonder how stable the frequency is? Isn’t it affected by the scheduling of your thread?
Vinod S:
July 16, 2012 at 16:18
Hi,
so the top frequency obtained is 21.9MHz with -O3 option. I think this is done in user space.
If we are running the loop toogle test in the kernel space, may be we can obtain a very high value, is it?
Vinod S says:
July 16, 2012 at 16:25
ooops.. not toogle but toggle ;-)
jokkebk says:
July 18, 2012 at 10:32
I would guess (without better knowledge, though) that moving to kernel space would mainly eliminate some individual longer delays (it’s likely that every time the kernel does multitasking, the toggling stops for a microsecond or so), but not increase the frequency by much.
radioing says:
March 15, 2013 at 18:48
No. The next step to increase the speed is to move from C to asm and optimize toggle routine for ARM core instruction set. You will be probably limited by waitstates generated by GPIO periphery in Broadcom CPU not by the CPU speed.
Stefan:
July 30, 2012 at 14:27
Hi jokkebk, i was wondering if you could state an version number of python library that you used, because from what i have read, there is a new version of python RPi library (3.0.1a) and it should pack a much more “punch” then the old one, it might be worth looking into (and update the python benchmark values if necessary)
jokkebk says:
July 30, 2012 at 23:47
Hi! Good to know. I used the 0.2.0 version when doing my tests. I’ll try the newer one when I have the chance – now I just encountered errors when trying to compile it and have to get some sleep. :(
Stefan says:
July 31, 2012 at 17:10
Sure take your time, i would benchmark it my self but my RPi is still stuck somewhere in the post office =<
jokkebk says:
August 1, 2012 at 22:33
Hehe. I hope you’ll get yours soon, it’s really a great piece of equipment!
I’ve now updated the benchmark with results from 0.3.0 version. As you guessed, it’s a lot faster.
Mike:
August 10, 2012 at 21:29
I’m working on a 48 stage shift register board (24 inputs, 24 outputs), Shift out and read in at the same time either on two pins or one pin I switch back and forth between read and write using the 10k resister trick at this link http://robots.freehostia.com/Software/ShiftRegister/ShiftRegister.html
The plan is to have a daemon monitor an output file in ram (tmpfs) and write to an input file each containing a 24 bit unsigned integer. Sounds like a C-based daemon would be plenty fast!
jokkebk says:
August 13, 2012 at 15:07
Sounds like a cool project. The C-based interface is definitely the fastest way to go currently. I just got a new scope and will be updating the measurements shortly.
Gordon Henderson:
August 17, 2012 at 20:47
That’s fantastic! I was able to get marginally faster myself with a bit of optimisation and overclocking (10MHz), but we’re really at the limits of general-purpose here. I have a little DSO quad and the waveforms at that speed really are on it’s limit of usability too.
And benchmarking a bash script with an oscilloscope – classic :-)
I’ve also been told that there is a real hardware limitation to the speed too – even if the loop were 3 lines of ARM assembler, then about 21MHz is what its going to top-out at, so there must be something else going on when we access hardware, but nice to know we can get there using C if we need to.
Issues I’ve seen myself (and from others) is that sometimes it’s too fast! e.g. scanning a matrix keypad – the pulse to the row isn’t long enough when you scan the columns – espeically if the keypad is at the end of a long bit of ribbon cable, so that’s something to be aware of too – there’s more to it than absolute speed!
Cheers,
-Gordon
jokkebk says:
August 19, 2012 at 22:44
Great information, thanks for your post! Seems like ~20 MHz really is the limit. It’s a lot more than many microcontrollers, still, although the normal Linux isn’t a realtime OS, so some limitations. I’ll need to test reading speed and latency at some point, too. :)
Seryoga says:
November 12, 2012 at 16:18
Greate job!
Did you test the reading speed and latency?
cheers Seryoga
Dan:
October 10, 2012 at 9:19
Thanks for the analysis, these are useful results. I’m curious to know what the CPU load of the C test at high frequency was?
jokkebk says:
October 10, 2012 at 9:22
Good idea. I’ll try it out when I next have my RaspPi powered up and let you know. :)
jokkebk says:
December 3, 2012 at 18:19
The C test with ~22 MHz uses almost 99 % of the CPU if the Pi is otherwise idle.
Multitasking works fine though, I was able to log in with another SSH shell and run “top” without any problems – it’s probably the square wave that suffers from this activity.
Francisco:
December 3, 2012 at 15:48
Hi! This benchmarks are great! You’ve done a remarkable job, and gathered very useful information. I’d like to ask you a question, if I may. These tests where all performed with the GPIO as outputs, right? Any idea if the results are similar if used as inputs?
jokkebk says:
December 3, 2012 at 16:01
Yes, this is output only. I haven’t tested the input speed (and likewise with outputs, there may be glitches with input speed so even if 10 million samples / sec could be achieved, there might be short periods of no polls at all, and this is even harder to verify with inputs than outputs)
skynet:
January 4, 2013 at 20:24
Did you tested GPIO speed in kernel space by writing yourself a little kernel? (CPU in real mode, no SO interrupt activated) You can find a little help at: http://www.cl.cam.ac.uk/freshers/raspberrypi/tutorials/os/
jokkebk says:
April 20, 2013 at 16:55
No, that one I didn’t do, it would require quite a bit of work to do that. :)
Robert Savage:
January 10, 2013 at 13:39
Here is a similar article that covers Java on the Raspberry Pi running on the various JVMs:
http://www.savagehomeautomation.com/projects/raspberry-pi-java-gpio-frequency-benchmarks.html
RichP:
January 31, 2013 at 9:41
The issue with many hw/sw designs interactions is with determinism and this trumps raw speed in many designs. That is, if you start an operation can you be sure not to be interupted. An uninteruptable section of code must be made to insure this, that is called, a critical section. It’s formed by disabling interupts, then the real time code executes, then reenabling the interupts in it’s most crude implementation. Speed is nothing if there can be indeterminate interuptions anytime when intereacting with high speed hw.
Another way (ie as in a SQL database) is to have a common flag with an uninteruptable test and set operation which forms a semephore that locks a record from other processes. The scheduler then looks at this flag to delay scheduling the next thread. This is clearly problematic in linux and is the reason for Real time OS’s and to partition real time tasks to dedicated MPUs(adrunios) and hardware state machines(fpga s).
In my experience, MPUs interacting with real time hardware are best kept seperate as an adunio might be. A simple I2C interface can link to as many adunios as needed with a very simple model: I2C is able to read/write the memory space while the semephore flag in each CPU turns on/off access to the memory space. A symbol table is produced each time adrunio code is recompiled for each real time MPU on the bus and feeds the code on the Rpi.
The other alternative is to use a direct hardware state machine in fpga code or a very simple and fast soft-cpu inside an FPGA with an instruction set geared exactly to the task at hand. Xilinx has several soft-cpus that compile into their FPGAs.
Also, with high speed fast dev designs, system C can actually compile a specialized version of C into hardware for an FPGA or other targets. Or many use verilog/VHDL to create machines from psuedocode.
The point of all this is … beware real time code in a multithreaded OS as it’s a debugging nightmare with all possible varing interactions with hardware events (grows with n factorial). Keep it simple and partition real time code to dedicated machines that cooperate with a master. This has saved me many times because it is common to spend more time in debug (or simulation) than in design. It’s also prudent for the master to have an upfront debugging role for the real time slaves as a part of the initial design. Visability into the slaves is the key to making real time processes manageable and able to be properly sync’ed and optimized.
jokkebk says:
April 20, 2013 at 15:04
Very good writeup! And I agree fully – one definitely should not expect much of GPIO determinism when running under Linux, unless there is a kernel-level (or MPU level) support for it, such might be the case with I2C or similar protocols. If certain latency or speed is always required, either realtime OS or separate device connected to the Pi (like Arduino) is definitely a better option.
José Ricardo Borba:
February 13, 2013 at 20:19
Very nice and VERY useful post. My 2c is: try numba for python scripts. I’m not sure if exists an ARM version at this time, but it highly improves your code.
exco:
February 22, 2013 at 1:13
trying to reproduce your results … I only get about 1/5th of your benchmark results… odd. (3.6.11)
jokkebk says:
February 22, 2013 at 9:26
Which language & library is that?
Zaid Pirwani:
April 26, 2013 at 1:11
While trying to find max speed of I2C, SPI and UART on Raspberry Pi, I found this… equally interesting and useful for me as my BE project is based on Raspi but I would love to have the info of speeds as well..
This is what I got till now:
UART: 476baud to 31.25Mbaud*
SPI: 3.8 kHz to 250 MHz*
I2C: 400kbps
*theoretical maximum
not tested on hardware as I don’t have the tools for it, if you got time, this is interesting to check out.. :)
BTW, Thanks for the different code examples and the benchmarks… these are GREAT