Benchmarking Raspberry Pi GPIO Speed

UPDATE2: You may also want to check out my Raspberry 2 vs 1 GPIO benchmark!

UPDATED: 2015-02-15! This article has been very popular, so I’ve now updated all the benchmarks using the latest firmware and library versions. The scope has also been upgraded to a PicoScope 5444B with better resolution and bandwith than the earlier models. :)

main2015

Don’t try this at home! Shorting GND and VCC with a probe might fry your Pi and more!

Method and Summary of Results

The basic test setup was to toggle one of the GPIO pins between zero and one as fast as possible. GPIO 4 was selected due to easy access and no overlapping functionality. This is basically the “upper limit” for any signalling one can hope to achieve with the GPIO pins – real-life scenarios where processing needs to be done would need to aim for some fraction of these values. Here are the current results:

Language Library Tested / version Square wave
Shell /proc/mem access 2015-02-14 2.8 kHz
Shell / gpio utility WiringPi gpio utility 2015-02-15 / 2.25 40 Hz
Python RPi.GPIO 2015-02-15 / 0.5.10 70 kHz
Python wiringpi2 bindings 2015-02-15 / latest github 28 kHz
Ruby wiringpi bindings 2015-02-15 / latest gem (1.1.0) 21 kHz
C Native library 2015-02-15 / latest RaspPi wiki code 22 MHz
C BCM 2835 2015-02-15 / 1.38 5.4 MHz
C wiringPi 2015-02-15 / 2.25 4.1 – 4.6 MHz
Perl BCM 2835 2015-02-15 / 1.9 48 kHz

Shell script

The easiest way to manipulate the Pi GPIO pins is via console. Here’s a simple shell script to toggle the GPIO 4 as fast as possible (add sleep 1 after both to get a nice LED toggle test):

#!/bin/sh

echo "4" > /sys/class/gpio/export
echo "out" > /sys/class/gpio/gpio4/direction

while true
do
	echo 1 > /sys/class/gpio/gpio4/value
	echo 0 > /sys/class/gpio/gpio4/value
done


As expected, the performance of this implementation is not good: A 2.9 kHz square wave can be generated using this method. For some reason, this figure has come down since 2012, when I measured 3.4 kHz. Might be a firmware update. For turnings things on and off this is enough, but no signalling and hardly even LED PWM is feasible.

2015_shell2

Update: Note that I have my probes at 1:10 setting, so the actual voltage value is 10x what is displayed in the figures!

Shell with WiringPi gpio utility

WiringPi comes with the gpio command, but its performance is almost 100x slower (40 Hz) than the plain shell, possibly due to starting delay of the executable. Code is a bit cleaner, though:

#!/bin/sh

gpio mode 7 out

while true
do
        gpio write 7 1
        gpio write 7 0
done

2015_shell_wiring

Python with RPi.GPIO

One of the simplest ways to access the GPIO with a “real programming language” (sorry bashers :) is with the RPi.GPIO Python library. Installing it was simple: Just download the .tar.gz file, extract files and run python setup.py install. Our test script is simple as well:

import RPi.GPIO as GPIO

GPIO.setmode(GPIO.BCM)

GPIO.setup(4, GPIO.OUT)

while True:
    GPIO.output(4, True)
    GPIO.output(4, False)

The library performance has increased steadily. 0.2.0 was less than 1 kHz, but 0.3.0 already bumped this to 44 kHz. As of version 0.5.10, the rate has again increased, and is now around 70 kHz!

2015_python_RPi.GPIO 0.5.10

The improved performance in Python is probably enough for simple multiplexing and LED PWM applications. Note that the new version requires some additional steps in installation, name getting Python development kit with sudo apt-get install python-dev. I originally got errors while trying this, but upgrading my packages solved that problem.

Python with WiringPi2 bindings

Another alternative for Python are the wiringPi Python bindings. Installation requires cloning the respective version and apt-get installation of python-dev and python-setuptools.

I installed the newer WiringPi2-Python version. Earlier tests with older version 1 gave a 19.5 kHz square wave. New test version with wiringpi2 module has improved to 28 kHz:

import wiringpi2

io = wiringpi2.GPIO(wiringpi2.GPIO.WPI_MODE_PINS)

io.pinMode(7,io.OUTPUT)

while True:
    io.digitalWrite(7,io.HIGH)
    io.digitalWrite(7,io.LOW)

2015_python_wiringpi2

Ruby with WiringPi bindings

WiringPi also has Ruby bindings, which can easily be installed:

sudo apt-get install ruby-dev
sudo gem install wiringpi

Code is also very simple:

require 'wiringpi'

io = WiringPi::GPIO.new

while true do
        io.write(7,0)
        io.write(7,1)
end

Performance is about the same as Python version, around 21 kHz square wave is generated:

2015_Ruby_wiringPi

C: Maximum performance

The Raspberry Pi Wiki gives a nice C code example for true hardware-level access to the GPIO. The interfacing is slightly more difficult, but code isn’t too bad. I took the example program and simplified the main method after setup_io() to this:

// Set GPIO pin 4 to output
INP_GPIO(4); // must use INP_GPIO before we can use OUT_GPIO
OUT_GPIO(4);

while(1) {
  GPIO_SET = 1<<4;
  GPIO_CLR = 1<<4;
}

Without any optimizations, I got an excellent 14 MHz square wave. Adding -O3 to the compile command (gcc -O3 strobe.c -o strobe) increases the rate to hefty 22 MHz. Measuring the waveform with oscilloscope starts to require VERY short wiring between probe and ground, otherwise it just looks like a sine wave due to capacitance in helper wires!

2015_C-O3

C with BCM2835 library

Mike McCauley has made a nice C library called bcm2835 that can also be used to interface with the GPIO pins using C. Its installation was also quite easy: download, run the standard configure / make / make install commands and you’re good to go. Compiling the code is done with the -lbcm2835 compiler flag to include the library. Benchmark code looked like this (note that in Broadcom numbering, GPIO 4 is P1_07):

#include <bcm2835.h>

#define PIN RPI_GPIO_P1_07 // GPIO 4

int main(int argc, char *argv[]) {
    if(!bcm2835_init())
	return 1;

    // Set the pin to be an output
    bcm2835_gpio_fsel(PIN, BCM2835_GPIO_FSEL_OUTP);

    while(1) { // Blink
	bcm2835_gpio_write(PIN, HIGH);
	//delay(500);
	bcm2835_gpio_write(PIN, LOW);
	//delay(500);
    }

    return 0;
}

The performance is not far beyond the earlier C example: A solid 5.4 MHz with the use of -O3 optimization flag. Definitely enough for most applications!

2015_C_bcm

C with WiringPi

Gordon Henderson has written an Arduino-like wiringPi library using C. It’s a popular one and quite easy to use. Here’s the simple test program:

#include <wiringPi.h>

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main() {
  if (wiringPiSetup () == -1)
    exit (1) ;

  pinMode(7, OUTPUT);

  while(1) {
    digitalWrite(7, 0);
    digitalWrite(7, 1);
  }

  return 0 ;
}

With the normal GPIO access method, the library already clocks an impressive 4.1 MHz square wave:

2015_C_wiringPi

There’s also a GPIO access method which involves calling wiringPiSetupGpio() instead of wiringPiSetup(), and using the normal GPIO numbering instead of wiringPi native renumbering system, so 7 becomes 4 in the above code. The performance is increased slightly to 4.6 MHz:

2015_C_wiringPi_gpio

Since 2012, the WiringPi performance has somewhat decreased, as I originally got 7.1 MHz from the GPIO access method. This might of course also be due to firmware changes (I am running the tests over multitasking OS in a SSH shell, after all).

Also, a /proc/sys based access method was provided, but it was a lot slower, running at 120 kHz on average (200 kHz). The wiringPi library also has Python, Ruby and Perl bindings. See above for the Python version performance, I’d expect the Perl and Ruby bindings to be on the same speed level.

Perl with BCM2835.pm

Mike McCauley has also made a Perl module that uses the above C library to provide GPIO access in our favorite language (who doesn’t love Perl?). For installation, I recommend skipping cpan command and searching for the latest version from CPAN, downloading the .tar.gz with wget, extracting, and running perl Makefile.PL / make / make install commands. Like it usually is, the Perl code isn’t pretty, but it does the job well:

use Device::BCM2835;
use strict;

Device::BCM2835::init() || die "Could not init library";

# Set RPi pin P1_07 (GPIO 4) to be an output
Device::BCM2835::gpio_fsel(&Device::BCM2835::RPI_GPIO_P1_07, 
                            &Device::BCM2835::BCM2835_GPIO_FSEL_OUTP);

while (1) { # Strobe
    Device::BCM2835::gpio_write(&Device::BCM2835::RPI_GPIO_P1_07, 1);
    Device::BCM2835::gpio_write(&Device::BCM2835::RPI_GPIO_P1_07, 0);
}

Compared to the Python version, the Perl module packs a bit more punch: 48 kHz square wave was achieved – enough for some PWM applications, if not quite enough for audio generation etc.

2015_Perl_bcm

As with the Python version, any tips to improve Perl execution performance are welcome! Interestingly enough, the 1.0 version achieved slightly better performance than the latest 1.9 version – around 59 kHz. The difference isn’t large enough to not upgrade, though.

Conclusion

Based on these simple benchmarks, I can conclude that shell is only usable for simple automation tasks etc., but at least Python/Ruby/Perl is needed for anything more complex such as LED PWM. Python with RPi.GPIO is the fastest of these, but Perl with BCM 2835 bindings comes close. For actual signalling applications, C seems like the only choice. I haven’t tried the C# and Java interfaces, but I’d expect them to be on the level of C and Perl, respectively, or a bit slower.

What is not evident from the snapshots, however, is that due to multitasking nature of Linux, the GPIO manipulation is constantly interrupted for short periods when the CPU is doing something else, such as receiving or sending data over network, writing log files, etc. Only a realtime OS is suitable for timing-critical stuff, or using the hardware level support for I2C, UART and such. A good alternative is an independent add-on board with a microcontroller, such as Arduino or several other alternatives. Communicating over UART is simple with such devices.

Published by

Joonas Pihlajamaa

Coding since 1990 in Basic, C/C++, Perl, Java, PHP, Ruby and Python, to name a few. Also interested in math, movies, anime, and the occasional slashdot now and then. Oh, and I also have a real life, but lets not talk about it!

130 thoughts on “Benchmarking Raspberry Pi GPIO Speed”

  1. nice work!
    I am wondering how your measurement setup (e.g. probe-impedance) is. It seems that the signal is affected by some RC-filtering. Is the GPIO port max. level really only 350 mV? Seems a bit low to me.
    Do you know the theoretical spec of the GPIO port?

    1. Actually now that you mention it, my scope is only rated for 10 MHz (sample rate 50 MS/s), so it might well be just the software doing interpolation with the 20 MHz signal.

      The probes are at 1:10 which explains the ~340 mV value, the actual value is thus about 3.4V. In that setting, their impedance should be 10 Mohm.

      Thanks for the comment, I’ll add some additional clarification to my post based on this!

  2. That was exactly what i have searched for! Nice article!

    However, i wonder how stable the frequency is? Isn’t it affected by the scheduling of your thread?

  3. Hi,
    so the top frequency obtained is 21.9MHz with -O3 option. I think this is done in user space.

    If we are running the loop toogle test in the kernel space, may be we can obtain a very high value, is it?

    1. I would guess (without better knowledge, though) that moving to kernel space would mainly eliminate some individual longer delays (it’s likely that every time the kernel does multitasking, the toggling stops for a microsecond or so), but not increase the frequency by much.

    2. No. The next step to increase the speed is to move from C to asm and optimize toggle routine for ARM core instruction set. You will be probably limited by waitstates generated by GPIO periphery in Broadcom CPU not by the CPU speed.

  4. Hi jokkebk, i was wondering if you could state an version number of python library that you used, because from what i have read, there is a new version of python RPi library (3.0.1a) and it should pack a much more “punch” then the old one, it might be worth looking into (and update the python benchmark values if necessary)

    1. Hi! Good to know. I used the 0.2.0 version when doing my tests. I’ll try the newer one when I have the chance – now I just encountered errors when trying to compile it and have to get some sleep. :(

        1. Hehe. I hope you’ll get yours soon, it’s really a great piece of equipment!

          I’ve now updated the benchmark with results from 0.3.0 version. As you guessed, it’s a lot faster.

  5. I’m working on a 48 stage shift register board (24 inputs, 24 outputs), Shift out and read in at the same time either on two pins or one pin I switch back and forth between read and write using the 10k resister trick at this link http://robots.freehostia.com/Software/ShiftRegister/ShiftRegister.html

    The plan is to have a daemon monitor an output file in ram (tmpfs) and write to an input file each containing a 24 bit unsigned integer. Sounds like a C-based daemon would be plenty fast!

    1. Sounds like a cool project. The C-based interface is definitely the fastest way to go currently. I just got a new scope and will be updating the measurements shortly.

  6. That’s fantastic! I was able to get marginally faster myself with a bit of optimisation and overclocking (10MHz), but we’re really at the limits of general-purpose here. I have a little DSO quad and the waveforms at that speed really are on it’s limit of usability too.

    And benchmarking a bash script with an oscilloscope – classic :-)

    I’ve also been told that there is a real hardware limitation to the speed too – even if the loop were 3 lines of ARM assembler, then about 21MHz is what its going to top-out at, so there must be something else going on when we access hardware, but nice to know we can get there using C if we need to.

    Issues I’ve seen myself (and from others) is that sometimes it’s too fast! e.g. scanning a matrix keypad – the pulse to the row isn’t long enough when you scan the columns – espeically if the keypad is at the end of a long bit of ribbon cable, so that’s something to be aware of too – there’s more to it than absolute speed!

    Cheers,

    -Gordon

    1. Great information, thanks for your post! Seems like ~20 MHz really is the limit. It’s a lot more than many microcontrollers, still, although the normal Linux isn’t a realtime OS, so some limitations. I’ll need to test reading speed and latency at some point, too. :)

  7. Thanks for the analysis, these are useful results. I’m curious to know what the CPU load of the C test at high frequency was?

    1. The C test with ~22 MHz uses almost 99 % of the CPU if the Pi is otherwise idle.

      Multitasking works fine though, I was able to log in with another SSH shell and run “top” without any problems – it’s probably the square wave that suffers from this activity.

  8. Hi! This benchmarks are great! You’ve done a remarkable job, and gathered very useful information. I’d like to ask you a question, if I may. These tests where all performed with the GPIO as outputs, right? Any idea if the results are similar if used as inputs?

    1. Yes, this is output only. I haven’t tested the input speed (and likewise with outputs, there may be glitches with input speed so even if 10 million samples / sec could be achieved, there might be short periods of no polls at all, and this is even harder to verify with inputs than outputs)

    2. Hi, i tryed the input limit of one pin with the WiringPi library. this is the code in C:

      #include
      #include
      #include
      #include
      #include

      #define PIN 3

      while(1){
      digitalRead(PIN);
      }

      the maximum speed without error is around 100Khz, faster will mean falling in other kernel’s process taking the cpu and making you blind for some microseconds.

      Remember this is just a benchmark, putting some code after the read will reduce the speed.

      I don’t understand why this huge difference in reading and writing… 100Khz and 6Mhz is a pretty big difference.

    1. Hey I just tried what you mentioned (a bare ASM program loaded on the SDCard). I could get only about 8.8MHz. But its strange that there are cases where even 22 MHz could be obtained through C.

  9. The issue with many hw/sw designs interactions is with determinism and this trumps raw speed in many designs. That is, if you start an operation can you be sure not to be interupted. An uninteruptable section of code must be made to insure this, that is called, a critical section. It’s formed by disabling interupts, then the real time code executes, then reenabling the interupts in it’s most crude implementation. Speed is nothing if there can be indeterminate interuptions anytime when intereacting with high speed hw.

    Another way (ie as in a SQL database) is to have a common flag with an uninteruptable test and set operation which forms a semephore that locks a record from other processes. The scheduler then looks at this flag to delay scheduling the next thread. This is clearly problematic in linux and is the reason for Real time OS’s and to partition real time tasks to dedicated MPUs(adrunios) and hardware state machines(fpga s).

    In my experience, MPUs interacting with real time hardware are best kept seperate as an adunio might be. A simple I2C interface can link to as many adunios as needed with a very simple model: I2C is able to read/write the memory space while the semephore flag in each CPU turns on/off access to the memory space. A symbol table is produced each time adrunio code is recompiled for each real time MPU on the bus and feeds the code on the Rpi.

    The other alternative is to use a direct hardware state machine in fpga code or a very simple and fast soft-cpu inside an FPGA with an instruction set geared exactly to the task at hand. Xilinx has several soft-cpus that compile into their FPGAs.

    Also, with high speed fast dev designs, system C can actually compile a specialized version of C into hardware for an FPGA or other targets. Or many use verilog/VHDL to create machines from psuedocode.

    The point of all this is … beware real time code in a multithreaded OS as it’s a debugging nightmare with all possible varing interactions with hardware events (grows with n factorial). Keep it simple and partition real time code to dedicated machines that cooperate with a master. This has saved me many times because it is common to spend more time in debug (or simulation) than in design. It’s also prudent for the master to have an upfront debugging role for the real time slaves as a part of the initial design. Visability into the slaves is the key to making real time processes manageable and able to be properly sync’ed and optimized.

    1. Very good writeup! And I agree fully – one definitely should not expect much of GPIO determinism when running under Linux, unless there is a kernel-level (or MPU level) support for it, such might be the case with I2C or similar protocols. If certain latency or speed is always required, either realtime OS or separate device connected to the Pi (like Arduino) is definitely a better option.

  10. Very nice and VERY useful post. My 2c is: try numba for python scripts. I’m not sure if exists an ARM version at this time, but it highly improves your code.

  11. While trying to find max speed of I2C, SPI and UART on Raspberry Pi, I found this… equally interesting and useful for me as my BE project is based on Raspi but I would love to have the info of speeds as well..
    This is what I got till now:
    UART: 476baud to 31.25Mbaud*
    SPI: 3.8 kHz to 250 MHz*
    I2C: 400kbps
    *theoretical maximum

    not tested on hardware as I don’t have the tools for it, if you got time, this is interesting to check out.. :)

    BTW, Thanks for the different code examples and the benchmarks… these are GREAT

  12. Hi,

    You have done a pretty nice job, thanks for sharing!

    We have performed a similar test to evaluate if the RaspPi can be useful for us at work. We got same results as you for python and C, but we also gave java a try.

    – Using a standard VM from openJDK: 2KHz

    – Using VM from Oracle SE embedded (ARMv6/7): 165KHz

    We also examined the Arduino for comparision.

    – Using standard IDE functions: 88KHz

    – Using low level C functions (AVR standard C): 1MHz(*)

    (*) we got 8MHz with a large program that toggled the output continuously instead of using a for/while loop.

  13. Hello,

    I am developing a GPIO library in Go and recently did a similar test. Some basic results, running flat out in a loop

    * using the /sys/class/gpio interface, I was able to get 116Khz
    * using /dev/kmem to map in the GPIO control memory (this is the same approach libbcm2835 uses) I got 7.1 Mhz, but was able to increase this 12.2Mhz by inlining all the calls.

    Afterwards I thought of a way to reduce the overhead even more, but was not able to measure the results. I’d estimate 14 to 16Mhz is possible, but only if you precalcuated all the addresses and shifts.

  14. Hello jokkebk,

    thanks for this nice IO speed benchmark. I tried to reproduce your C language results and got the same for BCM2835 C library and Maximum performance.

    Only exception: Compiling with -o3 does not change the IO speed, it is still 14 MHz. Do you have an idea what I can do to achieve your 22 MHz? I am using the Raspbian wheezy image.
    Thanks, Tom

    1. Thanks! And nice to hear confirmation for the results. I don’t know what may have caused the speed bump with -O3, it might be an older version of the libraries I had at the time, I haven’t tested with recent distros.

      1. Thanks for the information. I haven’t been back here for a while.My brother got 18.7Mhz for output on a home-made frequence meter he built just to test this.

  15. FYI:
    I just tried mono and C#. With direct Memory Access the GPIO speed was about 7.7Mhz. I didn’t use any wrappers, but unsafe code to directly address the GPIO Memory…
    When using a wrapper around wiringPi, then the C# solution is about 200kHz…

    Regards,

    Alex

  16. i have a camera in wich video is coming on camlink interface with pixel clock of 40MHz(16 bit video data for each pixel) i want to use rpi to process and display the video can it be done with GPIO or with any interface of rpi

    1. I’d wager about zero chance of doing 40 MHz data capture or comms with GPIO, even without O/S it would be hard for 1 bit, let alone 16. If the Pi has a HW camlink bus, then it might be possible. But probably you need some extra HW to do that.

  17. Hi,
    We are instead trying to read external signal into Rasp Pi and with basic C code we are not getting reliable signals even at 100kHz. We are reading the data into a vector using digitalRead and later storing it.
    The data looks irregular with missing 1’s when the input is a regular 100kHz signal. Any clue? Is reading into GPIO different than writing to from GPIO?
    Thanks

    1. I think it’s the same with output, the benchmark results only discuss average frequency, but not consistency – both are affected by Linux kernel which will probably launch its own interrupts several times a second, each lasting probably some microseconds at least.

      Only way to use RaspPi for high frequency (anything more than few kHz I’d think) would be a realtime OS instead of Linux, or blocking all kernel interrupts (I encountered a piece of code when googling around, but not certain how long can you do it and what are the ill effects, e.g. will there be data loss if SD write or USB is interrupted or something like that). Or then use an external microcontroller that communicates with RaspPi via serial.

  18. With a little bit of assembler inside an FIQ, I actually measure ~41.6 MHz for one complete high-low, low-high pin toggle with my Tek scope. So the true GPIO latency without higher level sugar is actually something around 12 nanoseconds…

    Here’s the core portion my basic benchmark code:

    .macro SINGLE_SHIFTED_BIT reg, bit
    ldr \reg, =0x1
    lsl \reg, \reg, \bit
    .endm

    SINGLE_SHIFTED_BIT R11, #15 // GPIO 15
    str R11, [R9,#0x1C] // 0x1C = GPSET0 offset, R9 = GPIO pointer
    str R11, [R9,#0x28] // 0x28 = GPCLR0 offset
    str R11, [R9,#0x1C]
    str R11, [R9,#0x28]
    str R11, [R9,#0x1C]
    str R11, [R9,#0x28]
    str R11, [R9,#0x1C]
    str R11, [R9,#0x28]
    str R11, [R9,#0x1C]
    str R11, [R9,#0x28]
    str R11, [R9,#0x1C]
    str R11, [R9,#0x28]
    str R11, [R9,#0x1C]
    str R11, [R9,#0x28]
    str R11, [R9,#0x1C]
    str R11, [R9,#0x28]

  19. Hello,

    Fantastic benchmark.
    Is it possible to update it to current software versions and RPi B+ and A+?

    Thanks,

    JM

    1. Unfortunately I do not have a wide array of Pis in my possession. However, I would be really surprised if there were any differences between the hardware revisions, as the processor and chipset is identical in every Pi. The network/USB part has some minor changes but even network IRQ handling would only block signal generation for short periods, which would not change the benchmark results in a visible way.

      I may revisit the benchmark at some point, but probably mainly to check how new SW versions have changed the situation.

  20. Great info. I’m using your example code to do some inline testing with a serial data connection. Dropping down to native c for the speed but my syntax is a little off.

    Here is what is looks like in WirePi. What would be the equivalent in native C?

    #include
    #include

    int main (void)
    {
    int state;
    printf (“Raspberry Pi In & Out\n”) ;

    if (wiringPiSetup () == -1)
    return 1 ;

    pinMode (0, INPUT) ; // aka BCM_GPIO pin 17
    pinMode (2, OUTPUT) ; // aka BCM_GPIO pin 27

    for (;;)
    {
    state = digitalRead (0); // Read it
    // printf (“Input is %d \n”,state);
    // digitalWrite (2, 1) ; // On
    // delay (500) ; // mS
    // digitalWrite (2, 0) ; // Off
    // delay (500) ;
    // delay (1) ;
    digitalWrite (2, state) ; // forward input
    }
    return 0 ;
    }

  21. Good job,

    But I get in most tests (native C, BCM and wiringPi) the half of the frequency you achieved, or a little bit less than you.
    I don’t know why. Any help? I need all the speed I can get from my gpios.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.