Code and Life

Programming, electronics and other cool tech stuff

Supported by

Supported by Picotech

How to calculate PBKDF2 HMAC SHA256 with Python, example code

Having just spent 4 hours trying to get a Python pseudocode version of PBKDF2 to match with hashlib.pbkdf2_hmac() output, I thought I'll post Yet Another Example how to do it. I thought I could just use hashlib.sha256 to calculate the steps, but turns out HMAC is not just a concatenation of password, salt and counter.

So, without further ado, here's a 256 bit key generation with password and salt:

import hashlib, hmac

def pbkdf2(pwd, salt, iter):
    h = hmac.new(pwd, digestmod=hashlib.sha256) # create HMAC using SHA256
    m = h.copy() # calculate PRF(Password, Salt+INT_32_BE(1))
    m.update(salt)
    m.update(b'\x00\x00\x00\x01')
    U = m.digest()
    T = bytes(U) # copy
    for _ in range(1, iter):
        m = h.copy() # new instance of hmac(key)
        m.update(U) # PRF(Password, U-1)
        U = m.digest()
        T = bytes(a^b for a,b in zip(U,T))
    return T

pwd = b'password'
salt = b'salt'

# both should print 120fb6cffcf8b32c43e7225256c4f837a86548c92ccc35480805987cb70be17b
print(pbkdf2(pwd, salt, 1).hex())
print(hashlib.pbkdf2_hmac('sha256', pwd, salt, 1).hex())

# both should print c5e478d59288c841aa530db6845c4c8d962893a001ce4e11a4963873aa98134a
print(pbkdf2(pwd, salt, 4096).hex())
print(hashlib.pbkdf2_hmac('sha256', pwd, salt, 4096).hex())

Getting from pseudocode to actual working example was surprisingly hard, especially since most implementations on the web are on lower level languages, and Python results are mostly just using a library.

Simplifying the pseudo code further

If you want to avoid the new...update...digest and skip the hmac library altogether, the code becomes even simpler. HMAC is quite simple to implement with Python. Here's gethmac function hard-coded to SHA256 and an even shorter pbkdf2:

import hashlib

sha256 = lambda b: hashlib.sha256(b).digest()

def gethmac(key, content):
    okeypad = bytes(v ^ 0x5c for v in key.ljust(64, b'\0'))
    ikeypad = bytes(v ^ 0x36 for v in key.ljust(64, b'\0'))
    return sha256(okeypad + sha256(ikeypad + content))

def pbkdf2(pwd, salt, iter):
    U = gethmac(pwd, salt+b'\x00\x00\x00\x01')
    T = bytes(U) # copy
    for _ in range(1, iter):
        U = gethmac(pwd, U)
        T = bytes(a^b for a,b in zip(U,T))
    return T

pwd = b'password'
salt = b'salt'

# both should print 120fb6cffcf8b32c43e7225256c4f837a86548c92ccc35480805987cb70be17b
print(pbkdf2(pwd, salt, 1).hex())
print(hashlib.pbkdf2_hmac('sha256', pwd, salt, 1).hex())

# both should print c5e478d59288c841aa530db6845c4c8d962893a001ce4e11a4963873aa98134a
print(pbkdf2(pwd, salt, 4096).hex())
print(hashlib.pbkdf2_hmac('sha256', pwd, salt, 4096).hex())

As you can see, HMAC is just creating a couple padded 64 byte arrays from key and then two nested hash calls. It also makes the pbkdf2() quite easy to read compared to hmac library!

If you want to optimize even further, you can do even the first round of U and T in the for loop by taking advantage of the fact that val^0 == val:

def pbkdf2(pwd, salt, iter):
    U = salt+b'\x00\x00\x00\x01'
    T = bytes(64)
    for _ in range(iter):
        U = gethmac(pwd, U)
        T = bytes(a^b for a,b in zip(U,T))
    return T