How to calculate PBKDF2 HMAC SHA256 with Python, example code
Mon, Dec 9, 2024 in post python encryption pbkdf2 sha256
Having just spent 4 hours trying to get a Python pseudocode version of PBKDF2 to match with hashlib.pbkdf2_hmac()
output, I thought I'll post Yet Another Example how to do it. I thought I could just use hashlib.sha256
to calculate the steps, but turns out HMAC is not just a concatenation of password, salt and counter.
So, without further ado, here's a 256 bit key generation with password and salt:
import hashlib, hmac
def pbkdf2(pwd, salt, iter):
h = hmac.new(pwd, digestmod=hashlib.sha256) # create HMAC using SHA256
m = h.copy() # calculate PRF(Password, Salt+INT_32_BE(1))
m.update(salt)
m.update(b'\x00\x00\x00\x01')
U = m.digest()
T = bytes(U) # copy
for _ in range(1, iter):
m = h.copy() # new instance of hmac(key)
m.update(U) # PRF(Password, U-1)
U = m.digest()
T = bytes(a^b for a,b in zip(U,T))
return T
pwd = b'password'
salt = b'salt'
# both should print 120fb6cffcf8b32c43e7225256c4f837a86548c92ccc35480805987cb70be17b
print(pbkdf2(pwd, salt, 1).hex())
print(hashlib.pbkdf2_hmac('sha256', pwd, salt, 1).hex())
# both should print c5e478d59288c841aa530db6845c4c8d962893a001ce4e11a4963873aa98134a
print(pbkdf2(pwd, salt, 4096).hex())
print(hashlib.pbkdf2_hmac('sha256', pwd, salt, 4096).hex())
Getting from pseudocode to actual working example was surprisingly hard, especially since most implementations on the web are on lower level languages, and Python results are mostly just using a library.
Simplifying the pseudo code further
If you want to avoid the new
...update
...digest
and skip the hmac
library altogether,
the code becomes even simpler. HMAC is quite simple
to implement with Python. Here's gethmac
function hard-coded to SHA256 and an even shorter pbkdf2
:
import hashlib
sha256 = lambda b: hashlib.sha256(b).digest()
def gethmac(key, content):
okeypad = bytes(v ^ 0x5c for v in key.ljust(64, b'\0'))
ikeypad = bytes(v ^ 0x36 for v in key.ljust(64, b'\0'))
return sha256(okeypad + sha256(ikeypad + content))
def pbkdf2(pwd, salt, iter):
U = gethmac(pwd, salt+b'\x00\x00\x00\x01')
T = bytes(U) # copy
for _ in range(1, iter):
U = gethmac(pwd, U)
T = bytes(a^b for a,b in zip(U,T))
return T
pwd = b'password'
salt = b'salt'
# both should print 120fb6cffcf8b32c43e7225256c4f837a86548c92ccc35480805987cb70be17b
print(pbkdf2(pwd, salt, 1).hex())
print(hashlib.pbkdf2_hmac('sha256', pwd, salt, 1).hex())
# both should print c5e478d59288c841aa530db6845c4c8d962893a001ce4e11a4963873aa98134a
print(pbkdf2(pwd, salt, 4096).hex())
print(hashlib.pbkdf2_hmac('sha256', pwd, salt, 4096).hex())
As you can see, HMAC is just creating a couple padded 64 byte arrays from key and then two nested hash calls. It also makes the pbkdf2()
quite easy to read compared to hmac
library!
If you want to optimize even further, you can do even the first round of U
and T
in the for
loop by taking advantage of the fact that val^0 == val
:
def pbkdf2(pwd, salt, iter):
U = salt+b'\x00\x00\x00\x01'
T = bytes(64)
for _ in range(iter):
U = gethmac(pwd, U)
T = bytes(a^b for a,b in zip(U,T))
return T