Python on-the-fly AES encryption/decryption and transfer to AWS S3
Tue, Feb 8, 2022 in post General python encryption decryption aes aws s3 boto3 fileson
So, I started writing a file database and toolset called fileson to take advantage of AWS S3 Glacier Deep Archive (let's just call it GDA from now on). With 1 €/mo/TB storage cost, it is essentially a dirt cheap option to store very infrequently accessed data like offsite backups.
Why not just use rclone? Well, I disliked the fact that all tools do a ton of (paid) queries against S3 when syncing. I thought a simple JSON file database should work to keep track what to copy and delete. Well, that work is progressing, but as a part of that...
Encrypting on the fly with Python and Pycrypto(dome)
I started thinking that client side encryption would be useful as well. AES is tried and tested, and it's easy to find sample code to do it. But it seems wasteful to first create encrypted files on your hard drive, then upload them to AWS and finally delete everything.
Luckily, the Python AWS SDK boto3
has a great example on how to upload a file to
S3
with upload_fileobj
that accepts "a readable file-like object". What does that
mean? Let's find out!
(note that you need to have the boto3
and pycryptodome
libraries
installed to successfully run these examples)
#!/usr/bin/env python3
import hashlib, os, boto3
class FileLike:
def __init__(self, filename, mode):
self.fp = open(filename, mode)
def write(self, data):
print('write', len(data), 'bytes')
return self.fp.write(data)
def read(self, size=-1):
print('read', size, 'bytes')
return self.fp.read(size)
def tell(self):
print('tell =', self.fp.tell())
return self.fp.tell()
def seek(self, offset, whence=0):
print('seek', offset, whence)
self.fp.seek(offset, whence)
def close(self):
print('close')
self.fp.close()
s3 = boto3.client('s3')
fp = FileLike('hash.py', 'rb')
print('Uploading...')
s3.upload_fileobj(fp, 'mybucket', 'please_remove.txt')
print('Done...')
The FileLike
class is a dummy wrapper around basic file functions that
prints out what is happening when s3.upload_fileobj
uses the provided
object.
user@server:~$ ./s3_test.py
Uploading...
seek 0 1
tell = 0
seek 0 2
tell = 357
seek 0 0
tell = 0
tell = 0
read 357 bytes
read 0 bytes
seek 0 0
read 357 bytes
read 0 bytes
close
Done...
So what happens? upload_fileobj
seems to:
- Seek the current position with
fp.seek(offset=0, whence=1)
. - Call
tell
, most likely to see where in the file it was when called. - Seek to the end of the file with
fp.seek(offset=0, whence=2)
. - Again call
tell
to know where the file ends. - Seek back to where it started,
- Make sure it's where it started with
tell
. - Read the rest of the data (probably very large files would be read in chunks).
- Go back to beginning and read the data again -- most likely first round is checksum
- Close the file (surprising, as it is not opening it...)
This tells us exactly what is the minimum needed to implement an "on the fly" AES encoding file object:
read
function that takes number of bytestell
function that can return 0 at the beginning andfilesize
in the end, possibly intermediate values after someread
sseek
function that can go to beginning (0,0), nowhere (0,1) and end (0,2)
Now I chose to do a little complicating twist: When on-the-fly encrypting
with AES CTR (which I chose to avoid padding), one needs to store the
randomized initial value (usual shorthand for this is iv
) of the counter
somewhere. With a 128 bit counter, this is 16 bytes. Usually this is stored in
the beginning of the encrypted file. Now it means that first 16 bytes "read"
from my wrapper should return the iv
, and then start returning encrypted
data. Also, the tell
function in the end should return a length 16 bytes
longer than the file being encrypted to accommodate. Doable, but as we are not
100 % sure the first read will not be something like 10 bytes (leaving 6 more
bytes of iv
to return), we need to do some conditionals in the read function.
Also, when you are seeking back to start, you need to reset the AES encryption,
as boto3
does two passes on the upload (presumably for checksumming). Here's
the final wrapper (with "write" support as well to support on-the-fly decryption
when downloading from S3 and writing to disk):
from Crypto.Cipher import AES
from Crypto.Util import Counter
import hashlib, os
class AESFile:
"""On-the-fly AES encryption (on read) and decryption (on write).
When reading, returns 16 bytes of iv first, then encrypted payload.
On writing, first 16 bytes are assumed to contain the iv.
Does the bare minimum, you may get errors if not careful."""
@staticmethod
def key(passStr, saltStr, iterations=100000):
return hashlib.pbkdf2_hmac('sha256', passStr.encode('utf8'),
saltStr.encode('utf8'), iterations)
def initAES(self):
self.obj = AES.new(self.key, AES.MODE_CTR, counter=Counter.new(
128, initial_value=int.from_bytes(self.iv, byteorder='big')))
def __init__(self, filename, mode, key, iv=None):
if not mode in ('wb', 'rb'):
raise RuntimeError('Only rb and wb modes supported!')
self.pos = 0
self.key = key
self.mode = mode
self.fp = open(filename, mode)
if mode == 'rb':
self.iv = iv or os.urandom(16)
self.initAES()
else: self.iv = bytearray(16)
def write(self, data):
datalen = len(data)
if self.pos < 16:
ivlen = min(16-self.pos, datalen)
self.iv[self.pos:self.pos+ivlen] = data[:ivlen]
self.pos += ivlen
if self.pos == 16: self.initAES() # ready to init now
data = data[ivlen:]
if data: self.pos += self.fp.write(self.obj.decrypt(data))
return datalen
def read(self, size=-1):
ivpart = b''
if self.pos < 16:
if size == -1: ivpart = self.iv
else:
ivpart = self.iv[self.pos:min(16, self.pos+size)]
size -= len(ivpart)
enpart = self.obj.encrypt(self.fp.read(size)) if size else b''
self.pos += len(ivpart) + len(enpart)
return ivpart + enpart
def tell(self): return self.pos
# only in read mode (encrypting)
def seek(self, offset, whence=0): # enough seek to satisfy AWS boto3
if offset: raise RuntimeError('Only seek(0, whence) supported')
self.fp.seek(offset, whence) # offset=0 works for all whences
if whence==0: # absolute positioning, offset=0
self.pos = 0
self.initAES()
elif whence==2: # relative to file end, offset=0
self.pos = 16 + self.fp.tell()
def close(self): self.fp.close()
Using the wrapper locally is trivial, just replace the normal
fp = open(filename, 'rb')
with fp = AESFile(filename, 'rb', key)
(you can generate a 16 byte key yourself or use the AESFile.key
to
get proper PBKDF2 derived key from password and salt). Reading from
that file pointer will give you first the iv
and then contents of
filename
encrypted.
To decrypt, you replace rb
with wb
and write the encrypted data, and the
wrapper writes decrypted data into the chosen file. I've provided a
complete encryption/decryption utility with the source file in
fileson crypt.py
Wrapping it up into AWS S3
Armed with the above class, it becomes trivial to adapt the boto3
AWS S3
examples to encrypt on the fly during upload, and decrypt on the fly during
download. Note that you need to configure boto3
properly before running
the code below, so follow the SDK docs first and only do this after you've
successfully ran their example without encryption.
#!/usr/bin/env python3
import boto3
from crypt import AESFile
import argparse, time
parser = argparse.ArgumentParser(description='AWS S3 upload/download with on-the-fly encryption')
parser.add_argument('mode', type=str, choices=['upload','download'], help='Mode')
parser.add_argument('bucket', type=str, help='S3 bucket')
parser.add_argument('input', type=str, help='Input file or S3 object name')
parser.add_argument('output', type=str, help='Output file or S3 object name')
parser.add_argument('password', type=str, help='Password')
parser.add_argument('salt', type=str, help='Salt')
parser.add_argument('-i', '--iterations', type=int, default=100000,
help='PBKDF2 iterations (default 100000)')
args = parser.parse_args()
s3 = boto3.client('s3')
key = AESFile.key(args.password, args.salt, args.iterations)
if args.mode == 'upload':
fp = AESFile(args.input, 'rb', key)
s3.upload_fileobj(fp, args.bucket, args.output)
else:
fp = AESFile(args.output, 'wb', key)
s3.download_fileobj(args.bucket, args.input, fp)
fp.close()
Super cool. You need to have a file and a bucket, but armed with those, let's
try it out (writing the script itself to a folder called test
in S3):
user@server:~$ ./aws.py upload mybucket aws.py test/aws.bin password salt
user@server:~$ ./aws.py download mybucket test/aws.bin aws2.py password salt
user@server:~$ diff aws.py aws2.py
If all went perfectly, diff should find your files identical. You can download
the test/aws.bin
yourself to view the encrypted version.
Awesome. You can now store encrypted stuff to AWS at will.