Python on-the-fly AES encryption/decryption and transfer to AWS S3
So, I started writing a file database and toolset called fileson to take advantage of AWS S3 Glacier Deep Archive (let's just call it GDA from now on). With 1 €/mo/TB storage cost, it is essentially a dirt cheap option to store very infrequently accessed data like offsite backups.
Why not just use rclone? Well, I disliked the fact that all tools do a ton of (paid) queries against S3 when syncing. I thought a simple JSON file database should work to keep track what to copy and delete. Well, that work is progressing, but as a part of that...
Encrypting on the fly with Python and Pycrypto(dome)
I started thinking that client side encryption would be useful as well. AES is tried and tested, and it's easy to find sample code to do it. But it seems wasteful to first create encrypted files on your hard drive, then upload them to AWS and finally delete everything.
(note that you need to have the
installed to successfully run these examples)
#!/usr/bin/env python3 import hashlib, os, boto3 class FileLike: def __init__(self, filename, mode): self.fp = open(filename, mode) def write(self, data): print('write', len(data), 'bytes') return self.fp.write(data) def read(self, size=-1): print('read', size, 'bytes') return self.fp.read(size) def tell(self): print('tell =', self.fp.tell()) return self.fp.tell() def seek(self, offset, whence=0): print('seek', offset, whence) self.fp.seek(offset, whence) def close(self): print('close') self.fp.close() s3 = boto3.client('s3') fp = FileLike('hash.py', 'rb') print('Uploading...') s3.upload_fileobj(fp, 'mybucket', 'please_remove.txt') print('Done...')
FileLike class is a dummy wrapper around basic file functions that
prints out what is happening when
s3.upload_fileobj uses the provided
user@server:~$ ./s3_test.py Uploading... seek 0 1 tell = 0 seek 0 2 tell = 357 seek 0 0 tell = 0 tell = 0 read 357 bytes read 0 bytes seek 0 0 read 357 bytes read 0 bytes close Done...
So what happens?
upload_fileobj seems to:
- Seek the current position with
tell, most likely to see where in the file it was when called.
- Seek to the end of the file with
- Again call
tellto know where the file ends.
- Seek back to where it started,
- Make sure it's where it started with
- Read the rest of the data (probably very large files would be read in chunks).
- Go back to beginning and read the data again -- most likely first round is checksum
- Close the file (surprising, as it is not opening it...)
This tells us exactly what is the minimum needed to implement an "on the fly" AES encoding file object:
readfunction that takes number of bytes
tellfunction that can return 0 at the beginning and
filesizein the end, possibly intermediate values after some
seekfunction that can go to beginning (0,0), nowhere (0,1) and end (0,2)
Now I chose to do a little complicating twist: When on-the-fly encrypting
with AES CTR (which I chose to avoid padding), one needs to store the
randomized initial value (usual shorthand for this is
iv) of the counter
somewhere. With a 128 bit counter, this is 16 bytes. Usually this is stored in
the beginning of the encrypted file. Now it means that first 16 bytes "read"
from my wrapper should return the
iv, and then start returning encrypted
data. Also, the
tell function in the end should return a length 16 bytes
longer than the file being encrypted to accommodate. Doable, but as we are not
100 % sure the first read will not be something like 10 bytes (leaving 6 more
iv to return), we need to do some conditionals in the read function.
Also, when you are seeking back to start, you need to reset the AES encryption,
boto3 does two passes on the upload (presumably for checksumming). Here's
the final wrapper (with "write" support as well to support on-the-fly decryption
when downloading from S3 and writing to disk):
from Crypto.Cipher import AES from Crypto.Util import Counter import hashlib, os class AESFile: """On-the-fly AES encryption (on read) and decryption (on write). When reading, returns 16 bytes of iv first, then encrypted payload. On writing, first 16 bytes are assumed to contain the iv. Does the bare minimum, you may get errors if not careful.""" @staticmethod def key(passStr, saltStr, iterations=100000): return hashlib.pbkdf2_hmac('sha256', passStr.encode('utf8'), saltStr.encode('utf8'), iterations) def initAES(self): self.obj = AES.new(self.key, AES.MODE_CTR, counter=Counter.new( 128, initial_value=int.from_bytes(self.iv, byteorder='big'))) def __init__(self, filename, mode, key, iv=None): if not mode in ('wb', 'rb'): raise RuntimeError('Only rb and wb modes supported!') self.pos = 0 self.key = key self.mode = mode self.fp = open(filename, mode) if mode == 'rb': self.iv = iv or os.urandom(16) self.initAES() else: self.iv = bytearray(16) def write(self, data): datalen = len(data) if self.pos < 16: ivlen = min(16-self.pos, datalen) self.iv[self.pos:self.pos+ivlen] = data[:ivlen] self.pos += ivlen if self.pos == 16: self.initAES() # ready to init now data = data[ivlen:] if data: self.pos += self.fp.write(self.obj.decrypt(data)) return datalen def read(self, size=-1): ivpart = b'' if self.pos < 16: if size == -1: ivpart = self.iv else: ivpart = self.iv[self.pos:min(16, self.pos+size)] size -= len(ivpart) enpart = self.obj.encrypt(self.fp.read(size)) if size else b'' self.pos += len(ivpart) + len(enpart) return ivpart + enpart def tell(self): return self.pos # only in read mode (encrypting) def seek(self, offset, whence=0): # enough seek to satisfy AWS boto3 if offset: raise RuntimeError('Only seek(0, whence) supported') self.fp.seek(offset, whence) # offset=0 works for all whences if whence==0: # absolute positioning, offset=0 self.pos = 0 self.initAES() elif whence==2: # relative to file end, offset=0 self.pos = 16 + self.fp.tell() def close(self): self.fp.close()
Using the wrapper locally is trivial, just replace the normal
fp = open(filename, 'rb') with
fp = AESFile(filename, 'rb', key)
(you can generate a 16 byte key yourself or use the
get proper PBKDF2 derived key from password and salt). Reading from
that file pointer will give you first the
iv and then contents of
To decrypt, you replace
wb and write the encrypted data, and the
wrapper writes decrypted data into the chosen file. I've provided a
complete encryption/decryption utility with the source file in
Wrapping it up into AWS S3
Armed with the above class, it becomes trivial to adapt the
boto3 AWS S3
examples to encrypt on the fly during upload, and decrypt on the fly during
download. Note that you need to configure
boto3 properly before running
the code below, so follow the SDK docs first and only do this after you've
successfully ran their example without encryption.
#!/usr/bin/env python3 import boto3 from crypt import AESFile import argparse, time parser = argparse.ArgumentParser(description='AWS S3 upload/download with on-the-fly encryption') parser.add_argument('mode', type=str, choices=['upload','download'], help='Mode') parser.add_argument('bucket', type=str, help='S3 bucket') parser.add_argument('input', type=str, help='Input file or S3 object name') parser.add_argument('output', type=str, help='Output file or S3 object name') parser.add_argument('password', type=str, help='Password') parser.add_argument('salt', type=str, help='Salt') parser.add_argument('-i', '--iterations', type=int, default=100000, help='PBKDF2 iterations (default 100000)') args = parser.parse_args() s3 = boto3.client('s3') key = AESFile.key(args.password, args.salt, args.iterations) if args.mode == 'upload': fp = AESFile(args.input, 'rb', key) s3.upload_fileobj(fp, args.bucket, args.output) else: fp = AESFile(args.output, 'wb', key) s3.download_fileobj(args.bucket, args.input, fp) fp.close()
Super cool. You need to have a file and a bucket, but armed with those, let's
try it out (writing the script itself to a folder called
test in S3):
user@server:~$ ./aws.py upload mybucket aws.py test/aws.bin password salt user@server:~$ ./aws.py download mybucket test/aws.bin aws2.py password salt user@server:~$ diff aws.py aws2.py
If all went perfectly, diff should find your files identical. You can download
test/aws.bin yourself to view the encrypted version.
Awesome. You can now store encrypted stuff to AWS at will.