Encryption of backups using age and the value of simplicity

15:58
06.11.2024
xenon
311

I will briefly introduce the age encryption tool. We will learn to encrypt both personal and corporate backups with one simple command, and then argue about why openssl is such a bad tool and not suitable for practical use by ordinary users/admins whose understanding of crypto algorithms is roughly expressed in the KDPV.

A Little Obvious

Backups need to be stored (better - stored in multiple copies, including, necessarily, somewhere outside) and they need to be encrypted. Everyone seems to agree with this, but in reality - few people do it. So, people are divided into those who do not yet encrypt backups (they have not yet realistically imagined-felt the possible risks) and responsible experienced people whose dick pics have been seen by the whole Internet (whom we know, figuratively speaking, by face).

Author - utility (can it be trusted?) Filippo Valsorda worked in the cryptographic team at Cloudflare, leads the Go security team at Google.

Just use it!

First - the simplest encryption, just with a password, for the smallest ones.

# encrypt
$ age -p -o paris.jpg.age paris.jpg 
Enter passphrase (leave empty to autogenerate a secure one): 
Confirm passphrase: 

# decrypt
$ age -d -o paris-decrypted.jpg paris.jpg.age 
Enter passphrase:

This is already enough to store files somewhere on Yandex.Disk and not be afraid that Yandex, a hacker, or special services will read them.

But let's consider a more interesting option that is not ashamed to be used in production. We want to encrypt database backups before uploading them to S3. Here are a few complications:

We do not want to store any private or symmetric keys on the server.
Several people in the company ("recipients") should be able to decrypt the backups. So that even if one leaves the team, someone else can decrypt the backup.
We do not want to use a single "super-key" and give it to everyone. As a rule, such a key quickly leaks (through hacked messengers, correspondence) and the worst thing is that we cannot even find out from whom it leaked. Each recipient should be able to decrypt with their individual key, which they do not reveal anywhere.

First, a little theory on how it works. Asymmetric encryption is not suitable for encrypting large amounts of data. Therefore, a random key is generated for encryption, the file itself is symmetrically encrypted with the ChaCha20 algorithm using this key, and then this key is added to the output file several times, encrypted with the public key of each recipient.

Where to get the keys? The first option is to generate them: age-keygen -o key.txt will create a pair of public and private ed25519 keys. But... why? Nowadays, everyone has an SSH key in ed25519 format or at least RSA. We won't create new keys, we'll use the SSH keys we already have, age can do that! Recipients to the encrypted file can be added by repeating the argument -r KEY (passing the public key value directly as an argument), or -R file - where the file lists (one or more) public keys. Your ~/.ssh/id_ed25519.pub is quite suitable as an encryption key. But we will write all recipients into one file recipients.txt:

# john doe
ssh-ed25519 AAAA....

# my key
ssh-ed25519 AAAA....

Yes, empty lines and hashes are allowed. and even ~/.ssh/authorized_keys is quite suitable as such a file.

So let's go:

$ age -o paris.jpg.age -R recipients.txt paris.jpg

The resulting paris.jpg.age file can be freely uploaded to your favorite file sharing service, be it S3, Dropbox, or Skype.

Why I like age (a bit of a holy war)

Firstly - age, it is simple, cheap, and quick to implement. As Boris the Blade said about cryptography: simplicity is good, simplicity is reliable! From a practical point of view, age is a tool that you read an article about, played with for 15 minutes, saw that it works, and by the evening started to reliably encrypt backups. It works. With some complex schemes, there is a higher probability that while you figure them out at a sufficient level, other important tasks will arise, you will switch to them and simply not do any encryption. Therefore, I am a supporter of simple and reliable solutions.

Secondly, the simplicity of age also has advantages from a cryptographic point of view. Most likely, the algorithms chosen by the author are good enough for this (quite typical for everyone) task, where it is not required to select special algorithms "for your unique leg". I think a competent author of the utility made a better choice of algorithms than I would have (with my cryptographic literacy, as illustrated above).

The third plus is that age simply does not have unnecessary knobs, options that you would not need. There is no risk that by using age incorrectly you will shoot yourself in the foot and get backups that someone will hack on a home computer in an hour.

I have not actively used PGP/GPG, except for playing around, but I have used openssl. The experience was so painful and traumatic that my psychotrauma from using openssl to work with X.509 certificates sublimated into the project showcert, 70+ stars on GitHub. All typical operations with certificates, from viewing a certificate from a file or on a remote server to creating your own CA, are done with simple, short, intuitive commands. Compare:

# you will never forget how to read certificate with showcert
showcert habr.com 

# two redirections, pipe, two invocations and 5 unneeded options
openssl s_client -connect habr.com:443 /dev/null | openssl x509 -inform pem -text

I have used openssl for many years to check certificates and never could print this pipeline of two commands from memory without errors - I always started with Google to find it. Compare with the first command, where it is impossible to make a mistake, there is simply no place. Therefore, for me, age looks like the same reasonable simplification.

openssl left me with the impression of not a tool for solving a problem, but rather a gasket, a CLI interface to a very rich zoo of various cryptographic functions (99% of which you don't need, and the remaining 1% you don't understand how to use, when GCM is better than XTS, but you can google and copy the spell from the browser to the shell).

Below, in the "links for holivar" there is a good illustration (stackoverflow), when a long "spell" with openssl actually encrypts the file. This answer is upvoted, many thousands of people copy and use it, and only below in the comments to the answer, it is written why it is wrong to do so. The same ECB-penguin effect from the beginning of the article. People do cryptography based on a semi-literate comment from an unknown person, upvoted by even more illiterate people in cryptography.

But age is exactly a utility for encryption, a tool for a specific purpose. It does not claim to be a Swiss knife, which can also repair a spaceship, but it cuts cervelat perfectly.

Also, I really liked the argument from the article

When was the last time we saw a format or protocol being broken because it had a vulnerable cipher? RC4, which was never seriously used anywhere, if cryptographers were involved? But if the protocol or format can negotiate/select these algorithms, then we have seen a lot of downgrade attacks or vulnerabilities in ASN.1 parsers (due to forced complex formats). Solutions with mono-algorithms are simply not subject to these attacks.