- Security
- A
Machine Learning and Cryptography: Getting to Know CipherGAN
Machine learning is now used to a greater or lesser extent in various industries. Cryptographic analysis is no exception. In this article, we will look at the CipherGAN generative adversarial network, which is used to determine the basic encryption mapping from banks of unpaired encrypted text and plaintext.
CycleGAN
At the heart of CipherGAN is another generative adversarial network — CycleGAN. This type of generative adversarial network is used for image style transfer. For example, CycleGAN can be trained to convert images from one domain (e.g., Fortnite) to another, such as PUBG. Training is unsupervised, meaning there is no way to uniquely match images from both domains.
This opens up the possibility of performing many interesting tasks, such as improving photo quality, coloring images, style transfer, etc. All you need is a source and target dataset (which is just a catalog of images). The general principle of this network is shown in the following diagram.
But let's return to the main topic of our article — CipherGAN. As mentioned above, this network is used to determine the basic encryption mapping from banks of unpaired ciphertext and plaintext.
Substitution ciphers have the property of confusion, which ensures the indistinguishability of the encryption key and the corresponding encrypted text. In particular, confusion is the main building block of modern symmetric cryptography, which has the property of diffusion. Thus, understanding the structure and vulnerability of substitution ciphers can form the basis for the cryptanalysis of modern cryptography. Table 2 presents various AI-based cryptanalysis studies.
However, CipherGAN is capable of breaking language data encrypted using various cipher variations and the Vigenère cipher with a high degree of accuracy and for dictionaries significantly exceeding previously achieved ones.
Let's recall what the Vigenère cipher is. A table of alphabets, called tabula recta or the Vigenère square (table), can be used for encryption. For the Latin alphabet, the Vigenère table consists of rows of 26 characters, with each subsequent row shifted by several positions.
At each stage of encryption, different alphabets are used, chosen depending on the character of the keyword. For example, suppose the original text looks like this:
ATTACKATDAWN
The person sending the message writes the keyword ("LEMON") cyclically until its length matches the length of the original text:
LEMONLEMONLE
The first character of the original text ("A") is encrypted with the sequence L, which is the first character of the key. The first character of the encrypted text ("L") is at the intersection of row L and column A in the Vigenère table. Similarly, the second character of the original text uses the second character of the key; that is, the second character of the encrypted text ("X") is obtained at the intersection of row E and column T. The rest of the original text is encrypted in a similar way.
Original text: ATTACKATDAWN
Key: LEMONLEMONLE
Encrypted text: LXFOPVEFRNHR
Of course, by modern standards, this is, to put it mildly, not the most powerful encryption algorithm, but for educational purposes it can be quite useful.
Installing CipherGAN
CipherGAN is written in Python, which makes working with this network much easier. To install, run:
pip install -r CipherGAN/requirements.txt
To train our neural network, we use data generators. Special attention should be paid to the cipher_generator (https://github.com/for‑ai/CipherGAN/blob/master/data/data_generators/cipher_generator.py), which can be used to generate data for shift and Vigenère ciphers.
The settings required for generation are passed to the script as flags. For example, to generate a Vigenère cipher at the word level (key:CODE) with a sample length of 200, run:
python CipherGAN/data/data_generators/cipher_generator.py --cipher=vigenere --vigenere_key=345 --percentage_training=0.9 --corpus=brown --vocab_size=200 --test_name=vigenere345-brown200-eval --train_name=vigenere345-brown200-train --output_dir=tmp/data --vocab_filename=vigenere345_brown200_vocab.txt
Starting training
To train the neural network, you can use the train.py
script shown below.
import shutil
import os
import tensorflow as tf
from .hparams.registry import get_hparams
from .models.registry import _MODELS
from .data.registry import _INPUT_FNS, get_dataset
from .metrics.registry import get_metrics
from .train_utils.lr_schemes import get_lr
from .train_utils.vocab_utils import read_vocab
tf.flags.DEFINE_string("model", "cycle_gan", "Which model to use.")
tf.flags.DEFINE_string("data", "cipher", "Which data to use.")
tf.flags.DEFINE_string("hparam_sets", "cipher_default", "Which hparams to use.")
tf.flags.DEFINE_string("hparams", "", "Run-specific hparam settings to use.")
tf.flags.DEFINE_string("metrics", "xy_mse",
"Dash separated list of metrics to use.")
tf.flags.DEFINE_string("output_dir", "tmp/tf_run",
"The output directory.")
tf.flags.DEFINE_string("data_dir", "tmp/data", "The data directory.")
tf.flags.DEFINE_integer("train_steps", 1e4,
"Number of training steps to perform.")
tf.flags.DEFINE_integer("eval_steps", 1e2,
"Number of evaluation steps to perform.")
tf.flags.DEFINE_boolean("overwrite_output", False,
"Remove output_dir before running.")
tf.flags.DEFINE_string("train_name", "data-train*",
"The train dataset file name.")
tf.flags.DEFINE_string("test_name", "data-eval*", "The test dataset file name.")
FLAGS = tf.app.flags.FLAGS
tf.logging.set_verbosity(tf.logging.INFO)
def _run_locally(train_steps, eval_steps):
"""Run training, evaluation and inference locally.
Args:
train_steps: An integer, number of steps to train.
eval_steps: An integer, number of steps to evaluate.
"""
hparams = get_hparams(FLAGS.hparam_sets)
hparams = hparams.parse(FLAGS.hparams)
hparams.total_steps = FLAGS.train_steps
if "vocab_file" in hparams.values():
hparams.vocab = read_vocab(hparams.vocab_file)
hparams.vocab_size = len(hparams.vocab)
hparams.vocab_size += int(hparams.vocab_size % 2 == 1)
hparams.input_shape = [hparams.sample_length, hparams.vocab_size]
output_dir = FLAGS.output_dir
if os.path.exists(output_dir) and FLAGS.overwrite_output:
shutil.rmtree(FLAGS.output_dir)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
def model_fn(features, labels, mode):
lr = get_lr(hparams)
return _MODELS[FLAGS.model](hparams, lr
python -m CipherGAN.train --output_dir=runs/vig345 --test_name="vigenere345-brown200-eval*" --train_name="vigenere345-brown200-train*" --hparam_sets=vigenere_brown_vocab_200
Conclusion
Of course, artificial intelligence still cannot crack AES or GOST. The CipherGAN neural network presented in the article can currently decrypt educational ciphers, such as the Vigenere cipher. However, in the future, neural networks may well become an effective assistant for cryptanalysts.
The article was prepared in anticipation of the start of the course "Cryptographic Information Protection". Learn more about the course.
Write comment