AES-NI example programs

Intel processors since around 2010 support the AES-NI instruction set, which provides hardware acceleration for the AES block cipher.

There is plenty of AES-NI code out there, including the Linux kernel and Intel's own sample code. However I struggled to find a really clear, self-contained example of how these instructions work. Eventually I put together these programs as a test of my understanding, and a demonstration which may be useful to others. The programs are very simple, there is little going on besides AES-NI, and they are thoroughly commented.

This code is available under a BSD-style license; see LICENSE.

Warning

Do not use this code in any context where actual security is required! This is just a demonstration, for learning purposes, of an AES-128 block encryption / decryption primitive. I've expended zero effort at the level of protocol or application security. Each block is encrypted the same way regardless of where it appears in the input stream — this is known as ECB mode and it's very insecure. Furthermore there may be implementation bugs, side channel exposures, etc. Even with a perfect implementation of AES, there are many ways to screw up using it.

It really should go without saying, but you should not use assembly programs from some random person on GitHub in your security-critical systems. I take absolutely no responsibility for what happens if you do (see LICENSE).

Usage

encrypt will read a 16-byte AES-128 key from standard input, followed by zero or more 16-byte plaintext blocks, and will write the corresponding ciphertext to standard output. decrypt works the same way, with the roles of plaintext and ciphertext reversed. Run build-and-test.sh to build both programs and then test them against a few hundred AES-128 test vectors.

Both programs are written in assembly for amd64 Linux, and will run without needing libc or any other libraries. If you want to port these programs to another amd64 platform (with a GNU-compatible assembler), all of the OS-specific code is at the top of common.s.

To see if your CPU supports AES-NI, check for aes in /proc/cpuinfo.

Other caveats: This code doesn't have any error handling. It assumes it can always read/write 16 bytes at a time, and doesn't handle EINTR. It doesn't handle input that is not a multiple of the AES block size, nor does it check the decrypted plaintext against any kind of padding spec. It doesn't prevent key material from being swapped to disk.

Details

In AES, a single block of data goes through a number of encryption or decryption rounds, each against a separate round key. The 16-byte round keys are derived from the user-specified key (also 16 bytes in AES-128) through a process called key expansion.

The macro key_expand in common.s computes a single round key, by invoking the AESKEYGENASSIST instruction and then calling the function key_combine for additional processing. The actual encryption / decryption rounds are performed by AESENC / AESDEC instructions, with a variant used on the last round. See the Intel white paper for more details on the individual instructions. That document also explains how to use AES-NI with 192- or 256-bit keys, which my code does not support.

In these programs, almost all of the action happens in registers; we only use memory for calling read and write. Conveniently, an AES round key or data block is the same size as an SSE register. The user-specified key goes in %xmm5 and the other round keys are computed and stored in %xmm6 through %xmm15. %xmm0 through %xmm2 are used for various scratch purposes; %xmm3 and %xmm4 are unused.

The key expansion code in encrypt.s and decrypt.s is similar and could be combined using an additional macro, but I chose to leave them separate for clarity.

AES-NI itself is very fast. However, don't expect great performance from these programs, because they make two system calls per 16 bytes processed. A high-throughput version would buffer I/O to reduce system call overhead.

Using AES-NI eliminates the need for AES lookup tables, which have been a source of cache-related timing side channel vulnerabilities [PDF]. Indeed my programs have no data-dependent control flow or memory access. (You can see this clearly in the disassembly, which is free of the usual libc noise and is actually quite readable.) However I won't go so far as to claim there are no timing side channels, especially because I haven't performed any measurements.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build-and-test.sh		build-and-test.sh
common.s		common.s
decrypt.s		decrypt.s
encrypt.s		encrypt.s
test-vectors		test-vectors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AES-NI example programs

Warning

Usage

Details

About

Releases

Packages

Languages

License

kmcallister/aesni-examples

Folders and files

Latest commit

History

Repository files navigation

AES-NI example programs

Warning

Usage

Details

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages