Let’s kill two birds with one stone: explain how the encryption in AES works and discuss AES Hardware acceleration is so efficient against software implementation.
Advanced Encryption Standard takes blocks of input data and encrypts them in rounds (10, 12, or 14). Most rounds consist of 4 main stages (only 1st and last rounds are different). Those stages are:
- Substitute bytes — each byte is replaced with a predefined one,
- Shift rows — each byte within bytes is shifted to the left within data words,
- Mixed columns — i.e. all 1st bytes of data words are processed according to mathematical operation,
- Add round key — data is XORed with specific for the round key generated based on the main one.
Now let’s understand what advantages have hardware implementation over software. These are:
- parallel processing,
- the simplicity of implementation of some of the operations,
- chain processing in hardware.
ParallelProcessing means that substitution or XORing the bytes could be done at the same time for each byte. In software we will need to analyze each byte one by one- this means at least 4 processor ticks for XORing and much more than 4 for substitution. In hardware, each operation for all bytes could be done in a single processor tick.
But that’s just the beginning, shifting rows in hardware does not require any tick from hardware as this is covered by appropriate connecting output from 1st stage to 3rd one. In software, we would need to reserve a significant amount of processor time for that.
Last but not least, in hardware when 1st block passed the stage another one could be read. In other words, all 4 stages could be done at the same time for 4 data blocks. This would be 4 different functions in software executed one by one.
To sum up AES seems to be designed to be executed on dedicated hardware. That’s why most chips include a hardware acceleration option for this encryption standard.
Do share your views about AES Hardware acceleration in comments.