The idea of homomorphic encryption was first proposed in 1978 by Rivest et al. In 2009, Gentry’s seminal work provided a framework to make fully homomorphic encryption feasible, and almost a decade’s work has now made it practical. While homomorphic encryption has become realistic, it still remains several magnitudes too slow, making it expensive and resource intensive. There are no existing homomorphic encryption schemes with performance levels that would allow large-scale practical usage. Substantial efforts have been put forward to develop full-fledged software libraries for homomorphic encryption. Such libraries include SEAL, Palisade, cuHE, HElib, NFLLib, Lattigo, and HEAAN. All of these libraries are based on the RLWE-based encryption scheme, and they generally implement Brakerski-Gentry-Vaikuntanathan (BGV), FanVercauteren (FV), and Cheon-Kim-Kim-Song (CKKS) homomorphic encryption schemes with very similar parameters.
Given the fluid state of improvements and new schemes in the field of HE, our design methodology is to provide a hardware suite to support the operations and functionalities common across the different HE algorithms, instead of optimizing the hardware for one particular algorithm. This approach provides flexibility to experiment with new designs and ideas using the core operations for lattices, polynomials, arithmetic, logic, statistical samplings, and finite fields.
We know that word size directly relates to the signal-to-noise ratio (SNR) of how a ciphertext is stored and manipulated in computation. There currently is no hardware architecture that natively supports the register sizes and/or the execution units performing the fundamental mathematical and logical operations for lattice FHE schemes; to make matters worse, x86 is native to 64-bit registers/operands and GPU architectures are built for 16-bit or 32-bit operands/operators. These intrinsic hardware constructions, which were designed to solve other problems, place a heavy burden in overhead for FHE computations that would natively be performed with registers and operators 1000s of bit wide. We are designing an architecture with LAWS to address this limitation on state of the art hardware today that hinders FHE computation.
This design methodology seeks to create an ISA based on the core mathematical foundation required by FHE algorithms and informed by the research on FHE schemes on how to optimize them. The large number of finite field operations involved in virtually every known FHE scheme is a major computational bottleneck. The heavy use of sampling from various statistical distributions are also a computational concern that the methodology addresses.
It is not yet certain that any known FHE schemes today are optimal nor that the algorithms that will be standardized will be those known today. We therefore want to provide a platform that is able to provide significant performance gains in lattice based FHE as well as the flexibility to experiment with new designs and ideas using the core operations for lattices, polynomials, arithmetic, logic and finite fields that will be available in our platform in the form of instructions.
Using our methodology, we introduce an open-source, first-of-its-kind, arithmetic hardware library with a focus on accelerating the arithmetic operations involved in Ring Learning with Error (RLWE)-based homomorphic encryption (HE). We design and implement a hardware accelerator consisting of submodules like Residue Number System (RNS), Chinese Remainder Theorem (CRT), NTT-based polynomial multiplication, modulo inverse, modulo reduction, and all the other polynomial and scalar operations involved in HE.
For all of these operations, wherever possible, we include a hardware-cost efficient serial and a fast parallel implementation in the library. A modular and parameterized design approach helps in easy customization and also provides flexibility to extend these operations for use in most homomorphic encryption applications that fit well into emerging FPGA-equipped cloud architectures.
Using the submodules from the library, we prototype a hardware accelerator on FPGA. The evaluation of this hardware accelerator shows a speed up of approximately 4200× and 2950× to evaluate a homomorphic multiplication and addition respectively when compared to an existing software implementation.