IACR eprint · Jun 2026
Character Block Encodings for Discrete CKKS: Single-Level LUTs and Low-Depth Arithmetic
Jules Dumezy, Elias Suvanto
Abstract
Functional bootstrapping has made discrete computation practical in the Cheon-Kim-Kim-Song (CKKS) scheme, but it fuses four distinct tasks - lookup table (LUT) evaluation, modular reduction, noise cleaning, and ciphertext refreshing - into a single rigid pipeline. As a consequence, a generic LUT over an alphabet of size $t$ costs multiplicative depth proportional to $\log_2 t$ and consumes a large share of the modulus budget during a fixed bootstrapping procedure, invoked each time a LUT evaluation or modular reduction is needed. We show that this pipeline can be unbundled by changing the representation, rather than optimizing the bootstrapping, through block encodings. A finite-alphabet value is carried across several CKKS slots whose coordinates form a basis of functions on the alphabet, typically the characters of a finite abelian group. In such a basis, every LUT is an affine plaintext map evaluated in a single multiplicative level, with depth independent of $t$. Modular reduction comes for free: a block encoding cannot represent anything but a residue, so arithmetic modulo $t$ is native. Because the encoded values lie on the unit circle, noise growth is independent of the alphabet size $t$. In the worst case, it matches the noise growth of standard discrete CKKS on the smallest alphabet $\mathbb Z_2$, and in more typical workloads it is linear in the number of operations, exponentially better than discrete CKKS at every $t > 2$. Noise cleaning becomes a constant-depth procedure of at most four levels, because the alphabet-dependent part is an LUT and only a fixed-degree-3 smoothstep is nonlinear. Finally, since LUTs are no longer part of the bootstrapping, refreshing reverts to its classical role as a maintenance operation invoked only to regain multiplicative depth. Any CKKS bootstrapping can be used, rather than a constrained and expensive pipeline. We instantiate the framework with several block encodings that make modular addition, modular multiplication, xor or min/max possible with a single CKKS multiplication. We use them to build CRT arithmetic over large composite moduli, and finite-state prefix scans for radix addition and subtraction in depth $4 + \lceil\log_2 d\rceil$ and for equality and comparison in depth $3 + \lceil\log_2 d\rceil$ for $d$ radix digits. For example, a 256-bit CRT modular addition or multiplication consumes a single multiplicative level and has a latency of 4.7 ms on a single thread.