Mikael Olsson; Niklas Gullberg , pp. 67. COM/School of Computing, 2012.
Context. The cryptographically secure pseudo-random number generator Blum Blum Shub (BBS) is a simple algorithm with a strong security proof, however it requires very large numbers to be secure, which makes it computationally heavy. The Graphics Processing Unit (GPU) is a common vector processor originally dedicated to computer-game graphics, but has since been adapted to perform general-purpose computing. The GPU has a large potential for fast general-purpose parallel computing but due to its architecture it is difficult to adapt certain algorithms to utilise the full computational power of the GPU.
Objectives. The objective of this thesis was to investigate if an implementation of the BBS pseudo-random number generator algorithm on the GPU would be faster than a CPU implementation.
Methods. In this thesis, we modelled the performance of a multi-precision number system with different data types; to decide which data type should be used for a multi-precision number system implementation on the GPU. The multi-precision number system design was based on a positional number system. Because multi-precision numbers were used, conventional methods for arithmetic were not efficient or practical. Therefore, addition was performed by using Lazy Addition that allows larger carry values in order to limit the amount of carry propagation required to perform addition. Carry propagation was done by using a technique derived from a Kogge-Stone carry look-ahead adder. Single-precision multiplication was done using Dekker splits and multi-precision modular multiplication used Montgomery multiplication.
Results. Our results showed that using the floating-point data type would yield greater performance for a multi-precision number system on the GPU compared to using the integer data type. The performance results from our GPU bound BBS implementation was about 4 times slower than a CPU version implemented with the GNU Multiple Precision Arithmetic Library (GMP).
Conclusions. The conclusion made from this thesis, is that our GPU bound BBS implementation, is not a suitable alternative or replacement for the CPU bound implementation.