Skip to main content

CPU Mining

Mangonote employs a fair PoW mining algorithm which is efficient on consumer CPUs.

Mangonote uses the RandomX mining algorithm to archive CPU mining and ASIC/FPGA resistance. A description of the algorithm is below.

RandomX

RandomX is a proof-of-work (PoW) algorithm that is optimized for general-purpose CPUs. RandomX uses random code execution (hence the name) together with several memory-hard techniques to minimize the efficiency advantage of specialized hardware.

RandomX utilizes a virtual machine that executes programs in a special instruction set that consists of integer math, floating point math and branches. These programs can be translated into the CPU's native machine code on the fly (example: program.asm). At the end, the outputs of the executed programs are consolidated into a 256-bit result using a cryptographic hashing function (Blake2b).

RandomX can operate in two main modes with different memory requirements:

  • Fast mode - requires 2080 MiB of shared memory.
  • Light mode - requires only 256 MiB of shared memory, but runs significantly slower

Both modes are interchangeable as they give the same results. The fast mode is suitable for "mining", while the light mode is expected to be used only for proof verification.

To minimize the performance advantage of specialized hardware, a proof of work (PoW) algorithm must achieve device binding by targeting specific features of existing general-purpose hardware. This is a complex task because we have to target a large class of devices with different architectures from different manufacturers.

There are two distinct classes of general processing devices: central processing units (CPUs) and graphics processing units (GPUs). RandomX targets CPUs for the following reasons:

  • CPUs, being less specialized devices, are more prevalent and widely accessible. A CPU-bound algorithm is more egalitarian and allows more participants to join the network.
  • A large common subset of native hardware instructions exists among different CPU architectures. The same cannot be said about GPUs. For example, there is no common integer multiplication instruction for NVIDIA and AMD GPUs.
  • All major CPU instruction sets are well documented with multiple open source compilers available. In comparison, GPU instruction sets are usually proprietary and may require vendor specific closed-source drivers for maximum performance.

Design considerations

The most basic idea of a CPU-bound proof of work is that the "work" must be dynamic. This takes advantage of the fact that CPUs accept two kinds of inputs: data (the main input) and code (which specifies what to perform with the data).

Conversely, typical cryptographic hashing functions [3] do not represent suitable work for the CPU because their only input is data, while the sequence of operations is fixed and can be performed more efficiently by a specialized integrated circuit.

Dynamic proof of work

A dynamic proof of work algorithm can generally consist of the following 4 steps:

Generate a random program. Translate it into the native machine code of the CPU. Execute the program. Transform the output of the program into a cryptographically secure value.

The actual 'useful' CPU-bound work is performed in step 3, so the algorithm must be tuned to minimize the overhead of the remaining steps.

Generating a random program

Early attempts at a dynamic proof of work design were based on generating a program in a high-level language, such as C or Javascript. However, this is very inefficient for two main reasons:

  • High level languages have a complex syntax, so generating a valid program is relatively slow since it requires the creation of an abstract syntax tree (ASL).
  • Once the source code of the program is generated, the compiler will generally parse the textual representation back into the ASL, which makes the whole process of generating source code redundant.

The fastest way to generate a random program is to use a logic-less generator - simply filling a buffer with random data. This of course requires designing a syntaxless programming language (or instruction set) in which all random bit strings represent valid programs.

Translating the program into machine code

This step is inevitable because we don't want to limit the algorithm to a specific CPU architecture. In order to generate machine code as fast as possible, we need our instruction set to be as close to native hardware as possible, while still generic enough to support different architectures. There is not enough time for expensive optimizations during code compilation.

Executing the program

The actual program execution should utilize as many CPU components as possible. Some of the features that should be utilized in the program are:

  • multi-level caches (L1, L2, L3)
  • μop cache
  • arithmetic logic unit (ALU)
  • floating point unit (FPU)
  • memory controller
  • instruction level parallelism
    • superscalar execution
    • out-of-order execution
    • speculative execution
    • register renaming

Calculating the final result

Blake2b is a cryptographically secure hashing function that was specifically designed to be fast in software, especially on modern 64-bit processors, where it's around three times faster than SHA-3 and can run at a speed of around 3 clock cycles per byte of input. This function is an ideal candidate to be used in a CPU-friendly proof of work.

For processing larger amounts of data in a cryptographically secure way, the Advanced Encryption Standard (AES) can provide the fastest processing speed because many modern CPUs support hardware acceleration of these operations.