Benchmarking zkVMs on Metamath Proof Checking

We evaluate the performance of zkVMs to validate mathematical proofs of on-chain transactions

Benchmarking zkVMs on Metamath Proof Checking

In a previous post, we introduced zkVMs as virtual machines that rely on zero-knowledge proofs to enable verifiable computation. We established their importance for building secure and efficient on-chain applications, allowing developers to prove the correctness of program execution without exposing sensitive information.

In this post, we’ll explore the zkVM ecosystem, dissecting some of the most prominent zkVMs, their tradeoffs, and how they contribute to implementing a core component of our proof-of-proof protocol.

Exploring zkVMs

The zkVM ecosystem has grown significantly in recent years, and this has introduced a variety of systems with unique designs and tradeoffs. As part of implementing our proof-of-proof protocol, we conducted an in-depth exploration of these zkVMs.

The goal of this research was twofold: to evaluate their performance against the tasks our protocol demands and to understand how different approaches to zero-knowledge computation compare in practice.

Through this analysis, we focused on seven prominent zkVMs, each bringing something unique to the table:

  • Risc Zero - The original RISC-V based zkVM. It uses a STARK proving system, and features GPU support. Risc Zero has multiple modes for creating proofs depending on verifier requirements, which we investigated separately:
    • Composite: Breaks computation into segments, producing a STARK for each segment.
    • Succinct: Reduces a composite proof into a single, more efficient STARK.
  • Jolt -  A zkVM created by a16z that centers its design around cryptographic lookup tables.
  • zkWasm - A zkVM designed for the web assembly binary format.  zkWasm utilizes a Halo2 circuit to enable verifiable computation.
  • Cairo - Unlike the others, Cairo is a domain-specific language from StarkWare, optimized for cryptographic computation rather than general-purpose programming.
  • Nexus -  A zkVM supporting the RISC-V architecture, but using the Hypernova proof system to differentiate itself.
  • SP1 - Developed by Succinct Labs, SP1 is another RISC-V zkVM leveraging Plonky3 cryptographic tooling. Like Risc Zero, it features two proof modes which are Core and Compressed.
  • Lurk - A domain-specific ZK language rooted in LISP. Lurk is designed with recursion in mind, offering a unique approach to verifiable computation.

Our Application: A Metamath Proof Checker

To encode mathematical proofs, our proof-of-proof system uses a formal system called Metamath. This system is flexible enough to represent any claim, while being simple enough that a Rust implementation of the checker is only several hundred lines of code.

We implemented checkers for Metamath in all of the zkVMs we tried, and ran them on a variety of Metamath files. These included simple files, (for example, hol_wov.mm proves a basic application of modus ponens), as well as files representing specific tasks from runs of our Proof of Proof system, like verifications of ERC20 transfers implemented in IMP and Solidity. We ran these checker codes using open source provers, following the recommendations of teams involved regarding which provers would be easiest to install and run on our system.

Comparison of zkVMs

Our overview of all the data we collected, along with the code that produced it, can be found in Pi Squared’s zk-benchmark repository.

Below, we present a comparative analysis of prover times for some of the most performant zkVMs we inspected and tested on various Metamath files.

Important Notes on the Data

There are a few important caveats in discussing results. The following are some of them:

  1. Programming models: Most zkVMs tested here use our Rust implementation of the Metamath checker. However, the Cairo and Lurk systems operate with their own domain-specific languages (DSLs). These differences mean the comparisons are not always one-to-one. Detailed documentation of these DSL-based zkVMs can be found here and here.
  2. Prover variations: For some zkVMs, multiple versions were tested. For instance, we compared CPU and GPU-based provers, and those that compress proofs cryptographically or produce larger proofs more quickly. Compression impacts both prover and verification time.
  3. Cairo deprecation: The Cairo prover tested here is based on an open-source prover from Lambdaworks. The latest prover from StarkWare, Stwo, is still a work in progress, so we opted for this prover as an alternative, but because this prover is now deprecated, the associated numbers may be outdated.
  4. zkVM evolution: The zkVM space evolves rapidly, with new provers frequently offering improved performance. Therefore, the benchmarks presented here may not remain valid for long. We encourage PRs to our benchmarking repository that add new versions of provers to the mix!

Experiment Design

We conducted two main experiments:

  1. Basic logic files: These are simple Metamath files encoding basic logical proofs.
  2. ERC20 transfer components: These represent segments of an ERC20 transaction executed within Pi Squared's Proof-of-Proof system.

Results

Note that some data is missing in cases where the prover ran out of memory or otherwise could not complete the proof of the benchmark file within the 15-minute timeout window we set. To see the full CSV data for this table, you can refer to the zk-benchmark-repository.

(The relevant files are hol_idi.mm, hol_wov.mm, hol_ax13.mm, hol_cbvf.mm, 45.erc20transfer_success_tm_0_6.mm, 25.erc20transfer_success_tm_0_9.mm, 3.erc20transfer_success_tm_0.mm, and 9.erc20transfer_success.mm)

The four smallest files are demonstrations of simple logical facts within Metamath. The latter four files are segments of an execution of an EVM transaction from Pi Squared’s own proof generation framework.

Bulk Experiments on Metamath Files

In addition, we ran bulk tests on 1225 Metamath files generated by our system. These results provide further insights into the performance of zkVMs under various conditions.

We summarize results for these tests in the following charts:

Input Size vs Prover Time: This chart illustrates prover time plotted against input size across all tested zkVMs, along with a linear trendline computed for each zkVM.

Metamath Size vs ZK Proof Time: This chart focuses exclusively on GPU-based zkVMs, demonstrating how they handle Metamath files of varying complexity.

Takeaways and Lessons

Through this process, we’ve gained valuable insights into the capabilities and limitations of current zkVMs:

  1. GPU acceleration matters: When supported, GPU acceleration significantly improves prover performance, particularly for larger or more complex files. This makes it a critical consideration for applications requiring scalability.
  2. Memory constraints: Many zkVMs have strict memory limits by design. For memory-intensive applications, it’s essential to choose a system that can handle your workload without compromising efficiency or reliability.
  3. Development environment and optimization: Leveraging a familiar programming environment (like Rust in our case) can greatly enhance the ability to fine-tune performance. Familiarity with the ecosystem often translates into improved productivity and optimized outcomes.

Building the Future with zkVMs

One of the best things about the zkVM space is that the software is getting faster all the time! Thanks to the work these teams put in, users can expect their code to perform better with every update to the stack and application-specific optimization. We are looking forward to the ZK space continuing to improve as refinements to both the cryptography and the applications programmed on top of it accumulate over time.

At Pi Squared, we’re excited to be part of this journey. We envision a world where users of our Universal Settlement Layer can come to the space with their own trust levels in existing zkVM systems, consider tradeoffs between performance, and ultimately decide what is best for them.

Ready to dive into zkVMs? Check out our zk-benchmark repository to explore the data behind this post, or head to our docs to learn more about how we’re driving interoperability and innovation in the zkVM space.