Arun Patro

benchmaring rust-vs-cpp for graphics

Introduction

The graphics class at NYU was a great motivation for me to play with compiled languages like C++ and Rust. Here, I write 3 simple graphics programs in cpp (class default) and rewrite them in rust. I try to to replicate similar code structure, fp precision, types. But there remain obv differences in the programming philosophy like OOP vs Struct-Impl-Trait/Functional. This benchmark is purely on a CPU - i7-1068NG7. GPUs involve Shading Languages and more thoughtful logic. I would also like to extend these studies to modern graphics tech like webgpu, vulcan, metal-3 and also target the wasm architecture.

Tasks

We deal with 3 tasks:

  1. Ray Tracing
  2. Ray Tracing with BVH accelaration
  3. Rasterization on CPU

TLDR

Rust is faster than C++


1> Ray Tracing

What is Ray Tracing?

Ray Tracing involves simulating light rays and their interactions with objects, to model how it may appear in real life. The trick here is to start the ray as originating from a camera onto the screen which hits objects thereby determining the color.

This is very intensive computation which is usually CPU bound. It may be parallelized, because each pixel value is independent of others. 

We render the following image which consists of 7 spheres, 1 plane and 7 lights. Objects are of lambert, specular and reflective materials, which determine their color calculation.

Results:

compile time runtime 1600x800 runtime 2400x1200 binary size
C++ 6.077s 3.110s 6.70 +- 0.46s 109KB
C++ (G++ LTO) - - 4.60 +- 0.49s 70KB
Rust 1m 23s 1.567s 2.778s 1.5MB
Rust (LTO) 54.227s 3.101s 6.914s 760KB
Rust (Thin LTO) 1m 15s 2.056s 4.170s 1.4MB

2> Ray Tracing with BVH accelaration

What is Bounding Volume Heirarchy (BVH)?

Complex geometries can be represented as a collection of primitive elements like triangles. When the scene has many objects, intersecting a ray with all the objects can be really intensive. It can help to recursively partition the space or the object into heirarchies, and only recursively search the partition the ray intersects. This can help in acheiving a logarithmic computation time. 

BVH is datastructure for acceleration that subdivides the objects (instead of space). We create a Axis-Aligned-Bounding-Box (AABB) for each object and group them together using a bottom-up approach. 

We render the following dragon which consists 856294 triangles. For this task, we slightly modify the code and assume assume all lights are visible, ignoring shadow rays. This assumption simplifies this task. (Note: One may need to additionally comment 2 LOC in rust)

Results:

compile time runtime 1600x800 runtime 2400x1200 binary size
C++ 6.077s 3.110s 6.70 +- 0.46s 109KB
C++ (G++ LTO) - - 4.60 +- 0.49s 70KB
Rust 1m 23s 1.567s 2.778s 1.5MB
Rust (LTO) 54.227s 3.101s 6.914s 760KB
Rust (Thin LTO) 1m 15s 2.056s 4.170s 1.4MB

3> Rasterization on CPU

What is Rasterization?

Ray tracing can be computationally intense. A trick to improve is to project the objects onto the camera plane. In this process we discretize the vector objects into pixels. This task is highly parallelizable, and can leverage GPUs. Rasterization is quite an involved pipeline and consists of Vertex, Geometry, Fragment and Blending Shaders. Each of these shaders are restrictive in what they do and hence are very  efficiently parallelizable. 

The above dragon is re-rendered much faster!

Results:

compile time runtime 1600x800 runtime 2400x1200 binary size
C++ 6.077s 3.110s 6.70 +- 0.46s 109KB
C++ (G++ LTO) - - 4.60 +- 0.49s 70KB
Rust 1m 23s 1.567s 2.778s 1.5MB
Rust (LTO) 54.227s 3.101s 6.914s 760KB
Rust (Thin LTO) 1m 15s 2.056s 4.170s 1.4MB

Observations

  1. rust is faster!
  2. cpp compiles faster
  3. cpp produces a smaller binary
  4. link-time-optimization (LTO) in rust is faster to compile than rust without LTO ??
  5. Rust with LTO is slower than without LTO ??
  6. How can thin LTO be faster than fat LTO ??

Additional Comments