3D Gaussian Splatting

A beginner friendly introduction to 3D Gaussian Splats and tutorial on how to train them.

3D Gaussian Splatting is a new method for novel-view synthesis of scenes captured with a set of photos or videos. They are a class of Radiance Field methods (like NeRFs) but are simultaneously faster to train (at equal quality), faster to render, and reach better or similar quality. They are also easier to understand and to postprocess (more on that later). This is a beginner friendly introduction to 3D Gaussian Splats and how to train them.

What are 3D Gaussian Splats?

At a high level, 3D Gaussian splats, like NeRFs or photogrammetry methods, are a way to create a 3D scene using a set of 2D images. Practically, this means that all you need is a video or a set of photos of a scene, to obtain a 3D representation of it — enabling you to reshoot it, or render it from any angle.

Here's an example of a capture I made. As input, I used 750 images from a plush toy, that I recorded with my phone from different angles.

Once trained, the model is a pointcloud of 3D Gaussians. Here is the pointcloud visualized as simple points.

But what are 3D Gaussians? They are a generalization of 1D Gaussians (the bell curve) to 3D. Essentially they are ellipsoids in 3D space, with a center, a scale, a rotation, and "softened edges".

Each 3D Gaussian is optimized along with a (viewdependant) color and opacity. When blended together, here's the visualization of the full model, rendered from ANY angle. As you can see, 3D Gaussian Splatting captures extremely well the fuzzy and soft nature of the plush toy, something that photogrammetry-based methods struggle to do.

How to train your own models? (Tutorial)

Important: before starting, check the requirements (about your OS & GPU) to train 3D Gaussian Splats here. In particular, this will require a CUDA-ready GPU with 24 GB of VRAM.

Step 1: Record the scene

If you want to use the same model as me for testing (the plush toy), I have made all images, intermediate files and outputs available, so you can skip to step 2.

Recording the scene is one of the most important steps because that's what the model will be trained on. You can either record a video (and extract the frames afterwards) or take individual photos. Be sure to move around the scene, and to capture it from different angles. Generally, the more images you have, the better the model will be. A few tips to keep in mind to get the best results:

  • Avoid moving too fast, as it can cause blurry frames (which 3D Gaussian Splats will try to reproduce)
  • Try to aim for 200-1000 images. Less than 200 images will result in a low quality model, and more than 1000 images will take a long time to process in step 2.
  • Lock the exposure of your camera. If it's not consistent between frames, it will cause flickering in the final model.

Just for reference, I have recorded the plush toy using a turntable, and fixed camera. You can find cheap ones on Amazon, like here. But you can also record the scene just by moving around it.

Once you're done. Place your images in a folder called input, like this:

📦 $FOLDER_PATH ┣ 📂 input ┃ ┣ 📜 000000.jpg ┃ ┣ 📜 000001.jpg ┃ ┣ 📜 ...

Step 2: Obtain Camera poses

Obtaining camera poses is probably to most finicky step of the entire process, for inexperienced users. The goal is to obtain the position and orientation of the camera for each frame. This is called the camera pose. There are several ways to do so:

  • Use COLMAP. COLMAP is a free and open-source Structure-from-Motion (SfM) software. It will take your images as input, and output the camera poses. It comes with a GUI and is available on Windows, Mac, and Linux.
  • Use desktop softwares. These include RealityCapture, Metashape (commercial softwares).
  • Use mobile apps, including Polycam, Record3D. They take advantage of the LiDAR sensor on recent iPhones to obtain the camera poses. Unfortunately, only available on iOS with an iPhone 12 or newer.
Again, if you want to use the same model for testing, download the sample "sparse.zip" and skip to step 3.

Because it is free and open-source, we will show how to use COLMAP to obtain the camera poses.

First, install COLMAP: follow the instructions of the official installation guide.

From now on, we suggest two ways to obtain the camera poses: with an automated script, or manually with the GUI.

Download the code from the official repo. Make sure to clone it recursively to get the submodules, like this:

git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive

Then run the following script:

python convert.py -s $FOLDER_PATH

This will automatically run COLMAP and extract the camera poses for you. Be patient as this can take a few minutes to a few hours depending on the number of images. The camera poses will be saved in a folder sparse and undistored images in a folder images.

To visualize the camera poses, you can open the COLMAP GUI. On linux, you can run colmap gui in a terminal. On Windows and Mac, you can open the COLMAPapplication.

Then select File > Import model and choose the path to the folder $FOLDER_PATH/sparse/0.

The folder structure of your model dataset should now look like this:

📦 $FOLDER_PATH ┣ 📂 (input) ┣ 📂 (distorted) ┣ 📂 images ┣ 📂 sparse ┃ ┣ 📂 0 ┃ ┃ ┣ 📜 points3D.bin ┃ ┃ ┣ 📜 images.bin ┃ ┃ ┗ 📜 cameras.bin

Step 3: Train the 3D Gaussian Splatting model

If you want to visualize my model, simply download the sample "output.zip" and skip to step 4.

If not already done, download the code from the official repo. Make sure to clone it recursively to get the submodules, like this:

git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive

Installation is extremely easy as the codebase has almost no dependencies. Just follow the instructions in the README. If you already have a Python environment with PyTorch, you can simply run:

pip install plyfile tqdm pip install submodules/diff-gaussian-rasterization pip install submodules/simple-knn

Once installed, you can train the model by running:

python train.py -s $FOLDER_PATH -m $FOLDER_PATH/output

Since my scene has white background, I'm adding the -w option. This will tell the training script that the base background color should be white (instead of black by default).

python train.py -s $FOLDER_PATH -m $FOLDER_PATH/output -w

This will save the model in the $FOLDER_PATH/output folder.

The entire training (30,000 steps) will take about 30-40 minutes, but an intermediate model will be saved after 7,000 steps which is already great. You can visualize that model right away, by following step 4.

Step 4: Visualize the model

The folder structure of your model dataset should now look like this:

📦 $FOLDER_PATH ┣ 📂 images ┣ 📂 sparse ┣ 📂 output ┃ ┣ 📜 cameras.json ┃ ┣ 📜 cfg_args ┃ ┗ 📜 input.ply ┃ ┣ 📂 point_cloud ┃ ┃ ┣ 📂 iteration_7000 ┃ ┃ ┃ ┗ 📜 point_cloud.ply ┃ ┃ ┣ 📂 iteration_30000 ┃ ┃ ┃ ┗ 📜 point_cloud.ply
  • If you're on Windows, download the pre-build binaries for the visualizer here.

  • On Ubuntu 22.04, you can build the visualizer yourself by running:

    # Dependencies sudo apt install -y libglew-dev libassimp-dev libboost-all-dev libgtk-3-dev libopencv-dev libglfw3-dev libavdevice-dev libavcodec-dev libeigen3-dev libxxf86vm-dev libembree-dev # Project setup cd SIBR_viewers cmake -Bbuild . -DCMAKE_BUILD_TYPE=Release # add -G Ninja to build faster cmake --build build -j24 --target install

Once installed, find the SIBR_gaussianViewer_app binary and run it with the path to the model as argument:

SIBR_gaussianViewer_app -m $FOLDER_PATH/output

You get a beautiful visualizer of your trained model! Make sure to select Trackball mode for a better interactive experience.

Any remaining questions? Don't hesitate to ask them in the comments here: