Barnes-Hut N-Body Simulation in CUDA

15-418/618 Spring 2026 Final Project
Andrew Koulogeorge

Home | Midterm | Final Report | GitHub


Summary

I am going to implement the Barnes-Hut N-body simulation algorithm from scratch in CUDA and do detailed profiling and analysis to understand its behavior and optimize its performance through several iterative implementations. I will evaluate performance across different problem sizes, initial body configurations, and approximation thresholds.

Background

The Barnes-Hut algorithm estimates pairwise gravitational forces between a collection of N bodies by approximating groups of bodies sufficiently far away as a single body with a corresponding center of mass. This reduces the complexity of the algorithm from O(N²) to O(N log N).

The input data to the algorithm is a collection of points in 3D space. The main computational loop includes: (1) constructing an octree where each node represents a region of space, (2) computing the center of mass for each node in the tree, and (3) traversing the tree to compute the gravitational force on each particle. Barnes Hut Overview

Learning Goal

Gain experience in performance engineering and analysis in a setting outside a problem set with standard performance benchmarks. This project is very similar in flavor to one of the 3 programming labs during the semester. With a longer timeline and less pressure to hit a fixed performance threshold, I am excited to gain experience with profiling tools and obtain deep insight into various aspects of this parallel program.

The Challenge

The structure of the tree is data-dependent and changes across time steps, creating several parallelism challenges (non exhaustive list):

Resources

I will use the NVIDIA GPUs available on the GHC cluster machines. I am starting the implementation from scratch. I plan to reference various online resources and papers related to Barnes-Hut implementations in CUDA, but not simply reimplement them. I want to explore my own ideas for optimization based on my own measurments. Barnes-Hut on Wikipedia Barnes-Hut on GPU

Goals and Deliverables

Plan to Achieve: Implement a sequential Barnes-Hut algorithm in C++. Implement a naive CUDA version and at least two optimized CUDA version driven by profiling data. Evaluate how each implementation scales across: number of bodies, initial spatial configuration, and the theta threshold for force approximation. Compare my best implementation against published state-of-the-art GPU Barnes-Hut implementations.

Hope to Achieve: Analyze performance of CUDA code on a different GPU generations hypothesize about causes of performance differences based on architectural details.

Poster Deliverable: Present a comparison of my implementations against a sequential baseline with analysis of the measurements and profiling data that motivated each optimization. Included detailed analysis of my best implementation. Include comparison against existing state-of-the-art implementations.

Schedule


Milestone Report

Coming soon.


Final Report

Coming soon.