The Biggest Identity Sandpiles and How to Compute Them

When I wrote Beautiful Abelian Sandpiles I wanted to show off some nice images of large identity sandpiles. But the simple algorithm I used was horrendously slow. Showing a sandpile identity that was larger than 100 by 100 took multiple seconds! That's not good enough! I became obsessed with trying to find a faster way, all in an effort to compute bigger and bigger sandpile identities, bigger than anything anyone had seen before. In the end, I did exactly that.

Before I discuss anything else, I want to show off this gigantic sandpile identity that is 16384 by 16384. Here it is in all its glory! Feel free to zoom in and explore all the hidden detail. Explore in fullscreen to get the best experience

This is coloured with the matplotlib inferno colourmap, so black, purple, orange, yellow correspond to 0, 1, 2, 3, grains of sand respectively. The sandpile identity was computed on an AMD Ryzen 7 4800H, and it took just under an hour, slightly beating the 10 days required to compute this 10 000 by 10 000 example which, prior to this, was the largest sandpile identity that I knew of. But how do we compute sandpiles quickly? Let's first get an overview of the current algorithms that exist and understand why they work!

Current Methods

There are essentially two different methods for computing an identity. The first is the method that I used in Beautiful Abelian Sandpiles which is given in pseudocode below:

def identity(width, height):
    sixes = [[6 for _ in range(width)]
                for _ in range(height)
            ]
    sixes_stabilized = stabilize(sixes)
    difference = [[6 - sixes_stabilized[j][i] 
                   for i in range(width)]
                   for j in range(height)
                 ]
    return stabilize(difference)

I will refer to this as the Difference method. The stabilize function topples all cells until all cells are below 4. Note that creating a 2D array of 6s and subtracting two 2D arrays is fast. So the time to compute stabilize twice determines the speed of this method.

The second method iteratively builds up the identity by adding sand around the edges in a special configuration called the Burning Configuration. Again, the pseudocode for this method is given below:

def add_burning_config(grid, height, width):
    for i in range(height):
        grid[i][0] += 1
        grid[i][-1] += 1

    for i in range(width):
        grid[0][i] += 1
        grid[-1][i] += 1

def identity(width, height):
    grid = [[0 for _ in range(width)]
               for _ in range(height)
           ]

    while True:
        old_grid = clone(grid)
        add_burning_config(grid, height, width)
        stabilize(grid)
        if grids_equal(grid, old_grid):
            return grid

Since this method repeatedly adds the special burning configuration to the grid, I will refer to this method as the Iterated Burning method. In my experience, using this method directly is quite slow compared to the difference method, but it is more flexible which will be important later.

There are some other methods that I am aware of, but they are essentially modifications to either of the above methods. So if we can understand both the difference method and the iterated burning method, then we will have a good grasp of how to compute the sandpile identity.

Analysis

Before we begin, let's state some definitions.

Sandpile: For a particular graph \(G = (V, E)\), a sandpile configuration \(S: V \to \mathbb{N}_0\) is a mapping from each vertex of the graph to the number of grains of sand on that vertex. For the majority of this post \(G\) is the square grid and I will refer to a vertex in the square grid as a grid cell. Since \(|V| = n\) is finite, I will use \(S_i\) to mean the number of grains of sand on the \(i^{th}\) vertex. I will use \(\mathbb{N}^n\) to refer to the set of all sandpiles.

Stable / Unstable: A vertex \(v\) is stable in a sandpile \(S\) if \(S(v) < \text{deg}(v)\). In the square grid \(\text{deg}(v) = 4\) everywhere. This includes the edges and corners. The vertices at the edges and corners are considered to be connected to special vertices called sinks, which capture how sand "falls off" the edge. A sandpile is stable if all the vertices are stable, except sinks. I will denote the set of all stable sandpiles as \(\mathcal{S}\).

Stabilization: The process of taking an unstable sandpile and toppling the unstable vertices until there are no unstable vertices left. This process is guaranteed to terminate and toppling order is irrelevant. I will write the stabilization of \(S\) as \(\mathrm{stab}(S)\).

Recurrence: A sandpile \(S\) is recurrent if it can be reached from all other sandpiles via addition and stabilization.

\[ S\;\text{is recurrent} \iff \forall c \in \mathcal{S}: \exists z \in \mathcal{S}: \mathrm{stab}(c+z) = S \]

I will denote the set of recurrent sandpiles by \(\mathcal{R}\). The above definition of recurrence can be difficult to work with, so I'll state another without proof.

\[ S\;\text{is recurrent} \iff \exists z \in \mathcal{S}: \mathrm{stab}(c_{\text{max}}+z) = S \]

Where \(c_{\text{max}}\) is the sandpile that has three grains of sand in every cell. It is the maximal stable sandpile and it is trivially recurrent.

I will also state two simple lemmas about recurrence that we will use later:

\[ \forall i: S_i \ge 3 \implies \mathrm{stab}(S) \in \mathcal{R} \]

Which states that the stabilization of a sufficiently large sandpile is always recurrent. And

\[ S \in \mathcal{R} \land z \in \mathbb{N}^n \implies \mathrm{stab}(S + z) \in \mathcal{R} \]

Which states that a recurrent state plus anything is still recurrent after stabilization. In essence, this means that once you've entered the set \(\mathcal{R}\), you can never leave it via just addition and stabilization.

For sandpiles \(S_1, S_2, S_3 \in \mathcal{R}\), the following rules hold:

\[ \begin{align} & \text{Commutativity:} & \mathrm{stab}(S_1 + S_2) = \mathrm{stab}(S_2 + S_1) \\ & \text{Associativity:} & \mathrm{stab}\big(\mathrm{stab}(S_1 + S_2) + S_3\big) = \mathrm{stab}\big(S_1 + \mathrm{stab}(S_2 + S_3)\big) \\ & \text{Identity:} & \exists I \in \mathcal{R}: \mathrm{stab}(S_1 + I) = \mathrm{stab}(I + S_1) = S_1 \\ & \text{Inverse:} & \exists S_1^{-1} \in \mathcal{R}: \mathrm{stab}(S_1 + S_1^{-1}) = \mathrm{stab}(S_1^{-1} + S_1) = I \end{align} \]

Note that the associativity rule follows from stabilization being idempotent and that the order of stabilization is irrelevant. We can actually state a slightly more useful version of the associative property for sandpiles.

\[ \mathrm{stab}\big(\mathrm{stab}(S_1 + S_2) + S_3\big) = \mathrm{stab}(S_1 + S_2 + S_3) \]

These are the group laws together with commutativity which mean that \(\mathcal{R}\) is an abelian group with respect to addition and stabilization. If we want to understand this abelian group, we can analyze its isomorphisms and anything we learn can be transferred back to our sandpiles. In particular, I want to associate each recurrent sandpile with a lattice over \(\mathbb{Z}^n\) and then study our identity calculation algorithms through the lens of these lattices.

To do this, we will turn our attention to the toppling operation and re-describe it with linear algebra. For the moment, we will ignore whether a particular topple is legal or not. We will describe a sandpile as a vector \(s\) and we will create a vector of topples \(t\) where \(t_i\) is the number of times we will topple cell \(s_i\). We can describe toppling with a matrix \(\Delta L\). In the literature, \(\Delta L\) is typically called the "Reduced Laplacian".

\[ \Delta L_{i,j} = \begin{cases} & \text{-4} & \text{if}\, i = j, \\ & \text{1} & \text{if}\, i, j \,\text{are adjacent}, \\ & \text{0} & \text{otherwise}. \end{cases} \]

Keep in mind that \(s\) and \(t\) are vectors that represent 2D grids.

Let's consider an example of a simple 2 by 2 sandpile \(s\) that is filled with all 6s, and we topple the top left and bottom right grid cells to get a new sandpile \(\hat{s}\). We map the grid to a vector in row-major order. This can now be described as follows:

\[ \begin{align} \hat{s} &= s + \Delta L \, t \\ &= \begin{bmatrix} 6 \\ 6 \\ 6 \\ 6 \end{bmatrix} + \begin{bmatrix} -4 & 1 & 1 & 0 \\ 1 & -4 & 0 & 1 \\ 1 & 0 & -4 & 1 \\ 0 & 1 & 1 & -4 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} 6 \\ 6 \\ 6 \\ 6 \end{bmatrix} + \begin{bmatrix} -4 \\ 2 \\ 2 \\ -4 \end{bmatrix} \\ &= \begin{bmatrix} 2 \\ 8 \\ 8 \\ 2 \end{bmatrix} \end{align} \]

The vector \(t\) is sometimes called the odometer. We can now ask some interesting questions about the equation \(\hat{s} = s + \Delta L \, t\).

What values of \(t\) cause \(\hat{s}\) to be stable?
What values of \(t\) cause \(\hat{s}\) to be recurrent?
How many such values of \(t\) are there, if any?
What happens if we consider all integer values of \(t\) simultaneously?

To answer the first question we can find a stable value for \(\hat{s}\) by iteratively incrementing \(t_i\) wherever \(\hat{s}_i \ge 4\). This is what stabilize is doing. Doing so gives the following solution:

\[ \begin{align} \hat{s} &= \begin{bmatrix} 6 \\ 6 \\ 6 \\ 6 \end{bmatrix} - \begin{bmatrix} -4 & 1 & 1 & 0 \\ 1 & -4 & 0 & 1 \\ 1 & 0 & -4 & 1 \\ 0 & 1 & 1 & -4 \end{bmatrix} \begin{bmatrix} 2 \\ 2 \\ 2 \\ 2 \end{bmatrix} \\ &= \begin{bmatrix} 6 \\ 6 \\ 6 \\ 6 \end{bmatrix} - \begin{bmatrix} -4 \\ -4 \\ -4 \\ -4 \end{bmatrix} \\ &= \begin{bmatrix} 2 \\ 2 \\ 2 \\ 2 \end{bmatrix} \end{align} \]

Notice that we can always find a stable sandpile in this way and so:

\[ \exists t^\star : \mathrm{stab}(s) = s + \Delta L \, t^\star \]

However, there may be other \(t\)s that give stable sandpiles. If we keep incrementing entries in \(t\), then some entries of \(\hat{s}\) will go negative. However, if we temporarily allow some entries to have negative values, then we can create another stable \(\hat{s}\). If \(\forall i: t_i = 3\) then \(\forall i: \hat{s}_i = 0\), which is another stable sandpile. Note that while the all 2s sandpile is recurrent, the all 0s sandpile is not recurrent.

This brings me to the third question, which has a somewhat surprising answer that I will state without proof. For any initial \(s\) there exists exactly one \(t\) such that \(\hat{s}\) is both stable and recurrent, i.e. only one \(\hat{s}\) corresponds a sandpile in the abelian sandpile group.

Therefore, instead of dealing with sandpiles in the abelian sandpile group directly, we can instead think of all solutions to \(\hat{s} = s + \Delta L t\) for some given \(s\). The benefit is that the solutions for \(\hat{s}\) form a lattice over \(\mathbb{Z}^n\) and it is easier to think about a lattice over integers than it is to think about the sandpile group. Another benefit is that we can now characterize a single recurrent sandpile by any \(s \in \mathbb{Z}^n\) by associating it with a lattice which contains only one recurrent sandpile.

What is especially of interest to us right now is the identity of the sandpile group. So let us consider what happens when we study the lattice formed from \(s = 0\). As stated above, there exists a unique \(t'\) such that \(\hat{s} = 0 + \Delta L t'\) is stable and recurrent, i.e. a member of the sandpile group. Let's call this element \(I\). Let \(g\) be another element of the sandpile group. Then:

\[ \begin{align} g + I &= \mathrm{stab}\big(g + (0 + \Delta L t' )\big) \\ &= \mathrm{stab}(g + \Delta L t' ) \\ &= g + \Delta L t' + \Delta L t^\star \\ &= g + \Delta L ( t' + t^\star ) \end{align} \]

Recall that a recurrent sandpile plus anything must be recurrent. This means that

\[ g + I = g + \Delta L (t' + t^\star ) \]

must be recurrent. But, since there is a unique recurrent sandpile in each lattice and \(g\) is already recurrent, then we must conclude that \( t' = -t^\star \) and \( g + I = g \). This shows that \(I\) must be the identity of the sandpile group. The usefulness of our lattices immediately becomes clear because we can now study the identity by studying the lattice generated by \(s = 0\)! From here on I'll refer to this lattice as the zero-lattice.

You may have noticed that the calculation above never uses the fact that \(I\) is recurrent. It is only important that \(g\) is recurrent. So we can also consider \(B = 0 - \Delta L 1 \notin \mathcal{R}\), which acts like an identity on \(g\) without being the identity. This \(B\) is precisely the burning configuration which is used in the Iterated Burning method.

Pathfinding on the Zero-lattice

We are now ready to reinterpret the Difference method and the Iterated Burning method in terms of lattices. In their essence, both algorithms must solve two problems:

Find a point on the zero-lattice.
Move around on the lattice to find the unique recurrent sandpile.

Since the unique recurrent sandpile on the zero-lattice is the identity sandpile, an algorithm that can solve these two problems is an algorithm for computing a sandpile identity.

Difference Method

Let's first consider how the Difference method solves these two problems. Recall that the difference method starts with \(\forall i: s_i = 6\). It then computes the identity with the following formula:

\[ I = \mathrm{stab}( \underbrace{s - \mathrm{stab}( s )}_{\text{finds a large point on the zero-lattice}} ) \]

Let's rewrite this using our \(\Delta L\) matrix. The first and second stabilizations can be described with toppling vectors \(t_1\) and \(t_2\) respectively.

\[ \begin{align} I &= s - ( s + \Delta L t_1 ) + \Delta L t_2 \\ &= 0 + \Delta L ( t_2 - t_1 ) \end{align} \]

We can see that \(s\) cancels with itself, which puts us on the zero lattice.

I will also claim that this solution is recurrent, so problem 2 is solved. To see why this is the case recall that a sandpile \(S\) is recurrent if

\[ \forall i: S_i \ge 3 \implies \mathrm{stab}(S) \in \mathcal{R} \]

Since the starting sandpile \(s\) is 6 everywhere and since stabilization ensures that no cell is greater than 3, then

\[ \begin{align} s_i - \mathrm{stab}(s)_i &= 6 - \mathrm{stab}(s)_i \\ &\ge 6 - 3 \\ &\ge 3 \end{align} \]

Which shows that we meet the requirements for recurrence after the second stabilization. So problem two is also solved and we have the sandpile identity.

Iterated Burning Method

For the Iterated Burning method the problem of finding a point on the zero lattice is trivial because we start with 0 in all grid cells. Therefore, we only need to focus on the second problem of ensuring that the algorithm eventually terminates with a recurrent sandpile.

Recall that \(B = -\Delta L 1\) is the burning configuration. Let \(s\) be a sandpile. Then iterated burning method stops when the following condition is met:

\[ \mathrm{stab}(s + B) = s \]

We need to show that when this condition is met, then \(s\) must be recurrent. Again, we will use the argument that we can force \(s\) to be sufficiently large so that each cell is 3 or greater.

First, notice that we can apply the stopping condition to itself.

\[ \begin{align} \mathrm{stab}\big(\mathrm{stab}(s + B) + B\big) &= \mathrm{stab}(s + B) \\ \underbrace{\mathrm{stab}(s + 2B)}_{\text{associativity}} &= s \end{align} \]

We can repeat this argument to get

\[ \forall k \in \mathbb{N} : \mathrm{stab}(s + kB) = s \]

We can new let \(k \gg 0\) and partially stabilize such that we move sand around until all cells have at least 3 grains of sand. Since this gives us a configuration which has at least 3 grains of sand everywhere, its stabilization must be recurrent, but we also know that its stabilization is \(s\). So we must have that \(s\) is recurrent, as desired.

We still haven't proven if this algorithm terminates. Since there are finitely many stable sandpiles, it is sufficient to show that the algorithm never gets stuck in a loop that doesn't satisfy the termination condition. Assume for the sake of contradiction that such a loop exists. Since the termination condition is not met, there must be at least two distinct sandpiles in the loop. Let these two sandpiles be \(s_1\) and \(s_2\) and let the size of the loop be \(l\). Then we have

\[ \begin{align} \mathrm{stab}(s_1 + lB) &= s_1 \\ \mathrm{stab}(s_2 + lB) &= s_2 \end{align} \]

From where we reuse our argument from above to prove that both \(s_1\) and \(s_2\) are recurrent. But since \(s_1\) and \(s_2\) are in the same lattice, this contradicts the fact that there is a unique recurrent sandpile in each lattice. Hence, such a loop cannot exist; the Iterated Burning method must eventually terminate and correctly return the sandpile identity.

Building an Intuition

Using our new intuitions gained from our lattice interpretation, we can imagine ways to improve these methods:

We could try different methods for finding a point on the zero-lattice. Ideally we want a cheap way of finding a point that is already close to the identity.
Once we're on the zero-lattice, we could try different methods to move faster towards the identity.

Let's quickly consider the second point. In the original iterated burning method, we always add \(B\), but it's clear that our correctness proofs would still work with any multiple of \(B\). We could even use different multiples at each step. Taking inspiration from the exponential search algorithm we could double the multiple at each step. This gives a new method:

\[ \begin{align} s_0 &= 0 \\ s_{i+1} &= \mathrm{stab}(s_i + 2^i B) \end{align} \]

But we can actually achieve the same effect in a simpler way by just doubling \(s_i\).

\[ \begin{align} s_0 &= B \\ s_{i+1} &= \mathrm{stab}(2 s_i) \end{align} \]

The termination condition is the same as for the iterated burning method. Let's call this the Exponential Burning method. Simple benchmarking shows that this is much faster than the Iterated Burning method, but not significantly faster than the Difference method.

But we can hope for more speed-ups by experimenting with different methods to find a point on the zero-lattice. We could build a set of heuristics to guess what the identity should be and use that as a starting point. Since the sandpile identities appear similar even at vastly different scales, we could imagine computing a small identity exactly and then scaling it up to the desired size and use that as the starting point. However, correctness of the resulting algorithm requires that our guess must be on the zero-lattice, so even thought we might have a guess that is close to the true identity, the scaled identity is not guaranteed to be on the zero-lattice. So this algorithm would not be correct.

But this idea is not hopeless. If instead of estimating the sandpile identity directly, we try to estimate its odometer \(t\), then we can easily ensure that we are on the zero-lattice. Recall that for the sandpile identity \(I\) there exists some \(t \in \mathbb{Z}^n\) such that \(I = \Delta L t\). In this case, any guess \(t' \in \mathbb{Z}^n\) would be valid. This gives us a lot of freedom in choosing our heuristics.

This avenue has already been explored with success by C. Fish in A GPU approach to the Abelian sandpile model. Here the author notes that despite the complexity of the identity sandpile, the odometer appears smooth by comparison. The author then approximates that odometer by a polynomial function with experimentally determined coefficients. The author calls this the Surface method. Using the polynomial estimate of \(t'\) as a starting point on the zero-lattice, the author was able to achieve about an order of magnitude speedup over the tested sizes. This shows the size of the impact we can hope to have by improving the initial starting point.

Benchmarking

Before moving on with more theory, let's take a more practical detour and see just how fast or slow are these methods?

Comparison of identity calculation methods

I run each method on grid sizes from 64 by 64 to 1024 by 1024 in increments of 64. For each method, I calculate the identity 3 times and take the average. Since each method depends on being able to stabilize a sandpile, I've spent some effort to give each method an optimized stabilize method which uses OpenMP and SIMD for parallelism.

Notice that iterated burning is quite slow compared to other methods. As we suggested earlier, adding multiples of \(B\) means that we converge to the identity sandpile faster, but the improvement is not significant enough to make this method competitive with the other methods. However, we do see that the exponential burning method is significantly faster than iterated burning and it even out competes the difference method on larger grids, but not by much.

It is surprising that these methods don't slow down monotonically as the grids get larger, but this is caused by an optimization to the stabilize method which can only be applied when the grid width is divisible by 128. This optimization is impactful enough to overcome the additional work to compute a slightly larger identity sandpile causing the compute time to decrease slightly. There's no theoretical reason why I couldn't apply this optimization to all grid sizes, but I am too lazy to deal with edge cases.

If we then ignore these alternating decreases in runtime, we can see that the time to compute the identity sandpile for the methods we have discussed is increasing rapidly, with the notable exception of My Method which computes the final 1024 by 1024 grid in an average of 645ms. It's worth noting that my implementation of the surface method is slower than what is reported by C. Fish, but that is likely because their implementation is designed for the GPU. The largest known identity sandpile, up to this point, was computed with the difference method by Null Program (Chris Wellons) and it was reported to take 10 days! With my level of patience (low), neither the difference method nor the exponential burning method would work for computing a larger sandpile identity.

Luckily, my method, which builds on the surface method by C. Fish, is fast enough for my level of patience. The only drawback is that it currently only works for grids sizes that are a power of 2. Again, this is not a theoretical restriction. I am simply not bothered to deal with the edge cases that occur when the grid size is not a power of 2.

My Method

We have now seen how much faster my method is compared to the ones discussed above, and we have explained enough of the theory that we can start to build a faster method. We will rely on 2 insights:

Approximating the odometer always gives a point on the zero-lattice.
The identity sandpile appears scale invariant. Meaning that a 256 by 256 identity sandpile looks similar to a scaled up 128 by 128 sandpile.

With these two insights in hand, we will take a drastic change of course and instead solve a partial differential equation.

\[ \nabla^2 u(x, y) = f(x, y) \]

Which can also be written as

\[ \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} = f(x, y) \]

This is the 2D Poisson equation. Unlike calculating large sandpile identites, there is a large and active body of research that aims to find fast methods to calculate numerical solutions to differential equations like the poisson equation. To my surprise, exact discrete solutions for \(u\) can be found in \(\mathcal{O}(N \log N)\) time using the fast fourier transform and there are iterative methods that converge quickly enough to compute solutions in \(\mathcal{O}(N)\) time. In particular, a technique called "Multigrid" finds solutions in linear time. If only we could make use of these refined methods for our interests!

On closer inspection, we can notice that the poisson equation and our sandpiles are closely related. Let us first consider how we would find a numerical solution to the poisson equation. It is necessary to discretize the poisson equation. I will use the second order finite difference.

\[ \begin{align} \frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2} \approx& &&\frac{u(x+h, y) - 2u(x, y) + u(x-h, y)}{h^2} \\ & && + \frac{u(x, y+h) - 2u(x, y) + u(x, y-h)}{h^2} \\ =& \frac{1}{h^2} \Big[& & u(x+h, y) + u(x-h, y) \\ & && + u(x, y+h) + u(x, y-h) - 4u(x, y) \quad\Big] \end{align} \]

We can rewrite this compactly as a matrix vector product

\[ \frac{1}{h^2} A u \]

where

\[ A_{i,j} = \begin{cases} & \text{-4} & \text{if}\, i = j, \\ & \text{1} & \text{if}\, i, j \,\text{are adjacent}, \\ & \text{0} & \text{otherwise}. \end{cases} \]

Notice that \(A = \Delta L\). This means that the discretized poisson equation is precisely the same as the equation that relates a sandpile to its odometer, \(S = \Delta L \, t\), in the case when \(h = 1\). Which gives us a way to use FFT (Fast Fourier Transform) or Multigrid to solve for \(t\) in either \(\mathcal{O}(N \log N)\) or \(\mathcal{O}(N)\) respectively.

Solving for the odometer, however, doesn't immediately give us a fast method for finding the sandpile identity. What it does give us, is a way to "project" any sandpile onto the zero-lattice. Recall that if \(t \in \mathbb{N}^n\) then \(\Delta L \, t\) is a sandpile that is on the zero-lattice. If we are given an arbitrary sandpile \(S\), we can then use Multigrid or FFT to solve \(S = \Delta L \, t_s\) where \(t_s \in \mathbb{R}^n\). We can then element-wise round \(t_s\) to get \(t_{s_0} \in \mathbb{N}^n\) and then \(S_0 = \Delta L \, t_{s_0}\) is guaranteed to be on the zero lattice. Since we don't need an exact solution, then if we use Multigrid we can effectively project \(S\) to the zero-lattice in \(\mathcal{O}(N)\) time. This means we can start with any guess we like for the sandpile identity and simply project it to the zero-lattice with Multigrid.

The next idea is that we will leverage the apparent scale invariance of the sandpile identity. Our initial guess for the \(n\) by \(n\) identity will be a scaled up version of the \(\frac{n}{2}\) by \(\frac{n}{2}\) identity. We can then project this initial guess onto the zero-lattice and then continue with the iterated burning method to converge to the true identity. This still requires us to have computed a smaller identity, but we can apply this method recursively and apply a difference method to resolve the base case.

The method is given as pseudocode below:

def identity(width, height):
    if width * height <= 2500:
        return identity_difference(width, height)

    small_identity = identity(width / 2, height / 2)
    upscaled_identity = upscale(small_identity, width, height)
    initial_guess = project(upscaled_identity)
    grid = initial_guess

    while True:
        old_grid = clone(grid)
        add_burning_config(grid, height, width)
        stabilize(grid)
        if grids_equal(grid, old_grid):
            return grid

def project(sandpile):
    odometer = multigrid_solve(sandpile)
    rounded_odometer = [round(v) for v in odometer]

    projected_sandpile = clone(sandpile)
    for i in range(len(sandpile)):
        for j in range(len(sandpile[i])):
            projected_sandpile[i][j] = -4 * rounded_odometer[i][j] +
                      (rounded_odometer[i+1][j] if i+1 < len(sandpile) else 0) +
                      (rounded_odometer[i-1][j] if i-1 >= 0 else 0) +
                      (rounded_odometer[i][j+1] if j+1 < len(sandpile[i]) else 0) +
                      (rounded_odometer[i][j-1] if j-1 >= 0 else 0)
    return projected_sandpile

For upscale, I used bilinear interpolation, but nearest neighbour also works well in my testing. For brevity, I will omit a description of multigrid_solve, but if you want to learn more about how this works then I can highly recommend the first few chapters of "A Multigrid Tutorial", which is what I used to understand the technique.

If you just want the executive summary, regular iterative methods (like Gauss-Seidel) can only propagate information between neighbouring cells in each iteration. This means that large grids will need many iterations to ensure that information can flow from one side to the other. Multigrid solves this by transforming the input grid into a pyramid of grids. Each layer of the pyramid reduces the resolution of the grid. Low resolution grids solve for low frequency errors and the high resolution original grid only needs to solve for high frequency errors. This means that information can propagate globally in each iteration.

Performance Tuning

As shown by the benchmark, this method is quite fast for smaller sandpiles, but it still slows down quickly as the size of the identity increases. The majority of the time is spent running stabilize. The performance of everything else is virtually irrelevant.

I tried many things to improve performance of this function, but I will only describe the tweaks that made a meaningful difference.

First, I started with an implementation of stabilize from Null Program (Chris Wellons). This uses AVX256 and OpenMP for parallelism and uses, as per his blog post, a "pull" based approach to parallelization. This is already orders of magnitude faster than the javascript implementation from my previous post, but when I profiled this implementation there were many l1d cache misses. The miss rate was sometimes as high as 10% on my hardware.

Focusing on the memory issue:

I capped thread counts to be equal to my core count.
I aligned all memory to a 64 byte boundary
I switched (against the advice of Null Program) to a "push" based approach to parallelism which lets me drop the grid copy. This halves the memory needed at the cost of requiring locks, but I can get away with using OpenMP atomic capture instead of mutexes, which performs well enough.

Since I had aligned the memory, I could also tweak the AVX256 instructions to use aligned loads instead of unaligned loads which increases throughput slightly. This might be slower on different hardware, because I now need to shuffle between SIMD lanes which is not free. On my hardware, however, it was faster than unaligned loads.

Additionally, linearly scanning over the whole grid to find unstable cells is very efficient when there are many unstable cells, but as the number of unstable cells decreases this becomes wasteful. The final few iterations of linear scan on a 16384 by 16384 grid can end up reading a quarter of billion cells to only perform a couple thousand topples. Switching implementation when the unstable cells become sparse avoids this problem. For the sparse case, I maintain a bounded stack of unstable cells. Toppling a cell adds its neighbours to the stack. If the stack grows too large, then bottom entries are dropped since they are likely to be stale. This implementation cannot make use of AVX instructions (as far as I know), but it can still be parallelized with OpenMP.

Finally, if the input to stabilization is symmetric, then the output is also symmetric. So we can get away with only stabilizing the top left corner and then mirroring the result. This is not valid when done naively, but if you tweak the toppling rules such that cells only lose 3 grains of sand if they are on the right or bottom edges then this gives the correct result after mirroring. The only drawback is that it only works on even grid sizes and since my method is recursive that means that all the sub-grids also need to be even, hence the method only works fully on grid sizes that are a power of 2. Additionally, for AVX to work reliably, I require the size of the top left quarter grid to be divisible by 64. This is what causes the non-monotonic performance characteristics in my benchmarks as exploitation of symmetry can only be done when the grid size is a multiple of 128. There are ways to apply the same trick to odd sized grids, but I am lazy. Obviously, working with a quarter of the data has a massive impact on performance since we can make better use of the cache and there is simply less work to do.

Overall, I was able to get about a 5x speedup over the stabilizer written by Null Program. Which is a nice boost, but I think that a 10x improvement might still be possible. All benchmarking was done with this improved stabilize function to give a fair comparison.

Can we go Faster?

Even after all this work, I can't help but feel that it should be possible to compute the identity much faster. But I don't have much more time that I can give to this, so I'll leave it for another time, or for another person. There are certainly still low-hanging fruit on the implementation side:

Using AVX512 instead of AVX256 if your CPU supports it.
Running these methods on the GPU.
Fine-tuning the Multigrid implementation to get better convergence rates.
Taking full advantage of the 8-fold symmetry of \(\Delta L\) instead of just the simple 4-fold symmetry

On the algorithmic side, there might be better ways to exploit the odometer than just using it for projection onto the zero-lattice. Perhaps other related sandpile models can give new ideas, particularly the harmonic model caught my eye, but I did not investigate it properly.

Always more to explore!

If you absolutely want to read the code I wrote for this and cannot be convinced otherwise, you can find it here. I make no claims of quality or readability. Read at your own risk.