Sparse sets
A sparse set is a simple data structure that has a few interesting properties:

O(1) to add an item.

O(1) to remove an item.

O(1) to lookup an item.

O(1) to clear the set.

O(n) to iterate over the set.

A set does not require its internal items storage to be initialised upon creation (!).
Sparse sets are commonly used to implement Entity Component System architectural pattern. I plan to cover ECS in one of my future posts but for now let’s try to understand and implement sparse sets.
Overview
Sparse sets use two integer arrays internally: dense
and sparse
.
The former is a packed array that stores the set’s items (integers) in the insertion order.
The latter is an array that can have holes (hence the name – sparse
) and it maps the set’s items to their indices in dense
.
The set also keeps track of how many items it has.
We call it N
.
Sparse sets could be implemented as growable, i.e. being able to reallocate the memory they use but I will use nongrowable version in this post for simplicity. This means a user has to specify the size of the set upon creation.
/** An item stored in a sparse set. */
typedef uint32_t rho_ss_id;
/** Maximum possible value of the rho_ss_id. */
static const rho_ss_id rho_ss_id_max = UINT32_MAX  1;
/** The sparse set. */
struct rho_ss {
rho_ss_id *sparse; /**< Sparse array used to speedoptimise the set. */
rho_ss_id *dense; /**< Dense array that stores the set's items. */
rho_ss_id n; /**< Number of items in the dense array. */
rho_ss_id max; /**< Maximum allowed item ID. */
};
An empty set
We start with an empty set that can have up to max
items.
The next picture demonstrates the sparse set that can hold up to ten items.
N
is zero, dense
and sparse
are allocated but not initialised.
struct rho_ss rho_ss_alloc( rho_ss_id max_id )
{
assert( max_id > 0 && max_id <= rho_ss_id_max );
size_t array_size = sizeof( rho_ss_id ) * ( max_id + 1 );
struct rho_ss ss = { 0 };
ss.dense = malloc( array_size );
ss.sparse = malloc( array_size );
if ( !ss.dense  !ss.sparse ) {
free( ss.dense );
free( ss.sparse );
ss.dense = NULL;
ss.sparse = NULL;
} else {
ss.max = max_id;
}
return ss;
}
Adding a first item
Let’s add 4
as a first item to the set.
First, we add the item to the dense
at index 0
(this is the current value of N
).
Then we write N
to the sparse
array at index 4
.
Now both slots in the dense
and sparse
arrays point to each other.
Lastly, we increase N
by one.
Adding a second item
Let’s add 6
as a second item.
The steps are the same.
We put 6
to the next free slot of the dense
array, write N
to the sparse
at index 6
, and increase N
by one.
The sparse
array has holes, while the dense
array items are placed next to each other.
Adding a third item
Now we add 0
as a third item.
There is nothing new here.
The item we added is 0
so we put its dense
index (which is 2
) to sparse
at index 0
.
And this is how we can implement it:
void rho_ss_add( struct rho_ss *ss, rho_ss_id id )
{
assert( ss );
assert( id <= ss>max );
if ( rho_ss_has( ss, id ) ) {
return;
}
ss>dense[ss>n] = id;
ss>sparse[id] = ss>n;
++ss>n;
}
The rho_ss_has
function checks if the given item is in the sparse set.
We implement it in the next section.
Checking whether an item is in the set
Let’s test if 6
is in the set.
To do so we go to the sparse
array and check its value at index 6
.
The value is 1
.
Now we use this value as an index in the dense
array.
The dense
has value 6
at this index, which means 'yes, the set contains item six'.
Now let’s check if 9
is in the set.
We go to the sparse
array and check its value at index 9
.
The value is garbage (we haven’t initialised the memory, remember?) and can be anything, let’s say it is X
.
If X >= N
then X
is out of bounds index for the dense
array and 9
is not in the set.
Suppose X
is less than N
.
In our case it should be either of 0
, 1
, or 2
because N
is 3
.
We check if the dense
array’s value at index X
is 9
.
It is not and it cannot be, so the answer is 'no, the set does not contain item nine'.
Now we can implement the operation:
bool rho_ss_has( struct rho_ss *ss, rho_ss_id id )
{
assert( ss );
assert( id <= ss>max );
rho_ss_id dense_index = ss>sparse[id];
return dense_index < ss>n && ss>dense[dense_index] == id;
}
Removing an item
Let’s remove 6
from the set.
First, we replace 6
with the last item from the dense
array, which is 0
.
Then we have to update the spare
array at index 0
with the new index of 0
in the dense
array, which is 1
.
And then reduce N
by one.
In other words to remove an item from the set, we replace it with the last item in the dense
array and update the corresponding sparse
array’s slot to point to the new location.
The implementation is as follows:
void rho_ss_remove( struct rho_ss *ss, rho_ss_id id )
{
assert( ss );
assert( id <= ss>max );
if ( rho_ss_has( ss, id ) ) {
ss>n;
rho_ss_id dense_index = ss>sparse[id];
rho_ss_id item = ss>dense[ss>n];
ss>dense[dense_index] = item;
ss>sparse[item] = dense_index;
}
}
Iterating over the set
This one is super simple.
You just iterate over the dense
array.
Like this (highlighted lines):
struct rho_ss ss = rho_ss_alloc( 10 );
rho_ss_add( &ss, 4 );
rho_ss_add( &ss, 6 );
rho_ss_add( &ss, 0 );
for ( rho_ss_id i = 0; i < ss.n; ++i ) {
printf( "%d ", ss.dense[i] );
}
printf( "%s", "\n" );
rho_ss_free( &ss );
Clearing the set
Just set N
to zero:
void rho_ss_clear( struct rho_ss *ss )
{
assert( ss );
ss>n = 0;
}
Like I said before, most of the sparse set operations are super fast.
What’s next?
You could play with my implementation of the sparse set or (and this is preffered way) implement it on your own.