Devlog: Wave Function Collapse #1: Implementation, Float Equality

2023-06-03

(Images used as WFC input were taken from https://github.com/mxgmn/WaveFunctionCollapse, which contains more detailed attributions.)

The initial implementation of 2D overlapping pattern model WFC is finished!

*Angular.png, N=3 (left: input, right: output)*

The key word is initial - there are no extra features, eg. no rotations or reflections are gathered from the input. And it is painfully slow as I haven’t put much effort into optimizing my code. But it does work and seems to produce sensible, wrapping outputs.

API

Some time was spent deciding what the C API should be. My demo program is an SDL application, but if you look at SDL_Surface and SDL_PixelFormat you’ll see mentions of pixel byte sizes, padding between rows (pitch), and even a BitsPerPixel field which isn’t always a multiple of 8 (making it unclear if a simple memcmp can effectively compare pixel values). Instead of natively supporting all of these properties, I eventually settled on this simple API that expects 32-bit pixel values:

int wfc_generate(
    int n,
    int srcW, int srcH, const uint32_t *src,
    int dstW, int dstH, uint32_t *dst);

To get from an image file to the output surface with SDL, the code would look something like this (error handling and cleanup omitted):

SDL_Surface *loadSurface = IMG_Load(path);
SDL_Surface *srcSurface = SDL_ConvertSurfaceFormat(loadSurface,
    SDL_PIXELFORMAT_RGBA32, 0);

// RGBA32 likely doesn't need locking.
assert(SDL_MUSTLOCK(srcSurface) == 0);

void *dstPixels = malloc(dstW * dstH * 4);
wfc_generate(
    n,
    srcSurface->w, srcSurface->h, srcSurface->pixels,
    dstW, dstH, dstPixels);

SDL_Surface *dstSurface = SDL_CreateRGBSurfaceWithFormatFrom(
    dstPixels, dstW, dstH,
    32, dstW * 4, srcSurface->format->format);

If you are using stb_image.h this should be simpler. Notice that size arguments are ints instead of size_t - there are good reasons why.

Steps

Dynamically resizable arrays are a very useful data structure, but I decided to delay using them as much as possible. The current implementation allocates all needed memory at the start and then uses it throughout.

An example - the first part of WFC is about analyzing the input image for patterns. I wrote a two-part O(srcW^2 * srcH^2 * n^2) procedure for this - each part goes through all the source image positions and checks if the pattern starting at that position appears for the first time so far. It checks this by going through all the previous patterns and comparing to the current one. In the first part, it just counts the total number of patterns, then it allocates an array for pattern data, then, in the second part, it iterates again and fills in the array. The approach is suboptimal, but it was easy to write and only has one call to malloc. No vector<>s or hash maps were involved.

Just for the sake of it, I created a little visualization of how frequent each NxN pattern is:

(Brighter spots mean higher frequency, encoded independently between images. Each pixel is the top-left corner of a NxN pattern. Patterns wrap around image edges.)

The propagation step is, unsurprisingly, the one with the greatest time complexity. Propagation starts from the most recently obeserved point and constraints the set of patterns of its neighbours to only the ones that wouldn’t cause an immediate contradiction with the observed point. It then ripples recursively from the neighbours that got their set of patterns reduced, stopping when no new points are being constrained further. It ripples in all directions, so we may end up visiting the same point several times.

I saw that some WFC implementations use a queue of points that they insert into and remove from. Since I don’t use resizable arrays, I instead decided to iterate through the entire matrix and unconditionally resolve constraints, then repeat until one iteration through the matrix introduces no new changes. This code would have had a total of 9 nested loops! (Iterate until no new changes: for each wave point: for each of its neighbours: for each pair of their patterns: for each pixel slot in their pattern overlap: check if pattern pixels are equal or not.)

Obviously, it was very slow. I improved it by keeping a matrix of what points were recently constrained and only propagating constraints to their neighbours. Matrix elements kept two bits of information - one for which points’ neighbours need to be verified in the current propagation iteration, and one with the same info, but for the next iteration. The code flips their roles on odd/even iterations, double buffer style. I still continuously iterate through the entire wave matrix over and over, but I perform much fewer propagations between pairs of neighbours. This improves the algorithm from really, painfully, stupidly slow to just really, really slow.

There were a few more fine points along the way, the one I’d mention is float comparisons. There are various articles online about it, to this one I gave the most attention. However, its final proposed code uses unions for type punning - this is fine in C, but UB in C++ (I want my code to work on both C and C++). memcpy can be used for type punning without incurring nasty strict aliasing bugs, so I changed the code to:

// WARNING: Assumes IEEE 754 representation of float!
int approxEq(float a, float b) {
    const int ulpsDiff = ...;

    int32_t ia, ib;
    memcpy(&ia, &a, sizeof(a));
    memcpy(&ib, &b, sizeof(b));

    if (ia < 0) ia = 0x80000000 - ia;
    if (ib < 0) ib = 0x80000000 - ib;

    return abs(ia - ib) < ulpsDiff;
}

However, it assumes IEEE 754 representation of float. I abandoned this approach and went for a simpler one:

int approxEq(float a, float b) {
    const float absDiff = 0.001f;
    const float relDiff = FLT_EPSILON;

    if (fabsf(a - b) < absDiff) return 1;

    if (fabsf(a) < fabsf(b)) return fabsf((a - b) / b) < relDiff;
    return fabsf((a - b) / a) < relDiff;
}

Thoughts on C

I have a tad more C experience now, so a few new thoughts on it. Overall, I’m still enjoying it, but I wanted to mention a few things.

C has fewer language features, so I spent less time considering the best way to write code since often there was only one possible approach. On the other hand, I couldn’t use templates for my multi-dimensional array code (utility code is REALLY important when doing linear algebra!^{though, this is more just storing values and indexing them in a particular way and can hardly be called linear algebra, but still}), so I created a bunch of macros instead. It’s somewhat less ergonomic overall.

I like how C forces programmers to think about memory - I ended up having very few calls to malloc and free and they were mostly centralized in a single function.

One thing I foresee might be an issue as the codebase grows is that pointers in API boundaries carry so little meaning with them. What you want to know when you see a pointer function argument or return value is:

Is NULL an expected value?
Is this an array pointer or does it point to a single value?
Does it hold ownership?

There are ways of providing some of that information, eg. with the [static ...] syntax (though some argue against using it). Or you can simply write a comment explaining the semantics. But it’s overall not nearly as type-safe as what C++ offers.

I mentioned how I did error-checking in the previous post. I changed it a bit to only have a single cleanup goto label. This makes the code shorter and I don’t have to come up with a new name every time I call a function that can fail. (I am of the opinion that the real benefit of lambdas and tuples in programming languages is that you don’t have to invent names for them.)

Lastly, I was regularly running my code with ASan and UBSan. They did have several findings, but honestly not as many as the C-always-segfaults memes would lead you to expect. I couldn’t use MSan or valgrind since they were giving false positives on SDL’s code. I plan to write standalone tests in which I will integrate various sanitizers.

What comes next?

The code is very slow. My plan is to first write a few tests to be able to easily check for correctness and UB. Then, create some benchmarks and start optimizing this mess.

After that, I would add new features, the first one will probably be adding rotated and reflected patterns to be able to create more interesting images. And speaking of images…