All posts

I Asked An AI To Fix My Code. It Went On A 1 Hour Existential Crisis

--

Introduction

I was trying to parallelize HNSW graph construction in C++ (more on that here). The idea seemed reasonable at the time. The insertion logic is well defined, and I thought with some careful locking around the graph mutations, it should be possible to make multiple threads build it faster.


At some point I brought in Claude Sonnet 4.6 Thinking to help debug things when it started breaking. That turned into a much longer session than I expected (~1 hour 😭). Well, I knew it got stuck after a couple of minutes, but since Cursor is request-based and I had some extra "month-end" requests left, I just let it run to see what it would do.


It ended up going through around 15 million tokens, and despite a lot of changes and fairly detailed reasoning at each step, it never actually got to a correct solution. This is roughly the core logic I was working with for insertion :

 1void HnswCPU::add(const vector<float>& embedding) {
 2    int level = sampleLevel();
 3    int id = currentId++;
 4
 5    embeddings.insert(embeddings.end(), embedding.begin(), embedding.end());
 6
 7    int offset = neighbors.size();
 8    neighbors.resize(offset + M0 + level * M, -1);
 9    nodes.emplace_back(level, offset);
10
11    if (entryPoint == -1) {
12        entryPoint = id;
13        maxLevel = level;
14        return;
15    }
16
17    int curr = entryPoint;
18
19    for (int l = maxLevel; l > level; l--)
20        curr = searchLayer(embedding.data(), curr, 1, l)[0];
21
22    for (int l = std::min(level, maxLevel); l >= 0; l--) {
23        auto candidates =
24            searchLayer(embedding.data(), curr, efConstruction, l);
25
26        auto selected =
27            selectNeighborsWithHeuristic(
28                embedding.data(),
29                candidates,
30                getNeighborCount(l),
31                l
32            );
33
34        int* nbr = getNeighborPtr(id, l);
35
36        for (size_t i = 0; i < selected.size(); i++)
37            nbr[i] = selected[i];
38
39        for (int other : selected) {
40            int* onbr = getNeighborPtr(other, l);
41
42            for (int i = 0; i < getNeighborCount(l); i++) {
43                if (onbr[i] < 0) {
44                    onbr[i] = id;
45                    break;
46                }
47            }
48        }
49    }
50
51    if (level > maxLevel) {
52        entryPoint = id;
53        maxLevel = level;
54    }
55}

The parallel version followed the same structure, just with multiple threads calling into this logic and some locks added around shared state. That is where things started to fall apart. The following sections are basically me watching the AI fail in real time.

SegFaults ...

Initially the issues looked pretty standard. There were segfaults during the benchmark runs, which pointed to invalid memory access somewhere in the graph. The first few fixes were around things like container growth and pointer stability. At one point it suspected deque reallocation, then switched to thinking it was a dangling pointer, then iterator invalidation. Each of those led to small refactors, and each time it seemed plausible.

Oh Wait, Maybe It's A Concurrency Problem ...

When that did not fix things, the focus shifted to concurrency. The diagnosis became that one thread was iterating over a node’s neighbor list while another thread was modifying it. The proposed fix was to add locks around neighbor access and, in some places, copy the neighbor list under a lock before iterating over it.


That got rid of the segfaults, but introduced deadlocks. The locking order was inconsistent, and once multiple nodes were involved, threads started waiting on each other in cycles. Fixing that required restructuring parts of the code to avoid holding one lock while trying to acquire another.

Okay, It Has To Be Multithreading

After that, the program stopped crashing but became extremely slow. Multiple threads were running at full CPU, but the work was not progressing. The issue there was heavy lock contention. The search phase visits a large number of nodes, and if every neighbor access requires taking a mutex, the threads spend most of their time waiting rather than doing useful work.

You Know What, Let's Remove Locks

At that point the approach changed again. Instead of locking reads, the idea was to make reads effectively lock-free by pre-allocating enough space in the neighbor lists so that they would never reallocate, and then allowing concurrent reads and writes to the same underlying buffers. The reasoning was that even if readers saw slightly stale or inconsistent data, it would still be safe because the memory itself would remain valid.


This is where things got more subtle. Some of the reasoning sounded correct in isolation, like avoiding reallocation to keep pointers stable. But operations like clearing and rewriting neighbor lists break those assumptions in ways that are not obvious at first. You can end up reading partially updated state, and even if individual integer writes are atomic, the structure as a whole is not consistent.


There was also a separate issue with how the entry point and maximum level were stored. They were kept in separate atomic variables, which meant one thread could observe a new maximum level but an old entry point. That leads to accessing levels that do not exist for that node, which explains some of the crashes. Fixing that improved stability, but it still did not solve the overall problem.

Now, This Is On Me (The Human)

After going through all of this, the main realization was that the approach itself was flawed. HNSW insertion is not just a local operation.


Each insertion depends on the current state of the graph, and it actively mutates that state in ways that affect subsequent insertions. Trying to run multiple insertions in parallel without a very carefully designed synchronization strategy leads to either incorrect graphs, crashes, or severe performance issues.


In other words, the problem was not that the locks were wrong or that the data structures were slightly off. The problem was that I was trying to parallelize something that is not naturally parallel in this form.

A Silver Lining ... Cope, Ik 😭

What stood out to me was not just that it failed, but how it failed. The model kept finding reasonable explanations for each individual issue and proposing fixes that made sense locally. But it took a looong time to step back and question whether the overall approach was viable in the first place.


By the end of it, the code was more complex, harder to reason about, and still incorrect. The simplest version, the single-threaded one, was the only one that actually worked reliably.

Conclusion

Looking back, this was less about a bug and more about chasing the wrong idea for too long. The AI did what it is good at. It kept finding local problems and fixing them one by one. Each step made sense in isolation, and honestly, I would have probably gone down a similar path myself. The difference is I would have given up much earlier.


What it never really did, at least not until very late, was question whether the whole thing should exist in the first place. Parallelizing HNSW insertion like this sounds reasonable, but once you get into the details, everything fights you. The graph is constantly changing, every insertion depends on the current structure, and small inconsistencies are enough to break either correctness or performance.


So you end up in this loop of fixing symptoms instead of the cause. Segfault, fix it. Deadlock, fix it. Slow, fix it. And somehow each “fix” just moves the problem somewhere else. If there is one takeaway from this, it is that not every problem is meant to be solved the way you initially frame it. Sometimes the right move is not better locking or cleaner code, it is stepping back and asking whether the approach itself makes sense.


Also, if an AI starts confidently rewriting half your system for an hour straight, there is a good chance you are both lost 😭😭

More like this

Comments