`yshui --log-level=trace`

As the title suggests, this is a dump of my random thoughts. Well, that is the intention at least. I have just started so it is still pretty barren here.

Socials

GitHub: @yshui
Mastodon: @yshui@fosstodon.org

License: All articles and materials on this site, unless otherwise specified, are published under CC BY 4.0. Icons by Font Awesome, font by Open Sans.

AI and the existential question about language

As you are reading this, can you be sure that the mental states this sentence elicits in your brain do indeed approximate the ones in mine as I was writing it?

Perhaps, when we speak, our words carry vastly different meanings for each party, and we only appear to communicate because our misunderstandings just happen to make sense to each of us separately.

Natural human languages seem to only get meaning by relating to each other. A foreign language would make no sense to us, until we can relate it to a language that we already know. One famous example - the Rosetta Stone. The Voynich Manuscript, assuming it's not just a hoax and full of gibberish, has so far escaped comprehension because there is no existing, interpretable text that can be related to it.

How, then, were we able to learn our first languages? Well, we relate it to the physical world. We learned what "apple" means by seeing an apple, and what "sweet" means by biting into it.

So, here is the question: is all this necessary? In other words, can languages get meaning without being related to anything at all? Imagine someone who was deprived of all sensory inputs, and has only ever been fed language (speech or text) - would they be able to "understand" it?

I hope you can see the connection to AI now.

I could be convinced that the answer to the above question is yes by far less than what LLMs are able to do nowadays. And to me that is an absolutely mind-boggling result (that no one seems to be talking about). Does it mean our languages have some intrinsic statistical structure that creates meaning that is inherent to the language itself?

I don't know.

Games for Large Language Models to play against themselves

Danger

RETRACTION: Ideas expressed here are flawed! Article left here for future reference.

After some thought I realized I missed a very important third criteria, which made my games a lot less useful. Execrise for the reader: what important criteria is missing here?

So, self-play, that magical technique that allowed AlphaGo¹ to soar past the best human players. Does it also work for language models?

Recently, chain-of-thought (CoT) based reinforcement learning (RL) self-improvement has shown remarkable promise, and self-play feels to me like the logical next step. And unsurprisingly, many people thought the same. Various ideas have been proposed, and I have read a handful of them, yet I believe the ideas I have are still novel. Well, I am not an AI researcher, in fact, I am not a researcher at all. So I could easily have missed something, and all of these have already been done, which would be fantastic! Because then I wouldn't have to do all the hard work myself to figure out all these don't actually work XD.

OK, before going straight into the games, let me give you an overview first.

Problems with Self-Play in Language Models

Ultimately, Go is a simple game. The rules are relatively easy to learn, and when a game finishes, it's also easy to tell who has won and who has lost. And that's an important property for self-play, because it makes assigning the reward straightforward. This becomes tricky when designing games for language models.

It's very easy to make Language Models talk with themselves. But assigning a winner? That's difficult². There is a reason why CoT-based RL only boosted models' ability in areas such as math and logic - because it's relatively easy to tell when math is wrong. So maybe we can use something similar³ for self-play too? Maybe, but it's hard to imagine why playing against an opponent will make you better at math - you don't see players battling it out at Math Olympiad, after all.

In summary, two criteria have to be met for an LLM game:

It must be easy to judge. You don't want to hire a million human labelers to score the billion games played by LLMs one-by-one. That'd be cost-prohibitive and also, inhumane.
It must make sense as a game. Generally speaking, it can't be something you can do by yourself. (Game of Life, sadly, is not a game⁴).

The Games

I have two games. I said "games" in the title, but turns out I only have two. Here it goes:

The Jailbreak Game

If you are familiar with AI, you know what jailbreaking is. But to re-cap: AI companies try to stop users from using their AIs to do certain things (for "safety", apparently🙄), users find ways around that using word magic, the end.

So, the jailbreak game is exactly the same thing. One side plays the defender, and is instructed to follow a rule; the other side plays the attacker, who will try to make its opponent break its rule. To meet criteria (1), rules must be simple and easy to judge, such as keeping a secret phrase and not telling it to anyone, which can be judged simply by matching that secret phrase in the model's output.

The Persuasion Game

This is simple: one side tries to persuade the other side to take its viewpoint. For any viewpoint, a game is played for either side of that viewpoint. (i.e. one for "A is true", another for "A is not true"). I divide the viewpoints into two categories:

uncontroversial: These are indisputable facts, for example, earth is a globe, sun is powered by fusion, 2+2=4, etc. For these viewpoints, both players are updated after each game, with rewards determined based on the outcome and the factuality of the assigned view point.
controversial: Everything not in (1). These are opinions, or results from unsettled research, etc. (not going to list any examples...). For these, the persuader is rewarded if it can successfully convince its opponent. And it will be rewarded even more if it can do the same for the opposite viewpoint. The persuadee is not updated for this case.

Results?

Well, I don't have any. If someone has already done all these before, I'd love to know. Otherwise, give me GPU compute time and I figure it out somehow.

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Vision-Language Model Dialog Games for Self-Improvement

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

⁴

Zero-player game - Wikipedia

Fuzzing an X compositor

Background

Warning

X11 rant incoming.

I don't know if you know this, but X programming is not fun.

To start off, interactions with the X server are inherently racy. Let me give you an example. Say, you receive an event telling you that a window has been created. You are interested in what properties are set on new windows, so you send out a request for that. Because everything happens concurrently, when your request has arrived at the server, the window might have already been destroyed, and you get an error.

OK, this one doesn't sound too bad. You look at the error, deduce the window must have been destroyed, and move on. But this same problem applies to literally everything you do with the X server. And X has tens of different kinds of objects - some of which have very complex relationships with each other, and hundreds of ways to manipulate them. And every time you do something, you have to consider what you would need to do if anything changed due to a race condition, how you would detect such a change, how to tell race conditions and real errors apart... The list just goes on and on.

What's worse, is the libraries I need to use to interact with the server weren't really designed with these kinds of considerations in mind. For one thing, they encourage you to handle server messages out-of-order. Continue from the example above, with Xlib, the function that lets you get properties from a window is XGetProperty. What happens internally when it is called, is that it will send a GetProperty request, then it will block until a reply for that request is received, finally it returns the reply back to you. Blocking I/O aside, this doesn't look too bad. Until you realize, the reply might not be the immediate next thing you get from the server - any events received before the reply, will be skipped over. You will be processing the reply first, despite it actually coming after. What if it's an event telling you the property you were trying to get has changed? OK, you figured out what's the correct thing to do in this case, now you just need to figure out the rest of the hundreds of cases. What would be more logical is to handle all messages in the order they come, regardless if they are replies or events. But it is very difficult to do this with either of the two first-party libraries (Xlib and libxcb), if not outright impossible.

Hold on, you might say, for a concurrent program like this, surely there are some synchronization primitives I can use to make this easier? Well, yeah. X does have this global lock, which lets you block everyone else. Which is already a bad start. You know what's worse? It doesn't even work! Because of a bug, holding the global lock doesn't fully prevent the server state from changing! It's useless!

Does this sound bad enough? Well, there is more. picom, as you may know, is an X compositor, which puts your windows on your screen in a slightly more eye-candy fashion, which means it needs to know what windows there are. Seems like an easy enough task, surely I can just ask the X server for a list of all windows? Wrong, you have to query the server one window at a time. And remember, as you are doing this, windows are constantly coming and going, moving around in the window tree! That already sounds like a problem, but let's assume there is a way to do that. Now, you also have to monitor the window tree for any future changes - users will get confused if they open a new window and it doesn't show up. Is it possible to ask the server to send you events every time the window tree changes? Of course not! You can only ask for events from each individual window. If a new window is created and you haven't got a chance to enable events for it, you will have no idea if any new child window is created under it!

You think that's all? But there is still more! Do you know that the X server actively reuses its IDs? Yeah, if a window is destroyed, another window could be created with the same ID immediately after. So you see a window created, you send a request to fetch some information from it, what you get back could be from a completely different, unrelated window! How cool is that!?!?

Sorry, I was losing my mind a little bit. At some point, you just start to feel all this is just impossible. But I eventually managed to find a way. As you can guess, it takes a very complex algorithm to handle all the intricacies mentioned above. Which means the likelihood that I didn't make any mistakes, is practically zero. Unfortunately, as it normally is with any concurrent program, testing and debugging it is going to be extremely difficult. So how will I be able to make sure everything won't blow up after I ship this code?

Fuzzing!

Yeah, sure, fuzzing. Just throw every possible scenario at picom and see if it crashes, right? But it's not that simple. While it is possible to run picom with the X connection as its sole source of input, I can't just feed it random data through that connection. Generally speaking, we do trust the X server to not send us malformed data. If we really want to fuzz picom at this level, we need to convincingly mimic the behavior of the X server, which would be way too much work.

Here, what we want to test is the part of picom that replicates the window tree from the X server. So it would be much better if I can strip out this part of picom and test it separately. The code base picom inherited from compton isn't in a state where this is possible, but I need to implement the new tree replication algorithm anyway. This would be a great opportunity to refactor the code base to make it more compartmentalized.

Turning it inside-out

Here, there is an interesting design pattern I want to talk about. As I was making the tree replication code more independent, eventually I needed to design some kind of interface between the tree replication and the rest of picom. To see what I mean, consider the case where a new application window is added to the tree, the tree replication code needs to inform picom about this so it can set up this window for rendering. Doing it naively, it might look something like this:

void tree_handle_new_x_event(context, x_event) {
    // ...
    if (x_event is new window) {
	    // `context` holds necessary compositor states - we try to
	    // avoid global variables
        picom_setup_window(context, x_event.window);
    }
    // ...
}

Which is fine. But picom_setup_window would involve much of the code we aren't testing, so it must be stubbed out for fuzzing. And there could be many more cases like this.

This is one way to do it, and it does work. I don't like this, because I feel it's too easy for extra dependencies to creep in. And it's difficult to tell what external dependencies there are out of a glance. It's also annoying to carry a context argument everywhere, even in functions that don't use it directly - just because it transitively calls an external function.

The way I usually prefer, is turning the whole thing inside-out:

TreeActions tree_handle_new_x_event(x_event) {
	TreeActions return_value;
    // ...
    if (x_event is new window) {
	    return_value = TreeActions::SetupWindow;
    }
    // ...
    return return_value;
}

This way, the tree replication code can be entirely self-contained. The only input is through function arguments, and the only output is function return values. All the actions the caller needs to support can all be found in one place. And there is no infectious context parameter needed.

I don't know if this is an established pattern, or what it is called. (Please let me know if it has a name!) But since I learned about algebraic effects, I started to think this is just poor-programmer's algebraic effects. I hope you can see it too - TreeActions is the "effect" of tree_handle_new_x_event which the caller must handle. Except you can't do any of the nice things you can do with a real effect system :'(

Anyway, I won't say this is always the best approach. But I think this is definitely a design pattern worth considering for doing things like this.

Making the testing harness

After all the refactoring is done, and the tree replication code isolated, there are 4 X requests left¹ that we still need to (partially) support. Considering this number would likely be in the hundreds otherwise, this is not too bad. But this does mean I needed to re-implement (a tiny) part of the X server for the testing harness. I essentially needed to maintain a window tree - just the basic stuff.

But more importantly, the testing harness also has to model the concurrency of the X server. This is what we set out to test, after all. This is a bit more tricky, I had to simulate the incoming and outgoing message queues, and randomly interleaved message deliveries with all the other processing.

And that's it! Now we are ready to fuzz picom!

Results

I was expecting to see some bugs going in, but I didn't expect how many bugs I was actually going to get. One after another, the fuzzer uncovers race conditions I forgot to consider. Some of them were so complex, they took me quite some time to figure out - even with access to full traces of exactly what happened from my testing harness! Just imagine how hard it would be to debug these from a user's bug report - the mere thought makes me shudder. This just goes to show how difficult it is to do X programming correctly. Eventually, I managed to fix all the bugs found, though it required some significant design changes. Afterwards, the fuzzer ran for days without finding any new issues.

So I guess now I can say with reasonable confidence that picom's window tree code is bug free!

Conclusion

First of all, X11 sucks. This might make some people mad, but this is a fact. If you are wondering, yes, wayland does solve all of the problems mentioned above. If you see developers leaving X for wayland, things like this are the reasons why.

Fuzzing is an incredibly powerful tool for uncovering bugs. But only certain kinds of codebases are fuzzable. It is already an accepted fact that it is good practice to modularize and decouple your code. Now, you can add fuzzability to the long list of reasons why you should do that.

Besides fuzzing, I also looked into symbolic execution and model checking for this problem. Compared to fuzzing, I feel they are much less explored. Information on how to use them is more limited, and quality of the documentation for the few tools that exist is generally poor. While I managed to get the tools to work, they unfortunately didn't yield much useful results.

These are: QueryTree, ChangeWindowAttributes (for event mask changes only), GetWindowAttributes, and GetProperty (for WM_STATE).

Go debug your core dumps somewhere else

Have you ever had this happen to you? you caught your code crashing in CI, which gave you a core dump. So you downloaded this core dump and tried to debug it, but all you got was this:

(gdb) bt
#0  0x00005651f0f09c00 in ?? ()
#1  0x00005651f0ed774e in ?? ()
#2  0x00005651f0ee3ada in ?? ()
#3  0x00005651f0ee41e4 in ?? ()
#4  0x00007f2b654c124a in ?? ()
#5  0x0000000000000000 in ?? ()

:'(

If this is you, I have good news for you.

Um, actually

Oh I know what you are going to say. The reason I was getting this is because I didn't have all the shared library files on the machine I tried to debug it on, right? And what I need to do is to figure out what shared library files were loaded by the program, copy them back, and make gdb load them somehow.

To find out what libraries are loaded, I would need to attach gdb to the core file and list them (info proc mappings). Then, I would need to copy them while maintaining the relative directory structure (e.g. /usr/lib/libc.so needs to be copied to /tmp/coredump_storage/usr/lib/libc.so). And finally, when I load the core file, I should ask gdb to load libraries from a different path with a combination of set sysroot and/or set solib-search-path.

This sounds like a reasonable solution. But I don't want to manually do all these every time something crashes. Besides, not all CI platforms support connections to the CI machine, even when they do, they usually require the build to be run again with SSH enabled. By that time, the crash could be gone if it is not deterministic.

So, all these have to be automated. Maybe I could've made it work with a hacky shell script that parses text output from gdb, and use a .gdbinit script so I don't need to set sysroot manually every time. For a reasonable person, that might be good enough.

But I am not a reasonable person.

Here we go

So, this is how my thought process went: gdb knows what libraries are loaded, so that information must be stored in the core dump somewhere and I just need to write a program to find it. And if my program was able to find it, it can't take that much extra work to modify it to point to somewhere else.

yeah... about that.

So where is it actually stored?

A core dump file is actually just a normal ELF file. So the natural thing to do first is look at it with readelf. And I was excited to see this among the outputs:

  CORE                 0x0000331c       NT_FILE (mapped files)
    Page size: 4096
                 Start                 End         Page Offset
    0x00005603fa4fa000  0x00005603fa526000  0x0000000000000000
        /tmp/workspace/build/src/picom
    0x00005603fa526000  0x00005603fa5d4000  0x000000000000002c
        /tmp/workspace/build/src/picom
    0x00005603fa5d4000  0x00005603fa605000  0x00000000000000da
        /tmp/workspace/build/src/picom
    0x00005603fa605000  0x00005603fa614000  0x000000000000010a
        /tmp/workspace/build/src/picom
    0x00005603fa614000  0x00005603fa62e000  0x0000000000000119
        /tmp/workspace/build/src/picom
    0x00007f7b87d05000  0x00007f7b87d0a000  0x0000000000000000
        /usr/lib/x86_64-linux-gnu/libgpg-error.so.0.33.1
    0x00007f7b87d0a000  0x00007f7b87d20000  0x0000000000000005
        /usr/lib/x86_64-linux-gnu/libgpg-error.so.0.33.1
    0x00007f7b87d20000  0x00007f7b87d2b000  0x000000000000001b
        /usr/lib/x86_64-linux-gnu/libgpg-error.so.0.33.1
    0x00007f7b87d2b000  0x00007f7b87d2c000  0x0000000000000025
        /usr/lib/x86_64-linux-gnu/libgpg-error.so.0.33.1

...

That's a list of all the shared libraries and where in memory they were mapped! It couldn't be this easy, could it?

And as it turned out, no, it couldn't.

I wrote a program to parse the core dump and look for this NT_FILE component, copy those files, and modify the core dump so the paths would point to where I want them to be. I tried it, it did not work. Frustratingly, gdb is still trying to look for the library files where they originally were, for some reason.

I could have stopped here. I already have a program that could automatically copy the libraries for me, and doing a set sysroot in gdb really isn't that bad. But at this point, my curiosity was piqued, and I must find out what is actually going on.

Debugging the debugger

I look at the core dump file again with a hex editor, and indeed, there are still paths to the original library file scattered around. But unlike NT_FILE, this time there seems to be no structure to it. Those paths are just... there.

How was the debugger able to find them then? I tried to read the code, but as you would expect, gdb does not have the easiest code base to get into. lldb is a little better, but I still didn't know where to start.

So I attached a debugger, to the debugger. (I just think this is funny to say.)

I want to give lldb praise, for how amazingly detailed its logs are. I was barely able to get anything out of gdb, on the other hand lldb literally tells you about every single little thing it does. With the help of that, and a debugger, I was finally able to narrow it down.

Rendezvous with the dynamic linker

Now, we are going to take a little detour. You see, finding out what libraries are loaded isn't just a problem when you analyze a core dump. The debugger needs to be informed about that when they debug a live program too. There is no syscall for loading a library (there was one, long story), it's all done in user space by something called the dynamic linker, which just opens the file and maps it into memory. So how could the debugger know when this happens? It couldn't just set a breakpoint in the dynamic linker, right?

As it turned out, yeah it totally could. There is such a thing called the "dynamic linker rendezvous" struct, that is located in a predefined location in memory. In it, there is a field r_brk, which is the memory location where the debugger should put a breakpoint. The breakpoint is usually an empty function, which the linker calls every time it is about to load a library. Whenever that breakpoint is hit, the debugger knows a new library is loaded.

This feels like a hack, doesn't it? Well, when a hack becomes the standard, it is no longer a hack anymore.

This is fascinating, but how is this related to what we wanted to do? So, how does the debugger know what has just been loaded when the breakpoint is hit? The answer is that there is another field, r_map, in the rendezvous struct, which is a linked list of all the libraries currently loaded.

And that's exactly what we need.

Welcome back

OK, so now we know how to find loaded libraries in a live program, how does that help us debug a core dump?

Well you see, what is a core dump, but a complete dump of the program's memory at the point of crash. Which is to say the rendezvous struct is dumped too. And all the debugger has to do, is pretend the core dump is just another live program, and read the r_map linked list from its "memory".

And all we have to do, is to expand the program's "memory" with a copy of this linked list, but with all the paths rewritten with the ones we want, then point the rendezvous struct to the linked list we just created.

Conclusion

Voilà! We've done it. I tested this with gdb and lldb, and it works. I now have a little tool that automatically copies shared libraries from a core dump, as well as updates the core dump file to look up these libraries from their new paths. Now I can debug core dumps on another machine without worrying about setting sysroot! How cool is that?

Is this all worth it. To be honest, probably not. But at least I have learned how the dynamic linker talks with the debugger. And now you have too!

I found an 8 years old Xorg bug

Let me set the right expectations first. This bug I found is actually not that complicated, it's very straightforward once you see what's going on. But I still think the process it took me to uncover this bug could be interesting. It's also kind of interesting that a simple bug stayed undiscovered for so long. I will speculate why that is later. Now let's start.

The big X server lock

To give you some background, I was working on picom, the X11 compositor, when I encountered this problem. picom utilizes a X command, called GrabServer, which is essentially a giant lock that locks the entire X server.

Why do we need this? Well, that's because the X server is a terrible database, but that would take a long article to explain (let me know if you would like to read about that). To put it simply, picom needs to fetch the window tree from X. But there is no way to get the whole tree in one go, so we have to do this piece by piece. If the window tree keeps changing as we are fetching it, we will just get horribly confused. So we lock the server, then fetch the tree in peace.

And GrabServer is just the tool for that, quoting the X protocol specification:

[GrabServer] disables processing of requests and close-downs on all connections other than the one this request arrived on.

Cool, until I found out that ...

... It doesn't work

I have a habit of putting assertions everywhere in my code. This way, if something is not going where I expected it to go, I will know. I would hate for things to quietly keep going and only fail mysteriously much later.

And that is how I found out something isn't right - windows that we know exist, suddenly disappear while we are holding the X server lock. Basically when a window is created, we receive an event. After getting that event, we lock the X server, then ask it about the new window. And sometimes, the window is just not there. How could this happen if the server is locked by us?

The first thing I did was to check the protocol again. Did I somehow misunderstood it? Unlikely, as the protocol is pretty clear about what GrabServer does. OK, does picom have a bug then? Did we somehow forget to lock the server? Did we miss a window destroyed event? I checked everywhere, and didn't really find anything.

This seems to lead to a single possible conclusion ...

A Xorg bug?

It could be, though I didn't want to jump to conclusions that quickly. I want to at least figure out what was going on inside the X server when those windows were destroyed.

I could attach a debugger to the X server, however, debugging the X server pauses it, which would be a problem if I was debugging from inside that X session. Beside that, window destruction happens quite often, which can be prohibitive for manual debugging. It's still possible with a remote ssh connection, and gdb scripting, but it's inconvenient.

The other option is modifying the X server and adding printfs to to print out logs when interesting things happen. That still feels like too much work.

Luckily, there is a better way to do this. It's called eBPF and uprobe. Essentially they let you run arbitrary code when your target program reaches certain points in code, without requiring modifying the program, or disrupting its execution.

Yeah, we live in the future now.

So, I hooked into GrabServer, so I can see who is currently grabbing the server; then I hooked into window destruction to print a stack trace every time a window is destroyed. When everything was ready I set it off and collected the logs. At first there were a couple of false positives, because some applications do legitimately grab the server and destroy windows. But after a while, I saw something that stood out:

0x4755a0 DeleteWindow (window.c:1071)
0x46ef75 FreeClientResources (resource.c:1146) | FreeClientResources (resource.c:1117)
0x4450bc CloseDownClient (dispatch.c:3549)
0x5bfd12 ospoll_wait (ospoll.c:643)
0x5b8901 WaitForSomething (WaitFor.c:208)
0x445bb5 Dispatch (dispatch.c:492)
0x44a1bb dix_main (main.c:274)
0x729a77b6010e __libc_start_call_main (:0)

Aha, CloseDownClient! So the window is closed because a client disconnected? But I remember the protocol specification says

... disables processing of requests and close-downs ...

Oh yeah, this is indeed a Xorg bug! So what's going on here?

A simple bug

Xorg server uses epoll to handle multiple client connections. When GrabServer is used, the server will stop listening for readability on all other clients besides the client that grabbed the server. This is all well and good, except for connection errors. When an error happens, epoll will notify the server even if it is not listening for anything. The epoll_ctl(2) man page says:

EPOLLERR

Error condition happened on the associated file descriptor. This event is also reported for the write end of a pipe when the read end has been closed.

epoll_wait(2) will always report for this event; it is not necessary to set it in events when calling epoll_ctl().

Turns out, it's just a simple misuse of epoll. Checking the git logs shows this bug has been there for at least 8 years.

So how does a simple bug like this slip under the radar for so long? Actually, I think I might have the answer for this.

You see, a X11 compositor sits in a very special niche in the system. Normal applications only care about their own windows most of the time, so they only need to synchronize within themselves. And for window managers, well, they manage windows. They have the authority to decide when a window should be destroyed (well, most of the time). So there is no race condition there either. Only the compositor needs to know about all windows, yet doesn't have a say on when they are closed. So it's in a unique position that made using the big X server lock necessary.

Besides that, this problem rarely happens despite picom's heavy use of the lock. I was only able to trigger it by installing .NET Framework on Linux using Wine. (I will not explain why I was doing that.)

Conclusion

I actually don't have much more to say. Hopefully you found this little story interesting. I definitely recommend learning about eBPF and uprobe. They are amazing tools, and have a lot more uses beyond just debugging.

Additional note 1: Despite me claiming it is necessary to use the server lock in picom, there might be a way of updating the window tree reliably without it. I do want to get rid of the lock if I can, but I am still trying to figure it out.

Did GitHub Copilot really increase my productivity?

Translations: 🇯🇵日本語

I had free access to GitHub Copilot for about a year, I used it, got used to it, and slowly started to take it for granted, until one day it was taken away. I had to re-adapt to a life without Copilot, but it also gave me a chance to look back at how I used Copilot, and reflect - had Copilot actually been helpful to me?

Copilot definitely feels a little bit magical when it works. It's like it plucked code straight from my brain and put it on the screen for me to accept. Without it, I find myself getting grumpy a lot more often when I need to write boilerplate code - "Ugh, Copilot would have done it for me!", and now I have to type it all out myself. That being said, the answer to my question above is a very definite "no, I am more productive without it". Let me explain.

Disclaimer! This article only talks about my own personal experiences, as you will be able to see, the kind of code I ask Copilot to write is probably a little bit atypical. Still, if you are contemplating if you should pay for Copilot, I hope this article can serve as a data point. Also, I want to acknowledge that generative AI is a hot-potato topic right now - Is it morally good? Is it infringing copyright? Is it fair that companies train their model on open source code then benefit from it? Which are all very very important problems. However please allow me to put all that aside for this article, and talk about productivity only.

OK, let me give you some background first. For reasons you can probably guess, I do not use Copilot for my day job. I use it for my own projects only, and nowadays most of my free time is spent on a singular project - picom, a X11 compositor, which I am a maintainer of. I am not sure how many people reading this will know what a "compositor" is. It really is a dying breed after all, given the fact X11 is pretty much at its end-of-life, and everyone is slowly but surely moving to wayland. Yes, each of the major desktop environments comes with its own compositor, but if you want something that is not attached to any DE, picom is pretty much the only option left. Which is to say, it is a somewhat "one of a kind" project.

Of course, as is the case with any software projects, you will be able to find many commonly seen components in picom: a logging system, string manipulation functions, sorting, etc. But how they all fit together in picom is pretty unique. As a consequence, large scale reasoning of the codebase with Copilot is out of the window. Since it has not seen a project like this during training, it's going to have a really hard time understanding what it's doing. Which means my usage of Copilot is mostly limited to writing boilerplates, repetitive code, etc. To give a concrete example, say you need to parse an escaped character:

if (pattern[offset] == '\\') {
	switch (pattern[offset + 1]) {
	case 't': *(output++) = '\t'; break;
	// ????
	}
}

If you put your cursor at the position indicated by ????, you can pretty reliably expect Copilot to write the rest of the code for you. Other examples include mapping enums to strings, write glue functions that have a common pattern, etc. In other words, the most simple and boring stuff. Which is very good. See, I am someone who wants programming to be fun, and writing these boring, repetitive code is the least fun part of programming for me. I am more than delighted to have someone (or rather, something) take it away from me.

So, what is wrong then? Why did I say I am more productive without Copilot? Well, that's because Copilot has two glaring problems:

1. Copilot is unpredictable

Copilot can be really really helpful when it gets things right, however, it's really difficult to predict what it will get right, and what it won't. After a year of working with Copilot, I would say I am better at that than when I first started using it, but I have yet to fully grasp all the intricacies. It is easy to fall into the trap of anthropomorphising Copilot, and trying to gauge its ability like you would a human. For instance, you might think, "Hmm, it was able to write that function based on my comments, so it must be able to write this too". But you are more than likely to be proven wrong by the chunk of gibberish Copilot throws at you. This is because, Artificial Intelligence is very much unlike Human Intelligence. The intuition you've developed through a lifetime's interaction with other humans, is not going to work with an AI. Which means, short of letting Copilot actually try, there is oftentimes no surefire way to know whether it's going to work or not. And this problem is compounded by the other big problem of Copilot:

2. Copilot is slooooow

clangd, my C language server of choice, is very fast. It's faster than I can type, which means practically speaking, its suggestions are instant. Even when the suggestions are unhelpful, it costs me nothing. I don't have to pause, or wait, so my flow is uninterrupted. Compared to that, Copilot is much much slower. I would wait at least 2~3 seconds to get any suggestion from Copilot. If Copilot decided, for whatever reason, to write a large chunk of code, it would take a lot longer. And in many instances I would wait all those seconds only to see Copilot spit out unusable code. And I would have to decide if I need to refine the instructions in comments and try again; or partially accept the suggestion and do the rest myself. Even though this doesn't happen that often, (after you have gotten to know Copilot a bit better), much time is wasted in the back-and-forth.

So yeah, that's pretty much all I have to say. At least at this very moment, I do not think Copilot will improve my productivity, so I definitely wouldn't be paying for it. If GitHub's plan was to give me a year's free access of Copilot to get me addicted, then their plot has conclusively failed. But that being said, if Copilot is a little bit smarter, or several times faster than it currently is, maybe the scale will tip the other way.

Hmm, should I be scared?

I want a different Nix

I have been daily driving NixOS for about six months, and it has been great. I don't think I'll ever switch to a different distro again (don't quote me on this). I'm sure you've already heard why nix is great many times, so I'll try not to parrot my fellow nix enthusiasts. (And if you have not, it's not hard to find such an article)

Instead, I am here to complain about one thing I dislike strongly about Nix: it does not support dynamic dependencies.

To see what I mean by this, let me give you some background first. With Nix, a package's dependency was fixed when it was built. Say you have this derivation (what Nix calls a package):

package = mkDerivation {
   # ...
   buildInputs = [ dep1 dep2 ];
};

Then after package is built, it will content hard coded references to dep1, dep2, which cannot be changed. If either of the dependencies changed, e.g. a version update, you will get a different package as output. This can be great if you want your packages to be absolutely deterministic and reproducible. But, as an average Linux user, this has caused me much pain.

Because of all the darn rebuilds!

In the example above, if anything depends on package, they will be rebuilt if either of package's dependencies changed, because package is an entirely different package now. And all the transitive dependencies will get rebuilt too! Which means if you want to install a slight variant of a package, you could be getting yourself into a rebuild hell. And because of your change, none of the packages that need rebuilding can be found in NixOS' binary cache.

Last week I spent more than an hour just to enable debug info for xorg.xorgserver, because Nix has to recompile the entirety of Qt, webkit2gtk, along with 100 other packages. And last time I tried to use a different version of xz (you might be able to guess why), Nix wanted to recompile literally everything, because xz is one of the bootstrap packages, so basically every other package depends on it.

And this is pretty hard for NixOS developers too. Changes to certain packages trigger huge rebuilds, which is so computationally intensive, NixOS developers choose to lump them together into big pull requests. And they often take weeks to be validated and merged. Even urgent security fixes have to get through the same pipeline.

This problem is intrinsic to Nix, so I don't think it can be solved. I just wish there is an alternative to Nix that does most of what Nix does but allows dynamic dependencies. If you know such a thing exists, please please let me know.