Good Job concurrency lock contention
Probably a mediocre job in retrospect
This week I was running a long backfill task in GoodJob which spawned a large number of workers performing operations on a high number of database records. This kind of thing can get a bit stressful and takes a bit of planning to avoid strain on resources, not to mention too much babysitting. I’m more familiar with Sidekiq but I think they’re pretty comparable.
The details of said workers might be interesting to go into some more detail about later since it involved LLMs and Amazon Bedrock calls, but the takeaway I want to document is about over-engineering for performance.
I’d optimistically added a perform_limit with an associated key to
avoid too many workers running at once, hogging the queue and/or
stressing out Bedrock and triggering rate limits. At the same time, I
kept running into resource problems, overloading the database and
causing latency enough to cause detriment to production and general
DevOps concern. After going around in circles for a while, reducing
concurrent jobs etc., I realised the concurrency locks themselves were
causing the problem. We had a good number of workers available, and
while 5 were happily performing the task, the other X workers were
spamming queries to ask whether the lock was available. Having workers
hit Postgres over and over to check on locks smells a bit like a poor
architecture on GoodJob’s part, but I’ll leave that question open for
another day.
I’d incorrectly diagnosed the number of concurrent workers as the problem, but in fact it was the limitation itself causing the database strain. This wasn’t something I could see myself and took a bit of coordination with DevOps to identify which processes were the bottleneck and what they were trying to do. The number of workers available was a sufficient throttling mechanism, and adding a concurrent limit & key simply caused lock contention and a flurry of activity while the workers tried to negotiate who’s using the key. As soon as I removed those restrictions, those workers chomped up the jobs easily.
Completely unrelated TIL
Shift + Aputs vim into Insert mode at the end of the line, whereas$just moves to the last character. Annoyance solved.