On Rails

On Rails invites Rails developers to share real-world technical challenges and solutions, architectural decisions, and lessons learned while building with Rails. Through technical deep-dives and retrospectives with experienced engineers in the Rails community, we explore the strategies behind building and scaling Rails applications.

Hosted by Robby Russell of Planet Argon.

All Episodes

On Rails

Rosa Gutiérrez & Solid Queue

June 24, 2025 • Rails Foundation • Season 1 • Episode 1

Send us a text

In this episode of ‘On Rails’, host Robby Russell (@planetargon) chats with Rosa Gutiérrez, Principal Programmer at 37signals, about the technical decisions behind Solid Queue - a database-backed job queue replacing Resque in their Rails apps.

Rosa dives into why her team built Solid Queue, how it improves reliability, visibility, and maintainability, and the challenges of migrating live apps like Hey during active development. Learn how they tackled recurring jobs, long-running tasks, and testing strategies, plus insights on system design, scaling, and the joy of deleting old code.

Topics:

Why 37signals replaced Resque
Building a job queue with ActiveJob + MySQL
Transparent job states & using Mission Control as a dashboard
Migrating with minimal impact
Best practices for recurring and long-running jobs
Recommended tools, testing gems, and dev books

Links:

Acidic Job
Chaotic Job
Rosa.codes
Rosa’s videos on Ruby Events
Rosa on the 37signals Dev blog
Rosa on Github
Rosa on LinkedIn
Book Recommendation: Refactoring, Second Edition by Martin Fowler

#RubyOnRails #SolidQueue #BackgroundJobs #37signals #OnRailsPodcast

On Rails is a podcast focused on real-world technical decision-making, exploring how teams are scaling, architecting, and solving complex challenges with Rails.

On Rails is brought to you by The Rails Foundation, and hosted by Robby Russell of Planet Argon.

Robby: 0:00

Welcome to On Rails. This is the podcast where we dig into the technical decisions behind building and maintaining production Ruby on Rails apps. I'm your host, Robbie Russell. In this episode, I'm joined by Rosa Gutierrez, a principal programmer at 37signals. We talk about how her team tackled the growing complexity of background job management by building Solid Queue, a database backend queuing system designed to replace their aging Resque setup. Rosa walks us through how they approached migrating live apps like Basecamp and Hay to this new system while still actively developing it. We also explore how our team handles job retries, failures, recurring tasks, and long-running exports. Rosa shares practical strategies for debugging, simplifying job logic, and why she still leans on the one-off break task when the situation calls for it. Rosa joins us from Madrid, Spain. All right, check for your belongings. All aboard. Rosa, I have to ask, what keeps you, dare I say, on rails?

Rosa: 1:05

That's a great question. If I'm being completely honest, it's mostly my teammates on my current job. They are so great that even if the framework wasn't as great because it is, I will still be on Rails thanks to them.

Robby: 1:20

I'm actually curious, how long have you been working with Ruby on Rails?

Rosa: 1:23

It's been now 11 years.

Robby: 1:26

What does that kind of place you on the version roadmap?

Rosa: 1:29

I started on Rails 3. I think Rails 4 was already out or it was about to be released, I think. But the company where I started, they were still on Rails 3. So that was my first.

Robby: 1:42

Was that Rails 3.0 getting to 3.2 and then getting up to Rails 4? Were you part of any of those early upgrade projects there then?

Rosa: 1:48

No, I never did any of those upgrades.

Robby: 1:51

Okay. So what brings you here today is that I wanted to talk with you about Solid Queue in particular. So I know that Solid Queue was born out of your team at 37signals' decision to move away from Rescue. And for anyone who hasn't read those several posts from David, we'll include links for those in the show notes. But from your perspective, what started to break down with Rescue?

Rosa: 2:10

Yeah, the main reason was complexity. We have accumulated a lot of custom-made code and different gems that we built there at my company, forks of existing gems, all to manage different issues we have been hitting with Rescue over the years. So when we started doing a new app and we wanted to use Jobs in that new app and we checked, how do we have this set up in other apps? And David saw seven different gems and a lot of different stuff that was very, very hard to wrap your head around. He said, what is this? This cannot be so complicated. So that's why we started Solid Queue.

Robby: 2:52

Interesting. And I'm curious, you know, we talk about lots of, and that's a pretty common thing where over time an application will, you might have to fork your gem to do something That's special too, especially if you're working on a larger platform or application with a lot of users or a lot of edge cases and things that are not maybe what most SaaS applications may or may not be encountering. So I think it's interesting going through that process and being like, okay, what's a fresh take on these sorts of programming paradigms? And so I'm kind of curious, it sounded like that might have been a decision that might have come from David, but for you and your role at the organization, what sort of things did you try to navigate? Were there a lot of edge cases that you were able to get rid of or did you just try to account for that when Solid Q started getting developed?

Rosa: 3:38

Yeah. So when we started developing Solid Q, we basically developed took account of all the requirements, everything that we needed, that we knew we have to put together for Resque. So we built Solid Queue with those requirements in mind already, thinking a lot about those specific things that we have to patch Rescue for. So we started with that in mind. We also kept a possibility that perhaps we could avoid some of the edge cases, some of the quirks we were patching Rescue for. So we also kept that in mind, but mostly we were guided by those requirements.

Robby: 4:14

Interesting. And did you go through the process like, we're going to build this new approach or new paradigm for the new application? Or was it very much like, well, we're also going to use this opportunity to make this work in our existing applications?

Rosa: 4:28

Yeah. Yeah, it was actually the existing application approach because building it for a new application would probably have been not real. Like we wouldn't be sure because everything was new. We wouldn't have the volume in jobs, the problems, the edge cases. We wouldn't have anything there. it wouldn't have been very realistic. So actually the approach was to start with an existing application. And for that, we chose Hey, our email service, for a couple of reasons. One, the code base was fairly recent, started in 2019, and we have kept it up to date with Rails. It had all the quirks that we have been seeing in Basecamp with jobs already. We were using the exact same seven gems. and using all of that. And the other reason was that we didn't want to try with Basecamp because it has lots and lots more users. And it's also our main app. Hey, still is high criticality because it's email, but it wasn't as high criticality as Basecamp in the sense of it's higher criticality in some aspects like privacy and other stuff. But the jobs themselves, we've had a little bit more margin there to deal with problems.

Robby: 5:45

You know, out of curiosity, as someone that's used both Hey and Basecamp and different versions of Basecamp over the years, I'm curious, what sorts of background jobs are you typically needing to think about from a user perspective? It's a lot of like CRUD interface type things, you're saving data, displaying data. kind of behind the scenes stuff is typically happens like a lot of email notification type stuff or

Rosa: 6:06

in Basecamp or in

Robby: 6:08

Basecamp?

Rosa: 6:08

Yeah, in Basecamp. So we have all the email notifications, of course, then anything that's to do with bulk permissions, like when you add someone to an account, adding them to the project, adding them, subscribing them to a lot of different recordings. All that happens in jobs. For example, processing all the stuff for the notification themselves, not just sending them, but also Say mentions, like scanning a text for mentions that happens in a job as well. Processing actions that go in a cascade mode through hierarchies. Like for example, you have a message with attachments and comments and say you archive that message. So we propagate that through the hierarchies. Like we propagate that in a job as well. Anything that's to do with deletions. We normally, we soft delete a lot of stuff and we delete things in the background too.

Robby: 7:00

Interesting. And, you know, for those listening might be curious, like, well, you had patched Resque might have been easier to try to get your patches into Resque or versus like, when does a team think like, maybe we should just go build a new thing?

Rosa: 7:16

Yeah, that's a great question. I think Resque, I'm not sure it's longer maintained, but maybe we could have forked and do a new version. But it's actually a very, very old code base. It's really old. It supports any Ruby application. You don't need to use Rescue with Rails, which means it implements a bunch of the things that you now get with ActiveJob for free. So building a new system will allow to do something simpler that you could leverage ActiveJob. ActiveJob has a lot of things that a job-keeping system needs, like retrying, error handling, all the serialization, all that is built in ActiveJob. So that's great. But the main reason was actually that David wanted us to try to use a database instead of Redis. Rescue uses Redis as the backend. It's like Sidekiq. They use Redis as the backend to store jobs, which is great. Redis has some data structures that are really, really well suited to this problem of having job queues. But since we have had great success with solid caches, That was the first solid gem that implemented on cache backend for Rails using a database and had been already running in production for one of our apps and that was using a database. So David wanted to see if we could, you know, use a database for this.

Robby: 8:39

That's interesting. Yeah, I remember when that was coming out and when Solid Cache came out, I was like, oh, that's so interesting. I'm curious if in that scenario, were you typically using the same database as you're for your cache? And then it was like just other tables within the same as Basecamp's table? Or did you have like a separate database that you're connecting to a different SQL database or something? Or was it all just in the same one?

Rosa: 9:03

So for the cache, I think we started directly with a separate database, kind of makes sense. Yeah, to have it separated, the cache completely independent from the application. For Solid Queue, actually, we started with sharing the database, having everything together. One of the reasons was to have transactional integrity with jobs so that you can take advantage of, you know, if you are doing any modifications in your data within a transaction and you want to enqueue your job within that transaction that has certain advantages, like you can roll back the whole thing if the job fails to enqueue, for example, or you don't need to worry that when the job gets enqueued, the data is not yet there because the job wouldn't be enqueued until the transaction where you are changing your data is committed. So it has some advantages. So that was something we sort of had in mind and other active job backends that use the database, like GoodJob. GoodJob is a great example. We look a lot into that because it's a great piece of software. it also uses this, like by default it uses the same database. And so it takes advantage of this transactional integrity. So we started there and then the, you know, life takes you other places and we ended up with a separate database. And that is the default now and what we recommend, but we started that way.

Robby: 10:22

Okay. I appreciate that. And I think that's, that's helpful to get some context. And especially when you're like a local, local development environment and getting something running. And then when you're in production, I was just like, how does that, the advantage of like, if you're talking and writing and reading from the same database versus like, maybe you have, it could be the same database server, but it might have a different database. So you can see like, well, this one's starting to get a little, noisy, we might want to isolate that or something. But at least knowing that you have that flexibility there, it does seem like with it, then you're also then as with like in Rails 8, like you don't have to necessarily like think about like spinning up and running another process in parallel and development and your test environment and your production environment. And how can we simplify that? And so I think I appreciate that about Rails kind of coming back and revisiting some of those kind of default assumptions over the years. And it's not a question of like, we went from Memcache to Redis to back to the database There you go. And I'm curious about, you know, you mentioned like Sidekiq at least once there, but were there anything about Sidekiq or GoodJob that just didn't quite align with what you're looking there? Or is it just that primary factor of like, let's try to make this more Rails focused and more, and we're going to take advantage of just the SQL database?

Rosa: 11:32

Yeah. So for, well, for Sidekiq, it was obviously that it uses Redis. So it's also good. We explicitly wanted to use a database. And for GoodJob, the problem was that it uses very specific features of Postgres that aren't available in SQLite and in MySQL. And we use MySQL. So we couldn't use that. So one thing we looked at in the beginning was whether perhaps we could make GoodJob compatible with MySQL as well and just contribute to GoodJob. But after I spent a few days looking into the code and getting familiar, It didn't seem feasible. It would feel like we are shoehorning MySQL there. So for GoodJob, the problem was that GoodJob uses Postgres and it takes advantage of a few quite specific Postgres features. Like Postgres has a listen notify feature that is super, super handy to implement job queues. But MySQL and SQLite, they don't have this. And we use MySQL. And if we wanted to offer this for Rails, it will have to support the three official databases. So one thing we looked at was to perhaps make a good job to support MySQL somehow. So I spent a few days looking into the code. It's really cool. It's really, really nice. But trying to see whether we could make it work with MySQL, we could generalize it. But it didn't seem that way. It seemed like we would be making it much, much worse. Like way before we launched Solid Queue, I had the chance to meet with job creator Ben Sheldon in the first Rails world in Amsterdam. And I told him that I have been looking at the code and thinking whether we could support MySQL and he didn't see that as a good idea. And he confirmed like that wasn't a good idea. So I was a bit relieved, you know, like we made the right call there.

Robby: 13:24

That's great. What was your role in shaping the direction of Solid Queue early on?

Rosa: 13:28

I was the main developer in charge of that. So it was my project. So I could do whatever. I remember in the beginning, I look at all the 16 gems and write down everything I found and I did some tests as well. So I always wrote everything down for everyone to read. And I got great feedback from David, from Donald in my team, who was the author of Solid Cash. So they you know, gave their opinions, but ultimately it was my decision.

Robby: 13:57

Nice. You know, one of the boldest parts of this is that, you know, as you were mentioning that your team was migrating, was it, hey, specifically to Solid Queue while still building it, right? So what did that look like in practice? Like, how do you, you're running both systems in parallel at any point or? Yeah,

Rosa: 14:15

yeah, yeah, yeah, yeah, exactly. All the time running Rescue and Solid Queue together. So that's a great question. What I did was to start with the basics. What's the most basic thing you can build? And think of the most basic thing that allows you to move some jobs. That was the requirement. Then first, I chose which jobs to move first, which was the jobs that will have no user impact at all. And for that, I chose the, we call it incineration. That is the real deletion happening. So we soft delete everything. Usually like for Hey, for example, I think we keep, when you cancel your account or when you delete an email or yeah, most deletions are first put in the, for emails, they are put in the trash for 25 days, I think. And accounts are canceled. And once they, your subscription expire, it lasts for 30 days, I think. And then it canceled. And then 30 days after we delete everything. all the data. So there is always that period. So those jobs have no user impact. Of course, we need to ensure, because in our privacy policy, we assure you that we're going to delete the data. But it's the case that if the job is delayed one day, it's not a problem. So those were the jobs that I chose. We had quite a few different of those for sub-deletion. And so for those, we needed... basically being able to enqueue and run the jobs and scheduling them in the future as well. Because we do this, this is a very questionable pattern that I don't love, which is enqueuing a lot of jobs in the future to perform things that can be undone. But that's okay. That's how we were doing it. And we have to support scheduling jobs in the future anyway. So those were the main things we needed. So with that, I first built a prototype that just worked and did that. And then I moved those jobs. The nice thing about ActiveJob is that it allows you to define the queue adapter per job. So you can just move one job to a new adapter and that's it. You just need to ensure you are running that. Of course, you are running both. What we did back then was that actually there was a pause in this migration. I haven't explained that to anyone yet, but the official version is that I was building Solid Queue for 18 months or so. But the truth was that in those 18 months, there were many months where I didn't touch Solid Queue at all. And it was because the... the building solid queue overlapped with our migration to Kamal in Hay as well. And that lasts quite some time. So I remember when I first started getting solid queue deployed at the same time at the Resque and that it was all Kubernetes still. We were still in the cloud using Kubernetes and I remember doing that. But then since they were moving also the app to Kamal, we decided, okay, maybe let's keep the moving pieces to a minimum. And let's wait. Let's move this to Kamal first, exit the cloud. And then when the dust has settled, then we start moving to Solid Queue. So I remember I took a pause from the project. I completely stopped. And I work in other stuff, completely unrelated stuff. And a few months after, I took it back. I did, you know, different MVMs where you deploy the app and you run Solid Q using Kamal. It's super simple. So in the end, it was nice. Right. We didn't have to do complicated hand charts and stuff. So having both Rescue and Solid Q running together was very simple.

Robby: 17:58

You know, I think for those listening, there's a couple of things about that story there that I find fascinating. It's very relatable, I think, for a lot of developers. Right. You might be working on a large project initiative for a while. But then another decision comes down the way like, hey, we need to switch gears for a while. So and so you have to kind of put the work kind of on hold to some degree. And do you feel like you've developed at this point in your career, a good way of kind of tracking where you're at? So that makes it easy for someone relatively simple to pick that work back up and build momentum again, after dust settles? Because I think a lot of listeners might be working in those scenarios where they're working maybe on an overdue upgrade or they're trying to do some migration and then something switches and then they have to put it on hold. And then they're like, where were we? It's been three months or it's been several months. How do I gain momentum again on this?

Rosa: 18:46

Yeah, that's a great question. I'm not sure if I'm mastering this, but I've done that a lot. Actually now I'm in a post for a long running project. And I know I'll be worried when I go back to it in maybe next week. But my approach that it really works for me, though I'm not always so disciplined that I do it, but it's just writing, writing things, like writing things down where you are and writing updates. Basecamp is very good for this. And we are usually at 37 signals. We are pretty good about writing things down all the time. But I found that's the best. Periodic updates to the project so that when you come back, you can just refer to those and remember where you were and gain momentum back fast. Though I have to say that I'm not always doing this perfectly, strictly. And then I regret it.

Robby: 19:40

Have you been involved in scenarios where you're picking up the work that someone else had been working on and trying to make sense and having to rely on So maybe another way I could have asked the question is like, what would you recommend to someone else like in their career guidance? Like, but maybe it's one thing for us to tell ourselves, like, well, I always think I could be better about this, but like I'm giving advice to someone else, like please go through this process and this is, this will make you and your coworkers lives much easier.

Rosa: 20:04

Yeah, totally. Yeah. I think I've, I've done that. Yeah. Not lately, but I've definitely picked other projects and I think maybe someone has picked I'm not sure if someone has picked my projects, at least not in 37signals, but it's just do the updates. Like, Basecamp is great for this, the product. It just has these periodic check-ins. It has this new feature that we have for projects that is the needle update. So it's called move the needle, and you need to say where the project is at and write an update. And it knocks you a bit when it's been a while since you haven't posted an update there. So... Something like that. Something that just nudges you to write and say where you are.

Robby: 20:48

I think that's helpful. Thanks for kind of going on a little tangent with me there for our audience and for myself as well. And I think there's always those kind of like, you know, people might have this assumption like, well, people working on these really complex systems and applications, they figured it all out. But, you know, we're all human and we've got a lot of things distracting us. So I'm curious also like with thinking about this type of scenario where you're putting things on hold was you mentioned putting a When you're building something like Solid Q, are you opening up issues in GitHub for the project to work against? Or is it just like once you ship it, then you kind of... How did that process kind of look like if you're building out, extracting something, and then you're going to then share it with the rest of the world? Yeah.

Rosa: 21:33

That's a great question. So we also for this, we don't really use GitHub issues. We just use Basecamp. We usually use to-dos or cards or whatever there. But it's a great question because when I started, I wasn't quite sure what I have to do. I mean, I wasn't, it's not like when sometimes when our product team is building a feature and it's shaped and you have a clear vision of what you are building. And then it's easier, I think, to just understand write a list of to-do's of all the things you need to do and you just do them but sometimes with solid queue some of the in in when it was early the to-do's were very vague you know like figure out in queue jobs or run jobs or whatever so i didn't quite track the word that way i will just have a goal and i will try to do it and then i will just write an update about that goal but sometimes it was It wasn't until I got real with this, you know, actually shipping code, that I started getting more traction. In the beginning, I remember I struggled a bit with this because it was, you know, sort of the blank page block. Like, I got a bit of that and I still

Robby: 22:45

Rosa: 22:45

when I need to start. It takes me some time and it's hard. I try to write, even long form, you know, like, I need to do this, this, this in a long form writing, not to-dos, because I find it very hard. But then once I've got real, I have something working and I can see small steps, then I start doing, in my case, I do always to-dos. And yes, so I write there everything. I try to divide them. But sometimes some to-dos end up being super big and others are very small. And then I may do a pull request that closes everything. a few to-dos or I need multiple pull requests for one single to-do sometimes. So it depends, but I don't have the, like a super, I would love to have a perfect, effective, you know, a flawless procedure to do this in a structured way.

Robby: 23:42

I think that's just a, you know, I think we're all trying to figure out how to find our momentum or make progress and show progress and keep ourselves on track at some times and into like, and when you're working on something new like that or replacing something like Resque and you're like, all right, so there's some, I mean, I'm assuming there was some feature parody that you needed to think about. Did you come approach it with like a lot of a test driven approach where like, you kind of like, it needs to do this, it needs to do this. And like, you could map it out that way and be like, until this is all green, it's not really working. Or is it just like a little piece by piece in like, You probably have to play around a little bit and experiment, I would imagine, as well.

Rosa: 24:21

Yeah, yeah, totally. In my case, I think this part was easy, sort of, because it came given by the jobs I have to move. So basically, I was moving jobs, and then I will try to move certain jobs that will need a specific feature. Like they will need perhaps... I don't know, they will need the concurrency controls or they will need something that we still didn't have. So that was my next goal, you know, build this so I can move those jobs.

Robby: 24:52

So I'm curious, did anything get tricky with retries or serialization?

Rosa: 24:58

With retries, not quite. The only thing that got tricky was that in the beginning we didn't have a way to retry jobs other than going to the Rails console and just search for the jobs there if they fail and just retry them manually and that wasn't very nice for when you were like we have to first we have to wire up some alerts so you know like how would you know that they fail at all unless you are checking so we we put together a very simple alert that will just check the we have a separate table for failures so the alert was as simple as checking that table and see if there is anything it's very similar to what we had with Resque like it has the failed jobs stored separately in Redis. So you just check there and see if there is anything. And so you know. But then you have to go to the console. And that's where mission control jobs came. Like we already had it, but it didn't support Solid Q. So I actually have to stop doing anything else with Solid Q until we have mission control running and you could see the jobs fail there and you could retry them there. because it was otherwise for anyone on call or anyone seeing anything failing, it was a pain just having to go to the database to inspect the failure and everything.

Robby: 26:12

For those listening that might not be that familiar with mission control, can you explain that a little bit?

Rosa: 26:17

Yeah, it's a very simple dashboard to manage your jobs. You can inspect queues, the jobs that are in progress, the jobs that are already finished, if you are storing those. and the jobs that had failed and you can discard jobs that had failed. You can retry them. You know, all the basic stuff that you usually need to do when you are intervening manually, you know, because I assume like you can have, of course, automatic retries and all that at the job has for free. But there are cases where you need manual intervention.

Robby: 26:53

Nice. This episode of On Rails is brought to you by Strong Params, enterprise-grade input filtering built right in. Before Strong Parameters, any user could post whatever they wanted and your models would just take it, no questions asked. With Strong Params, you define exactly what's allowed, no more mystery fields, no more surprise admin accounts, just peace of mind. Trusted by Rails developers everywhere since version 4.0. Strong Params, because params require user permit, is the grown-up thing to do. You know, and kind of want to touch back on the migration process. And you mentioned that you had, you would start moving certain jobs over as you had the capabilities and solid queue to do that with. But then, you know, at some point you may, might've needed to make the switch, or did you just get to a point where you'd let the, everything that was still kind of maybe in the Resque world finish? You mentioned that you have like in the future type scheduling going on, or did you ever port any of those over or did you just let them finish out? And then at some point you could turn it off.

Rosa: 27:57

Yeah, that's a great question. I actually moved them all, like I moved them manually. All the ones that were scheduled in the future, meaning I remember I did like a couple of hours in the future because jobs scheduled in the future in both Rescue and Solid Q, they also include retries. You know, in active job, you can retry with delay. So you are not like, imagine you got rate limit or whatever, and you don't want to be slamming the whatever thing you are hitting all the time. So you have the option to retry a number of times and add some delay. So it may happen that we have jobs doing that case. So I remember I migrate something like, you know, more than 12 hours in the future, but I migrate all of them to Solid Queue at some point. Once we have those jobs running in Solid Queue so that new jobs will be enqueued in Solid Queue, I then move all the ones in Rescue.

Robby: 28:50

What did that migration process look like then? Were you... pulling stuff out and just sticking it in the new...

Rosa: 28:56

Yeah, pretty much. It's very simple. It's just a little bit strange because Rescue uses a very weird API to interact with Rescue. It's a bit strange. So it doesn't feel super, super intuitive to use, but it's simple. You just need to look how it's done. And basically what I did, we have At Appoya, I think we have about 30 million jobs or something like that. So what we did was to just, no, sorry, we have less in Rescue because we also have our own homemade system to schedule jobs in the future that was using the database. So in that case, it was just a matter of copying between two different tables. But for Rescue, we also have a bunch there. So it was just fetching jobs and inserting them in bulk in Solid Queue, basically.

Robby: 29:45

So the process was like just extracting it, moving it over, and then were you just deleting it in Rescue then? Yeah. Okay. And did you have to manipulate any much about like what you were getting out of Rescue before you could stick it into?

Rosa: 29:59

We use ActiveJob for that, but yeah, you need because Rescue uses its own serialization, but you can just use them the same way as Rescue endures jobs. And you know, the Rescue adapter in Rails, you can just use that.

Robby: 30:12

Interesting.

Rosa: 30:13

I don't know if it's public API or not, but it didn't matter because it was this one time.

Robby: 30:18

Did anyone on your team ever share any of what that might have looked like for people? Because if people are listening, they're like, oh, I might be interested in migrating, but we're going to have to figure out how to reverse engineer these two different things and...

Rosa: 30:32

Right, right. Yeah, good question. I don't think we've ever done it, but I would totally go and find those scripts because they are in GitHub. We had it in Git for sure. I probably deleted them already, but they are in the history, so I can totally find them.

Robby: 30:48

I think that would be interesting for people to get a chance to look at because that ends up being... I mean, in your situation, you were building the solid queue and then you're moving things over. But sometimes it's that middle little layer, that temporary thing you need to write to move things that people are like, all right, but that's the work I need to do is I don't have to build solid queue, but I might be interested in moving. But how big of a lift is that going to be? And if there was some code out there that could maybe, or at least some patterns people could kind of follow, I think that would demystify it. And maybe potentially a lot more people would get to migrate through it sooner. If they're like, oh, that's actually not so complicated. Yeah,

Rosa: 31:26

yeah, yeah, for sure.

Robby: 31:27

That's good to hear.

Rosa: 31:28

Yeah, yeah, totally.

Robby: 31:29

You know, and so you were able to migrate, move them over and then turn Resque off then, and you didn't necessarily have to keep them running?

Rosa: 31:36

Yeah, that's right. So for the scheduled jobs in the future, we did that. For anything else, we just wait.

Robby: 31:41

Just for scheduling, okay.

Rosa: 31:42

Because those were just, yeah, the jobs that are ready to be run are usually very fast. So as we were moving specific jobs, you know, like I mean the active job, you know, changing the adapter, deploying and making sure that they were running in solid queue only once and then seeing if we had moved all the jobs running in a specific queue, We will just remove that queue from the Resque configuration, and that's it.

Robby: 32:08

Awesome. You know, one of the things I was looking at, I was looking at the source code and the readme on GitHub for Solid Queue, and I'll include links to this in the show notes for everybody as well, but one of the things that caught my attention is how job state is so transparent in Solid Queue, like waiting, scheduled, running, failed, all right there in your database. So you mentioned mission control maybe takes advantage of that. How has that changed the way your team thinks about debugging or monitoring jobs now?

Rosa: 32:34

It's been so much easier. Honestly, it's been really, really easy. We had a case, actually, when we were about 60% or so, or 50%, I don't remember, even less, of jobs move. And we had an issue where we were seeing lots and lots of jobs being enqueued of a specific type. Lots. And we didn't know what was going on. With Solid Queueui, it was so much easier to inspect the data. And then once we figured out what was happening, it was a really crazy thing with our automatic scanner that we were running and we were basically dedosing ourselves. But anyway, once we found that, cleaning that was so much simpler than with Rescue. because Rescue has a very weird API. And it's hard to, for example, in Rescue, it's very hard in a queue to fetch jobs of a specific class because they are stored in a Redis list. So you don't have... You know, in a database, it's kind of filtered by a column value or whatever. It's so much simpler.

Robby: 33:42

Yeah, that does seem a lot simpler.

Rosa: 33:45

Yeah, it was super simple.

Robby: 33:46

So there are a few questions I think our listeners would be curious about that aren't necessarily specific to SolvaQ. And you kind of touched on this a little bit already, but I want to talk a little bit more about how your team currently approaches certain job patterns. And how do you think about the trade-offs between enqueuing batches of jobs versus, say, triggering one of them at a time?

Rosa: 34:04

Yeah, that's a good question. I'm not quite sure we have an official answer to that. I think we usually think of different possibilities. For example, if you are going to do... You mean a batch of actions in the same job? When you say... Or enqueuing a whole batch at once or enqueuing one job in that batch one by one? Or... Just making sure I understood your question.

Robby: 34:34

Yeah, no, that's a good follow up there. I think from my experience, I've seen a lot of teams either go through the route of being like, all right, we're going to have something that's going to queue up a thousand jobs individually. And then maybe some of those will queue up some, or we're going to have like a job that's going to process a thousand things as one job. And then there's kind of like this chain effect or like, well, then maybe that spins off another job. job and it keeps cascading a little bit. And so.

Rosa: 34:59

Okay. Okay. Got it. Yeah. So I think what we will do will be to try to keep jobs, um, the smallest you can, uh, in a way that it makes sense for the action to happen independently from the other actions. Like say, for example, those thousand jobs, those 1000 things you need to do, say they are completely independent. They aren't related. So then in that case, we will go for different jobs because it'll be faster and it'll be also much easier to retry. Like if we have to say that huge job fails, you will need to ensure that it's reentrant, like you don't process the same things over or that if you do, it's okay and all those things. So it will be much easier. The problem is if the specific things, maybe they depend one in other. Like you have a bunch of things you need to do and they need to be done, like one needs to happen only if the previous one needs to happen and so on. So I know there are different opinions on that. I personally, I don't like to orchestrate these kind of things using the jobs to do the orchestration. I don't like to have jobs that depends on other jobs to run. I know some people like they have this, actually Solid Queue doesn't have this feature yet. People want it to have that sort of flow that you can do. Or batches, like, you know, you need to run all these operations and then that triggers another job and all that. We don't have that yet. The main reason we don't have it is because we don't use it. So we didn't build it. The truth is that I don't like that very much because it becomes very hard to debug. I find it very hard to troubleshoot when you use your job system as the orchestrator. I find it simpler if I do that in the code. I use a single job and then I make sure I can retry that job and I can resume that job if it fails or needs to be resumed. And actually, we are going to do a small sneak peek. We are going to open source. A very simple solution, very, very nice, that we put together in my team. It was Donald, you know, who worked on Solid Cash. He worked on this. And a very simple solution to easily interrupt and resume jobs. That would be, you know, these kind of jobs that do a lot and can be interrupted, and you need to be sure that you can resume them. So you don't need to use independent jobs that need to be running, you know, in a specific sequence and depending on other jobs. So this is going to be, I don't know, soon, I think. I hope.

Robby: 37:36

That's interesting. I'm always thinking about how teams are trying to approach that with their different things. So maybe we can talk about a specific type of reason why you would have maybe some sort of job that's maybe it's a scheduled or recurring thing, things that maybe many years ago we might have used Cron for. Every morning we need to generate a bunch of reports for every one of our customers so that when they log into their interface, they can see some updated data that's relatively updated. Up to live data, but maybe it's not live data because it's too intensive to do it in real time. So in that sort of scenario, if it's like a daily thing, you're going to do the process for all of your customers and sequentially go through each one and do that. How would you think about approaching that today?

Rosa: 38:21

That I will totally use a recurring job or a ground job totally that includes jobs, independent jobs for each customer. I will totally do that. And we had another very fun incident with jobs. In Hay, we have a recycling feature. You can set specific contact or like a specific sender or a specific box to be recycled. That means that after 30 days, everything will be deleted. So the way we have implemented that was that we will create a recycler for a specific contact and then the recycler will be scheduled. It will run. and then it will schedule its next run. What happened there was that some recyclers, for some reason that we weren't sure because it was long before and we didn't have any logs anymore, they were failing to enqueue their, maybe something, I don't know, maybe there was an issue with Redis, there was an issue before, whatever, but they failed to enqueue their next run. What happens there is that you just lost the recycling completely, like they stopped and you don't know. And this can never happen if you have a recurring job daily because if something goes wrong one day, you know that then the following day it's going to run again. And whatever happens is going to, you know, it's going to be remediated in a way. So for this case, for the case of the reports, maybe you want to make sure that you get alerted or whatever if the reports are not delivered. And for that, I would probably track that information somewhere that say delivered through or whatever, something like that. But I will go for the simpler, like, as I say, it's just hard to account for all the things that can go wrong. If you are assuming and queuing jobs, it's going to always be fine. Like you need to assume that and queuing jobs may fail. Like anything that may fail. So you need to, I think you need to account for that. It may not matter, like maybe a job that doesn't really matter much. Like maybe, I don't know, some turbo stream jobs, for example, that if you use turbo, you get those. Some of them may not matter because it's just updating the UI that it doesn't have any big consequences for the data. Like no data is being lost. No action is being interrupted. It's just an UI blip. You know, like if you reload the page, it's going to be there, whatever. But there are cases that you do need to think what happens if your jobs are not enqueued.

Robby: 40:42

So a couple of things to extract out of that. So you're not a proponent of having jobs scheduled the next time that they should run? No. So what do you think in that scenario should be triggering that? What is kind of like the kind of current modern day approach to that? Does your team still use cron type stuff anymore?

Rosa: 41:01

Yeah, we do. And actually for the stuff that we don't, like the incinerations, I really want to move them to Chrome because just last week or two weeks ago, we have an edge case where in Basecamp, when you deleted an event from the calendar that corresponded to a recurring event and wasn't the first event in the series or the last event in the series, so it was a middle recurring occurrence, you deleted that, it went to the trash, and it will just stay there forever. because we weren't currently enqueuing. The logic was so complicated that we weren't enqueuing the incineration job for that case. And that just doesn't happen if you just have some record in your database that says this needs to be deleted and is due whenever. And then you just have a Chrome job that checks everything that is due and does the work. And so if you miss it, the next time it will be available. I find it so much... simpler than just relying on enqueuing those jobs in the future and all that. So yes, we do use those and I want to move those incinerations. I really want to move them to this model because it's much simpler.

Robby: 42:18

And it's always been there.

Rosa: 42:19

Yeah, exactly. It's just a simpler thing. It's just the truth is that sometimes I love simpler solutions that have always worked fine.

Robby: 42:28

It's such a good point there. And I think that's One of the things I've appreciated the last few years within the Rails community is that we're getting in touch. And I'm like, wait, that was what we were doing. We had to do that 20 years ago. So I think there's some types of technology, like using Cron as an example. It's been around forever. And it's such a reliable thing. But then at the same time, it doesn't feel modern. And so why do you think engineers build these new approaches? It's like, well, let's not trust that the computer is going to be good about remembering what time it is or something, you know? And like, which we hope it does, but then, but that also in a certain way, I think there's, it feels like, I think people might be listening to like, well, I think when you run things by, like by Cron, it feels like you have a little less visibility into what's happening, you know? And so there's a, maybe there's a trade-off that people are.

Rosa: 43:16

Yeah. Yeah, totally. So for that, actually we have Cron jobs built into Solid Q. Like we actually, I say we use Cron jobs. We call them Cron jobs, but they don't run with Unix, Chrome. Because that was actually a thing we had to do because of Kamal. Rescue has a plugin called Rescue Scheduler that allows you to schedule jobs in the future because Rescue by itself doesn't have this. But that Rescue Scheduler also has Chrome or recurring job thing where you can define jobs that run recurrently. And so with Kamal, it wasn't that easy to run Chrome in the Docker container. So we have to move. We still have... real cron jobs using unix cron but we have to move them to Resque scheduler to Resque scheduler so then when building solid queue we needed a replacement for that so solid queue actually has recurring jobs which are like cron but it's just they are defined in a in a yaml file and and run automatically by solid queue. So, yes.

Robby: 44:19

Okay, so that's helpful. Thanks for kind of explaining that.

Rosa: 44:22

Yeah, yeah. It doesn't need to be Unix. Just the general model of having a recurring theme rather than something that needs to schedule itself or something that you need to schedule in the future and trust that whatever. It's also especially super useful when there is something that can change a state. For example, trashing things is very easy to see because You trash an item in Basecamp. And so it goes to the trash. And you enqueue the job 30 days from now, for example, to delete it. And then you restore it because you changed your mind. And so you enqueue that job. It's not valid anymore, but it's there. And then you trash it again. And so you enqueue another job. And now you have two jobs there enqueued that will do nothing. But it's just it gets... wasteful in a way. Like why do you need to have to account in the jobs? Of course, you need to account that, you know, do nothing if the actual thing is not trashed anymore because otherwise you would be losing data. So it's also all those things that get so much simpler if you just have a process that checks everything that's deleted and just delete in that moment.

Robby: 45:32

Right. And just for kind of dig into that, I'm going to ask a couple maybe dumb questions here, but like just to make sure I understand the difference for our audience here. So you might have a job. Let's take a job like you're deleting something. And so you create a job, schedule a job in 30 days or whatever. The plan is in 30 days, we're going to delete all of this stuff for real. Now it's like a soft delete and we're going to maybe permanently remove it or whatever. And that job, one approach could be that in that job, when it starts to run, it will check, is this still working? something you should be deleting? Like that should be, should you, is this still in a state that you should be doing? Right, right. Maybe another curiosity here is like, what happens when the business logic changes between now and 30 days from now where...

Rosa: 46:20

Yes. Exactly. We have that. We've had that problem. What happens there is that you may get a lot of errors for some reason. Like imagine you change the 30 days, right? So we have that. That's the case. The logic is as simple as you say, like basically in a deletion job, we check if the record is still trash, no? And then we delete it. And if it's not, we just do nothing. So the job runs. But then if you change the date, say that because also you don't want to, it's not enough to check if the object is still deleted. You need to check if it's been deleted for enough time, right? Because say you delete something, you restore it. And then one week later, you trash it again, right? So then you will have two jobs. One running in three weeks. and from the second deletion and one running in four weeks or 30 days or whatever. So when the job in 30 days comes, It shouldn't do, even though the record is in the trash, you don't want to delete it because it hasn't been 30 days. Has been 30 days since the first time, but not the second. So you need to check that. You know, if it's, we have basically what we check is, is it due for deletion? That would be the check if it's due. And if it's not due, we wouldn't do anything. Or in some other cases, we may raise an error. It depends. So if you raise an error and you change, it happens to us. If you change the period, it may happen that your records that are actually due for deletion, they are not anymore because, you know, when that date comes because you've changed the period. So then you will have all those jobs eroding or if they are not eroding, they may just do nothing and then the records will be there forever because you need to... make sure that when you change the logic, you will need to schedule new jobs for everything that had already their jobs scheduled. You know what I mean? Yeah. It's so much more complicated.

Robby: 48:14

Yeah.

Rosa: 48:14

Whereas with the recurring approach, you don't need to worry at all. You just change the logic, track the date where each record has been deleted, of course, just the last time. You only care about the last time a record was deleted. And so when the recurring job comes, when the current period, You just check it and if it's not due, you don't need to do anything, but also it will come in the future. And if you make the period shorter, everything that wasn't due before, maybe it'll be due the next day, but that will be fine. As you know, everything will get the new period immediately applied, if that makes sense.

Robby: 48:49

Yeah, yeah. And, you know, in that, so in that scenario, so you're maybe flagging like a column or something that like this needs to be deleted at or sometime. Yeah,

Rosa: 48:59

yeah, exactly.

Robby: 48:59

And then, so on that date, Right, right, right. Right.

Rosa: 49:23

Yeah, that would be, yeah, that would be, I mean, we still do the scheduling in the future. We haven't migrated that. Speaking, you know, saying all those things aloud made me, you know, just finishing this podcast and just go and migrate because it's so much simpler.

Robby: 49:40

When they think about retries in particular, because jobs can fail fairly unpredictably at times. And you mentioned something about how you could interrupt and resume, like there might be something coming down. Like, can you give me like an example or? when you would want to do that?

Rosa: 49:57

Yeah, so for example, one good reason for that is that Kamal, when the way it works, when you deploy, you have a grace period for the running containers to stop. And they stop and then the new ones start. And with Kamal, you are not allowed to have two versions of the code running simultaneously for like ever. They don't run simultaneously. Only for that grace period when one is stopping and one is starting. So that means that You may have jobs that are very long. For example, in our case, one of our longest jobs are exports. People can export their data in Hay and in Basecamp. And depending on how much data you have, it may take a while, like it may take hours. If you have a very big account and you are exporting everything, you can upload files and it can take hours. So that's a good example that if we deploy and interrupt that, it would be really, really annoying to have to, you know, to need to start again from the beginning. And then imagine if we are working, if it's a weekday and we are working and deploying often, you could have the case of an export never finished because we are just interrupting it and it needs to start. So that would be a good case. You want to run and if you fail, you ideally, you should be keeping your state when you were interrupted. And then when that job starts again, you just can resume. It can be export, but it can be any other kind of long running job for things you need to do.

Robby: 51:29

You know, that's another good example there where you're, I'm assuming with like an export, you're not just creating, adding a bunch of new data into the database. You're maybe literally exporting files to somewhere, to some server or something. Right. And like if that process were to get interrupted or stop halfway in the middle and then you have to, and it has to restart. If the job fails, like how do you think about patterns for cleaning up what had already been done? generated or do you kind of approach like check if something is or like if these files already exist we'll just keep using them or like what does that look like there

Rosa: 52:04

yeah yeah in the case of exports it's actually very simple what we do when the export starts We stored a sort of file with a token thing where we store some information about the export itself. We keep exports as records in the database so we can track the status. We know if this is pending, if this is resuming or ready to resume or if it was interrupted. So we track that there and then we store a little file in the place. We have like a Some boxes, it's like a separate volume that we mount in the export container. We use a different volume for that, that we mount in the Docker container where the export jobs run. And so we just, we create a folder for each export and we store a little file there. So when the job starts again, it just checks if it's running there. Actually, we had a way to force We use a feature with ActiveJob that you can use different queue names. And we have a little hack. Actually, we should publish this because it's very clever. It was invented by, like, this idea was by, I think, James or Jeremy or whatever in my team had this idea. But we use the queue name to make sure that all export jobs in a specific queue run always in a specific server, in a specific VM. So we know the volume is there. So it's not like the next time the job is going to run, it's going to be picked by a different server. And of course the files won't be there. No, we ensure it goes there. But also if you have just one server, you don't need to worry about this because it will be that. And so we store that little file. And when it starts, it checks that it's there. And the progress, we store the progress there. And we just see where we were at. There are other techniques. Maybe you could also track it in the database, perhaps. At certain points, you know, yeah, I've done this, I've done this. So it doesn't need to be like a file there,

Robby: 54:00

but

Rosa: 54:00

it's what we do.

Robby: 54:02

Thanks for kind of diving into that. I think that's helpful to get some context for some of the lower level details that people might be trying to figure out. How would we approach that? You know, tell us a little bit more about how you think about testing background jobs.

Rosa: 54:15

Yeah, so usually what we do for jobs, this is something that I wasn't doing before joining 37signals. It was something I totally learned there. And it's that they, at 37signals, they advocate for having almost no logic at all in the job. If you look at our jobs themselves, they are tiny. They are usually one size. One line, the perform method, so actually the job logic is one thing, most cases, or two things, three things. And so that logic, we always put it in the model that's there. So test the model for the job logic itself. Then, of course, it's the matter of testing situations where the job may be wrong, when you can't get errors, what happened, the retries. That usually we use testing helpers that a T-job has. Maybe not everyone is familiar with those, but a T-job includes some really, really handy helpers that allow you to check if a job was enqueued with the arguments, the time, which, you know, in case it's a retry, you can check that it was enqueued with the delay that you expect it to be enqueued. So that's super... Actually, for us, it's been enough. But it's true that sometimes we've added tests for that once we had a problem with a job that, you know, wasn't being retried as it should. We discovered it, we fixed it, and then we added the test. And lately, I don't know if you've seen this, but Stephen Marihain, I'm not sure if I'm pronouncing his surname correctly, but he's the SQLite guy. So he published a gem. He has a gem that's called Acidic Job, and he has another test. gem that is called a chaotic job i think but it has like um a lot of really really cool methods for testing jobs and simulating a lot of scenarios like you know like this job and introducing random errors interesting like a chaotic job is you know this sort of chaos a monkey that It's supposed to go into your infrastructure and your system and just cause random problems. So he did this for jobs and it's super, super neat. I saw a talk from him explaining how it works and it's really, really neat. We are not using that, but I think that has a lot of potential for us, for everyone.

Robby: 56:37

I think I'll definitely include links for Stephen's gems in the show notes for everybody as well. So we can look at that. What was the first one you mentioned chaotic job? One

Rosa: 56:46

is acidic job for sure. And the other one is, I think it's chaotic job, but I'm not, I'm not completely sure because I'm speaking from memory, but it should be super easy to find.

Robby: 56:55

Great. I'll definitely track those down for everybody, you know, and you mentioned earlier, so the, I'm curious about another maybe things that you tested out with the, you mentioned that you might have already removed the code, but for that migration from moving jobs from Resque into solid queue, did you, where did you stick the logic for that? Was that like, was it more than like a rake test that you just ran on the server or is this something? In

Rosa: 57:20

that case, it was just that because it wasn't very complicated. Just a rake test. I ran in the server. Well, first I did some tests manually. just in the console checking that I wasn't breaking anything. And then always, you know, without deleting anything. The first time I did that, I didn't delete the jobs. I deleted them only when I was sure that they had been correctly moved. But that case was just a simple task. In other cases where it's been a little more complicated, I may have created a simple Ruby object, a class, And sometimes I add tests for that if the logic gets complicated. Like in that particular case, I didn't because it was just copy this here. That's it. But some cases, some more complicated migrations we've done, we do that. We just create a regular class. We add tests. And we will use a rig task from the server just using that code that we will have deployed beforehand.

Robby: 58:17

Okay, so it sounds like something that I've talked about with people, like with teams is like a lot of teams, because I work on a lot of projects where they've been around for a while and then we'll come into them and we're like, or trying to make sense of like what code is still worth keeping or what still needs to be around. So does your team have a healthy approach for removing? It's not that you mentioned to remove that code maybe from the project that it's since been deleted. So what's that process look like? Or is it just kind of like an informal, like we don't need that anymore. Let's just go ahead and remove it.

Rosa: 58:46

Yeah, it's sort of that. Like we don't have a really, like sometimes I think we keep some stuff that could be removed and it's still there. In this case, it was because I was, you know, it was some sort of culmination of moving from Rescue to Solid Q, a culmination of deleting everything. Everything Rescue related, everything deleted. And it was so satisfying. But in other cases, I mean, I'm sure we have some migration scripts that don't make any sense anymore. And they are just there.

Robby: 59:18

It's such an interesting thing. Like you mentioned, it's like that code's still in Git technically, but I'm always, and people are like, well, if we delete it, then, you know, someone could track it down. But I'm like, what about the person that joins in five years from now? Like they're never going to. find that, you know, and maybe that's a great thing, you know? And, and so I think there's like, there's sometimes teams or developers are a little precious about keeping their code around. And it's also like, we don't, we can figure this out again one day, or, you know, it's just like, how will we need to re-reference this or not? And so I don't think about how teams could be a little bit more proactive about like, let's, let's delete this stuff. Like nobody needs to see, like, it's just, it's cause it's, it's there. And It's getting accounted for every time we're getting stats on how big our application is or our test coverage. And it's like, let's just delete this stuff. Come

Rosa: 1:00:07

on. It's okay. Yeah, I love deleting things. I'm very easy to delete, but I'm also good at writing things down. So usually, for example, in that case, I wrote that somewhere in the to-do to delete all the last stuff. I probably linked that and made it, you know, that you can search for that in the future if you need to know. And that will have a link to the pull request. And then you can see how it looks before being deleted. Yeah, yeah. So usually I'm one for delete, delete, delete.

Robby: 1:00:37

Do you hear that, everybody? Rosa says to delete some code. Go do it today. Do you often delete other people's code very often, though? Or do you think in that scenario it was because you had written it, it'd make it a lot easier? Yeah,

Rosa: 1:00:47

it's true. I mean, it depends. But I think I will delete everyone's code. Now, at this point, I've been at the company for eight years. So it's not the same. Like if you are new, maybe I will be, you know, more shy about deleting things. Like I wouldn't be, you know, doing that. But in that sense, I don't think anyone at my company is very precious about that. So that's also a factor. Like if I was in a different company where people were more, you know, to keep things, I would just keep things. I wouldn't be, you know, trying to impose my view and deleting everyone's code. And, you know, it will depend.

Robby: 1:01:26

I'm going to keep advocating that we should probably delete our code. Yes, me too. I think it would be just better for everybody. Okay. You know, for teams listening and thinking about maybe moving from Resque or sidekick to solid queue, what would you want them to think about ahead of time?

Rosa: 1:01:44

I think the main thing perhaps will be to get volumes, to get some numbers. Like how many jobs are you enqueuing at peak? What types? Like how long can they be delayed? You know, all your numbers, your restrictions and all that. And then just do some calculations with the hardware they have and the database they have just to make sure that they are not trying to move to a database that is too small, maybe, and then running into trouble. That's what I would do, I suppose.

Robby: 1:02:13

Okay. And was there anything you maybe would think about doing differently if you had to do that migration again?

Rosa: 1:02:22

Yes. One thing for sure, like the main one, would be to start from the beginning with a separate database. Because we started with the same database as the application, and we discovered later on that we couldn't do that. Because it was just for a specific type of jobs we had to enqueue them using concurrency controls. That is a feature of Solid Queue to ensure that some jobs are not, they don't overlap when they are being enqueued. I'm not a fan of this feature. That is my most hated feature of Solid Queue. It's the trickiest one of them. I'm not super happy with how it works, but it was the best I could come up with. I couldn't come up with something better that didn't hurt the polling like the regular operation. So I'm not a fan. It has a lot of overhead. It needs a lot of writes in the database. For this kind of feature, Redis is great. When you have the TTL, the kind of log that you can implement in Redis to make sure that it only sets this key if it's not set. And to implement logs is so nice. And the expiration as well. You get it for free. For this, the database is not great. But we needed this feature, so I have to do it. Anyway, it's what it is. But the thing is that when we had some jobs that were very, we had a lot of them and we have to move them and we saw the amount of rights we will have to do in our app database. And so one person from the SRE team and I, we got together and made some calculations and we thought that was kind of risky. suddenly to put all that write load in the main database. So that was when we said, okay, we need to move to another database. And that was just much more complicated. That was way harder than having ResQ and Solid Queue together. It was much worse to have Solid Queue in one database and in another database. So I will definitely start with a separate database from the beginning and I will save myself from that.

Robby: 1:04:29

What are you most proud of from this migration?

Rosa: 1:04:31

That we didn't have basically almost any customer impact. I say almost because there was customer impact once, but it wasn't because of Solid Q. It was a mistake I made, not related to Solid Q. It was a mistake I made with Kamal. With Kamal, you needed to run some mb5 thing that I forgot to run, whatever. And so I caused a small outage. It wasn't a note, it was just some actions will get a 500 error, but there wasn't the single one, the single customer facing issue we had throughout all the migration. So I'm proud of that. We had other issues that weren't customer facing, but still, you know, customer facing is the most important thing.

Robby: 1:05:15

Sure. That's wonderful. All right. So before we wrap up, I want to ask you a couple of last questions here, Rosa, and, you know, Do you have a software book that you find yourself recommending to peers?

Rosa: 1:05:28

Yeah, I think so. I got one here that I got it from my shelf. It's one that is a bit old, but it's Refactoring from Martin Fowler. This book.

Robby: 1:05:37

Yes. I have it right

Rosa: 1:05:39

there. Nice. So DHH is a big fan of this book. And when it was released a few years ago, 37signals bought it for all the programmers. in the company. So that was when I got it. And I think it's really nice. And the other one was one that I got recently. It's this one. It's genetics from Gergerly Oros. He's a very famous guy. And it's a pretty nice book about software engineering in general. It's from the Pragmatic Engineer, the software engineer's guidebook. I think it's pretty nice.

Robby: 1:06:13

Nice. I'm trying to remember, is that version of refactoring when it's all in Java? I think

Rosa: 1:06:20

it's using JavaScript.

Robby: 1:06:23

JavaScript,

Rosa: 1:06:24

okay. Oh, yeah, yeah, yeah, JavaScript. I was remembering correctly.

Robby: 1:06:27

Okay, yeah. The original one was all in Java. Yes, Java, yes. I got familiar with that book right when I started using Ruby on Rails, and that was like 2004, 2005. But it was given to me as a gift from someone. And definitely it got me really excited about wanting to improve existing code. And I was like, wow, okay, this is interesting. Do you find yourself referencing that quite a bit?

Rosa: 1:06:49

Sometimes I think about it, but not really when refactoring. That's the funny thing. Sometimes thinking when writing new code, I sometimes think, okay, I could do this in this way, you know, use, for example, like a different method here or whatever. I'm thinking that this is how it would look if I had refactoring this, not writing from scratch, but refactoring it from some fictional other code that I didn't exist, applying this refactoring.

Robby: 1:07:18

You know, out of curiosity, do you often do much like pseudocoding? Like I wish it... I could write, express this concept like this and then figure out how to have... And from a style or a taste perspective, Ruby allows us to do a lot to be expressive with our programming. And I know it sounds like something that seems to be really important to David, but how do you kind of think about that?

Rosa: 1:07:43

I don't find myself writing pseudocode much often. I only do it for... higher level. When I need to do a task that is complex, so I write it down what it needs to happen, but I wouldn't call that pseudocode because it's higher level, I think.

Robby: 1:08:00

I didn't mean to imply that David does pseudocode. I don't have, I actually don't know if he writes pseudocode. But yeah, I don't know. I think I'm more of like, if I'm like, you're working on solid queue and you're like, how am I going to interface with this?

Rosa: 1:08:11

Ah, I see what you mean.

Robby: 1:08:13

You're like, how might I lay this out or a little bit like, what would be a nice way to interact with something and then not worry about behind the scenes yet? But like, this is how I want to interface with this type of entity.

Rosa: 1:08:24

Yeah. I see what you mean. Yeah, I see what you mean. Yeah, that's a good question. I think I may just think in my head how the API will look like and then try to write it a little bit, but directly using Ruby. Like, yeah, I will just use Ruby, but I will try to write that first, you know, like how the interaction will be and trying to imagine how if I was the person using it, not writing it. So, you know, just writing it. I think I will do that. But I also realized that I don't do that a lot of consciously. I just, now I just, yeah, I didn't think consciously about that.

Robby: 1:09:07

Just kind of happens. All right. Well, where's the best place for folks to follow your work or dig into your writing online?

Rosa: 1:09:15

I don't do a lot of writing online. I'm sorry. I think the best for all this kind of stuff, it will be the development blog in 37signals. That is where I've been publishing everything that's to do with Solid Queue, but it's been very little. I know this is one of the things that I was hoping to get better this year, but I don't think it's happening this year. I think it will be a 2026 resolution.

Robby: 1:09:43

Okay, and I know that you've given a number of talks as well. Are those included on your own website?

Rosa: 1:09:51

Not yet, but they are in the wonderful Ruby video site. It's great. It's been renamed to Ruby Events.

Robby: 1:10:00

Oh, yeah, yeah, yeah.

Rosa: 1:10:01

And all the talks are there, and it's super nice because it saved me from having to list those in my own website. I can just send people there. Yeah, it's super. It's really cool. It's really nice.

Robby: 1:10:14

That's great. Thanks for that. I will include links to those as well for all of our listeners. Rosa, thank you so much for joining us on OnRails.

Rosa: 1:10:23

Oh, no, thanks to you. Thanks to you.

Robby: 1:10:26

It was really great to hear how your team approached the move to Solid Queue and all the thought that went into making it work in production. So I really appreciate you sharing all those details with us and diving into the deep end with us. Thank you. Thanks to you. That's it for this episode of On Rails. This podcast is produced by the Rails Foundation with support from its core and contributing members. If you enjoyed the ride, leave a quick review on Apple Podcasts, Spotify, or YouTube. It helps more folks find the show. Again, I'm Robbie Russell. Thanks for riding along. See you next time.

People on this episode

Robby Russell

Host