[v3] Migrations backwards compatibility #1189

ewjoachim · 2024-09-08T09:56:45Z

According to our own doc, migrations named with version X are the migrations you can apply once you run on version X.
According to our doc, people on v2.13 wishing to upgrade without disruption would:

Change the code to v3.0
Apply the v3.0 migrations
Change the code to v3.1
Apply the v3.1 migrations
All the while their instance is running and processing jobs.

Compatibility matrix:

Code ➡️ ⬇️ Schema	v2.13	v3.0	v3.1
v2.13	✅	✅	❌
v3.0	❌	✅	✅
v3.1	❌	❌	✅

❌ means we don't have to make it compatible, but if it happens to be, it's not a problem, of course

We need to decide what we put in v3.0 and v3.1

Note: the change of status = aborting -> abort_requested might need a couple of trigger just for the time of the dual compatibility (adding the `abort_requested column and update its value based on the status, then add a trigger that synchronizes the 2 columns, and in the next release, delete the triggers and the column)

The text was updated successfully, but these errors were encountered:

medihack · 2024-09-25T21:22:34Z

I must confess that I get totally confused by naming the migrations now. In the contributing docs we say "xx.yy.zz is the number of the latest released version of Procrastinate" and in the migrations docs "you want to update to Procrastinate 1.15.0 ... migration scripts whose versions are greater than or equal to 1.9.0, and lower than 1.15.0". But then in the above table and #1186 you name the migration 03.00.00_01_cancel_notification.sql which won't be lower when we include this migration in a 3.0 release.
Unfortunately, I also don't fully understand the "Note". Have I missed something in my migration?

ewjoachim · 2024-09-26T09:38:08Z

It is confusing :(
When we wrote procrastinate, we were in an environment where we couldn't stop the service while applying migrations, so we had to ensure we created compatible migrations, that there would always be a path where you upgrade code, then db, then code, then db etc, without service interruption. As of today this is still an official commitment procrastinate does. We can revisit this but I believe it's worth trying to keep it.

For v3, you're right that the move I make is probably wrong but the migration was named for 2.9.2 which is also wrong. The reality is that:

Either when we're ready to make v3, we make one last v2 that adds stuff, and then in the v3 migrations we remove stuff
Or we'll do a v3.0 that does almost no change in the code (so almost identical to v2.x) but introduces DB changes and a v3.1 that will introduce everything we've added to the v3 branch. I think that's the preferred solution (even if it means the v3.0 will actually not be the v3, but if we release both at the same time, I guess no one will use the v3.0, everyone will jump straight to the v3.1)

We should definitely run the tests with the DB on the previous schema to enforce this, otherwise it's probably prone to wishful thinking.
I think it would also be interesting to start versioning our procedures, so anytime we change them, we can control easily what version of procrastinate uses what version of the procedures while still having multiple ones in the code. That might be much less 🤯 for us. Though, we can't do the same with all objects (tables, enums, triggers...), of course.

I sometimes wonder if it's worth using procedures at all, compared to doing everything in the queries, but that's a whole other debate.

medihack · 2024-09-27T09:36:45Z

Sorry, I am still a bit confused 😞. So with #1186 we will do it the other way around than

there would always be a path where you upgrade code, then db, then code, then db

because

we'll do a v3.0 that does almost no change in the code (so almost identical to v2.x) but introduces DB changes

If the migration will be part of v3.0, wouldn't it be more congruent to check the latest v2.x version (straight before the release) and name it that way?

But maybe I take care of those other open issues for v3 and just watch and learn what you do with this issue here 😉.

We should definitely run the tests with the DB on the previous schema to enforce this, otherwise it's probably prone to wishful thinking. I think it would also be interesting to start versioning our procedures, so anytime we change them, we can control easily what version of procrastinate uses what version of the procedures while still having multiple ones in the code. That might be much less 🤯 for us. Though, we can't do the same with all objects (tables, enums, triggers...), of course.

I sometimes wonder if it's worth using procedures at all, compared to doing everything in the queries, but that's a whole other debate.

And all those triggers that use them?

ewjoachim · 2024-09-27T14:29:10Z

and just watch and learn what you do with this issue here 😉.

Haha you seem to think I have it all thought through and I have a plan. I'm learning as I go, and what I've learnt so far is that if you say there's something strange, I better get looking, because you've shown an excellent gut feeling for those issues so far :)

And all those triggers that use them?

I believe the set of procedures used by triggers is quite distinct from the set of "business-logic" procedures, the former aren't very complex (save procrastinate_trigger_status_events_procedure_update) nor often updated. The ones that may benefit from not being procedures could be procrastinate_defer_job, procrastinate_fetch_job, procrastinate_finish_job and the likes.

medihack · 2024-10-03T18:45:16Z

It is confusing :( When we wrote procrastinate, we were in an environment where we couldn't stop the service while applying migrations, so we had to ensure we created compatible migrations, that there would always be a path where you upgrade code, then db, then code, then db etc, without service interruption. As of today this is still an official commitment procrastinate does. We can revisit this but I believe it's worth trying to keep it.

I'm curious how this is supposed to work with all the changes in the database schema. Can't we justify the service interruption with a major release and the breaking changes (I really regret introducing that aborting state 🙄)?
I also wonder why the migrations don't have the same version as their associated release. That way, deciding what to migrate to would be pretty straightforward (I am happy that we use the Django migrations in our apps), but I'm probably just missing something.

ewjoachim · 2024-10-04T16:59:12Z

I'm curious how this is supposed to work with all the changes in the database schema.

Usually, it's about making the schema support 2 versions:

When you create a field, you create it one version before you start using it, and if you need it filled with an initial value, you use a migration and a trigger so that up until your code actually starts using it, it has the correct value. Alternatively, if the new field has a simple default value, you don't need a trigger, just create the field before the code starts using it. If the field is required, create it without a constraint and after the code has started using it, deal with potential wrong values created in the meantime and add the constraint.
When you delete a field, you delete it one version after you stop using it. Usually you don't need to add triggers because you don't care what the value will be at the time of deletion.
Many column changes can be expressed with a field addition and a field deletion, potentially with triggers synchronizing the value of multiple fields for the time in between.

This made sense for procrastinate at the time we did it partly because we didn't imagine there would be a lot of migrations, partly because it was the way we did things. I can see that it's really annoying to everyone now.

Can't we justify the service interruption with a major release and the breaking changes

The process was supposed to help users do blue-green deployments even on major releases. If it happens that we can't it means we designed the process badly 😅

(I really regret introducing that aborting state 🙄)

Don't be. Process shouldn't get in the way of improving the lib. This highlights that we need to change the process. But all this discussion is also about how we can change it without it being just annoying for users who can't afford breaking compatibility easily.

There will be (hopefully very few, realistically at least some) places where, upon reading that deployment needs to shut off the service, people will schedule it off-hours (evening or weekend). I know because I've been in those kind of situations. As much as I'd appreciate that this kind of practice doesn't exist, as a lib maintainer, I want to make sure that I don't have other people sacrifice their time because I didn't make the effort to make the lib blue-green deployable. It doesn't mean we have to do the impossible, but at least try and see if it's actually easy or not.

I also wonder why the migrations don't have the same version as their associated release.

In the beginning, they had. and then we got it wrong multiple times because when we did the PR, we didn't know if the next release would be a minor or an incremental.
We changed it to "name the migration like the current release" so at least we'd know what it is, but there's still the risk that we would release while there are open PRs and forget to rename the migrations.
I think the migrations need their own version number independent from the release.

ewjoachim · 2024-10-04T17:00:19Z

I'm going to make a PR to try and fix this mess. It's long overdue.

ewjoachim added this to the Version 3.0 milestone Sep 8, 2024

This was referenced Oct 4, 2024

Increasing worker concurrency increases number of fetch_job queries to procrstatinate database tables #1209

Open

v3: fix the migration mess #1211

Closed

medihack mentioned this issue Oct 19, 2024

New migration system in v3 #1227

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v3] Migrations backwards compatibility #1189

[v3] Migrations backwards compatibility #1189

ewjoachim commented Sep 8, 2024 •

edited

Loading

medihack commented Sep 25, 2024

ewjoachim commented Sep 26, 2024

medihack commented Sep 27, 2024

ewjoachim commented Sep 27, 2024

medihack commented Oct 3, 2024

ewjoachim commented Oct 4, 2024 •

edited

Loading

ewjoachim commented Oct 4, 2024

[v3] Migrations backwards compatibility #1189

[v3] Migrations backwards compatibility #1189

Comments

ewjoachim commented Sep 8, 2024 • edited Loading

medihack commented Sep 25, 2024

ewjoachim commented Sep 26, 2024

medihack commented Sep 27, 2024

ewjoachim commented Sep 27, 2024

medihack commented Oct 3, 2024

ewjoachim commented Oct 4, 2024 • edited Loading

ewjoachim commented Oct 4, 2024

ewjoachim commented Sep 8, 2024 •

edited

Loading

ewjoachim commented Oct 4, 2024 •

edited

Loading