-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Fix handling shovels with old supervisor id format #10004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@gomoripeti any reason why this is currently a draft PR? |
This is a draft because I haven't tested with khepri. Eg what happens after upgrade 3.12 -> 3.13 -> enable khepri (Does |
I upgraded from 3.12.x with a shovel to 3.13.0-rc2. Then applied this patch (and cleaned up duplicate shovel). At this point there is a shovel worker with an old child id format. When trying to enable khepri_db feature flag, I get the below exception because
Additionally I did not test federation but I think it has the same set of issues with old style child ids starting from 3.12.8. Wouldn't it be best to have a feature flag that converts from old style id to new style id? (both for federation and shovels) That would prevent the coexistence of both formats. |
@gomoripeti that would means that you must enable that flag before you enable Khepri. Extending |
I will look into it since this is one of our two biggest outstanding issues before the next 3.13 RC. |
Alternatively we can make the plugin migrate this data. It might be easier. I will take another look. |
@dcorbacho suggests that modifying |
My problem with solving this is that the mnesia->khepri conversion happens on the mirrored_supervisor level which does not know about the specific id format of each implementation (that is the shovel or federation). I got a slight hope when I saw the module name (eg |
I pushed another commit to fix the case when Khepri is enabled. With the assumption that Mnesia->Khepri migration will make sure that no child id remains in the old format, I think this PR is ready. |
Before 3.13.0 and 3.12.8 supervisor id format was different. Don't start a shovel with the new id format if one exists with the old id format.
Because the clause `({{_, N}, _, _, _}) -> N =:= Name;` matches both new and old id format but always returns false for old id format, `child_exists` also returned false for old ids. Also rename SupId to ChildId to be pedantically correct.
1fdfad9
to
397b3eb
Compare
It is assumed that old child id format cannot exist in this case.
397b3eb
to
d284eaf
Compare
@gomoripeti thank you. This is a fair set of changes to merge, and we can look into Shovel process child ID conversion at a later point. |
@gomoripeti Could you describe the steps to reproduce the migration failure with the current |
hi @dcorbacho it needs to be a multi-node cluster on a version with old id format (eg 3.12.6). Create a shovel on the old version (will store old id format to Mnesia) Upgrade to 3.13/main in a rolling fashion -> the old id preserved in Mnesia. ("mirrored supervisor record is updated by the shovel plugin when it starts." I think this does not happen with multi-node rolling restarts) Then trying migration to khepri will hit the unexpected old id. |
@gomoripeti thanks! That makes sense, I've tried a single node upgrade. I'll test this now |
Proposed Changes
This is a forward port of #9965 and #9968 from v3.12.x to main (future 3.13.x)
(Sorry for the unusual method of not first submitting to main. I felt it was more urgent to fix the already released 3.12.x series and I thought that it will be easier to add khepri support than to remove it - this might not be the case)
Not fully tested yet, hence submitting a draft PR first
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
documentFurther Comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc.