TRUNK-6409: Initializer module should not re-run initialization on each node#307
TRUNK-6409: Initializer module should not re-run initialization on each node#307IamMujuziMoses wants to merge 3 commits intomekomsolutions:mainfrom
Conversation
|
@rkorytkowski @dkayiwa @ibacher could you please review this PR, your feedback will be appreciated. |
| InitializerMessageSource messageSource = getInitializerMessageSource(); | ||
| Context.getMessageSourceService().setActiveMessageSource(messageSource); | ||
|
|
||
| boolean isPrimary = Boolean.parseBoolean(getPropertyValue(PROPS_PRIMARY_STARTUP, "false")); |
There was a problem hiding this comment.
@rkorytkowski do you mean something like this where the system property primary.startup is added to thee runtime.properties
There was a problem hiding this comment.
@IamMujuziMoses TRUNK-6409 is about persisting checksums in a dedicated table in DB instead of in filesystem and then adding a check to run import only on one replica at a time. It can be achieved similar to how liquibase does it i.e. have a mutex like flag in db that is checked before running an import and having a node that starts the import to hold it until it's done before any other node can do it. This way we don't have to set any node as primary. The drawback is that each node will be doing a check if checksums changed and comparing to DB persisted values, but that should be quick and can be further optimised if needed. I think it's simplifies deployments. You don't have to keep 2 different deployments and remember to deploy in sequence. I'll update the issue description.
There was a problem hiding this comment.
@rkorytkowski Thanks for the review, this makes sense, I'll rework the solution to use a DB-based lock.
| InitializerMessageSource messageSource = getInitializerMessageSource(); | ||
| Context.getMessageSourceService().setActiveMessageSource(messageSource); | ||
|
|
||
| boolean isPrimary = Boolean.parseBoolean(getPropertyValue(PROPS_PRIMARY_STARTUP, "false")); |
There was a problem hiding this comment.
@IamMujuziMoses TRUNK-6409 is about persisting checksums in a dedicated table in DB instead of in filesystem and then adding a check to run import only on one replica at a time. It can be achieved similar to how liquibase does it i.e. have a mutex like flag in db that is checked before running an import and having a node that starts the import to hold it until it's done before any other node can do it. This way we don't have to set any node as primary. The drawback is that each node will be doing a check if checksums changed and comparing to DB persisted values, but that should be quick and can be further optimised if needed. I think it's simplifies deployments. You don't have to keep 2 different deployments and remember to deploy in sequence. I'll update the issue description.
|
An important additional consideration here is that it's possible to disable Iniz's run on start-up and run it manually. People doing something like this need to have an easy API to correctly obtain whatever mutex we're using. |
e113bf6 to
eaa59ff
Compare
|
@rkorytkowski @ibacher could you please review this, did you mean something like this? |
rkorytkowski
left a comment
There was a problem hiding this comment.
Thanks! Going in the good direction. Still a few tweaks needed.
| @Override | ||
| public String getOrCreateNodeId() { | ||
| AdministrationService admin = Context.getAdministrationService(); | ||
| String nodeId = admin.getGlobalProperty(GP_NODE_ID); |
There was a problem hiding this comment.
Am I missing something or each node uses the same GP? Is it just for debugging purposes? I would just get rid of node id entirely.
There was a problem hiding this comment.
We could use e.g. hostname for locked_by.
| private Date lockedAt; | ||
|
|
||
| @Column(name = "locked_by") | ||
| private String lockedBy; |
There was a problem hiding this comment.
It would be good to have lock_until set to e.g. 15 minutes so that it is automatically freed if not released due to failure.
| private Integer id; | ||
|
|
||
| @Column(name = "locked", nullable = false) | ||
| private Boolean locked; |
There was a problem hiding this comment.
Rather than having the locked field, I would just have a String primary key with a name of the lock e.g. 'initializer'. I would insert when taking the lock and delete the row when releasing.
There was a problem hiding this comment.
Having locks with different names would mean that if someone runs initializer in their own module, it can use a different lock name and thus run initialisation regardless of the default initialization done by the core module.
| // Check if config changed | ||
| if (!getInitializerService().isConfigChanged()) { | ||
| log.info("No config changes... skipping initializer"); | ||
| return; |
There was a problem hiding this comment.
It shouldn't be just skipping. It should be waiting until lock is released.
There was a problem hiding this comment.
okay, I get it now, thanks for the review
There was a problem hiding this comment.
I mean it should wait and then skip. The wait is so that we don't start up another instance before initialization is done.
| + "SET locked = true, lockedAt = current_timestamp, lockedBy = :node " | ||
| + "WHERE id = 1 AND (locked = false OR lockedAt < :expiryTime)"; | ||
|
|
||
| Date expiryTime = new Date(now - LOCK_TIMEOUT); |
There was a problem hiding this comment.
I see you have expiryTime. I would still put it in the table so that if we need we can have different locks with different expiry times.
| String checksum = DigestUtils.sha256Hex(fis); | ||
| map.put(rel.toString(), checksum); | ||
| } catch (Exception e) { | ||
| map.put(f.getName(), "ERROR"); |
There was a problem hiding this comment.
Isn't the initalizer module computing the checksums elsewhere and persisting them in the filesystem? We need to remove that old behavior.
There was a problem hiding this comment.
Yes I think it's this, should I remove it completely?
There was a problem hiding this comment.
Yes, we need to remove it, but also if checkums are there, we should put them in DB so that we don't re-run everything when upgrading the module.
eb55043 to
2f03a29
Compare
2f03a29 to
ff6848d
Compare
|
@rkorytkowski could you please review these updates |
|
@IamMujuziMoses thanks, I'll go through your code and do testing on Monday. |
|
|
||
| // Run Initializer | ||
| log.info("OpenMRS config loading process started..."); | ||
| getInitializerService().loadUnsafe(true, throwError); |
There was a problem hiding this comment.
My understanding is that with your changes if only a single checksum is different, then all files are being loaded again instead of just the one that changed... It's not correct. Only that single file should run. I'll fix that issue.
|
|
||
| try (InputStream is = Files.newInputStream(path)) { | ||
| Path rel = base.relativize(path); | ||
| String checksum = DigestUtils.sha256Hex(is); |
There was a problem hiding this comment.
You changed the checksum from md5Hex to sha256Hex... which is probably fine, but you don't seem to handle migration from file stored checksums to db checksums. I'll address that as well.
| getInitializerService().loadUnsafe(true, throwError); | ||
|
|
||
| // Save new checksums | ||
| getInitializerService().updateChecksums(); |
There was a problem hiding this comment.
We should not updateChecksums all at once... It's possible that some files get imported and checksums should be updated and some fail... I'll address that as well.
|
Thanks @IamMujuziMoses! I've left a few review comments. Let me address them as I need to get it done today. |
Issue I worked on
see https://issues.openmrs.org/browse/TRUNK-6409
Checklist: I completed these to help reviewers :)
My IDE is configured to follow the code style of this project.
No? Unsure? -> configure your IDE, format the code and add the changes with
git add . && git commit --amendI have added tests to cover my changes. (If you refactored
existing code that was well tested you do not have to add tests)
No? -> write tests and add them to this commit
git add . && git commit --amendI ran
mvn clean packageright before creating this pull request andadded all formatting changes to my commit.
No? -> execute above command
All new and existing tests passed.
No? -> figure out why and add the fix to your commit. It is your responsibility to make sure your code works.
My pull request is based on the latest changes of the master branch.
No? Unsure? -> execute command
git pull --rebase upstream master