Skip to content

TRUNK-6409: Initializer module should not re-run initialization on each node#307

Open
IamMujuziMoses wants to merge 3 commits intomekomsolutions:mainfrom
IamMujuziMoses:TRUNK-6409
Open

TRUNK-6409: Initializer module should not re-run initialization on each node#307
IamMujuziMoses wants to merge 3 commits intomekomsolutions:mainfrom
IamMujuziMoses:TRUNK-6409

Conversation

@IamMujuziMoses
Copy link
Copy Markdown

Issue I worked on

see https://issues.openmrs.org/browse/TRUNK-6409

Checklist: I completed these to help reviewers :)

  • My IDE is configured to follow the code style of this project.

    No? Unsure? -> configure your IDE, format the code and add the changes with git add . && git commit --amend

  • I have added tests to cover my changes. (If you refactored
    existing code that was well tested you do not have to add tests)

    No? -> write tests and add them to this commit git add . && git commit --amend

  • I ran mvn clean package right before creating this pull request and
    added all formatting changes to my commit.

    No? -> execute above command

  • All new and existing tests passed.

    No? -> figure out why and add the fix to your commit. It is your responsibility to make sure your code works.

  • My pull request is based on the latest changes of the master branch.

    No? Unsure? -> execute command git pull --rebase upstream master

@IamMujuziMoses
Copy link
Copy Markdown
Author

@rkorytkowski @dkayiwa @ibacher could you please review this PR, your feedback will be appreciated.

InitializerMessageSource messageSource = getInitializerMessageSource();
Context.getMessageSourceService().setActiveMessageSource(messageSource);

boolean isPrimary = Boolean.parseBoolean(getPropertyValue(PROPS_PRIMARY_STARTUP, "false"));
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkorytkowski do you mean something like this where the system property primary.startup is added to thee runtime.properties

Copy link
Copy Markdown

@rkorytkowski rkorytkowski Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IamMujuziMoses TRUNK-6409 is about persisting checksums in a dedicated table in DB instead of in filesystem and then adding a check to run import only on one replica at a time. It can be achieved similar to how liquibase does it i.e. have a mutex like flag in db that is checked before running an import and having a node that starts the import to hold it until it's done before any other node can do it. This way we don't have to set any node as primary. The drawback is that each node will be doing a check if checksums changed and comparing to DB persisted values, but that should be quick and can be further optimised if needed. I think it's simplifies deployments. You don't have to keep 2 different deployments and remember to deploy in sequence. I'll update the issue description.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkorytkowski Thanks for the review, this makes sense, I'll rework the solution to use a DB-based lock.

InitializerMessageSource messageSource = getInitializerMessageSource();
Context.getMessageSourceService().setActiveMessageSource(messageSource);

boolean isPrimary = Boolean.parseBoolean(getPropertyValue(PROPS_PRIMARY_STARTUP, "false"));
Copy link
Copy Markdown

@rkorytkowski rkorytkowski Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IamMujuziMoses TRUNK-6409 is about persisting checksums in a dedicated table in DB instead of in filesystem and then adding a check to run import only on one replica at a time. It can be achieved similar to how liquibase does it i.e. have a mutex like flag in db that is checked before running an import and having a node that starts the import to hold it until it's done before any other node can do it. This way we don't have to set any node as primary. The drawback is that each node will be doing a check if checksums changed and comparing to DB persisted values, but that should be quick and can be further optimised if needed. I think it's simplifies deployments. You don't have to keep 2 different deployments and remember to deploy in sequence. I'll update the issue description.

Copy link
Copy Markdown

@rkorytkowski rkorytkowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please note that the module already persist checksums for files in the filesystem. We need to put those checksums in DB so they are available for all nodes and not lost with filesystem failure.

@ibacher
Copy link
Copy Markdown
Contributor

ibacher commented Feb 12, 2026

An important additional consideration here is that it's possible to disable Iniz's run on start-up and run it manually. People doing something like this need to have an easy API to correctly obtain whatever mutex we're using.

@IamMujuziMoses
Copy link
Copy Markdown
Author

@rkorytkowski @ibacher could you please review this, did you mean something like this?

Copy link
Copy Markdown

@rkorytkowski rkorytkowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Going in the good direction. Still a few tweaks needed.

@Override
public String getOrCreateNodeId() {
AdministrationService admin = Context.getAdministrationService();
String nodeId = admin.getGlobalProperty(GP_NODE_ID);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something or each node uses the same GP? Is it just for debugging purposes? I would just get rid of node id entirely.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use e.g. hostname for locked_by.

private Date lockedAt;

@Column(name = "locked_by")
private String lockedBy;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have lock_until set to e.g. 15 minutes so that it is automatically freed if not released due to failure.

private Integer id;

@Column(name = "locked", nullable = false)
private Boolean locked;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having the locked field, I would just have a String primary key with a name of the lock e.g. 'initializer'. I would insert when taking the lock and delete the row when releasing.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having locks with different names would mean that if someone runs initializer in their own module, it can use a different lock name and thus run initialisation regardless of the default initialization done by the core module.

// Check if config changed
if (!getInitializerService().isConfigChanged()) {
log.info("No config changes... skipping initializer");
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be just skipping. It should be waiting until lock is released.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I get it now, thanks for the review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it should wait and then skip. The wait is so that we don't start up another instance before initialization is done.

+ "SET locked = true, lockedAt = current_timestamp, lockedBy = :node "
+ "WHERE id = 1 AND (locked = false OR lockedAt < :expiryTime)";

Date expiryTime = new Date(now - LOCK_TIMEOUT);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have expiryTime. I would still put it in the table so that if we need we can have different locks with different expiry times.

String checksum = DigestUtils.sha256Hex(fis);
map.put(rel.toString(), checksum);
} catch (Exception e) {
map.put(f.getName(), "ERROR");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the initalizer module computing the checksums elsewhere and persisting them in the filesystem? We need to remove that old behavior.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it's this, should I remove it completely?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to remove it, but also if checkums are there, we should put them in DB so that we don't re-run everything when upgrading the module.

@IamMujuziMoses
Copy link
Copy Markdown
Author

@rkorytkowski could you please review these updates

@rkorytkowski
Copy link
Copy Markdown

@IamMujuziMoses thanks, I'll go through your code and do testing on Monday.


// Run Initializer
log.info("OpenMRS config loading process started...");
getInitializerService().loadUnsafe(true, throwError);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that with your changes if only a single checksum is different, then all files are being loaded again instead of just the one that changed... It's not correct. Only that single file should run. I'll fix that issue.


try (InputStream is = Files.newInputStream(path)) {
Path rel = base.relativize(path);
String checksum = DigestUtils.sha256Hex(is);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You changed the checksum from md5Hex to sha256Hex... which is probably fine, but you don't seem to handle migration from file stored checksums to db checksums. I'll address that as well.

getInitializerService().loadUnsafe(true, throwError);

// Save new checksums
getInitializerService().updateChecksums();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not updateChecksums all at once... It's possible that some files get imported and checksums should be updated and some fail... I'll address that as well.

@rkorytkowski
Copy link
Copy Markdown

Thanks @IamMujuziMoses! I've left a few review comments. Let me address them as I need to get it done today.

@rkorytkowski
Copy link
Copy Markdown

rkorytkowski commented Mar 10, 2026

PR with fixes: #319

This one (#307) can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants