Skip to content

When a parent actor has a fault, child actors should also fault and not keep running #2198

@dulinriley

Description

@dulinriley

As part of our principles of supervision, a child actor should never be able to be orphaned, because then we have no one to deliver any supervision events to.
If the owner of a mesh hits a fault, it won't have a chance to run cleanup and shutdown child actors. This will orphan those actors.
The children should have some way of detecting when their parent is down, and then stopping itself (and all its children recursively).

Something that is key to implementing this right is that it should not rely on the owner or parent to do anything at crash time. These
can be all sorts of crashes, including if the power is yanked on a machine. So we need the children to be able to figure out the parent is
dead on their own.

Some ways to implement this:

  • Some periodic message from child actors to their owner to check if they are alive if they get a timely reply back. If the message can't be delivered, it'll go through the undeliverable message handler, which we'll need to make sure also fails the actor
  • Piggyback off some other periodic message. For example, an owner should be sending out periodic GetState messages to children to see if they are alive, the children can assume the parent is alive if it gets those messages. Could have a problem if we want to remove that particular periodic message, or if there are other things that can stall sending out messages that aren't actually making the actor dead / unreachable

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions