-
Notifications
You must be signed in to change notification settings - Fork 10
Description
@ryanday36 and I were discussing offline this morning that the singleton dependency may or may not have allowed a user to inadvertently bypass certain per-queue resource and running job limits and take up all the resources on the system. The scenario looked something like the following:
- the queue had a per-association max nodes limit of X
- the user submitted a number of jobs (with a total node count > X), each with the
singletondependency, which means that the job is satisfied only when there are no other active jobs of the same userid and job name which are not already held with a singleton dependency.
If the singleton dependency jobs were submitted before any of them proceeded to RUN state (and thus, incremented running job and resource counts for the user in the priority plugin), then I think those jobs could potentially not have any flux-accounting dependencies applied to them, since none of the jobs had been running yet.
I wonder if perhaps incrementing the resource counts for a job as it enters SCHED state instead of RUN state would prevent this from happening in the future.