In FAUST-BENCHOU#28
I make a e2e test the test step is:
1.Creating ModelServing with 5 replicas
2.Change the template image to nginxAlpineImage and perform a rolling update.
so the index 0,1,2,3,4 should still be 0,1,2,3,4 and all Pods should be new images and running.
However, the group-names of the last 5 Running Pods are ...-4, ...-5, ...-6, ...-7, ...-8 respectively.
=== RUN TestModelServingServingGroupOrdinalNoUnboundedIndexDuringRollingUpdate
model_serving_test.go:781: Creating ModelServing test-sg-ordinal-roll-no-partition replicas=5 (no partition; rollout via image change)
utils.go:33: Waiting for ModelServing to be ready...
model_serving_test.go:787: Initial CurrentRevision: 9df7c648c
model_serving_test.go:791: Rolling update: image -> nginx:alpine
model_serving_test.go:826: Rolling converged (no partition): revision=795b89cdd9
model_serving_test.go:956: Running pod count: 5 (expecting 5)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
model_serving_test.go:844:
Error Trace: /home/runner/work/kthena/kthena/test/e2e/controller-manager/model_serving_test.go:844
Error: ServingGroup ordinals not dense 0..N-1 after rollout
Test: TestModelServingServingGroupOrdinalNoUnboundedIndexDuringRollingUpdate
Messages: last=ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
- name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
- name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
--- FAIL: TestModelServingServingGroupOrdinalNoUnboundedIndexDuringRollingUpdate (131.28s)
I'm not sure whether its design or not, but i think this kind of design may cause the same problem in #584.
like the index will increase endlessly.
just why we use ordinal+1 to scaleup instread of filling missing index like the partiton logic?i mean now the the scaledown logic is
// scaleDownServingGroups scales down the ServingGroups to the expected count with two-level priority-based selection:
// 1. Primary: Not-ready groups (Creating, NotFound) are deleted first
// 2. Secondary: Among groups with same status, lower deletion cost = delete first
Therefore, there is a probability that the intermediate indexes will be missing.But the scaleUpServingGroups is like only max+1
// Otherwise, it creates new ServingGroups with increasing indices starting from the current max index + 1.
This could also lead to a situation where missing values in the index can never be filled (we only handled this at the partition level).
Is this situation of the index growing indefinitely during rollupdate expected?
In FAUST-BENCHOU#28
I make a e2e test the test step is:
1.Creating ModelServing with 5 replicas
2.Change the template image to nginxAlpineImage and perform a rolling update.
so the index 0,1,2,3,4 should still be 0,1,2,3,4 and all Pods should be new images and running.
However, the group-names of the last 5 Running Pods are ...-4, ...-5, ...-6, ...-7, ...-8 respectively.
I'm not sure whether its design or not, but i think this kind of design may cause the same problem in #584.
like the index will increase endlessly.
just why we use ordinal+1 to scaleup instread of filling missing index like the partiton logic?i mean now the the scaledown logic is
Therefore, there is a probability that the intermediate indexes will be missing.But the scaleUpServingGroups is like only max+1
This could also lead to a situation where missing values in the index can never be filled (we only handled this at the partition level).
Is this situation of the index growing indefinitely during rollupdate expected?