Skip to content

During RollingUpdate Index is NoUnbounded? #874

@FAUST-BENCHOU

Description

@FAUST-BENCHOU

In FAUST-BENCHOU#28
I make a e2e test the test step is:
1.Creating ModelServing with 5 replicas
2.Change the template image to nginxAlpineImage and perform a rolling update.

so the index 0,1,2,3,4 should still be 0,1,2,3,4 and all Pods should be new images and running.
However, the group-names of the last 5 Running Pods are ...-4, ...-5, ...-6, ...-7, ...-8 respectively.

=== RUN   TestModelServingServingGroupOrdinalNoUnboundedIndexDuringRollingUpdate
    model_serving_test.go:781: Creating ModelServing test-sg-ordinal-roll-no-partition replicas=5 (no partition; rollout via image change)
    utils.go:33: Waiting for ModelServing to be ready...
    model_serving_test.go:787: Initial CurrentRevision: 9df7c648c
    model_serving_test.go:791: Rolling update: image -> nginx:alpine
    model_serving_test.go:826: Rolling converged (no partition): revision=795b89cdd9
    model_serving_test.go:956: Running pod count: 5 (expecting 5)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:841: ServingGroup ordinal dense (retry): ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
          - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
          - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
    model_serving_test.go:844: 
        	Error Trace:	/home/runner/work/kthena/kthena/test/e2e/controller-manager/model_serving_test.go:844
        	Error:      	ServingGroup ordinals not dense 0..N-1 after rollout
        	Test:       	TestModelServingServingGroupOrdinalNoUnboundedIndexDuringRollingUpdate
        	Messages:   	last=ordinal 5 not in [0,4] (group-name=test-sg-ordinal-roll-no-partition-5 pod=test-sg-ordinal-roll-no-partition-5-prefill-0-0)
        	            	
        	            	pod list (5 items) for ms="test-sg-ordinal-roll-no-partition":
        	            	  - name=test-sg-ordinal-roll-no-partition-4-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-4" -> parent="test-sg-ordinal-roll-no-partition" ordinal=4 (parent==msName=true)
        	            	  - name=test-sg-ordinal-roll-no-partition-5-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-5" -> parent="test-sg-ordinal-roll-no-partition" ordinal=5 (parent==msName=true)
        	            	  - name=test-sg-ordinal-roll-no-partition-6-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-6" -> parent="test-sg-ordinal-roll-no-partition" ordinal=6 (parent==msName=true)
        	            	  - name=test-sg-ordinal-roll-no-partition-7-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-7" -> parent="test-sg-ordinal-roll-no-partition" ordinal=7 (parent==msName=true)
        	            	  - name=test-sg-ordinal-roll-no-partition-8-prefill-0-0 phase=Running deleting=false group-name="test-sg-ordinal-roll-no-partition-8" -> parent="test-sg-ordinal-roll-no-partition" ordinal=8 (parent==msName=true)
--- FAIL: TestModelServingServingGroupOrdinalNoUnboundedIndexDuringRollingUpdate (131.28s)

I'm not sure whether its design or not, but i think this kind of design may cause the same problem in #584.
like the index will increase endlessly.
just why we use ordinal+1 to scaleup instread of filling missing index like the partiton logic?i mean now the the scaledown logic is

// scaleDownServingGroups scales down the ServingGroups to the expected count with two-level priority-based selection:
// 1. Primary: Not-ready groups (Creating, NotFound) are deleted first
// 2. Secondary: Among groups with same status, lower deletion cost = delete first

Therefore, there is a probability that the intermediate indexes will be missing.But the scaleUpServingGroups is like only max+1

// Otherwise, it creates new ServingGroups with increasing indices starting from the current max index + 1.

This could also lead to a situation where missing values ​​in the index can never be filled (we only handled this at the partition level).

Is this situation of the index growing indefinitely during rollupdate expected?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions