Skip to content

Conversation

@Joshzxj
Copy link

@Joshzxj Joshzxj commented Jan 16, 2026

Description

Use cloneDevice() to create a deep copy of the device list before modification, preventing race between AssignController writes and PropertyCollector's deepCopy reads.

I was hitting a race between
A data race occurs between two concurrent operations:

  1. Write: AssignController() in object/virtual_device_list.go:547 appends to controller.Device slice during ReconfigVMTask has been called.
  2. Read: deepCopy() in simulator/object.go:51 marshals VM properties (including the same controller) during PropertyCollector.RetrievePropertiesEx()
// deepCopy uses xml encode/decode to copy src to dst
func deepCopy(src, dst any) {
	b, err := xml.Marshal(src)
	if err != nil {
		panic(err)
	}

	dec := xml.NewDecoder(bytes.NewReader(b))
	dec.TypeFunc = types.TypeFunc()
	err = dec.Decode(dst)
	if err != nil {
		panic(err)
	}
}
Write at 0x00c00032e2e8 by goroutine 226:
  github.com/vmware/govmomi/object.VirtualDeviceList.AssignController()
      /Users/zexingj/project/govmomi/object/virtual_device_list.go:547 +0x2b0
  github.com/vmware/govmomi/simulator.(*VirtualMachine).configureDevice()
      /Users/zexingj/project/govmomi/simulator/virtual_machine.go:1495 +0x24d4



Previous read at 0x00c00032e2e8 by goroutine 42:
  reflect.Value.Index()
      /Users/zexingj/sdk/go1.24.11/src/reflect/value.go:1422 +0x24c
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalValue()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:477 +0x196f
...
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalInterface()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:705 +0x23b
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalValue()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:459 +0xfac
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalStruct()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:992 +0x13c5
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalValue()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:564 +0x1791
  github.com/vmware/govmomi/vim25/xml.(*Encoder).Encode()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:172 +0x64
  github.com/vmware/govmomi/simulator.(*Service).ServeSDK()
      /Users/zexingj/project/govmomi/simulator/simulator.go:600 +0x15a7
  github.com/vmware/govmomi/simulator.(*Service).ServeSDK-fm()
      <autogenerated>:1 +0x51
  net/http.HandlerFunc.ServeHTTP()

The fix is to use deep copy of devicelist to do config device.

With shared slice (without fix):
configureDevices() starts
├─> devices points to SAME array as vm.Config.Hardware.Device
├─> AssignController modifies controller.Device
│ └─> VM sees PARTIAL changes immediately (inconsistent state)
└─> ctx.Update() - already partially modified

With deep copy (with fix):
configureDevices() starts
├─> devices is a SEPARATE copy
├─> AssignController modifies the COPY
│ └─> VM still has OLD state (consistent)
└─> ctx.Update() replaces VM's devices with modified copy
└─> VM now has NEW state (consistent)

How Has This Been Tested?

It is hard to repo inside vcsim package with the whole AssignController call .

I wrote similiar code to mimic what AssignController did(Especially focus on directly inplace replacement device slice by append) , it 100% reproduce the same backtrace.

func TestRaceReconfigVMWithPropertyCollector(t *testing.T) {
	m := VPX()
	defer m.Remove()

	err := m.Create()
	if err != nil {
		t.Fatal(err)
	}

	// Get the VM directly from the simulator registry
	vmObj := m.Map().Any("VirtualMachine").(*VirtualMachine)

	var wg sync.WaitGroup
	start := make(chan struct{})
	done := make(chan struct{})

	// Reader goroutines - continuously call deepCopy on VM config
	// This simulates PropertyCollector reading VM properties via
	// PropertyCollector.RetrievePropertiesEx() -> collect() -> deepCopy()
	for i := 0; i < 10; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			<-start

			for {
				select {
				case <-done:
					return
				default:
					var dst mo.VirtualMachine
					deepCopy(&vmObj.VirtualMachine, &dst)
				}
			}
		}()
	}

	// Writer goroutine - simulates what configureDevices does
	wg.Add(1)
	go func() {
		defer wg.Done()
		<-start

		// This mirrors configureDevices() line 2089-2090:
		//   devices := object.VirtualDeviceList(vm.cloneDevice()) <============ this passed
		// The cloneDevice() call is critical - without it, modifications to
		// controller.Device would race with deepCopy reading the same data.
		devices := vmObj.Config.Hardware.Device // WITHOUT FIX

		// Simulate what AssignController does (object/virtual_device_list.go:547):
		//   c.Device = append(c.Device, device.GetVirtualDevice().Key)
		// This appends to controller.Device, which would race with deepCopy
		// if devices wasn't a deep copy.
		for _, d := range devices {
			if controller, ok := d.(*types.VirtualPCIController); ok {
				// Run many iterations to increase chance of race detection
				for j := 0; j < 1000; j++ {
					controller.Device = append(controller.Device, int32(4000+j))
				}
				break
			}
		}
		close(done)
	}()

	close(start)
	wg.Wait()
}
➜  govmomi git:(fix/configureDevices-race-condition) ✗ go test -race -v -run TestRaceReconfigVMWithPropertyCollector ./simulator 2>&1
=== RUN   TestRaceReconfigVMWithPropertyCollector
==================
WARNING: DATA RACE
Read at 0x00c0003e62e8 by goroutine 46:
  reflect.Value.Index()
      /Users/zexingj/sdk/go1.24.11/src/reflect/value.go:1422 +0x24c
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalValue()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:477 +0x196f
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalStruct()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:992 +0x13c5
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalValue()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:564 +0x1791
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalValue()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:477 +0x1995
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalStruct()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:992 +0x13c5
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalValue()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:564 +0x1791
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalStruct()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:992 +0x13c5
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalValue()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:564 +0x1791
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalStruct()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:992 +0x13c5
  github.com/vmware/govmomi/vim25/xml.(*printer).marshalValue()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:564 +0x1791
  github.com/vmware/govmomi/vim25/xml.(*Encoder).Encode()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:172 +0x64
  github.com/vmware/govmomi/vim25/xml.Marshal()
      /Users/zexingj/project/govmomi/vim25/xml/marshal.go:85 +0x1f6
  github.com/vmware/govmomi/simulator.deepCopy()
      /Users/zexingj/project/govmomi/simulator/object.go:51 +0x44
  github.com/vmware/govmomi/simulator.TestRaceReconfigVMWithPropertyCollector.func1()
      /Users/zexingj/project/govmomi/simulator/race_test.go:434 +0xf0

Previous write at 0x00c0003e62e8 by goroutine 50:
  github.com/vmware/govmomi/simulator.TestRaceReconfigVMWithPropertyCollector.func2()
      /Users/zexingj/project/govmomi/simulator/race_test.go:460 +0x26d

Goroutine 46 (running) created at:
  github.com/vmware/govmomi/simulator.TestRaceReconfigVMWithPropertyCollector()
      /Users/zexingj/project/govmomi/simulator/race_test.go:424 +0x4bd
  testing.tRunner()
      /Users/zexingj/sdk/go1.24.11/src/testing/testing.go:1792 +0x225
  testing.(*T).Run.gowrap1()
      /Users/zexingj/sdk/go1.24.11/src/testing/testing.go:1851 +0x44

Goroutine 50 (running) created at:
  github.com/vmware/govmomi/simulator.TestRaceReconfigVMWithPropertyCollector()
      /Users/zexingj/project/govmomi/simulator/race_test.go:442 +0x796
  testing.tRunner()
      /Users/zexingj/sdk/go1.24.11/src/testing/testing.go:1792 +0x225
  testing.(*T).Run.gowrap1()
      /Users/zexingj/sdk/go1.24.11/src/testing/testing.go:1851 +0x44
==================
    testing.go:1490: race detected during execution of test
--- FAIL: TestRaceReconfigVMWithPropertyCollector (0.73s)
FAIL
FAIL    github.com/vmware/govmomi/simulator     1.556s
FAIL

Guidelines

Please read and follow the CONTRIBUTION guidelines of this project.

@Joshzxj Joshzxj changed the title fix: data race in configureDevices with PropertyCollector fix: data race in configureDevices with PropertyCollector in vcsim Jan 16, 2026
@Joshzxj Joshzxj marked this pull request as ready for review January 16, 2026 19:38
@Joshzxj Joshzxj force-pushed the fix/configureDevices-race-condition branch from 578b5fb to 80de3b0 Compare January 16, 2026 19:48
@Joshzxj
Copy link
Author

Joshzxj commented Jan 16, 2026

Unit test failed might unrelated to this change, I see same sympton on https://github.com/vmware/govmomi/actions/runs/20923278206

Use cloneDevice() to create a deep copy of the device list before
modification, preventing race between AssignController writes and
PropertyCollector's deepCopy reads..

Signed-off-by: Zexing Jiang <zexing.jiang@broadcom.com>
@Joshzxj Joshzxj force-pushed the fix/configureDevices-race-condition branch from 80de3b0 to 24c2602 Compare January 16, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant