Skip to content

WebGPURenderer: Adjust getArrayBufferAsync behavior, add support for partial readback#33322

Open
gkjohnson wants to merge 13 commits intomrdoob:devfrom
gkjohnson:readback-fixes
Open

WebGPURenderer: Adjust getArrayBufferAsync behavior, add support for partial readback#33322
gkjohnson wants to merge 13 commits intomrdoob:devfrom
gkjohnson:readback-fixes

Conversation

@gkjohnson
Copy link
Copy Markdown
Collaborator

@gkjohnson gkjohnson commented Apr 3, 2026

Fix #33281
Fix #33282

This PR fixes a number of issues introduced in #33300, adjusts support for "ReadbackBuffer" (detailed below), and adds support for passing "offset" and "count" to the function for mapping and reading back a portion of the buffer attribute.

Signature

This is the new call signature. The function optionally takes a target ArrayBuffer to write into or ReadbackBuffer to copy the data into and map. If no "target" is provided then a new ArrayBuffer is constructed and returned, as is done in r183 meaning this function is backwards compatible. Both "count" and "offsets" are in bytes. And the "target" object is the one returned from the function, aligning with the patterns used for functions that take a "target" object across the project:

getArrayBufferAsync(
  attribute: BufferAttribute,
  count = - 1: number,
  offset = 0: number,
  target = null: ArrayBuffer | ReadbackBuffer
): ArrayBuffer | ReadbackBuffer;

The ReadbackBuffer shape has changed like so. It represents an intermediate buffer of "maxByteLength" in size on the GPU for copying data into and mapping to the CPU. The "buffer" field is set to the mapped WebGPU-api provided buffer and is subsequently set to "null" when releasing or disposing the readback buffer instance.

class ReadbackBuffer {
  name: string;
  maxByteLength: number;
  buffer: ArrayBuffer | null;
  constructor( maxByteLength: number );

  release();
  dispose();
}

Behavior Changes

  • Fixed the WebGL code path logging an error and returning a buffer of all zeroes.
  • BufferAttribute.array is no longer read in either WebGPU nor WebGL codepaths so users can continue to truncate the CPU-side arrays after upload, as designed.
  • WebGL readback path now correctly returns an ArrayBuffer rather than a TypedArray.
  • Removed internal allocation and retention of ReadbackBuffers when the user passes only a BufferAttribute. This was creating a lot of unnecessary retained GPU memory, created confusing overwrite behavior when running multiple interleaved kernel operations and readbacks, and caused an error to throw when trying to issue a subsequent readback before the first one finished.
  • Only the necessary content is copied and mapped to the CPU.
  • Simplified WebGL code path to remove unnecessary intermediate buffers.
  • Decouple "ReadbackBuffer" from "BufferAttribute" so the same readback buffer can be used for any buffer attribute's data. This allows the instances to be pooled and reused.

Testing

Running this code snippet with WebGPURenderer demonstrates the code working as expected and the ability to run multiple readbacks in parallel. This works with both "forceWebGL" set to true and false.

const mesh = new THREE.Mesh( new THREE.SphereGeometry() );
mesh.geometry.index.array = new Uint32Array( mesh.geometry.index.array );
scene.add( mesh );

renderer.render( scene, camera );

requestAnimationFrame( async () => {

	const attr = mesh.geometry.index;
	const targetHandle = new THREE.ReadbackBuffer( 8 );
	const targetBuffer = new ArrayBuffer( 8 );
	const [ buff0 ] = await Promise.all( [
		renderer.getArrayBufferAsync( attr, 8, 8 ),
		renderer.getArrayBufferAsync( attr, 8, 8, targetBuffer ),
		renderer.getArrayBufferAsync( attr, 8, 8, targetHandle ),
	] );

	console.log(
		new Uint32Array( buff0 ),
		new Uint32Array( targetBuffer ),
		new Uint32Array( handle.buffer ),
	);

} );

Tested with basic integration on a small buffer in this three-edge-projection demo which runs multiple compute kernels and readbacks in parallel to perform hidden edge removal.

@gkjohnson gkjohnson added this to the r184 milestone Apr 3, 2026
@gkjohnson gkjohnson requested a review from sunag April 3, 2026 12:04
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

📦 Bundle size

Full ESM build, minified and gzipped.

Before After Diff
WebGL 360.57
85.58
360.57
85.58
+0 B
+0 B
WebGPU 635.53
176.41
635.66
176.5
+125 B
+89 B
WebGPU Nodes 633.65
176.12
633.77
176.2
+125 B
+86 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Before After Diff
WebGL 492.95
120.2
492.95
120.2
+0 B
+0 B
WebGPU 707.71
191.31
707.62
191.35
-89 B
+40 B
WebGPU Nodes 656.93
178.57
656.84
178.64
-89 B
+68 B

@gkjohnson gkjohnson requested a review from Mugen87 April 3, 2026 12:15
@Mugen87
Copy link
Copy Markdown
Collaborator

Mugen87 commented Apr 4, 2026

I'm not feeling strong about which direction the API should head here especially since I have used GPU readbacks only sporadically in the past. IMO, it's best if you and @sunag settle on a API. Related #33315.

@sunag
Copy link
Copy Markdown
Collaborator

sunag commented Apr 4, 2026

I think the idea of ​​sharing buffers is great but the API proposed in the PR #33315 seems simpler and can cover the same use cases as the current one, with the exception of offset/size, which should be easy to implement.

Clone arraybuffer after obtaining it seem a bit contradictory to the observations we made earlier, since they use slice() and set() for the buffers, which will introduce unnecessary overhead for the user who only wants to read the buffer after obtaining it, this was improved in the latest updates I made to the PR, I think we should keep it. Cloning the buffer after obtaining it should be the user’s responsibility.

I think the implementations of the buffer property in ReadbackBuffer with return overload not seems bring real benefits, since we can obtain the array and getArrayBufferAsync always returns an ArrayBuffer consistent with the function's own name.

I would prefer to see a simpler function signature with parameter and return overloads:

class ReadbackBuffer {
  name: string;
  size: number;
  constructor( size: number );

  release();
  dispose();
}
getArrayBufferAsync( attribute ) : ArrayBuffer
getArrayBufferAsync( attribute, offset, size ) : ArrayBuffer

getArrayBufferAsync( attribute, readbackBuffer ) : ArrayBuffer
getArrayBufferAsync( attribute, readbackBuffer, offset, size ) : ArrayBuffer

@gkjohnson
Copy link
Copy Markdown
Collaborator Author

gkjohnson commented Apr 5, 2026

seems simpler and can cover the same use cases as the current one

It does not cover the same use cases. It includes all the same problematic behavior I called out in the original PR and this one. I would appreciate it if you start asking questions about why I prefer something and have recommended it rather than making sweeping claims like this. I have not suggested this approach for no reason.

Clone arraybuffer after obtaining it seem a bit contradictory to the observations we made earlier, since they use slice() and set() for the buffers, which will introduce unnecessary overhead for the user

All of my comments on why this is a problem seem to continually be ignored and I'm at a complete loss for why. First of all, I never asked for this and as a user I do not want this behavior. This doesn't solve anything I asked for and mandates incredibly confusing functionality not to mention breaks existing code unnecessarily.

This code will no longer work as expected. The buffers are now set to the same buffer:

let buff0, buff1;
renderer.compute( kernel, [ 1000, 1, 1 ] );
buff0 = await renderer.getArrayBufferAsync( attr );

renderer.compute( kernel, [ 1000, 1, 1 ] );
buff1 = await renderer.getArrayBufferAsync( attr );

And this code will throw an error whereas it does not here nor in r183:

let pr0, pr1;
renderer.compute( kernel, [ 1000, 1, 1 ] );
pr0 = renderer.getArrayBufferAsync( attr );

renderer.compute( kernel, [ 1000, 1, 1 ] );
pr1 = renderer.getArrayBufferAsync( attr );

const [ buff0, buff1 ] = Promise.all( [ pr0, pr1 ] );

Not having to worry about managing these buffers and race conditions is a benefit. I will frequently write code quickly using less efficient mechanisms specifically so I can get something working then switch to more optimal system (like ReadbackBuffer in this case) so I plan to continue using this simple path in dev.

This is even before we get to the fact that there is nearly no benefit and a ton of downsides:

Cons

  • Hidden, unexpected, and unnecessary buffer overwrite behavior with no indicator.
  • Inability to read from the same buffer attribute back to back.
  • Persistent GPU & CPU memory allocation the user has no control or knowledge of.
  • Not backwards compatible.

Pros

  • A small perf benefit from no slice maybe? But it's no slower than the "set" that replaced it in my testing.

Bottom line is this hasn't actually solved a problem anyone has raised. If we're going to make such dramatic changes to an existing API can we at least wait for someone to actually complain about the code path first? Again this isn't anything I wanted and I raised the issue. The ReadbackBuffer otherwise enables the behavior I'm looking for.

I think the implementations of the buffer property in ReadbackBuffer with return overload not seems bring real benefits

It absolutely does. The buffer attached to "ReadBuffer" is the buffer that has been mapped and is the buffer that will be unmapped. This means the given buffer can be set to null or neutered at any time. So passing the "ReadBuffer" instance to other parts of an application means they can attach "release" or "dispose" events to it to determine exactly when that buffer will have been invalidated or changed. It also serves as a flag indicating whether the buffer is currently mapped or not denoting that it's "in use".

I would prefer to see a simpler function signature with parameter and return overloads:

We can discuss rearranging the arguments but across the project the "target" object is consistently the final object in arguments list in math functions etc and is returned from the function. I'm not sure why this pattern should be different here. And if the function is going to work without passing a "readbackBuffer" as an option then my proposed order still allows you to readback a smaller buffer w/ count and offset.

@sunag
Copy link
Copy Markdown
Collaborator

sunag commented Apr 5, 2026

Could you make a simple fiddle, as you usually do, for your use case based on this PR?

@sunag
Copy link
Copy Markdown
Collaborator

sunag commented Apr 5, 2026

I think we should avoid cloning ArrayBuffer in the core; the user can easily do this using slice after obtaining if needed. It is more common for the user to manipulate the ArrayBuffer in one frame and expect the changes in another; if they need something more specific like this, they can still do it.

let buff0, buff1;
renderer.compute( kernel, [ 1000, 1, 1 ] );
buff0 = ( await renderer.getArrayBufferAsync( attr ) ).slice( 0 );

renderer.compute( kernel, [ 1000, 1, 1 ] );
buff1 = ( await renderer.getArrayBufferAsync( attr ) ).slice( 0 );

@sunag
Copy link
Copy Markdown
Collaborator

sunag commented Apr 5, 2026

I'm not sure why this pattern should be different here.

The count, offset signature is very unusual; people are used to using offset, size. It would be better to use the term size as well, since we are referring to these same terms in other related places. I think we can add an internal parameter overload to bring this benefit to the user.

@sunag
Copy link
Copy Markdown
Collaborator

sunag commented Apr 5, 2026

All of my comments on why this is a problem seem to continually be ignored and I'm at a complete loss for why.

Sorry if I wasn’t able to be that specific, but this problem could be solved in the way below using with the changes made in #33315. You can paste this code into webgpu_compute_audio. I don’t think it’s justifiable to clone the GPU/CPU buffer every time we make a request using getArrayBufferAsync(attribute) just to support this use case as proposed in this PR, since previous change will benefit the most common use cases.

If the user wants to obtain multiple fractions of the same GPU buffer in parallel, they should use ReadbackBuffer, even to better manage this since it's moving towards something more advanced, the user will usually obtain portions to optimize processing, and this is counterintuitive if they are recreating the GPU/CPU buffer every time.

async function getArrayBuffer() {

	const buffer = new THREE.ReadbackBuffer( waveArray.value.array.byteLength  );

	const array = ( await renderer.getArrayBufferAsync( waveArray.value, buffer ) ).slice( 0 );
	buffer.dispose();

	return array;

}

let c1 = getArrayBuffer();
let c2 = getArrayBuffer();

const [ waveBuffer1, waveBuffer2 ] = await Promise.all( [ c1, c2 ] );

const wave = new Float32Array( waveBuffer2 );

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

3 participants