Skip to content

Add C API to unset all query buffers.#5783

Merged
teo-tsirpanis merged 4 commits intomainfrom
teo/core-501-add-c-api-to-unset-all-query-buffers
Mar 19, 2026
Merged

Add C API to unset all query buffers.#5783
teo-tsirpanis merged 4 commits intomainfrom
teo/core-501-add-c-api-to-unset-all-query-buffers

Conversation

@teo-tsirpanis
Copy link
Copy Markdown
Member

@teo-tsirpanis teo-tsirpanis commented Mar 17, 2026

This PR adds new C and C++ APIs that unset all previously set query buffers. It can be useful to provide additional memory safety by guaranteeing that the Core will not hold references to user buffers after a certain point, which will be used by TileDB-rs to safely decouple the buffers' lifetime from the query object (CORE-502).


TYPE: C_API
DESC: Add tiledb_query_unset_buffers function, that unsets all previously set buffers.


TYPE: CPP_API
DESC: Add Query::unset_buffers function, that unsets all previously set buffers.

@teo-tsirpanis teo-tsirpanis force-pushed the teo/core-501-add-c-api-to-unset-all-query-buffers branch from 9e653ab to e44cf5a Compare March 17, 2026 19:08
@teo-tsirpanis teo-tsirpanis marked this pull request as ready for review March 17, 2026 19:44
@teo-tsirpanis teo-tsirpanis requested review from rroelke and ypatia March 17, 2026 19:44
Comment on lines +559 to +574
/**
* Unsets all buffers set by previous calls to tiledb_query_set_data_buffer,
* tiledb_query_set_offsets_buffer, and tiledb_query_set_validity_buffer,
* resetting the query to a state as if no buffers had been set.
*
* This function can be useful to call before releasing the memory from the
* query buffers, providing additional memory safety.
*
* @param query The TileDB query.
*
* @return `TILEDB_OK` for success and `TILEDB_ERR` for error. The function will
* return an error only if the query pointer is invalid.
*/
TILEDB_EXPORT int32_t tiledb_query_unset_buffers(tiledb_query_t* query)
TILEDB_NOEXCEPT;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the idea of not accepting a context here, because this API can only fail if called with a null pointer. Let me know what you think.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

@rroelke
Copy link
Copy Markdown
Member

rroelke commented Mar 18, 2026

This implementation looks fine to me - but I wonder if it's the right way to solve the problem, which as I understand it is that we can't detach buffers from the Rust query object in order to fill them with the next round of global order write data.

I've suggested before that we tried replacing the tiledb-rs query buffer API with Arrow to help with this problem. That's a very high-altitude fruit. Maybe there's a lower-hanging fruit which would be a better way to address this, just from the Rust side.

fn set_buffer<'data>(&mut Self, field: &str, &'data mut [u8]) -> BufferGuard<'data>;

This pattern didn't occur to us 2 years ago; but now maybe it's something we can try. The BufferGuard mutably borrows the data buffer and keeps it attached to the query during its lifetime (so maybe we would need to unattach a specific buffer). Perhaps it could provide mutable access to the buffer as well so that the user can modify data without needing to detach/reattach to the query.

What do you think?

If there's other potential uses of this C API then I'll approve anyway.

@teo-tsirpanis
Copy link
Copy Markdown
Member Author

teo-tsirpanis commented Mar 18, 2026

I am not a fan of replicating the set_***_buffer API shape to Rust because I do not believe that it is the most fitting UX for reading and writing TileDB arrays. Rust makes it more evident because we could not make the first iteration of the TileDB-rs query API look like this, and there are other problems with manually setting buffers, like having to mutably borrow buffers even for write queries.

For me, the ideal UX for writing arrays would look like this (see also TileDB-Inc/TileDB-Py#2237 (comment)):

// One-shot
let array = open_array();
array.write({"attr1": data, "attr2": data2}, other_options);

// Incremental
let writer = array.start_global_order_write(options);
while (…)
{
    writer.write({"attr1": data, "attr2": data2});
}
writer.finish();

TileDB-rs is halfway through realizing this vision, and thus manually setting buffers would be IMO a step backwards.

If there's other potential uses of this C API

APIs for garbage-collected languages like C# and Go can unpin the query buffers after unsetting them from a query (e.g. after each global order write step or when resizing buffers in incomplete reads), allowing to increase memory efficiency.

@teo-tsirpanis teo-tsirpanis requested a review from rroelke March 18, 2026 16:00
@rroelke
Copy link
Copy Markdown
Member

rroelke commented Mar 19, 2026

I am not a fan of replicating the set_***_buffer API shape to Rust because I do not believe that it is the most fitting UX for reading and writing TileDB arrays. Rust makes it more evident because we could not make the first iteration of the TileDB-rs query API look like this, and there are other problems with manually setting buffers, like having to mutably borrow buffers even for write queries.

For me, the ideal UX for writing arrays would look like this (see also TileDB-Inc/TileDB-Py#2237 (comment)):

// One-shot
let array = open_array();
array.write({"attr1": data, "attr2": data2}, other_options);

// Incremental
let writer = array.start_global_order_write(options);
while (…)
{
    writer.write({"attr1": data, "attr2": data2});
}
writer.finish();

I'll nitpick you a bit here to point out that there's an implicit map construction here which we probably ought to avoid. Or at least have the option to avoid. One idiomatic way would be to do

fn write<I>(&mut self, fields: I) -> Result where I: IntoIterator<Item = (&str, Buffer)>

The needs for write and read queries are quite different and I definitely agree with you that write queries should not need to do a mutable borrow. You didn't specifically say so but it seems to me like you would be on board to having entirely different code pathways for reads and writes.

TileDB-rs is halfway through realizing this vision, and thus manually setting buffers would be IMO a step backwards.

If there's other potential uses of this C API

APIs for garbage-collected languages like C# and Go can unpin the query buffers after unsetting them from a query (e.g. after each global order write step or when resizing buffers in incomplete reads), allowing to increase memory efficiency.

Does the query object in these languages also pin the buffers? Surely it must. Wouldn't garbage collection clear the query, and then clear the buffers after that?

I approved anyway - I'm just curious.

@teo-tsirpanis
Copy link
Copy Markdown
Member Author

You didn't specifically say so but it seems to me like you would be on board to having entirely different code pathways for reads and writes.

Definitely; Query has been a god object with so many responsibilities that should be split; at least at high-level APIs first.

Does the query object in these languages also pin the buffers? Surely it must. Wouldn't garbage collection clear the query, and then clear the buffers after that?

Yes, query buffers are pinned before passing them to the C API. Because garbage collection finalizers run in a non-deterministic order, there is a single "query handle" object that manages both the tiledb_query_t* and the buffer pins. When this object gets finalized (which happens if the user has not manually called Dispose()/Free()), we call tiledb_query_free and unpin the buffers afterwards.

https://github.com/TileDB-Inc/TileDB-CSharp/blob/60f47ad9be7071b6f91b19945aeebb50f2a5e252/sources/TileDB.CSharp/Marshalling/SafeHandles/QueryHandle.cs#L54-L77

https://github.com/TileDB-Inc/TileDB-Go/blob/656934b708aa4ff8a98c3b925b80c082ab583eb8/query.go#L21-L30


REST-CI has succeeded on rerun (unfortunately it's not visible from this page), so merging…

@teo-tsirpanis teo-tsirpanis merged commit 16cf508 into main Mar 19, 2026
56 of 57 checks passed
@teo-tsirpanis teo-tsirpanis deleted the teo/core-501-add-c-api-to-unset-all-query-buffers branch March 19, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants