Skip to content

update datafusion to 52 & update deps#525

Merged
phillipleblanc merged 10 commits intomainfrom
datafusion_52
Feb 26, 2026
Merged

update datafusion to 52 & update deps#525
phillipleblanc merged 10 commits intomainfrom
datafusion_52

Conversation

@hozan23
Copy link
Collaborator

@hozan23 hozan23 commented Jan 13, 2026

No description provided.

@hozan23
Copy link
Collaborator Author

hozan23 commented Jan 14, 2026

Since datafusion removed the pyarrow feature, we should migrate to datafusion-python. However, version 52 of datafusion-python has not yet been published, which is why we encountered the previous issue with CI. Let's wait for it.

@hozan23
Copy link
Collaborator Author

hozan23 commented Feb 3, 2026

Updates:
datafusion-python merged the preparation branch to version 52, but it hasn't been released yet. I am following the release, waiting for version 52 release.

@hozan23 hozan23 force-pushed the datafusion_52 branch 2 times, most recently from 1c64cf8 to f85f7c6 Compare February 3, 2026 11:10
@nuno-faria
Copy link
Contributor

@hozan23 datafusion-python has been updated: https://crates.io/crates/datafusion-python/52.0.0

The FFI_TableProvider stores a Weak reference to the TaskContextProvider,
but the Arc was created locally and dropped immediately. Use a static
OnceLock to keep it alive. Also update the __datafusion_table_provider__
signature for the v52 protocol and bump datafusion dependency to >=52.0.0.
@hozan23
Copy link
Collaborator Author

hozan23 commented Feb 24, 2026

@nuno-faria @phillipleblanc Can you have a final review please before merging to main?

Copy link
Contributor

@nuno-faria nuno-faria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hozan23, overall LGTM but there appears to be an error when executing the integration tests, namely at test_flight_sql_data_source. The df.count() triggers a panic in arrow:

thread 'flight::test_flight_sql_data_source' panicked at .../arrow-select-57.3.0/src/coalesce.rs:462:9:
assertion `left == right` failed
  left: 2
 right: 0

According to this comment, it appears that it was not supposed to work before in the first place. As a workaround, we can convert the df.count() into a df.collect().await?.iter().map(|b| b.num_rows()).sum::<usize>().

@hozan23
Copy link
Collaborator Author

hozan23 commented Feb 25, 2026

Hi @nuno-faria
I have fixed the df.count() issue. I also attempted to add integration tests to the CI, but i think it is non-trivial to add them, I reverted those changes. I have kept the fix for the df.count() issue you mentioned.

Let me know if it looks good to you so we can merge it.

Copy link
Contributor

@nuno-faria nuno-faria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks again @hozan23.

@phillipleblanc phillipleblanc merged commit ebdf355 into main Feb 26, 2026
11 checks passed
@phillipleblanc phillipleblanc deleted the datafusion_52 branch February 26, 2026 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants