Skip to content

sentence-transformers as an optional dependency (lose: sem_join, sem_topk)#289

Open
andrewjradcliffe wants to merge 1 commit intomitdbg:mainfrom
andrewjradcliffe:ajr/optional-sentence-transformers
Open

sentence-transformers as an optional dependency (lose: sem_join, sem_topk)#289
andrewjradcliffe wants to merge 1 commit intomitdbg:mainfrom
andrewjradcliffe:ajr/optional-sentence-transformers

Conversation

@andrewjradcliffe
Copy link
Copy Markdown

Motivation: there are scenarios in which the torch dependency implied by sentence-transformers can prevent the inclusion of this package in an application (binary size). With this modification, sem_join and sem_topk will simply throw an exception, but the rest of the semantic operators are unaffected. Not ideal, but has the intended effect.

…`sem_topk`)

Motivation: there are scenarios in which the `torch` dependency
implied by `sentence-transformers` can prevent the inclusion of this
package in an application (binary size). With this modification,
`sem_join` and `sem_topk` will simply throw an exception, but the rest
of the semantic operators are unaffected. Not ideal, but has the
intended effect.
@mdr223
Copy link
Copy Markdown
Collaborator

mdr223 commented Mar 9, 2026

Hi @andrewjradcliffe, thanks again for opening this PR!

This PR gets at a major pain point that I would like to address (narrowing the scope of PZ dependencies, especially hefty ones like sentence transformers and torch). However, we also want to be able to support multimodal queries for all operators out-of-the-box. The one issue with making sentence-transformers optional is that it removes the built-in support for (some) of our optimized multimodal semantic join and top-k implementations.

As a result, I'm going to hold this PR in limbo for at least one more week so that I have time to implement a fallback strategy for Semantic Top-K which will create a textual description of the image and then embed that description. In the long-term, my hope is that the frontier labs will release multimodal embedding models which will make this problem moot.

@andrewjradcliffe
Copy link
Copy Markdown
Author

Quite understandable and there's no rush. I was viewing it from the perspective of using the optional-dependencies elements as feature flags.

Admittedly, in Rust the cfg attribute and inherent strictness of the compiler reduce the cognitive burden to almost nil (e.g. try to use Top-K without feature enabled -- code won't compile and compiler returns informative error).

I think the principle of least surprise should apply, which implies that adding another flag (--extra) is not ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants