Skip to content

Batch Open-vocabulary Detection with Grounding Models #18

@NetZissou

Description

@NetZissou

About

Add a batch pipeline that takes

  • (a) an image corpus (folder or Parquet of binary images/URIs) and,
  • (b) one or more text labels, and returns detection boxes (with scores + optional masks) for each image/label using an open-vocabulary grounding model such as OWLv2

Objective

  • Support open-vocabulary text prompts

    • Single label
    • Multiple labels
  • Run efficiently on GPU(s) with batch inference

  • Emit results in interoperable formats with stable schema

Example

One Label Detection

- RGB Image
- Text Label: ["Fish"]
Image

Multi-labels Detection

- RGB Image
- Text Label: ["coffee mug", "plate", "spoon"]
Image

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions