Skip to content

Context supportΒ #9

@domenic

Description

@domenic

It's often useful to give context when doing translations. For example, "Turkey" might be translated as 七青ι³₯ or as γƒˆγƒ«γ‚³ depending on the context.

I've found some evidence of people attempting to give context to non-LLM translation models by using punctuation and sentence structure. For example, if you translate "Togo" by itself with Google Translate, you get "ζŒγ‘εΈ°γ‚Š" (basically, "take out" or "to go" for food orders). But if you translate "The country: Togo", you get "ε›½: γƒˆγƒΌγ‚΄", which is correct. People then build hacks on top of this, hoping that the output stays consistent across slightly varied inputs so that they can use regular expressions or other programming tricks to pull out the "γƒˆγƒΌγ‚΄" part.

This is fragile. For example, translating "The country: America" gives "ε›½οΌšγ‚’γƒ‘γƒͺγ‚«". This is also correct, but the colon punctuation is different: it's a full-width (Japanese) colon for this second example, instead of a half-width (Latin script) colon. So a developer's first-draft code would not work.

It would be ideal if we could abstract over this process for developers, using something like

translator.translate("Togo", { context: "%s is a country name" });

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions