Skip to content

Langid: Train for remaining languages that weren't in opus-100Β #213

@unhammer

Description

@unhammer

We have trained model files lid.beta.ftz and lid.release.ftz in the repo for languages that were in the opus-100 corpus. We should get corpora for the languages that weren't there and retrain (preferably in a fairly reproducible way, see scripts in ./ft-train).

Corpus suggestions: #207 (comment)

Missing in release:

Got only 35791 lines for oci oc
Got only 35907 lines for sme se
Got only 67312 lines for bel be
Got only 6961 lines for arg an
Got only 79927 lines for kaz kk
No corpus found for crh
No corpus found for frp
No corpus found for szl
No corpus found for zlm

Full missing-list for beta and relase: https://github.com/apertium/apertium-apy/blob/master/ft-train/download-extract-corpus#L56

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions