Langid: Train for remaining languages that weren't in opus-100

We have trained model files lid.beta.ftz and lid.release.ftz in the repo for languages that were in the opus-100 corpus. We should get corpora for the languages that weren't there and retrain (preferably in a fairly reproducible way, see scripts in ./ft-train).

Corpus suggestions: https://github.com/apertium/apertium-apy/pull/207#issuecomment-1398455482

Missing in release:

    Got only 35791 lines for oci oc
    Got only 35907 lines for sme se
    Got only 67312 lines for bel be
    Got only 6961 lines for arg an
    Got only 79927 lines for kaz kk
    No corpus found for crh
    No corpus found for frp
    No corpus found for szl
    No corpus found for zlm

Full missing-list for beta and relase: https://github.com/apertium/apertium-apy/blob/master/ft-train/download-extract-corpus#L56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langid: Train for remaining languages that weren't in opus-100 #213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Langid: Train for remaining languages that weren't in opus-100 #213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions