Skip to content

[Project]: Language community health #158

@Jibec

Description

@Jibec

Project Name (1 - 3 words)

Language community health

Description

Free software community is huge
Users are mostly not English speakers (80% of the world does not speak English)
Translators have to go translate in every place a software is produced

How do we assess the current translation progress for a specific language, as seen from end-user?
What is the language community healthy? Is it increasing over time or not?
What is the language coverage for a specific country?
How to identify successful projects and how they managed to get success?
How to identify issues in the way the free software community handle languages?

This work is focused on Fedora Operating System, and could theoretically be reused for any operating system.

Help required:

  • statistics production here is complex and a little bit broken -> infrastructure help (maybe from Fedora community?)
  • statistics are published as files, which makes it complex to share with stakeholders -> dev help
  • there is a lot of statistics, each file is a measurement (200k files for Fedora 39), there is thousands of packages and hundreds of languages, I have trouble to understand all of these data -> datascience help
  • there is millions of different situation, how do we detect processing bugs impacting results? -> datascience help
  • as the subject is wide, how do we identify the key metrics we would like to focus on for communication purpose? -> meeting with language-focused groups and stakeholders

This few items may take years already, and since there is a global access to translated content, lot of interesting projects could be done on top of the translation dataset (so that we are sure to busy in the next life too)

Related Links

some statistics are published here https://languages.fedoraproject.org
source code is here: https://pagure.io/fedora-l10n/fedora-localization-statistics/tree/staging (only tested with Fedora Operating System)

How would you like to be involved in this project?

Other (please specify in the “Additional Notes” at the end of this form)

Additional Notes.

I'll be happy to help here and continue to manage this project (this is not my day job), but not on the data science part in which I don't have any expertise.
While I'm comfortable to learn new tools/techniques, I am not looking to become a datascientist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    projectCollaborative data science projectsproposal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions