Set the language codes in the build execution data for echo builds #853

pmachapman · 2026-01-14T17:14:01Z

Fixes #840

This change is

codecov-commenter · 2026-01-14T17:31:07Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.17%. Comparing base (226d952) to head (40c021f).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #853   +/-   ##
=======================================
  Coverage   66.17%   66.17%           
=======================================
  Files         382      382           
  Lines       20793    20793           
  Branches     2721     2721           
=======================================
  Hits        13760    13760           
  Misses       6067     6067           
  Partials      966      966

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Enkidu93

@Enkidu93 reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @pmachapman).

src/Echo/src/EchoEngine/TranslationEngineServiceV1.cs line 135 at r1 (raw file):

                            ExecutionData = new ExecutionData
                            {
                                TrainCount = 0,

I wonder about making these values reflect the actual number of training rows/inference rows even though the Echo engine isn't really using that data per se. Do you think it makes more sense just to have them be 0?

If you did the UpdateBuildExecutionDataAsync call after the parallel corpus preprocessing, you could get those values during preprocessing like we do with the true MT engine builds and incorporate them in the execution data.

...but maybe this spirals into including other 'realistic' values like those for the quote convention analysis and build warnings which we probably do not want to do. On the other hand, getting these values would be easy since we're already running preprocessing on the corpora. What do you think?

pmachapman

@pmachapman made 1 comment.
Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93).

src/Echo/src/EchoEngine/TranslationEngineServiceV1.cs line 135 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

I wonder about making these values reflect the actual number of training rows/inference rows even though the Echo engine isn't really using that data per se. Do you think it makes more sense just to have them be 0?

If you did the UpdateBuildExecutionDataAsync call after the parallel corpus preprocessing, you could get those values during preprocessing like we do with the true MT engine builds and incorporate them in the execution data.

...but maybe this spirals into including other 'realistic' values like those for the quote convention analysis and build warnings which we probably do not want to do. On the other hand, getting these values would be easy since we're already running preprocessing on the corpora. What do you think?

Done. Sounds good. Thank you!

Enkidu93

@Enkidu93 reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @pmachapman).

src/Echo/src/EchoEngine/TranslationEngineServiceV1.cs line 135 at r1 (raw file):

Previously, pmachapman (Peter Chapman) wrote…

Done. Sounds good. Thank you!

Thank you! I think we should mirror the counting of the training rows and inference rows here exactly as in

serval/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs

Line 54 in 226d952

await ParallelCorpusPreprocessingService.PreprocessAsync(

and

serval/src/Machine/src/Serval.Machine.Shared/Services/WordAlignmentPreprocessBuildJob.cs

Line 54 in 226d952

await ParallelCorpusPreprocessingService.PreprocessAsync(

- notice that we don't increment on every call.

ddaspit

@ddaspit reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman).

pmachapman

@pmachapman made 1 comment.
Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93).

src/Echo/src/EchoEngine/TranslationEngineServiceV1.cs line 135 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Thank you! I think we should mirror the counting of the training rows and inference rows here exactly as in

serval/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs

Line 54 in 226d952

await ParallelCorpusPreprocessingService.PreprocessAsync(

and

serval/src/Machine/src/Serval.Machine.Shared/Services/WordAlignmentPreprocessBuildJob.cs

Line 54 in 226d952

await ParallelCorpusPreprocessingService.PreprocessAsync(

- notice that we don't increment on every call.

Done.

Enkidu93

@Enkidu93 reviewed 2 files and all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @pmachapman).

pmachapman requested review from Enkidu93 and ddaspit January 14, 2026 17:14

Enkidu93 requested changes Jan 14, 2026

View reviewed changes

pmachapman force-pushed the echo_executiondata branch from 988d64a to 55c0925 Compare January 14, 2026 18:28

pmachapman commented Jan 14, 2026

View reviewed changes

Enkidu93 requested changes Jan 15, 2026

View reviewed changes

ddaspit approved these changes Jan 15, 2026

View reviewed changes

Set the language codes in the build execution data for echo builds

40c021f

pmachapman force-pushed the echo_executiondata branch from 55c0925 to 40c021f Compare January 18, 2026 19:49

pmachapman commented Jan 18, 2026

View reviewed changes

Enkidu93 approved these changes Jan 19, 2026

View reviewed changes

pmachapman merged commit 3e5fc0c into main Jan 19, 2026
3 checks passed

pmachapman deleted the echo_executiondata branch January 19, 2026 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set the language codes in the build execution data for echo builds #853

Set the language codes in the build execution data for echo builds #853

Uh oh!

pmachapman commented Jan 14, 2026 •

edited by ddaspit

Loading

Uh oh!

codecov-commenter commented Jan 14, 2026 •

edited

Loading

Uh oh!

Enkidu93 left a comment

Uh oh!

pmachapman left a comment

Uh oh!

Enkidu93 left a comment

Uh oh!

ddaspit left a comment

Uh oh!

pmachapman left a comment

Uh oh!

Enkidu93 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Set the language codes in the build execution data for echo builds #853

Set the language codes in the build execution data for echo builds #853

Uh oh!

Conversation

pmachapman commented Jan 14, 2026 • edited by ddaspit Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

pmachapman left a comment

Choose a reason for hiding this comment

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

ddaspit left a comment

Choose a reason for hiding this comment

Uh oh!

pmachapman left a comment

Choose a reason for hiding this comment

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pmachapman commented Jan 14, 2026 •

edited by ddaspit

Loading

codecov-commenter commented Jan 14, 2026 •

edited

Loading