-
-
Notifications
You must be signed in to change notification settings - Fork 2
Set the language codes in the build execution data for echo builds #853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #853 +/- ##
=======================================
Coverage 66.17% 66.17%
=======================================
Files 382 382
Lines 20793 20793
Branches 2721 2721
=======================================
Hits 13760 13760
Misses 6067 6067
Partials 966 966 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @pmachapman).
src/Echo/src/EchoEngine/TranslationEngineServiceV1.cs line 135 at r1 (raw file):
ExecutionData = new ExecutionData { TrainCount = 0,
I wonder about making these values reflect the actual number of training rows/inference rows even though the Echo engine isn't really using that data per se. Do you think it makes more sense just to have them be 0?
If you did the UpdateBuildExecutionDataAsync call after the parallel corpus preprocessing, you could get those values during preprocessing like we do with the true MT engine builds and incorporate them in the execution data.
...but maybe this spirals into including other 'realistic' values like those for the quote convention analysis and build warnings which we probably do not want to do. On the other hand, getting these values would be easy since we're already running preprocessing on the corpora. What do you think?
988d64a to
55c0925
Compare
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmachapman made 1 comment.
Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93).
src/Echo/src/EchoEngine/TranslationEngineServiceV1.cs line 135 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I wonder about making these values reflect the actual number of training rows/inference rows even though the Echo engine isn't really using that data per se. Do you think it makes more sense just to have them be 0?
If you did the
UpdateBuildExecutionDataAsynccall after the parallel corpus preprocessing, you could get those values during preprocessing like we do with the true MT engine builds and incorporate them in the execution data....but maybe this spirals into including other 'realistic' values like those for the quote convention analysis and build warnings which we probably do not want to do. On the other hand, getting these values would be easy since we're already running preprocessing on the corpora. What do you think?
Done. Sounds good. Thank you!
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @pmachapman).
src/Echo/src/EchoEngine/TranslationEngineServiceV1.cs line 135 at r1 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Done. Sounds good. Thank you!
Thank you! I think we should mirror the counting of the training rows and inference rows here exactly as in
serval/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs
Line 54 in 226d952
| await ParallelCorpusPreprocessingService.PreprocessAsync( |
serval/src/Machine/src/Serval.Machine.Shared/Services/WordAlignmentPreprocessBuildJob.cs
Line 54 in 226d952
| await ParallelCorpusPreprocessingService.PreprocessAsync( |
ddaspit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddaspit reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman).
55c0925 to
40c021f
Compare
pmachapman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmachapman made 1 comment.
Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93).
src/Echo/src/EchoEngine/TranslationEngineServiceV1.cs line 135 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Thank you! I think we should mirror the counting of the training rows and inference rows here exactly as in
andserval/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs
Line 54 in 226d952
await ParallelCorpusPreprocessingService.PreprocessAsync( - notice that we don't increment on every call.serval/src/Machine/src/Serval.Machine.Shared/Services/WordAlignmentPreprocessBuildJob.cs
Line 54 in 226d952
await ParallelCorpusPreprocessingService.PreprocessAsync(
Done.
Enkidu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 reviewed 2 files and all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @pmachapman).
Fixes #840
This change is