-
-
Notifications
You must be signed in to change notification settings - Fork 7
Improvements to saving confidence, quality estimation, and sample usability #900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sed indexing to reflect row number in file
benjaminking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjaminking reviewed 6 files and all commit messages, and made 8 comments.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @ddaspit and @mshannon-sil).
silnlp/common/sample_usability.py line 128 at r1 (raw file):
) parser.add_argument( "--random-state",
Maybe rename this "random-seed"? for consistency with other scripts?
silnlp/nmt/quality_estimation.py line 44 at r1 (raw file):
@dataclass class BookScores:
Nice use of inheritance and composition here. It really helps to make the code clearer.
silnlp/nmt/quality_estimation.py line 76 at r1 (raw file):
book = file_scores[0].vref.book if file_scores else None verse_scores += file_scores if confidence_file.with_suffix(".chapters.tsv").is_file():
It would probably require some refactoring (maybe a class wrapping confidence_files?), but I'd like if there was a way to keep the logic about what the various confidence files are called all in one place, so we wouldn't have to update in multiple places if it changes.
silnlp/nmt/quality_estimation.py line 127 at r1 (raw file):
def get_verse_scores(input_file_path: Path, slope: float, intercept: float) -> List[VerseScore]:
If I'm correct, input_file_path is pointing to a confidences file? Changing the name to reflect that might be helpful for understanding what's going on (since there are so many different files involved here)
silnlp/nmt/translate.py line 129 at r1 (raw file):
tags, ) if save_confidences:
I wonder if we should just always be saving confidence files? Do you see any downsides of that? Now that we're having to pass confidence files around in the code, it would help to simplify things.
silnlp/nmt/experiment.py line 162 at r1 (raw file):
raise RuntimeError("A Scripture book, file, or file prefix must be specified for translation.") # Run quality estimation once after all translations complete
I wonder if we ought to be doing quality estimation in each of the translate functions instead (probably calling a function to do that)? We could just pass along the quality_estimation parameter and then don't have to return the list of confidence files.
silnlp/nmt/test.py line 435 at r1 (raw file):
def write_sentence_bleu(
What was the impetus in getting rid of SentenceBleu?
silnlp/common/translator.py line 148 at r1 (raw file):
def get_all_sequence_confidence_scores(self) -> List[float]: return [ scs for scs in [t.get_sequence_confidence_score() for t in self._sentence_translations] if scs is not None
Was taking the exponential incorrect here?
mshannon-sil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mshannon-sil made 7 comments.
Reviewable status: 0 of 8 files reviewed, 7 unresolved discussions (waiting on @benjaminking and @ddaspit).
silnlp/common/sample_usability.py line 128 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
Maybe rename this "random-seed"? for consistency with other scripts?
Done.
silnlp/common/translator.py line 148 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
Was taking the exponential incorrect here?
No I just changed how get_sequence_confidence_score() works so that it returns the exponentiated version, so I removed the exp() call here to compensate. It seemed more apt since what we've been calling confidence scores is the exponentiated version, and it also makes the code cleaner since we don't have to remember to call exp() every time we get the score.
silnlp/nmt/experiment.py line 162 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
I wonder if we ought to be doing quality estimation in each of the translate functions instead (probably calling a function to do that)? We could just pass along the
quality_estimationparameter and then don't have to return the list of confidence files.
Done.
silnlp/nmt/quality_estimation.py line 76 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
It would probably require some refactoring (maybe a class wrapping
confidence_files?), but I'd like if there was a way to keep the logic about what the various confidence files are called all in one place, so we wouldn't have to update in multiple places if it changes.
Done.
silnlp/nmt/quality_estimation.py line 127 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
If I'm correct,
input_file_pathis pointing to a confidences file? Changing the name to reflect that might be helpful for understanding what's going on (since there are so many different files involved here)
Done.
silnlp/nmt/test.py line 435 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
What was the impetus in getting rid of SentenceBleu?
I used the sacrebleu version in write_pair_verse_scores, because the comment explaining the rational for including our own custom implementation says: Substitute for the sacrebleu version of sentence_bleu, which uses settings that aren't consistent with the values we use for corpus_bleu, and isn't fully parameterized. However, I believe I was able to pass in all the values we use for corpus_bleu into the sacrebleu sentence_bleu or take advantage of default values that already match, and so it seems fully parameterized to me. If there's no difference, it's better to just use sacrebleu's version.
Let me double check though, @ddaspit , do you still see a need for our own version of sentence_bleu? If so, I can remove it in the other places it's used since I haven't fully removed it yet.
silnlp/nmt/translate.py line 129 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
I wonder if we should just always be saving confidence files? Do you see any downsides of that? Now that we're having to pass confidence files around in the code, it would help to simplify things.
Based on our discussion with other members, it sounds like we're in favor of always saving confidence, especially to avoid accidently forgetting to save the flag. It would clutter the infer directory up a bit, but the storage memory impact would be minimal.
However, there's still an issue with confidence related to multiple translations, and I'd like to get this PR merged first to get the current features into master as soon as possible. So I think it's best to hold off on removing the flag until the next PR that address the multiple translations issue.
|
Previously, mshannon-sil wrote…
From Damien: "I believe that prior to sacrebleu 2.0 there were a couple of options that we could not set for So I'll go ahead and change test.py to always use the sacrebleu version and remove the custom implementation. |
|
Previously, mshannon-sil wrote…
Just removed the custom implementation. Also, if your question had to do with removing sentencebleu from the list of scorers, that's because verse level scores are now always being calculated, so sentencebleu will be run for the verse scores whenever bleu is requested as a scorer. |
benjaminking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work. Just a couple small things before we should be able to get this in.
@benjaminking reviewed 8 files and all commit messages, made 7 comments, and resolved 6 discussions.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @ddaspit and @mshannon-sil).
silnlp/nmt/quality_estimation.py line 76 at r1 (raw file):
Previously, mshannon-sil wrote…
Done.
Could you have confidence_file here be an instance of ConfidenceFile so that you can call get_chapter_path() instead of specifying the suffix? It might also be nice if ConfidenceFile had some sort of iterator method that could iterate over books/chapters/verses and their confidence scores, so that other methods like this wouldn't need to know the format of the file.
silnlp/nmt/experiment.py line 223 at r2 (raw file):
) parser.add_argument( "--test-data-file",
Maybe we could call this --test-data-scores-file? I was confused about what exactly we were passing around until I got to this description below.
silnlp/common/translator.py line 177 at r2 (raw file):
class ConfidenceFile: def __init__(self, path: Path, trg_file_path: Optional[Path] = None):
It looks like you only ever use the from_trg_path method to initiate this class. Is there a way to get rid of the trg_file_path parameter in the constructor, since it should be predictable from path?
silnlp/nmt/translate.py line 66 at r2 (raw file):
postprocess_handler: PostprocessHandler = PostprocessHandler(), tags: Optional[List[str]] = None, ) -> List[Path]:
It doesn't look like this method returns anything.
silnlp/nmt/translate.py line 157 at r2 (raw file):
test_data_path: Optional[Path] = None, tags: Optional[List[str]] = None, ) -> List[Path]:
Same issue with this function.
silnlp/nmt/translate.py line 219 at r2 (raw file):
postprocess_handler: PostprocessHandler = PostprocessHandler(), tags: Optional[List[str]] = None, ) -> List[Path]:
Same issue here as well.
|
Previously, benjaminking (Ben King) wrote…
What about |
benjaminking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjaminking made 1 comment and resolved 1 discussion.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ddaspit and @mshannon-sil).
silnlp/nmt/experiment.py line 223 at r2 (raw file):
Previously, mshannon-sil wrote…
What about
--verse-test-scores-file? That would distinguish it from the aggregate/book-level scores-(ckpt).csv file that test.py already outputs.
Yes, that sounds good to me.
mshannon-sil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mshannon-sil made 5 comments.
Reviewable status: 3 of 8 files reviewed, 5 unresolved discussions (waiting on @benjaminking and @ddaspit).
silnlp/common/translator.py line 177 at r2 (raw file):
Previously, benjaminking (Ben King) wrote…
It looks like you only ever use the
from_trg_pathmethod to initiate this class. Is there a way to get rid of thetrg_file_pathparameter in the constructor, since it should be predictable frompath?
I use the init method to initiate the class in quality_estimation.py at the end of the main() method. But yeah I think I did overengineer this, so I simplified it to just using the init method and deriving the trg_draft_file_path (renamed to be more accurate & consistent w/the repo) from path.
silnlp/nmt/quality_estimation.py line 76 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
Could you have
confidence_filehere be an instance ofConfidenceFileso that you can callget_chapter_path()instead of specifying the suffix? It might also be nice ifConfidenceFilehad some sort of iterator method that could iterate over books/chapters/verses and their confidence scores, so that other methods like this wouldn't need to know the format of the file.
Yeah sorry looks like I didn't integrate it in quality_estimation.py as fully as I thought. I fixed it to use get_chapter_path() and get_book_path(). I think I'll postpone using an interator method for another time as I'm not 100% clear on its purpose and I'd like to get these core changes into master soon.
silnlp/nmt/translate.py line 66 at r2 (raw file):
Previously, benjaminking (Ben King) wrote…
It doesn't look like this method returns anything.
Done. This was leftover from when quality estimation was happening outside the method. I also caught a bug when I was looking at this. In the call to glob() at the end that looks for confidence files, I changed it to trg_file_path.stem rather than trg_file_path.name so that it correctly picks up drafts that get their suffix changed in postprocessing/multiple translations.
silnlp/nmt/translate.py line 157 at r2 (raw file):
Previously, benjaminking (Ben King) wrote…
Same issue with this function.
Done.
silnlp/nmt/translate.py line 219 at r2 (raw file):
Previously, benjaminking (Ben King) wrote…
Same issue here as well.
Done.
benjaminking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you get it merged, I can create an issue to track the bit that we postponed.
@benjaminking reviewed 5 files and all commit messages, made 2 comments, and resolved 5 discussions.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @ddaspit).
silnlp/nmt/quality_estimation.py line 76 at r1 (raw file):
Previously, mshannon-sil wrote…
Yeah sorry looks like I didn't integrate it in quality_estimation.py as fully as I thought. I fixed it to use get_chapter_path() and get_book_path(). I think I'll postpone using an interator method for another time as I'm not 100% clear on its purpose and I'd like to get these core changes into master soon.
That's ok. Maybe we can create an issue to track this once it's merged. The idea is that we'd like to have all the logic related to reading the confidence files (in whatever format they're in) in one place, ideally encapsulated in a class. That way, if the format ever needs to change, we don't need to track down all the places that read the confidence files and risk missing one of them.
The iterator would be a way to get access to the contents of the confidence files without needing to know the format. You could write something like for chapter, confidence in confidence_file.chapter_score_iterator, and project those scores, with the ConfidenceFile class taking care of all the file IO details.
A different way to approach the problem that would still preserve encapsulation would be to have the ConfidenceFile class create the projected chrF3 scores, e.g. confidence_file.project_chrf3_scores(slope, intercept).
|
Previously, benjaminking (Ben King) wrote…
Oh I see what you mean now! Yes let's track that in an issue and make that part of the next PR. |
…approach across translate.py and experiment.py; fix handling and minimize number of confidence files when post_processing is applied
This PR addresses a number of requested improvements across features related to confidence, quality estimation, and sampling usability:
This change is