Skip to content

Word-Final Spaces in Dictionary Entries #48

@Trey314159

Description

@Trey314159

These commands reveal entries that have inappropriate word-final spaces in them:
grep " ," core_lex.csv — 62 entries
grep " ," notcore_lex.csv — 16 entries

Here are the first few from core_lex.csv:

alkeran ,4785,4785,5000,Alkeran ,名詞,固有名詞,一般,*,*,*,アルケラン,アルケラン,*,A,*,*,*,*
blopress ,4785,4785,5000,Blopress ,名詞,固有名詞,一般,*,*,*,ブロプレス,ブロプレス,*,A,*,*,*,*
bosna i hercegovina ,4793,4793,5000,Bosna i Hercegovina ,名詞,固有名詞,地名,国,*,*,ボスニア・ヘルツェゴビナ,ボスニア・ヘルツェゴビナ,*,A,*,*,*,022705
engel ,4787,4787,5000,Engel ,名詞,固有名詞,人名,一般,*,*,エンゲル,エンゲル,*,A,*,*,*,017728

This creates tokens with spaces at the end of them, and inconsistent tokenization when the words are followed by punctuation. So the string engel engel. gets two tokens: "engel " (with a space) and "engel" (without a space).

These entries should have their word-final spaces removed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions