alkeran ,4785,4785,5000,Alkeran ,名詞,固有名詞,一般,*,*,*,アルケラン,アルケラン,*,A,*,*,*,*
blopress ,4785,4785,5000,Blopress ,名詞,固有名詞,一般,*,*,*,ブロプレス,ブロプレス,*,A,*,*,*,*
bosna i hercegovina ,4793,4793,5000,Bosna i Hercegovina ,名詞,固有名詞,地名,国,*,*,ボスニア・ヘルツェゴビナ,ボスニア・ヘルツェゴビナ,*,A,*,*,*,022705
engel ,4787,4787,5000,Engel ,名詞,固有名詞,人名,一般,*,*,エンゲル,エンゲル,*,A,*,*,*,017728
This creates tokens with spaces at the end of them, and inconsistent tokenization when the words are followed by punctuation. So the string engel engel. gets two tokens: "engel " (with a space) and "engel" (without a space).
These entries should have their word-final spaces removed.
These commands reveal entries that have inappropriate word-final spaces in them:
grep " ," core_lex.csv— 62 entriesgrep " ," notcore_lex.csv— 16 entriesHere are the first few from
core_lex.csv:This creates tokens with spaces at the end of them, and inconsistent tokenization when the words are followed by punctuation. So the string engel engel. gets two tokens: "engel " (with a space) and "engel" (without a space).
These entries should have their word-final spaces removed.