Skip to content

Commit 85d33fe

Browse files
📝 English GRS with rule description
1 parent eeb180d commit 85d33fe

File tree

1 file changed

+48
-0
lines changed

1 file changed

+48
-0
lines changed

benchmarks/english_golden_rules.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,157 +1,205 @@
11
# -*- coding: utf-8 -*-
22

33
GOLDEN_EN_RULES = [
4+
# 1) Simple period to end sentence
45
("Hello World. My name is Jonas.", ["Hello World.", "My name is Jonas."]),
6+
# 2) Question mark to end sentence
57
("What is your name? My name is Jonas.", ["What is your name?", "My name is Jonas."]),
8+
# 3) Exclamation point to end sentence
69
("There it is! I found it.", ["There it is!", "I found it."]),
10+
# 4) One letter upper case abbreviations
711
("My name is Jonas E. Smith.", ["My name is Jonas E. Smith."]),
12+
# 5) One letter lower case abbreviations
813
("Please turn to p. 55.", ["Please turn to p. 55."]),
14+
# 6) Two letter lower case abbreviations in the middle of a sentence
915
("Were Jane and co. at the party?", ["Were Jane and co. at the party?"]),
16+
# 7) Two letter upper case abbreviations in the middle of a sentence
1017
("They closed the deal with Pitt, Briggs & Co. at noon.",
1118
["They closed the deal with Pitt, Briggs & Co. at noon."]),
19+
# 8) Two letter lower case abbreviations at the end of a sentence
1220
(
1321
"Let's ask Jane and co. They should know.",
1422
["Let's ask Jane and co.", "They should know."]),
23+
# 9) Two letter upper case abbreviations at the end of a sentence
1524
(
1625
"They closed the deal with Pitt, Briggs & Co. It closed yesterday.", [
1726
"They closed the deal with Pitt, Briggs & Co.",
1827
"It closed yesterday."
1928
],
2029
),
30+
# 10) Two letter (prepositive) abbreviations
2131
("I can see Mt. Fuji from here.", ["I can see Mt. Fuji from here."]),
32+
# 11) Two letter (prepositive & postpositive) abbreviations
2233
(
2334
"St. Michael's Church is on 5th st. near the light.",
2435
["St. Michael's Church is on 5th st. near the light."],
2536
),
37+
# 12) Possesive two letter abbreviations
2638
("That is JFK Jr.'s book.", ["That is JFK Jr.'s book."]),
39+
# 13) Multi-period abbreviations in the middle of a sentence
2740
("I visited the U.S.A. last year.", ["I visited the U.S.A. last year."]),
41+
# 14) Multi-period abbreviations at the end of a sentence
2842
(
2943
"I live in the E.U. How about you?",
3044
["I live in the E.U.", "How about you?"],
3145
),
46+
# 15) U.S. as sentence boundary
3247
(
3348
"I live in the U.S. How about you?",
3449
["I live in the U.S.", "How about you?"],
3550
),
51+
# 16) U.S. as non sentence boundary with next word capitalized
3652
("I work for the U.S. Government in Virginia.",
3753
["I work for the U.S. Government in Virginia."]),
54+
# 17) U.S. as non sentence boundary
3855
("I have lived in the U.S. for 20 years.",
3956
["I have lived in the U.S. for 20 years."]),
4057
# Most difficult sentence to crack
58+
# 18) A.M. / P.M. as non sentence boundary and sentence boundary
4159
(
4260
"At 5 a.m. Mr. Smith went to the bank. He left the bank at 6 P.M. Mr. Smith then went to the store.",
4361
[
4462
"At 5 a.m. Mr. Smith went to the bank.",
4563
"He left the bank at 6 P.M.", "Mr. Smith then went to the store."
4664
]
4765
),
66+
# 19) Number as non sentence boundary
4867
("She has $100.00 in her bag.", ["She has $100.00 in her bag."]),
68+
# 20) Number as sentence boundary
4969
("She has $100.00. It is in her bag.", ["She has $100.00.", "It is in her bag."]),
70+
# 21) Parenthetical inside sentence
5071
("He teaches science (He previously worked for 5 years as an engineer.) at the local University.",
5172
["He teaches science (He previously worked for 5 years as an engineer.) at the local University."]),
73+
# 22) Email addresses
5274
("Her email is [email protected]. I sent her an email.",
5375
["Her email is [email protected].", "I sent her an email."]),
76+
# 23) Web addresses
5477
("The site is: https://www.example.50.com/new-site/awesome_content.html. Please check it out.",
5578
["The site is: https://www.example.50.com/new-site/awesome_content.html.",
5679
"Please check it out."]),
80+
# 24) Single quotations inside sentence
5781
(
5882
"She turned to him, 'This is great.' she said.",
5983
["She turned to him, 'This is great.' she said."],
6084
),
85+
# 25) Double quotations inside sentence
6186
(
6287
'She turned to him, "This is great." she said.',
6388
['She turned to him, "This is great." she said.'],
6489
),
90+
# 26) Double quotations at the end of a sentence
6591
(
6692
'She turned to him, "This is great." She held the book out to show him.',
6793
[
6894
'She turned to him, "This is great."',
6995
"She held the book out to show him."
7096
],
7197
),
98+
# 27) Double punctuation (exclamation point)
7299
("Hello!! Long time no see.", ["Hello!!", "Long time no see."]),
100+
# 28) Double punctuation (question mark)
73101
("Hello?? Who is there?", ["Hello??", "Who is there?"]),
102+
# 29) Double punctuation (exclamation point / question mark)
74103
("Hello!? Is that you?", ["Hello!?", "Is that you?"]),
104+
# 30) Double punctuation (question mark / exclamation point)
75105
("Hello?! Is that you?", ["Hello?!", "Is that you?"]),
106+
# 31) List (period followed by parens and no period to end item)
76107
(
77108
"1.) The first item 2.) The second item",
78109
["1.) The first item", "2.) The second item"],
79110
),
111+
# 32) List (period followed by parens and period to end item)
80112
(
81113
"1.) The first item. 2.) The second item.",
82114
["1.) The first item.", "2.) The second item."],
83115
),
116+
# 33) List (parens and no period to end item)
84117
(
85118
"1) The first item 2) The second item",
86119
["1) The first item", "2) The second item"],
87120
),
121+
# 34) List (parens and period to end item)
88122
("1) The first item. 2) The second item.",
89123
["1) The first item.", "2) The second item."]),
124+
# 35) List (period to mark list and no period to end item)
90125
(
91126
"1. The first item 2. The second item",
92127
["1. The first item", "2. The second item"],
93128
),
129+
# 36) List (period to mark list and period to end item)
94130
(
95131
"1. The first item. 2. The second item.",
96132
["1. The first item.", "2. The second item."],
97133
),
134+
# 37) List with bullet
98135
(
99136
"• 9. The first item • 10. The second item",
100137
["• 9. The first item", "• 10. The second item"],
101138
),
139+
# 38) List with hypthen
102140
(
103141
"⁃9. The first item ⁃10. The second item",
104142
["⁃9. The first item", "⁃10. The second item"],
105143
),
144+
# 39) Alphabetical list
106145
(
107146
"a. The first item b. The second item c. The third list item",
108147
["a. The first item", "b. The second item", "c. The third list item"],
109148
),
149+
# 40) Geo Coordinates
110150
(
111151
"You can find it at N°. 1026.253.553. That is where the treasure is.",
112152
[
113153
"You can find it at N°. 1026.253.553.",
114154
"That is where the treasure is."
115155
],
116156
),
157+
# 41) Named entities with an exclamation point
117158
(
118159
"She works at Yahoo! in the accounting department.",
119160
["She works at Yahoo! in the accounting department."],
120161
),
162+
# 42) I as a sentence boundary and I as an abbreviation
121163
(
122164
"We make a good team, you and I. Did you see Albert I. Jones yesterday?",
123165
[
124166
"We make a good team, you and I.",
125167
"Did you see Albert I. Jones yesterday?"
126168
],
127169
),
170+
# 43) Ellipsis at end of quotation
128171
(
129172
"Thoreau argues that by simplifying one’s life, “the laws of the universe will appear less complex. . . .”",
130173
[
131174
"Thoreau argues that by simplifying one’s life, “the laws of the universe will appear less complex. . . .”"
132175
],
133176
),
177+
# 44) Ellipsis with square brackets
134178
(
135179
""""Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).""",
136180
[
137181
'"Bohr [...] used the analogy of parallel stairways [...]" (Smith 55).'
138182
],
139183
),
184+
# 45) Ellipsis as sentence boundary (standard ellipsis rules)
140185
("If words are left off at the end of a sentence, and that is all that is omitted, indicate the omission with ellipsis marks (preceded and followed by a space) and then indicate the end of the sentence with a period . . . . Next sentence.",
141186
[
142187
"If words are left off at the end of a sentence, and that is all that is omitted, indicate the omission with ellipsis marks (preceded and followed by a space) and then indicate the end of the sentence with a period . . . .",
143188
"Next sentence."
144189
]),
190+
# 46) Ellipsis as sentence boundary (non-standard ellipsis rules)
145191
(
146192
"I never meant that.... She left the store.",
147193
["I never meant that....", "She left the store."],
148194
),
195+
# 47) Ellipsis as non sentence boundary
149196
(
150197
"I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it.",
151198
[
152199
"I wasn’t really ... well, what I mean...see . . . what I'm saying, the thing is . . . I didn’t mean it."
153200
],
154201
),
202+
# 48) 4-dot ellipsis
155203
(
156204
"One further habit which was somewhat weakened . . . was that of combining words into self-interpreting compounds. . . . The practice was not abandoned. . . .",
157205
[

0 commit comments

Comments
 (0)