Skip to content

Commit 9bbb4d2

Browse files
committed
Finish 2.6.0
2 parents ec6bbd8 + eac29d1 commit 9bbb4d2

39 files changed

+385
-193
lines changed

.github/workflows/ci.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ jobs:
1919
strategy:
2020
fail-fast: false
2121
matrix:
22-
ruby: ['3.0', 3.1, 3.2, ruby-head, jruby]
22+
ruby: ['3.0', 3.1, 3.2, 3.3, ruby-head, jruby]
2323
steps:
2424
- name: Clone repository
2525
uses: actions/checkout@v3
@@ -33,6 +33,6 @@ jobs:
3333
run: ruby --version; bundle exec rspec spec || $ALLOW_FAILURES
3434
- name: Coveralls GitHub Action
3535
uses: coverallsapp/github-action@v2
36-
if: "matrix.ruby == '3.2'"
36+
if: "matrix.ruby == '3.3'"
3737
with:
3838
github-token: ${{ secrets.GITHUB_TOKEN }}

Gemfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ group :development do
1313
gem "redcarpet", platforms: :mri
1414
gem "rocco", platforms: :mri
1515
gem "pygmentize", platforms: :mri
16+
gem 'getoptlong'
1617
end
1718

1819
group :development, :test do

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,9 @@ As LL(1) grammars operate using `alt` and `seq` primitives, allowing for a match
2626
* Transform `a ::= b+` into `a ::= b b*`
2727
* Transform `a ::= b*` into `a ::= _empty | (b a)`
2828
* Transform `a ::= op1 (op2)` into two rules:
29-
```
30-
a ::= op1 _a_1
31-
_a_1_ ::= op2
32-
```
29+
30+
a ::= op1 _a_1
31+
_a_1_ ::= op2
3332

3433
Of note in this implementation is that the tokenizer and parser are streaming, so that they can process inputs of arbitrary size.
3534

@@ -75,7 +74,7 @@ Generate formatted grammar using HTML (requires [Haml][Haml] gem):
7574

7675
### Parsing an ISO/IEC 14977 Grammar
7776

78-
The EBNF gem can also parse [ISO/EIC 14977] Grammars (ISOEBNF) to [S-Expressions][S-Expression].
77+
The EBNF gem can also parse [ISO/IEC 14977][] Grammars (ISOEBNF) to [S-Expressions][S-Expression].
7978

8079
grammar = EBNF.parse(File.open('./etc/iso-ebnf.isoebnf'), format: :isoebnf)
8180

@@ -96,15 +95,15 @@ The {EBNF::Writer} class can be used to write parsed grammars out, either as for
9695
The formatted HTML results are designed to be appropriate for including in specifications.
9796

9897
### Parser Errors
99-
On a parsing failure, and exception is raised with information that may be useful in determining the source of the error.
98+
On a parsing failure, an exception is raised with information that may be useful in determining the source of the error.
10099

101100
## EBNF Grammar
102101
The [EBNF][] variant used here is based on [W3C](https://w3.org/) [EBNF][]
103102
(see [EBNF grammar](https://dryruby.github.io/ebnf/etc/ebnf.ebnf))
104103
as defined in the
105104
[XML 1.0 recommendation](https://www.w3.org/TR/REC-xml/), with minor extensions:
106105

107-
Note that the grammar includes an optional `[identifer]` in front of rule names, which can be in conflict with the `RANGE` terminal. It is typically not a problem, but if it comes up, try parsing with the `native` parser, add comments or sequences to disambiguate. EBNF does not have beginning of line checks as all whitespace is treated the same, so the common practice of identifying each rule inherently leads to such ambiguity.
106+
Note that the grammar includes an optional `[number]` in front of rule names, which can be in conflict with the `RANGE` terminal. It is typically not a problem, but if it comes up, try parsing with the `native` parser, add comments or sequences to disambiguate. EBNF does not have beginning of line checks as all whitespace is treated the same, so the common practice of identifying each rule inherently leads to such ambiguity.
108107

109108
The character set for EBNF is UTF-8.
110109

@@ -116,7 +115,7 @@ which can also be proceeded by an optional number enclosed in square brackets to
116115

117116
[1] symbol ::= expression
118117

119-
(Note, this can introduce an ambiguity if the previous rule ends in a range or enum and the current rule has no identifier. In this case, enclosing `expression` within parentheses, or adding intervening comments can resolve the ambiguity.)
118+
(Note, introduces an ambiguity if the previous rule ends in a range or enum and the current rule has no number. The parsers dynamically determine the terminal rules for the `LHS` (the identifier, symbol, and `::=`) and `RANGE`).
120119

121120
Symbols are written in CAPITAL CASE if they are the start symbol of a regular language (terminals), otherwise with they are treated as non-terminal rules. Literal strings are quoted.
122121

@@ -134,7 +133,7 @@ Within the expression on the right-hand side of a rule, the following expression
134133
<tr><td><code>[^abc], [^#xN#xN#xN]</code></td>
135134
<td>matches any UTF-8 R\_CHAR or HEX with a value not among the characters given. The last component may be '-'. Enumerations and ranges of excluded values may be mixed in one set of brackets.</td></tr>
136135
<tr><td><code>"string"</code></td>
137-
<td>matches a literal string matching that given inside the double quotes.</td></tr>
136+
<td>matches a literal string matching that given inside the double quotes case insensitively.</td></tr>
138137
<tr><td><code>'string'</code></td>
139138
<td>matches a literal string matching that given inside the single quotes.</td></tr>
140139
<tr><td><code>A (B | C)</code></td>
@@ -158,7 +157,8 @@ Within the expression on the right-hand side of a rule, the following expression
158157
</table>
159158

160159
* Comments include `//` and `#` through end of line (other than hex character) and `/* ... */ (* ... *) which may cross lines`
161-
* All rules **MAY** start with an identifier, contained within square brackets. For example `[1] rule`, where the value within the brackets is a symbol `([a-z] | [A-Z] | [0-9] | "_" | ".")+`
160+
* All rules **MAY** start with an number, contained within square brackets. For example `[1] rule`, where the value within the brackets is a symbol `([a-z] | [A-Z] | [0-9] | "_" | ".")+`, which is not retained after parsing
161+
* Symbols **MAY** be enclosed in angle brackets `'<'` and `>`, which are dropped when parsing.
162162
* `@terminals` causes following rules to be treated as terminals. Any terminal which is all upper-case (eg`TERMINAL`), or any rules with expressions that match characters (`#xN`, `[a-z]`, `[^a-z]`, `[abc]`, `[^abc]`, `"string"`, `'string'`, or `A - B`), are also treated as terminals.
163163
* `@pass` defines the expression used to detect whitespace, which is removed in processing.
164164
* No support for `wfc` (well-formedness constraint) or `vc` (validity constraint).
@@ -177,7 +177,7 @@ Intermediate representations of the grammar may be serialized to Lisp-like [S-Ex
177177

178178
is serialized as
179179

180-
(rule ebnf "1" (star (alt declaration rule)))
180+
(rule ebnf (star (alt declaration rule)))
181181

182182
Different components of an EBNF rule expression are transformed into their own operator:
183183

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.5.0
1+
2.6.0

bin/ebnf

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ $:.unshift(File.expand_path(File.join(File.dirname(__FILE__), "..", 'lib')))
99
require 'rubygems'
1010
require 'getoptlong'
1111
require 'ebnf'
12+
require 'rdf/spec'
1213

1314
options = {
1415
output_format: :sxp,
@@ -86,7 +87,11 @@ end
8687

8788
input = File.open(ARGV[0]) if ARGV[0]
8889

89-
ebnf = EBNF.parse(input || STDIN, **options)
90+
logger = Logger.new(STDERR)
91+
logger.level = options[:level] || Logger::ERROR
92+
logger.formatter = lambda {|severity, datetime, progname, msg| "%5s %s\n" % [severity, msg]}
93+
94+
ebnf = EBNF.parse(input || STDIN, logger: logger, **options)
9095
ebnf.make_bnf if options[:bnf] || options[:ll1]
9196
ebnf.make_peg if options[:peg]
9297
if options[:ll1]

ebnf.gemspec

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Gem::Specification.new do |gem|
3535
gem.add_runtime_dependency 'rdf', '~> 3.3' # Required by sxp
3636
gem.add_runtime_dependency 'htmlentities', '~> 4.3'
3737
gem.add_runtime_dependency 'unicode-types', '~> 1.8'
38+
gem.add_runtime_dependency 'base64', '~> 0.2'
3839
gem.add_development_dependency 'amazing_print', '~> 1.4'
3940
gem.add_development_dependency 'rdf-spec', '~> 3.3'
4041
gem.add_development_dependency 'rdf-turtle', '~> 3.3'

etc/ebnf.ebnf

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,8 @@
55

66
# Use the LHS terminal to match the identifier, rule name and assignment due to
77
# confusion between the identifier and RANGE.
8-
# Note, for grammars not using identifiers, it is still possible to confuse
9-
# a rule ending with a range the next rule, as it may be interpreted as an identifier.
10-
# In such case, best to enclose the rule in '()'.
8+
# The PEG parser has special rules for matching LHS and RANGE
9+
# so that RANGE is not confused with LHS.
1110
[3] rule ::= LHS expression
1211

1312
[4] expression ::= alt
@@ -34,11 +33,13 @@
3433

3534
[11] LHS ::= ('[' SYMBOL ']' ' '+)? SYMBOL ' '* '::='
3635

37-
[12] SYMBOL ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+
36+
[12] SYMBOL ::= '<' O_SYMBOL '>' | O_SYMBOL
37+
38+
[12a] O_SYMBOL ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+
3839

3940
[13] HEX ::= '#x' ([a-f] | [A-F] | [0-9])+
4041

41-
[14] RANGE ::= '[' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']' - LHS
42+
[14] RANGE ::= '[' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']'
4243

4344
[15] O_RANGE ::= '[^' ((R_CHAR '-' R_CHAR) | (HEX '-' HEX) | R_CHAR | HEX)+ '-'? ']'
4445

etc/ebnf.html

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- Generated with ebnf version 2.4.0. See https://github.com/dryruby/ebnf. -->
1+
<!-- Generated with ebnf version 2.5.0. See https://github.com/dryruby/ebnf. -->
22
<table class="grammar">
33
<tbody id="grammar-productions" class="ebnf">
44
<tr id="grammar-production-ebnf">
@@ -77,6 +77,12 @@
7777
<td>[12]</td>
7878
<td><code>SYMBOL</code></td>
7979
<td>::=</td>
80+
<td><code class="grammar-paren">(</code>'<code class="grammar-literal">&lt;</code>' <a href="#grammar-production-O_SYMBOL">O_SYMBOL</a> '<code class="grammar-literal">&gt;</code>'<code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <a href="#grammar-production-O_SYMBOL">O_SYMBOL</a></td>
81+
</tr>
82+
<tr id="grammar-production-O_SYMBOL">
83+
<td>[12a]</td>
84+
<td><code>O_SYMBOL</code></td>
85+
<td>::=</td>
8086
<td><code class="grammar-paren">(</code><code class="grammar-brac">[</code><code class="grammar-literal">a-z</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> <code class="grammar-brac">[</code><code class="grammar-literal">A-Z</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> <code class="grammar-brac">[</code><code class="grammar-literal">0-9</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> '<code class="grammar-literal">_</code>' <code class="grammar-alt">|</code> '<code class="grammar-literal">.</code>'<code class="grammar-paren">)</code><code class="grammar-plus">+</code></td>
8187
</tr>
8288
<tr id="grammar-production-HEX">
@@ -89,7 +95,7 @@
8995
<td>[14]</td>
9096
<td><code>RANGE</code></td>
9197
<td>::=</td>
92-
<td>'<code class="grammar-literal">[</code>' <code class="grammar-paren">(</code><code class="grammar-paren">(</code><a href="#grammar-production-R_CHAR">R_CHAR</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-R_CHAR">R_CHAR</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <code class="grammar-paren">(</code><a href="#grammar-production-HEX">HEX</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> <code class="grammar-alt">|</code> <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code><code class="grammar-plus">+</code> '<code class="grammar-literal">-</code>'<code class="grammar-opt">?</code> <code class="grammar-paren">(</code>'<code class="grammar-literal">]</code>' <code class="grammar-diff">-</code> <a href="#grammar-production-LHS">LHS</a><code class="grammar-paren">)</code></td>
98+
<td>'<code class="grammar-literal">[</code>' <code class="grammar-paren">(</code><code class="grammar-paren">(</code><a href="#grammar-production-R_CHAR">R_CHAR</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-R_CHAR">R_CHAR</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <code class="grammar-paren">(</code><a href="#grammar-production-HEX">HEX</a> '<code class="grammar-literal">-</code>' <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <a href="#grammar-production-R_CHAR">R_CHAR</a> <code class="grammar-alt">|</code> <a href="#grammar-production-HEX">HEX</a><code class="grammar-paren">)</code><code class="grammar-plus">+</code> '<code class="grammar-literal">-</code>'<code class="grammar-opt">?</code> '<code class="grammar-literal">]</code>'</td>
9399
</tr>
94100
<tr id="grammar-production-O_RANGE">
95101
<td>[15]</td>

etc/ebnf.ll1.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# This file is automatically generated by ebnf version 2.4.0
1+
# This file is automatically generated by ebnf version 2.5.0
22
# Derived from etc/ebnf.ebnf
33
module Meta
44
START = :ebnf

etc/ebnf.ll1.sxp

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -100,13 +100,11 @@
100100
(seq '@pass' expression))
101101
(terminals _terminals (seq))
102102
(terminal LHS "11" (seq (opt (seq '[' SYMBOL ']' (plus ' '))) SYMBOL (star ' ') '::='))
103-
(terminal SYMBOL "12" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
103+
(terminal SYMBOL "12" (alt (seq '<' O_SYMBOL '>') O_SYMBOL))
104+
(terminal O_SYMBOL "12a" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
104105
(terminal HEX "13" (seq '#x' (plus (alt (range "a-f") (range "A-F") (range "0-9")))))
105106
(terminal RANGE "14"
106-
(seq '['
107-
(plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX))
108-
(opt '-')
109-
(diff ']' LHS)) )
107+
(seq '[' (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX)) (opt '-') ']'))
110108
(terminal O_RANGE "15"
111109
(seq '[^' (plus (alt (seq R_CHAR '-' R_CHAR) (seq HEX '-' HEX) R_CHAR HEX)) (opt '-') ']'))
112110
(terminal STRING1 "16" (seq '"' (star (diff CHAR '"')) '"'))

0 commit comments

Comments
 (0)