Improve lark-cython compatibility#1528
Improve lark-cython compatibility#1528ornariece wants to merge 10 commits intolark-parser:masterfrom
Conversation
| return self.postlex.process(tokens) | ||
|
|
||
| def __copy__(self): | ||
| return type(self)(self.lexer, copy(self.state), self.postlex) |
There was a problem hiding this comment.
Are we sure that the postlexer shouldn't be copied too?
There was a problem hiding this comment.
i would say so? i see it as some kind of processor, it holds no data per se about the current states. i could be wrong.
There was a problem hiding this comment.
But it does hold a state, e.g. https://github.com/lark-parser/lark/blob/master/lark/indenter.py#L33
There was a problem hiding this comment.
Although, tbh, those shouldn't be instance attributes, but local variables inside of process. I think this postlexer design should already be broken on the current main if this copy method is called since lexer=PostLexConnector doesn't get copied either.
There was a problem hiding this comment.
i see, i agree those don't really make sense as instance attributes. it makes sense that a postlexer instance should be agnostic to a stream of tokens. this matter is probably beyond the scope of this PR though.
…tting, and support for a dark theme
Docs: Updated link of DSL article to a new version
|
The "Python type check" test should now work properly if you rebase over master. |
|
Overall looks good, but I have a few questions -
Also, I still think copy() should make a copy of the postlex too, since it's not an object we have control of. (and if there's no data to copy, the cost is negligible anyway) |
This PR addresses two significant compatibility issues when using lark-cython:
1. Fix Token handling for lark-cython
In the standard lark implementation,
Tokeninherits fromstr, allowing string methods to be called directly onTokenobjects. However, lark-cython implementsTokendifferently - it doesn't inherit fromstr, which means string methods likersplit()won't work on token objects.Changes:
Tokeninstances with calls ontoken.valueinstead2. Refactor postlexer handling architecture
The original implementation used a
PostLexConnectorclass that acted as a wrapper around a lexer but only implemented part of theLexerinterface (thelex()method but notnext_token()). This partial implementation worked in standard lark through duck typing but caused errors in lark-cython's more strict implementation.Changes:
PostLexConnectorclassself.postlexattribute to store the postlexer separatelyPostLexwhile ensuring proper interface implementationThis approach is more consistent with the architectural intent of postlexers as stream processors, while avoiding type compatibility issues in lark-cython.