Conversation
|
I have fairly limited rust experience, but want to become more involved in this part of the substrait community. This is just a small PR to get my feet wet :) |
| owner: owner.to_string(), | ||
| id: id.to_string(), | ||
| }), | ||
| _ => Err(InvalidUrn), |
There was a problem hiding this comment.
Actually, I caught a small bug that allowed urns like extension:one:two:three:four:five. Let me know if this is an okay implementation!
There was a problem hiding this comment.
Where is it specified that this is an invalid identifier?
There was a problem hiding this comment.
You are right, it isn't specified anywhere. However, if we do use extension:one:two:three:four:five, then there is an ambiguous parse. Any one of the following could be the owner for example:
- "one"
- "one:two"
- "one:two:three"
- "one:two:three:four"
Since this isn't formally captured in the substrait docs, I have opened a PR in upstream here.
The behavior introduced in this PR is consistent with the implementations in other projects
For consistency with the other libraries, we could instead use a regex to validate? It would probably be slower, but what do you think?
I could also introduce the regex into the substrait upstream itself.
There was a problem hiding this comment.
I would suggest following https://en.wikipedia.org/wiki/Uniform_Resource_Name#Syntax
There was a problem hiding this comment.
Ah I think I see the distinction now. You raise a fair point. Perhaps the correct solution is actually to upstream to the parent substrait lib a clarification that extension:owner:<something that may contain ':'> will always parse in the way you had it before as owner=owner, and id=<something that may contain ':'>.
There was a problem hiding this comment.
Actually, we are already breaking from urn convention, as our "urns" do not have urn: prepended to them
I originally wrote about the <NID> being invalid because it has periods, but I realize now that our <NID> is in fact extension, so that part is fine.
Shall we clarify then that for us, urns are valid "official" urns but just with the urn: prefix chopped off?
There was a problem hiding this comment.
Shall we clarify then that for us, urns are valid "official" urns but just with the urn: prefix chopped off?
Sounds good to me (this was also my understanding). Let's discuss and document this in the main repo.
There was a problem hiding this comment.
Reopened the PR in upstream with this clarification. Thanks!
add a few more tests as well
|
Making this one a draft until the upstream (substrait-io/substrait#881) is resolved. |
Adds some very simple tests validating the already existing URN logic.