-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Minimal reproduction:
(-> "[\"^ \",\"~:foo}\",\"bar\"]"
(.getBytes "UTF-8")
(java.io.ByteArrayInputStream.)
(cognitect.transit/reader :json)
cognitect.transit/read
pr-str
clojure.edn/read-string)The above results in a RuntimeException from read-string ("Map literal must contain an even number of forms"), due to the } at the end of the :foo} in the transit string (which, to be clear, I've constructed by hand, rather than outputting from Transit). If you remove the read-string and replace it with keys first, you can see that Transit has not identified the keyword as invalid, and has produced a keyword :foo}. This issue is me asking for Transit instead to report that the input was malformed or is not representable as a valid Clojure data structure.
I understand that for performance reasons Clojure doesn't validate keywords when interning, and I'm not proposing to bikeshed that decision. However, for Transit, that approach has the unfortunate consequence that you have to re-validate data that Transit has read from the potentially-untrusted network before you can assume that its keywords are safe to use in everyday Clojure code.
Specifically, we have a Clojure web app with a ClojureScript front-end, and we use Transit to communicate between them. Sometimes we'll need to take some of the data received from the front-end and write it out somewhere else (to be read back in future, or to communicate it to a separate process, etc). The natural way to do that is via pr-str and clojure.edn/read. This issue introduces a new point of failure in our application -- reading the EDN back in might fail, even though it was written out via pr-str, because Transit might have given us a keyword which is not printable/readable in Clojure.
For additional context, this issue was discovered by an external pen-test, attempting an XSS attack (by manually replaying a modified request), so the answer for us can't just be "well then don't have your front end send data across the wire that you're not going to be capable of reading correctly". To give ourselves some level of confidence that our app isn't going to break when someone tries XSS, we're now having to write additional code to wrap around our calls to transit/read to ensure that the keywords it has read are valid keywords, etc. That seems like it would be better handled when reading the input in the first place.