Skip to content

Transit reader produces Clojure keywords that cannot be round-tripped to EDN #32

@samroberton

Description

@samroberton

Minimal reproduction:

(-> "[\"^ \",\"~:foo}\",\"bar\"]"
    (.getBytes "UTF-8")
    (java.io.ByteArrayInputStream.)
    (cognitect.transit/reader :json)
    cognitect.transit/read
    pr-str
    clojure.edn/read-string)

The above results in a RuntimeException from read-string ("Map literal must contain an even number of forms"), due to the } at the end of the :foo} in the transit string (which, to be clear, I've constructed by hand, rather than outputting from Transit). If you remove the read-string and replace it with keys first, you can see that Transit has not identified the keyword as invalid, and has produced a keyword :foo}. This issue is me asking for Transit instead to report that the input was malformed or is not representable as a valid Clojure data structure.

I understand that for performance reasons Clojure doesn't validate keywords when interning, and I'm not proposing to bikeshed that decision. However, for Transit, that approach has the unfortunate consequence that you have to re-validate data that Transit has read from the potentially-untrusted network before you can assume that its keywords are safe to use in everyday Clojure code.

Specifically, we have a Clojure web app with a ClojureScript front-end, and we use Transit to communicate between them. Sometimes we'll need to take some of the data received from the front-end and write it out somewhere else (to be read back in future, or to communicate it to a separate process, etc). The natural way to do that is via pr-str and clojure.edn/read. This issue introduces a new point of failure in our application -- reading the EDN back in might fail, even though it was written out via pr-str, because Transit might have given us a keyword which is not printable/readable in Clojure.

For additional context, this issue was discovered by an external pen-test, attempting an XSS attack (by manually replaying a modified request), so the answer for us can't just be "well then don't have your front end send data across the wire that you're not going to be capable of reading correctly". To give ourselves some level of confidence that our app isn't going to break when someone tries XSS, we're now having to write additional code to wrap around our calls to transit/read to ensure that the keywords it has read are valid keywords, etc. That seems like it would be better handled when reading the input in the first place.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions