Skip to content

Code Review and Suggestions #1

@beanpuppy

Description

@beanpuppy

Hey, I saw your issue on Dingo asking for a code review. I've been working on my own "compiles to Go" language - https://github.com/halcyonnouveau/soppo, so I think I have some relevant experience here.

I think this is a good start, and honestly has a much better foundation for this kind of thing than Dingo (but that's a story for another time). I've structured this into three parts: language design (semantic issues with the language itself), architecture (compiler structure), and code (implementation bugs). I've focused on issues with the current approach rather than features that just aren't implemented yet - I get that missing features are expected for an early-stage project.

Anyway, hope you'll find this helpful !

Language Design

The ?string syntax suggests a nullable string, but in Go, string is a value type that cannot be nil - only empty (""). Only pointers, slices, maps, channels, interfaces, and functions can be nil.

type User struct {
    email: ?string  // This can't actually be nil in Go
}

For this to work, you'd need to compile ?string to *string (pointer to string). Currently the codegen doesn't transform nullable types, so either the ? is stripped and it's just string (which can't be nil), or it emits ?string literally (which is invalid Go syntax).

The same issue applies to the null coalescing operator ?:. If email is a string (without a pointer), the generated email == nil check won't compile because you can't compare a value type to nil.

For reference, Soppo handles this by only allowing ? on types that are actually nilable in Go (pointers, slices, maps, etc.). Writing ?string would be a compile error - you'd need ?*string (nullable pointer to string) instead.

Architecture

The type checker uses a simple symbol table that stores types as strings:

scopes []map[string]string

This processes declarations in source order, which means forward references won't work - you can't call a function that's defined later in the file. Most compilers use two passes: first collect all declarations (function signatures, type definitions), then check the bodies. This lets you reference things in any order.

The string-based representation also limits what's possible. There's no type inference - x := getValue() just returns "interface{}" instead of inferring the actual return type. Generics aren't possible since there's no way to represent type variables or solve constraints. Return types also aren't validated - return expressions are type-checked, but never compared against the function's declared return type.

I've recently come across this document: https://jimmyhmiller.com/easiest-way-to-build-type-checker which looks like a good introduction to building a more robust type checker using bidirectional type checking.

Code

expect() return value is ignored

Throughout the parser, expect() returns an error but the return value is never checked:

p.expect(lexer.TOKEN_RPAREN)  // Returns error, never checked

This means parse errors are silently swallowed. If a closing paren is missing, parsing continues with garbage.

Unterminated strings don't error

for l.pos < len(l.input) && l.current() != '"' {
    // Loops to EOF if no closing quote
}

An unterminated string like "hello will loop to EOF and produce a garbage token instead of a proper error.

:= declarations skip type checking

ShortAssignStmt isn't handled in checkStatement, so := declarations bypass type checking entirely:

case *parser.VarDecl:
    return tc.checkVar(s)
// case *parser.ShortAssignStmt: <- missing

Byte-based lexer breaks UTF-8

The lexer indexes by byte instead of rune, which breaks UTF-8:

func (l *Lexer) current() byte {  // byte, not rune

UTF-8 characters can be 1-4 bytes, so indexing by byte gives you partial characters:

s := "héllo"
s[1]  // Returns 0xC3 (first byte of 'é'), not a valid character

unicode.IsLetter(rune(0xC3))  // false - so 'héllo' won't parse as an identifier

Identifiers and strings containing non-ASCII characters (é, 日, emoji) won't work correctly.

Suggestions

I'd prioritise fixing expect() error handling first - silent parse failures make everything else harder to debug.

For testing, snapshot tests (also called golden tests) work well for compilers - save the expected Go output and compare against it. This makes it easy to add test cases and catch regressions. This is what Soppo does, along with fuzzing the parser to find edge cases like unterminated strings or unexpected token sequences. Go has built-in fuzz testing with go test -fuzz that would work well for this.

For the nullable types issue, I'd step back and decide on the semantics before building more on top of it. Either ?T compiles to *T for value types, or ? is only allowed on already-nilable types. The current design doesn't map to valid Go.

I'm not sure what your process for writing this language is, but I'd suggest doing the design work first - sketch out all the features you want, how they interact, and what Go code they should generate before implementing them. It's easier to spot issues like the nullable types problem on paper than after you've written the compiler. Having a spec or design doc also makes it clearer what's a bug vs what's just not implemented yet.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions