Skip to content

Conversation

@TwoF1nger
Copy link

The new parameter lastCommentStart was needed so line comments can be detected even when nested in block comments, without having to check characters we've already advanced over.

If checking is #"*" at s - 1 andalso is #"(" at s - 2 is not a problem, then lastCommentStart can be avoided altogether. I just noticed that the rest of the code only compares to non-negative offsets to s, and went along.

Comment on lines 623 to 624
loop_inComment (s + 1) {commentStart = s - 1, nesting = 1}
loop_inComment (s + 1)
{commentStart = s - 1, nesting = 1, lastCommentStart = s - 1}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I wonder if it would be simpler to just immediately check for is #")" at (s+1) ? I think then you could avoid needing lastCommentStart.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work, but only for top-level line comments, because loop_afterOpenParen is only called at top level.

This check alone would not detect line comments that happen to be enclosed in a block comment. Another check would need to be added in loop_inComment.

Since both checks are related to comments, I found a way to have them both in loop_inComment.

@shwestrick
Copy link
Owner

This is great! Thanks for working on it.

I do think it would be preferable to avoid lastCommentStart. See my note above for a look-ahead suggestion. The look-behind strategy seems totally reasonable too; I'd be happy with either!

@shwestrick
Copy link
Owner

I did some quick testing and noticed something going wrong with indentation. An extra space is being prepended before fun:

Input:

(*) hello
fun foo () = ()

Output:

(*) hello
 fun foo () = ()

Throughout smlfmt, comments are implicitly attached to nearby tokens and then incorporated into the final layout on-the-fly. I think the line comment is attaching to fun and then a space is being spuriously added... perhaps here although I'm not 100% sure:

(Seq.map (fn x => at tab (concat (Text x, space)))

The bug might be due to how tokens are split into multiple line pieces. See this function:

fun tokenToPieces {tabWidth: int} tok =

Perhaps the line comment is being split into ManyPieces, where the second piece is an empty string? Not sure...

@TwoF1nger could you look into this?

@TwoF1nger
Copy link
Author

I did some quick testing and noticed something going wrong with indentation. An extra space is being prepended before fun:
...
@TwoF1nger could you look into this?

I'll leave this for another day, looks difficult.

Meanwhile, let me know if you have any further remarks on the lexing changes.

@TwoF1nger
Copy link
Author

I seem to have stumbled on a fix for the extra space. If loop_inLineComment doesn't consume the terminating newline, the problem goes away.

Not sure if this is the real fix, or just covers up wrong behavior in the places you mentioned. I still don't quite understand the code.

Also, another problem disappeared - an empty line after a line comment used to be filled with a bunch of spaces.

@shwestrick
Copy link
Owner

shwestrick commented Oct 9, 2025

Gotcha -- this fixes the issue with the extra space!

Testing now and it looks like layout is not able to automatically re-insert the newline. (The layout engine is treating the line-comment as though it were an inline comment.)

For example on this input:

(*) hello
fun foo () =
  (*) bar
  1 + 2 + 3

It's producing this output:

(*) hello
fun foo () = (*) bar 1 + 2 + 3

@shwestrick
Copy link
Owner

shwestrick commented Oct 9, 2025

I think I've figured out a fix. Could you try this patch on top of your most recent commit (d0cc5e0)? This seems to fix the above issue.

diff --git a/src/lex/Token.sml b/src/lex/Token.sml
index 3d31cf3..6648996 100644
--- a/src/lex/Token.sml
+++ b/src/lex/Token.sml
@@ -117,6 +117,7 @@ sig
   val isReserved: token -> bool
   val isStringConstant: token -> bool
   val isComment: token -> bool
+  val isLineComment: token -> bool
   val isWhitespace: token -> bool
   val isCommentOrWhitespace: token -> bool
   val isComma: token -> bool
@@ -444,6 +445,17 @@ struct
       Comment => true
     | _ => false
 
+  fun isLineComment tok =
+    case getClass tok of
+      Comment =>
+        let
+          val src = getSource tok
+        in
+          Source.length src >= 3 andalso Source.nth src 0 = #"("
+          andalso Source.nth src 1 = #"*" andalso Source.nth src 2 = #")"
+        end
+    | _ => false
+
   fun isWhitespace tok =
     case getClass tok of
       Whitespace => true
diff --git a/src/prettier-print/TabbedTokenDoc.sml b/src/prettier-print/TabbedTokenDoc.sml
index 4350aa8..f8d37dc 100644
--- a/src/prettier-print/TabbedTokenDoc.sml
+++ b/src/prettier-print/TabbedTokenDoc.sml
@@ -105,6 +105,12 @@ local
   fun tokenToPieces {tabWidth: int} tok =
     if not (Token.isComment tok orelse Token.isStringConstant tok) then
       OnePiece (SyntaxHighlighter.highlightToken tok)
+    else if Token.isLineComment tok then
+      (* A bit of a hack. "ManyPieces" forces a layout strategy using
+       * a fresh rigid tab which is forcibly activated, effectively creating
+       * a single whole line in the output for this line comment.
+       *)
+      ManyPieces (Seq.singleton (SyntaxHighlighter.highlightToken tok))
     else
       let
         val src = Token.getSource tok

@TwoF1nger
Copy link
Author

This seems to be sensitive to where the line comment is among the other code tokens.

The patch fixes the issue you described, but in the following, slightly different, situation the next code line is still joined with the (*) bar comment:

(*) hello
fun foo () = (*) bar
  1 + 2 + 3

becomes

(*) hello
fun foo () = (*) bar 1 + 2 + 3

Probably because the comment shares a line with the code that preceded it.
(In your example above the (*) bar comment was on a line of its own.)

Thanks a lot for helping!

@shwestrick
Copy link
Owner

Ah, interesting -- I have a suspicion of what's going wrong here, but I will have to look closer later. (I am currently at a week-long conference and will be a bit busy!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants