SuccessorML line comments #102

TwoF1nger · 2025-10-06T17:00:30Z

The new parameter lastCommentStart was needed so line comments can be detected even when nested in block comments, without having to check characters we've already advanced over.

If checking is #"*" at s - 1 andalso is #"(" at s - 2 is not a problem, then lastCommentStart can be avoided altogether. I just noticed that the rest of the code only compares to non-negative offsets to s, and went along.

shwestrick · 2025-10-06T17:50:45Z

src/lex/Lexer.sml

-          loop_inComment (s + 1) {commentStart = s - 1, nesting = 1}
+          loop_inComment (s + 1)
+            {commentStart = s - 1, nesting = 1, lastCommentStart = s - 1}


Here I wonder if it would be simpler to just immediately check for is #")" at (s+1) ? I think then you could avoid needing lastCommentStart.

This would work, but only for top-level line comments, because loop_afterOpenParen is only called at top level.

This check alone would not detect line comments that happen to be enclosed in a block comment. Another check would need to be added in loop_inComment.

Since both checks are related to comments, I found a way to have them both in loop_inComment.

shwestrick · 2025-10-06T18:03:10Z

This is great! Thanks for working on it.

I do think it would be preferable to avoid lastCommentStart. See my note above for a look-ahead suggestion. The look-behind strategy seems totally reasonable too; I'd be happy with either!

shwestrick · 2025-10-06T18:21:01Z

I did some quick testing and noticed something going wrong with indentation. An extra space is being prepended before fun:

Input:

(*) hello
fun foo () = ()

Output:

(*) hello
 fun foo () = ()

Throughout smlfmt, comments are implicitly attached to nearby tokens and then incorporated into the final layout on-the-fly. I think the line comment is attaching to fun and then a space is being spuriously added... perhaps here although I'm not 100% sure:

smlfmt/src/base/PrettyTabbedDoc.sml

Line 664 in 5c297d2

(Seq.map (fn x => at tab (concat (Text x, space)))

The bug might be due to how tokens are split into multiple line pieces. See this function:

smlfmt/src/prettier-print/TabbedTokenDoc.sml

Line 105 in 5c297d2

fun tokenToPieces {tabWidth: int} tok =

Perhaps the line comment is being split into ManyPieces, where the second piece is an empty string? Not sure...

@TwoF1nger could you look into this?

This reverts commit 0cee20d.

TwoF1nger · 2025-10-07T08:00:20Z

I did some quick testing and noticed something going wrong with indentation. An extra space is being prepended before fun:
...
@TwoF1nger could you look into this?

I'll leave this for another day, looks difficult.

Meanwhile, let me know if you have any further remarks on the lexing changes.

TwoF1nger · 2025-10-07T14:13:54Z

I seem to have stumbled on a fix for the extra space. If loop_inLineComment doesn't consume the terminating newline, the problem goes away.

Not sure if this is the real fix, or just covers up wrong behavior in the places you mentioned. I still don't quite understand the code.

Also, another problem disappeared - an empty line after a line comment used to be filled with a bunch of spaces.

shwestrick · 2025-10-09T12:26:34Z

Gotcha -- this fixes the issue with the extra space!

Testing now and it looks like layout is not able to automatically re-insert the newline. (The layout engine is treating the line-comment as though it were an inline comment.)

For example on this input:

(*) hello
fun foo () =
  (*) bar
  1 + 2 + 3

It's producing this output:

(*) hello
fun foo () = (*) bar 1 + 2 + 3

shwestrick · 2025-10-09T12:27:36Z

I think I've figured out a fix. Could you try this patch on top of your most recent commit (d0cc5e0)? This seems to fix the above issue.

diff --git a/src/lex/Token.sml b/src/lex/Token.sml
index 3d31cf3..6648996 100644
--- a/src/lex/Token.sml
+++ b/src/lex/Token.sml
@@ -117,6 +117,7 @@ sig
   val isReserved: token -> bool
   val isStringConstant: token -> bool
   val isComment: token -> bool
+  val isLineComment: token -> bool
   val isWhitespace: token -> bool
   val isCommentOrWhitespace: token -> bool
   val isComma: token -> bool
@@ -444,6 +445,17 @@ struct
       Comment => true
     | _ => false
 
+  fun isLineComment tok =
+    case getClass tok of
+      Comment =>
+        let
+          val src = getSource tok
+        in
+          Source.length src >= 3 andalso Source.nth src 0 = #"("
+          andalso Source.nth src 1 = #"*" andalso Source.nth src 2 = #")"
+        end
+    | _ => false
+
   fun isWhitespace tok =
     case getClass tok of
       Whitespace => true
diff --git a/src/prettier-print/TabbedTokenDoc.sml b/src/prettier-print/TabbedTokenDoc.sml
index 4350aa8..f8d37dc 100644
--- a/src/prettier-print/TabbedTokenDoc.sml
+++ b/src/prettier-print/TabbedTokenDoc.sml
@@ -105,6 +105,12 @@ local
   fun tokenToPieces {tabWidth: int} tok =
     if not (Token.isComment tok orelse Token.isStringConstant tok) then
       OnePiece (SyntaxHighlighter.highlightToken tok)
+    else if Token.isLineComment tok then
+      (* A bit of a hack. "ManyPieces" forces a layout strategy using
+       * a fresh rigid tab which is forcibly activated, effectively creating
+       * a single whole line in the output for this line comment.
+       *)
+      ManyPieces (Seq.singleton (SyntaxHighlighter.highlightToken tok))
     else
       let
         val src = Token.getSource tok

TwoF1nger · 2025-10-09T14:18:25Z

This seems to be sensitive to where the line comment is among the other code tokens.

The patch fixes the issue you described, but in the following, slightly different, situation the next code line is still joined with the (*) bar comment:

(*) hello
fun foo () = (*) bar
  1 + 2 + 3

becomes

(*) hello
fun foo () = (*) bar 1 + 2 + 3

Probably because the comment shares a line with the code that preceded it.
(In your example above the (*) bar comment was on a line of its own.)

Thanks a lot for helping!

shwestrick · 2025-10-12T01:25:02Z

Ah, interesting -- I have a suspicion of what's going wrong here, but I will have to look closer later. (I am currently at a week-long conference and will be a bit busy!)

TwoFinger added 2 commits October 7, 2025 00:07

add SuccessorML line comments

7001a26

reformat Lexer.sml

0cee20d

shwestrick reviewed Oct 6, 2025

View reviewed changes

TwoFinger added 2 commits October 7, 2025 14:56

Revert "reformat Lexer.sml"

b58fc1e

This reverts commit 0cee20d.

rewrite without lastCommentStart

ecd88cb

fix the excessive indent after a line comment

d0cc5e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SuccessorML line comments #102

SuccessorML line comments #102

Uh oh!

TwoF1nger commented Oct 6, 2025

Uh oh!

shwestrick Oct 6, 2025

Uh oh!

TwoF1nger Oct 7, 2025

Uh oh!

shwestrick commented Oct 6, 2025

Uh oh!

shwestrick commented Oct 6, 2025

Uh oh!

TwoF1nger commented Oct 7, 2025

Uh oh!

TwoF1nger commented Oct 7, 2025

Uh oh!

shwestrick commented Oct 9, 2025 •

edited

Loading

Uh oh!

shwestrick commented Oct 9, 2025 •

edited

Loading

Uh oh!

TwoF1nger commented Oct 9, 2025

Uh oh!

shwestrick commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SuccessorML line comments #102

Are you sure you want to change the base?

SuccessorML line comments #102

Uh oh!

Conversation

TwoF1nger commented Oct 6, 2025

Uh oh!

shwestrick Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

TwoF1nger Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

shwestrick commented Oct 6, 2025

Uh oh!

shwestrick commented Oct 6, 2025

Uh oh!

TwoF1nger commented Oct 7, 2025

Uh oh!

TwoF1nger commented Oct 7, 2025

Uh oh!

shwestrick commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shwestrick commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TwoF1nger commented Oct 9, 2025

Uh oh!

shwestrick commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shwestrick commented Oct 9, 2025 •

edited

Loading

shwestrick commented Oct 9, 2025 •

edited

Loading