Skip to content

add benchmarking for TraverseStrategy concept#4834

Draft
alokkumardalei-wq wants to merge 2 commits intotypelevel:mainfrom
alokkumardalei-wq:TraverseStrategy-check
Draft

add benchmarking for TraverseStrategy concept#4834
alokkumardalei-wq wants to merge 2 commits intotypelevel:mainfrom
alokkumardalei-wq:TraverseStrategy-check

Conversation

@alokkumardalei-wq
Copy link

@alokkumardalei-wq alokkumardalei-wq commented Mar 1, 2026

Motivation and Context

This PR introduces TraverseStrategy to optimize traverse and sequence operations for lazily-evaluated and inherently stack-safe data structures (such as Vector, List, Chain, and Eval).

Previously, traverse and traverseVoid incurred significant and redundant Eval allocations, causing excessive Garbage Collection (GC) pressure and calculation overhead. By introducing TraverseStrategy with Direct and ViaEval execution paths, we can execute traversals strictly when the type is already stack-safe, completely bypassing the Eval wrapping overhead.

Furthermore, we updated cats.instances implementations (like List and Vector) to use a balanced divide-and-conquer recursion (runHalf) to prevent stack overflows on high-element iterations.

I have used AI to know about TraverseStrategy, Garbage Collection (GC) pressure in Scala, the differences between strict evaluation (List/Vector) vs safe-check lazy types (Eval, Either), and to learn how to write and execute JMH benchmarks.

Changes Made

  • Introduced TraverseStrategy: Added Direct and ViaEval paths inside cats.Apply.
  • Refactored Core Instances: Upgraded List and Vector to use balanced recursion (runHalf) for traverseVoid, guaranteeing stack safety.
  • Added JMH Benchmarks: Created cats.bench.TraverseStrategyBench to statistically measure throughput adjustments.

Code Highlights

To provide deeper context on the implementation structure:

1. Building the Execution Strategy (cats/Apply.scala)
We introduced TraverseStrategy with two specific evaluation models. Direct targets lazily evaluated native types that don't need Eval wrapping logic, while ViaEval provides safe-fallback wrapping limits.

sealed trait TraverseStrategy[G[_], A, B] {
  def traverse[F[_]](fa: F[A])(f: A => G[B])(implicit F: Traverse[F]): G[F[B]]
}

private[cats] object TraverseStrategy {
  final case class Direct[G[_], A, B](A: Apply[G]) extends TraverseStrategy[G, A, B] {
    // Fast path: bypasses intermediate Eval allocation
    def traverse[F[_]](fa: F[A])(f: A => G[B])(implicit F: Traverse[F]): G[F[B]] =
      F.traverse(fa)(f)(A)
  }

  final case class ViaEval[G[_], A, B](A: Apply[G]) extends TraverseStrategy[G, A, B] {
    // Safe-fallback pathway
    def traverse[F[_]](fa: F[A])(f: A => G[B])(implicit F: Traverse[F]): G[F[B]] =
      F.traverse(fa)(a => A.map(f(a))(Eval.now))(A).value
  }
}

2. Fixing List & Vector Stack Overflows (cats/instances/vector.scala)
Previously, nested loops inside Vector.traverseVoid lacked stack-safety execution models. We refactored strict loops to parse structures sequentially via a runHalf divide-and-conquer implementation to guarantee stack safety across massive iterations.

// We updated direct recursion to runHalf to prevent Stack Overflows
  private def runHalf[A, B](<as: Vector[A], start: Int, end: Int>)(f: A => Eval[B]): Eval[Vector[B]] = {
    val length = end - start
    if (length == 0) Eval.now(Vector.empty)
    else if (length == 1) f(as(start)).map(_ +: Vector.empty)
    else {
      val mid = start + length / 2
      val evalLeft = Eval.defer(runHalf(as, start, mid)(f))
      val evalRight = Eval.defer(runHalf(as, mid, end)(f))
      evalLeft.flatMap(l => evalRight.map(r => l ++ r))
    }
  }

3. Implementing Strategy Instances in Core Types (cats/data/Kleisli.scala)
We propagated the new traverseStrategy upwards through nested typeclasses so they dynamically adapt. For example, Kleisli defines its strategy lazily, mirroring the inner F type's stack-safety model seamlessly:

   override lazy val traverseStrategy = {
    val stratF = F.traverseStrategy

    stratF match {
      case Apply.TraverseStrategy.Direct(_) =>
        Apply.TraverseStrategy.Direct(this)
      case _ =>
        new Apply.TraverseStrategy[Kleisli[F, A, *]] {
          type Rhs[B] = A => stratF.Rhs[B]

          def map2[A0, B, C](<left: Rhs[A0], right: Rhs[B]>)(fn: (A0, B) => C): Rhs[C] = { a =>
            val l = stratF.applyOnRhs(left, a)
            val r = stratF.applyOnRhs(right, a)
            stratF.map2(l, r)(fn)
          }
  1. Safely Batching Massive Chain Iterations (cats/data/Chain.scala)
    Similar to Vector, Chain needs to construct its internal traversal trees without blowing the JVM limits. We utilized G.traverseStrategy combined with a width = 128 grouping factor when processing elements in traverseFilterViaChain:
     val strat = G.traverseStrategy
     val toG: A => G[List[B]] = { (a: A) =>
       G.map(f(a)) { optB =>
         if (optB.isDefined) optB.get :: Nil
         else Nil
       }
     }

     def loop(start: Int, end: Int): strat.Rhs[Chain[B]] =
       if (end - start <= width) {
         var flist = strat.applyToRhs(toG, as(end - 1))
         var idx = end - 2
         while (start <= idx) {
           val fa = strat.applyToRhs(f, as(idx))
           val right = flist
           flist = strat.map2(fa, right)(consB)
           idx = idx - 1
         }
         strat.mapRhs(flist)(Chain.fromSeq(_))
       } else {
          // Continues recursively processing width-chunks

Benchmark Results

To verify these performance improvements statistically, I ran the local JMH TraverseStrategyBench suite. The results demonstrate significant throughput scaling without stack overflows, even on 10,000+ length Iterables.

image

You can clearly see that lazy types like Eval has much less b/op mean less GC pressure than strict types Either in every iteration.

Demo video:

This video demonstrates about how we produce the benchmarking results:
Demo

@satorg
Copy link
Contributor

satorg commented Mar 1, 2026

Thank you for the PR!

I'd suggest to address a couple of issues with the PR first:

  1. The PR title and description don't correspond the PR content. The PR mentions benchmarks only but in fact it does introduce new functionality.
  2. Avoid using screenshots for textual information please. Consider markdown syntax for preformatted code blocks instead.

@alokkumardalei-wq
Copy link
Author

alokkumardalei-wq commented Mar 2, 2026

Hello @satorg ,thank you for your feedback.
I have adjusted the PR around suggestions you have mentioned i.e title of PR title and description should correspond to PR content , I think this is clear in the new PR if not please let me know where I might have gone wrong.

And secondly, avoiding screenshots for textual information and provide markdown formatted code blocks instead, I think I fully comply with this except for the benchmarking results because I have show you difference in b/op in different types therefore I need to highlight it . If something else can be done or anything wrong I have done here please let me know.

If everything is fine please let me know if the PR is ready to review or not.

Thank you..

@He-Pin
Copy link

He-Pin commented Mar 2, 2026

If this works, you can send pr to scala/scala too, where TailCall works the same way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants