Deadlock between Nashorn ScriptLoader and OTel WeakConcurrentMap under concurrent script compilation on JDK 8 #17224
Unanswered
YaoYingLong
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Description
When using the OpenTelemetry Java Agent (tested with version 8.2604.1-RELEASE, based on opentelemetry-java-instrumentation) on JDK 8 with an application that concurrently creates and
executes Nashorn ScriptEngine instances, a JVM-level deadlock reliably occurs. The deadlock completely hangs all HTTP threads and requires a JVM restart to recover.
The root cause is a lock-ordering inversion (AB-BA deadlock) between:
Root Cause Analysis
The deadlock involves two concurrent threads acquiring the same two locks in opposite orders.
Thread A path — exec-19 (holds StructureLoader, waits for CHM$Node)
This thread is compiling a Nashorn script. During CompilationPhase$InstallPhase, Nashorn calls MethodHandles.Lookup.findStatic() which triggers a ClassLoader.loadClass() call while holding
the StructureLoader lock. The OTel agent's LoadInjectedClassInstrumentation advice intercepts this loadClass() call and invokes HelperInjector.loadHelperClass(). This calls
WeakLockFreeCache.get(), which invokes WeakConcurrentMap$WithInlinedExpunction.getIfPresent(). Before returning, this method calls the static expungeStaleEntries(), which drains the
globally shared REFERENCE_QUEUE. A dead WeakKey belonging to skipCache.target is found and AbstractWeakConcurrentMap.removeWeakKey() calls skipCache.target.remove(deadKey) →
ConcurrentHashMap.replaceNode(), which tries to acquire the bin lock for CHM$Node@0x00000005c7ab34a0 — already held by exec-33.
"http-nio-8080-exec-19" BLOCKED
at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1117)
- waiting to lock <0x00000005c7ab34a0> (a java.util.concurrent.ConcurrentHashMap$Node)
at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.removeWeakKey(AbstractWeakConcurrentMap.java:243)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.expungeStaleEntries(AbstractWeakConcurrentMap.java:236)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.getIfPresent(WeakConcurrentMap.java:193)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.get(WeakLockFreeCache.java:26)
at io.opentelemetry.javaagent.tooling.HelperInjector.loadHelperClass(HelperInjector.java:375)
at io.opentelemetry.javaagent.bootstrap.InjectedClassHelper.loadHelperClass(InjectedClassHelper.java:55)
at java.lang.ClassLoader.loadClass(ClassLoader.java:398)
at java.lang.ClassLoader.loadClass(ClassLoader.java:405)
- locked <0x00000005c191cc18> (a jdk.nashorn.internal.runtime.StructureLoader) ← holds StructureLoader
at java.lang.ClassLoader.loadClass(ClassLoader.java:405)
- locked <0x00000005c7cac8d8> (a jdk.nashorn.internal.runtime.ScriptLoader)
...
at jdk.nashorn.internal.codegen.CompilationPhase$InstallPhase.transform(CompilationPhase.java:523)
at jdk.nashorn.internal.codegen.Compiler.compile(Compiler.java:655)
at jdk.nashorn.internal.runtime.Context.compile(Context.java:1317)
- locked <0x00000005c2fa6f98> (a jdk.nashorn.internal.runtime.Context)
...
at com.vocust.demo.deadlock.service.DeadlockDemoService.runDynamicEvalBurst(DeadlockDemoService.java:120)
at com.vocust.demo.deadlock.service.DeadlockDemoService.transitionScenario(DeadlockDemoService.java:60)
Thread B path — exec-33 (holds CHM$Node, waits for StructureLoader)
This thread is also compiling a Nashorn script. During ScriptLoader.installClass(), the JVM calls ClassLoader.defineClass(), which is intercepted by ByteBuddy's ExecutingTransformer. This
invokes IgnoredClassLoadersMatcher.matches() to decide whether to transform the class. Since the ScriptLoader is being seen for the first time, skipCache.computeIfAbsent(scriptLoader, ...)
is called. On JDK 8, ConcurrentHashMap.computeIfAbsent holds the bin lock while executing the mapping function. Inside the mapping function, loadsExpectedClass() calls
scriptLoader.loadClass(PatchLogger.class.getName()), which delegates to StructureLoader.loadClass() and tries to acquire the StructureLoader intrinsic lock — already held by exec-19.
"http-nio-8080-exec-33" BLOCKED
at java.lang.ClassLoader.loadClass(ClassLoader.java:398)
- waiting to lock <0x00000005c191cc18> (a jdk.nashorn.internal.runtime.StructureLoader) ← waits for StructureLoader
at java.lang.ClassLoader.loadClass(ClassLoader.java:405)
- locked <0x0000000715f59fb8> (a jdk.nashorn.internal.runtime.ScriptLoader)
at jdk.nashorn.internal.runtime.ScriptLoader.loadClass(ScriptLoader.java:55)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at io.opentelemetry.javaagent.tooling.ignore.IgnoredClassLoadersMatcher.loadsExpectedClass(IgnoredClassLoadersMatcher.java:78)
at io.opentelemetry.javaagent.tooling.ignore.IgnoredClassLoadersMatcher.delegatesToBootstrap(IgnoredClassLoadersMatcher.java:69)
at io.opentelemetry.javaagent.tooling.ignore.IgnoredClassLoadersMatcher.lambda$matches$0(IgnoredClassLoadersMatcher.java:57)
at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1688)
- locked <0x00000005c7ab34a0> (a java.util.concurrent.ConcurrentHashMap$Node) ← holds CHM$Node
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.AbstractWeakConcurrentMap.computeIfAbsent(AbstractWeakConcurrentMap.java:182)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.weaklockfree.WeakConcurrentMap$WithInlinedExpunction.computeIfAbsent(WeakConcurrentMap.java:218)
at io.opentelemetry.javaagent.shaded.instrumentation.api.internal.cache.WeakLockFreeCache.computeIfAbsent(WeakLockFreeCache.java:21)
at io.opentelemetry.javaagent.tooling.ignore.IgnoredClassLoadersMatcher.matches(IgnoredClassLoadersMatcher.java:45)
at net.bytebuddy.agent.builder.AgentBuilder$Default$ExecutingTransformer.doTransform(AgentBuilder.java:12425)
...
at jdk.nashorn.internal.runtime.ScriptLoader.installClass(ScriptLoader.java:98)
- locked <0x0000000715f59fb8> (a jdk.nashorn.internal.runtime.ScriptLoader)
...
at com.vocust.demo.deadlock.service.DeadlockDemoService.runDynamicEvalBurst(DeadlockDemoService.java:120)
at com.vocust.demo.deadlock.service.DeadlockDemoService.transitionScenario(DeadlockDemoService.java:60)
JVM deadlock detection output
Found one Java-level deadlock:
"http-nio-8080-exec-200":
waiting to lock monitor 0x000000012b9a0de0 (object 0x00000005c191cc18, a jdk.nashorn.internal.runtime.StructureLoader),
which is held by "http-nio-8080-exec-19"
"http-nio-8080-exec-19":
waiting to lock monitor 0x000000013c3620c0 (object 0x00000005c7ab34a0, a java.util.concurrent.ConcurrentHashMap$Node),
which is held by "http-nio-8080-exec-33"
"http-nio-8080-exec-33":
waiting to lock monitor 0x000000012b9a0de0 (object 0x00000005c191cc18, a jdk.nashorn.internal.runtime.StructureLoader),
which is held by "http-nio-8080-exec-19"
Found 1 deadlock.
Two Contributing Bugs
Bug 1 — IgnoredClassLoadersMatcher.loadsExpectedClass() calls loadClass() inside computeIfAbsent mapping function
IgnoredClassLoadersMatcher.matches() uses skipCache.computeIfAbsent(loader, ...). On JDK 8, ConcurrentHashMap.computeIfAbsent holds the bin lock during the mapping function. The mapping
function calls loadsExpectedClass() → loader.loadClass(), which acquires the ClassLoader intrinsic lock. This means the CHM bin lock is held while acquiring a ClassLoader lock — an unsafe
lock ordering.
Relevant code: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/javaagent-tooling/src/main/java/io/opentelemetry/javaagent/tooling/ignore/IgnoredClassLoadersM
atcher.java#L45-L78
// Line 45
return skipCache.computeIfAbsent(cl, c -> !delegatesToBootstrap(cl));
// ...
// Line 78 — called inside the computeIfAbsent mapping function while holding CHM bin lock:
return loader.loadClass(expectedClass.getName()) == expectedClass;
Bug 2 — expungeStaleEntries() uses a globally shared static REFERENCE_QUEUE
AbstractWeakConcurrentMap.REFERENCE_QUEUE is private static final — shared across all WeakConcurrentMap instances in the JVM. expungeStaleEntries() is a static method that drains the queue
and calls weakKey.ownerRef.get().remove(weakKey) on the original owner map of each dead key. This means that calling get() or computeIfAbsent() on any WithInlinedExpunction instance can
trigger cleanup of any other map's dead entries — including skipCache.target.remove() — while holding an unrelated lock (e.g., a ClassLoader lock from HelperInjector's loadClass advice
path).
Relevant code: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/internal/cache/weaklock
free/AbstractWeakConcurrentMap.java#L233-L244
// expungeStaleEntries is static and drains a global queue:
public static void expungeStaleEntries() {
Reference reference; while ((reference = REFERENCE_QUEUE.poll()) != null) { removeWeakKey((WeakKey) reference);
}
}
// removeWeakKey routes to the key's original map, crossing map boundaries:
private static void removeWeakKey(WeakKey weakKey) { ConcurrentMap map = weakKey.ownerRef.get();
if (map != null) {
map.remove(weakKey); // may acquire lock in any map's CHM
}
}
Why jdk.nashorn.* Is Not Protected
GlobalIgnoredTypesConfigurer currently only ignores the standalone Nashorn module:
// Line 125
ignoreClassLoader("org.openjdk.nashorn.internal.runtime.ScriptLoader")
The JDK 8 built-in Nashorn (jdk.nashorn.internal.runtime.ScriptLoader) is not in the ignore list, so the agent instruments its loadClass() and triggers
InjectedClassHelper.loadHelperClass() on every class load, creating the Thread A path.
Suggested Fixes
Fix 1 (immediate / low risk): Add jdk.nashorn.internal.runtime.ScriptLoader to GlobalIgnoredTypesConfigurer.ignoreClassLoader():
ignoreClassLoader("jdk.nashorn.internal.runtime.ScriptLoader")
ignoreClassLoader("org.openjdk.nashorn.internal.runtime.ScriptLoader") // already present
Beta Was this translation helpful? Give feedback.
All reactions