Does Valhalla help? (JDK 27 EA, JEP 401 value records)¶
Every component in japes is a record, which means the backing
component storage is a reference array — reading a Position is a
pointer-chase and writing one allocates a fresh heap record.
Valhalla's JEP 401 promises flat layout for value records: the
same backing array becomes a flat float[] and loads become plain
array indexing.
What this page measures
The ecs-benchmark-valhalla module ports every japes benchmark
with value record components and runs them against a Valhalla
EA build (openjdk 27-jep401ea3). The two runs use the same
Java source, same --enable-preview, same JMH settings, same
tier-1 generator — only the component declaration
(record vs value record) and the runtime JVM differ.
Results¶
Numbers are µs/op, lower is better. japes-v is japes on the
Valhalla EA JVM with value records. Copied verbatim from
DEEP_DIVE.md.
| benchmark | case | japes | japes-v | Δ |
|---|---|---|---|---|
iterateSingleComponent |
10k | 2.43 | 1.06 | 2.29× real |
iterateSingleComponent |
100k | 34.4 | 9.31 | 3.69× real |
iterateTwoComponents |
10k | 4.33 | 1.85 | 2.34× real |
iterateTwoComponents |
100k | 65.4 | 20.0 | 3.27× real |
iterateWithWrite |
10k | 38.5 | 53.2 | 0.72× slower |
iterateWithWrite |
100k | 377 | 536 | 0.70× slower |
NBody simulateOneTick |
1k | 4 | 5.84 | 0.68× slower |
NBody simulateOneTick |
10k | 41 | 57.3 | 0.72× slower |
NBody simulateTenTicks |
10k | 399 | 577 | 0.69× slower |
ParticleScenario tick |
10k | 107 | 180 | 0.59× slower |
SparseDelta tick |
10k | 1.88 | 1.96 | 0.96× slower |
RealisticTick tick |
10k / st | 5.86 | 11.9 | 0.49× slower |
RealisticTick tick |
10k / mt | 10.3 | 17.8 | 0.58× slower |
The reads tell the real story¶
The DCE trap and why this table has a real suffix
An earlier revision of the japes iteration benchmarks had empty
system bodies (void iterate(@Read Position p) {}), which let
the escape analyser prove the load was unused and delete the
whole iteration — previously reported "20–80×" speedups were
measuring nothing. With a JMH Blackhole consumer on every
read system (bh.consume(pos)), the JIT has to actually touch
each element, and the real Valhalla number comes out:
2.2–4.0× faster on reads once you scale past 10 k entities.
That's the JEP 401 flat-array layout paying off exactly where it should — sequential dense iteration over a primitive-backed storage.
What the table actually shows:
- Reads — big and real. At 100k entities Valhalla finishes
iterateSingleComponentin ~27% of stock japes's time (3.69×), anditerateTwoComponentsin ~31% (3.27×). The flat backing layout turns every read into a directaaloadagainst a primitive region instead of a pointer chase + field load on a heap record, and the tier-1 generator's tight chunk loop inlines cleanly on top of it. This is the biggest cross-JVM number in the whole project. - Writes — stock is now faster.
iterateWithWriteand NBody are ~30% faster on stock JDK 26 than under Valhalla EA. Writes still allocatenew Position(...)(either a flat value or a heap record depending on the JVM), and the Valhalla EA JIT appears to regress on the write path — the reference-array store that stock JDK 26 has optimised for years is now being outpaced by the stock JIT's improvements rather than helped by Valhalla. - Scenarios — Valhalla regresses.
ParticleScenariois 68% slower under Valhalla,RealisticTick st103% slower,RealisticTick mt73% slower.SparseDeltahas tightened to 4% slower, down from the 40% gap seen in earlier rounds — the PR'sChangeTracker.swapRemovefix and the concurrentArchetypeGraphcache both trim Valhalla overhead disproportionately, because the EA JIT was amplifying the pre-fix hot paths. GC profiling still shows Valhalla allocating ~2× more per op on the scenario benchmarks than stock japes; the residual regression comes from value records crossing the erasedRecordparameter ofWorld.setComponent, which forces the JVM to box the value into a heap wrapper even though the storage layer is value-aware.
Does an explicit flat-array opt-in fix it?¶
JEP 401 EA exposes an experimental flat-array allocator at
jdk.internal.value.ValueClass.newNullRestrictedNonAtomicArray(Class, int, Object)
plus a class-level
@jdk.internal.vm.annotation.LooselyConsistentValue opt-in.
Both are wired into DefaultComponentStorage and the Valhalla
benchmark records (see the DefaultComponentStorage static
initialiser — it's gated behind
-Dzzuegg.ecs.useFlatStorage=true so it's off by default). The
resulting backing array genuinely is flat
(ValueClass.isFlatArray(arr) == true, verified in-process), but
in an A/B comparison on the same JVM it was measurably worse:
| benchmark | flat OFF | flat ON | Δ |
|---|---|---|---|
iterateTwoComponents 10k |
1.79 | 6.18 | 3.4× slower |
iterateTwoComponents 100k |
18.4 | 64.3 | 3.5× slower |
RealisticTick st |
14.0 | 16.3 | 16% slower |
SparseDelta |
2.57 | 2.49 | noise |
The EA JIT clearly hasn't yet emitted optimised get/set code for flat null-restricted arrays — the flat layout is in place but accessing it goes through a slower path than the reference-array fallback that the JIT has had longer to optimise. All the real Valhalla wins above (the 2–4× reads) come from the reference-array path, where the JIT scalar-replaces well and the value-record layout wins through escape analysis instead of through an explicit flat backing. The opt-in is there and correct; it'll become the right default once the Valhalla JIT's flat-array path catches up with its reference-array path.
SparseDelta is within noise. The bottleneck is change-tracker
bookkeeping, not component reads, so there's nothing for Valhalla
to flatten.
Predator / prey under Valhalla¶
The relations scenario (PredatorPreyForEachPairBenchmarkValhalla,
in the ecs-benchmark-valhalla module) ports the benchmark to
@LooselyConsistentValue value record Position, Velocity,
Predator, Prey. The Hunting relation payload stays a plain
record because it lives in TargetSlice.values, an Object[]
inside the relation store, not in a flat ComponentStorage — so
there is nothing to flatten on the payload side. Same scheduler,
same @ForEachPair dispatch, same tier-1 generator, same grid
parameters as the stock benchmark.
| predators × prey | Stock JDK 26 | Valhalla EA (value records, ref arrays) | Valhalla EA (value records, flat arrays) |
|---|---|---|---|
| 100 × 500 | 6.3 µs | 6.7 µs (+6 %) | 18.6 µs (+195 %) |
| 100 × 2000 | 14.0 µs | 14.3 µs (+2 %) | 25.8 µs ( +85 %) |
| 100 × 5000 | 26.4 µs | 26.7 µs (+1 %) | 37.7 µs ( +43 %) |
| 500 × 500 | 22.1 µs | 25.0 µs (+13 %) | 80.9 µs (+266 %) |
| 500 × 2000 | 31.7 µs | 33.9 µs (+7 %) | 90.0 µs (+184 %) |
| 500 × 5000 | 55.9 µs | 57.9 µs (+4 %) | 108.8 µs ( +95 %) |
| 1000 × 500 | 43.1 µs | 48.9 µs (+13 %) | 161.0 µs (+274 %) |
| 1000 × 2000 | 55.3 µs | 61.1 µs (+10 %) | 169.3 µs (+206 %) |
| 1000 × 5000 | 88.4 µs | 93.1 µs (+5 %) | 195.7 µs (+121 %) |
Two things jump out.
Value-record + reference-array storage is essentially a tie with
stock. Declaring Position / Velocity as value record with
@LooselyConsistentValue while keeping the backing storage a
plain reference array costs between 0 and 13% across every cell —
well inside the JMH error bars at most cells. For this workload
the value-record declaration alone gives no measurable win:
pursuit's inner body is so tight (two component reads, one write,
one payload read, one invokevirtual) that the tier-1 generator
already lets the JIT scalar-replace short-lived Position /
Velocity instances on both JVMs. Nothing left for value
semantics to recover.
Flat-array storage is a 1.4×–3.7× regression at every grid cell, matching the same warning already documented on the iteration micro-benchmarks. The absolute overhead scales with predator count, not with prey count:
| predators | 500 prey Δ | 2000 prey Δ | 5000 prey Δ |
|---|---|---|---|
| 100 | +12.3 µs | +11.8 µs | +11.3 µs |
| 500 | +58.8 µs | +58.0 µs | +52.9 µs |
| 1000 | +117.9 µs | +114.0 µs | +107.3 µs |
That shape fingerprints the overhead as per-pair component access:
~predators × 3 pairs × (2 reads + 1 write) of flat-array I/O per
tick, roughly +13 ns per access above the reference-array fast
path. The unoptimised EA JIT code for flat get/set dominates
everything the tier-1 pair runner was built to eliminate.
The upshot is the same conclusion the earlier sections reach:
value records themselves cost nothing, the value-record layout
hasn't yet unlocked a new win on top of the existing tier-1
generator for short-lived component shapes, and flat-array storage
remains gated behind -Dzzuegg.ecs.useFlatStorage=true until the
Valhalla JIT matures. Filed as a re-benchmark target for every
future EA drop.
Honest takeaway¶
Under JEP 401 EA, Valhalla hands japes a real ~3× speedup on
read-heavy iteration (the biggest single gain in the project) and
~10% on dense integration loops, and is still a net
regression on change-detection scenario benchmarks that exercise
setComponent heavily. Counter-intuitively, the explicit
flat-array opt-in (newNullRestrictedNonAtomicArray +
@LooselyConsistentValue) makes things worse today because the
EA JIT hasn't optimised the flat-access path yet — the real wins
come from the reference-array fallback where the JIT can
scalar-replace through escape analysis. Both code paths are
implemented and A/B-tested in the repo; the flat opt-in will
become the right default once the Valhalla JIT catches up.
"Just set the JVM to Valhalla" is not a free performance switch today but the read-side numbers are very compelling, and the trajectory is clearly favourable.
Reproducing¶
./gradlew :benchmark:ecs-benchmark-valhalla:jmhJar
$VALHALLA_HOME/bin/java --enable-preview \
--add-exports java.base/jdk.internal.value=ALL-UNNAMED \
--add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED \
-jar benchmark/ecs-benchmark-valhalla/build/libs/ecs-benchmark-valhalla-jmh.jar
Opt-in flat-array A/B:
# flat OFF (default — reference arrays with value records)
$VALHALLA_HOME/bin/java --enable-preview \
--add-exports java.base/jdk.internal.value=ALL-UNNAMED \
--add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED \
-jar benchmark/ecs-benchmark-valhalla/build/libs/ecs-benchmark-valhalla-jmh.jar
# flat ON (experimental — flat null-restricted arrays)
$VALHALLA_HOME/bin/java --enable-preview \
--add-exports java.base/jdk.internal.value=ALL-UNNAMED \
--add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED \
-Dzzuegg.ecs.useFlatStorage=true \
-Dzzuegg.ecs.debugFlat=true \
-jar benchmark/ecs-benchmark-valhalla/build/libs/ecs-benchmark-valhalla-jmh.jar
Relations scenario under Valhalla:
$VALHALLA_HOME/bin/java --enable-preview \
--add-exports java.base/jdk.internal.value=ALL-UNNAMED \
--add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED \
-jar benchmark/ecs-benchmark-valhalla/build/libs/ecs-benchmark-valhalla-jmh.jar \
"PredatorPreyForEachPairBenchmarkValhalla" \
-p predatorCount=100,500,1000 -p preyCount=500,2000,5000