Does Valhalla help? (JDK 27 EA, JEP 401 value records)¶

Every component in japes is a record, which means the backing component storage is a reference array — reading a Position is a pointer-chase and writing one allocates a fresh heap record. Valhalla's JEP 401 promises flat layout for value records: the same backing array becomes a flat float[] and loads become plain array indexing.

What this page measures

The ecs-benchmark-valhalla module ports every japes benchmark with value record components and runs them against a Valhalla EA build (openjdk 27-jep401ea3). The two runs use the same Java source, same --enable-preview, same JMH settings, same tier-1 generator — only the component declaration (record vs value record) and the runtime JVM differ.

Results¶

Numbers are µs/op, lower is better. japes-v is japes on the Valhalla EA JVM with value records. Copied verbatim from DEEP_DIVE.md.

benchmark	case	japes	japes-v	Δ
`iterateSingleComponent`	10k	2.43	1.06	2.29× real
`iterateSingleComponent`	100k	34.4	9.31	3.69× real
`iterateTwoComponents`	10k	4.33	1.85	2.34× real
`iterateTwoComponents`	100k	65.4	20.0	3.27× real
`iterateWithWrite`	10k	38.5	53.2	0.72× slower
`iterateWithWrite`	100k	377	536	0.70× slower
NBody `simulateOneTick`	1k	4	5.84	0.68× slower
NBody `simulateOneTick`	10k	41	57.3	0.72× slower
NBody `simulateTenTicks`	10k	399	577	0.69× slower
`ParticleScenario tick`	10k	107	180	0.59× slower
`SparseDelta tick`	10k	1.88	1.96	0.96× slower
`RealisticTick tick`	10k / st	5.86	11.9	0.49× slower
`RealisticTick tick`	10k / mt	10.3	17.8	0.58× slower

The reads tell the real story¶

The DCE trap and why this table has a real suffix

An earlier revision of the japes iteration benchmarks had empty system bodies (void iterate(@Read Position p) {}), which let the escape analyser prove the load was unused and delete the whole iteration — previously reported "20–80×" speedups were measuring nothing. With a JMH Blackhole consumer on every read system (bh.consume(pos)), the JIT has to actually touch each element, and the real Valhalla number comes out: 2.2–4.0× faster on reads once you scale past 10 k entities.

That's the JEP 401 flat-array layout paying off exactly where it should — sequential dense iteration over a primitive-backed storage.

What the table actually shows:

Reads — big and real. At 100k entities Valhalla finishes iterateSingleComponent in ~27% of stock japes's time (3.69×), and iterateTwoComponents in ~31% (3.27×). The flat backing layout turns every read into a direct aaload against a primitive region instead of a pointer chase + field load on a heap record, and the tier-1 generator's tight chunk loop inlines cleanly on top of it. This is the biggest cross-JVM number in the whole project.
Writes — stock is now faster. iterateWithWrite and NBody are ~30% faster on stock JDK 26 than under Valhalla EA. Writes still allocate new Position(...) (either a flat value or a heap record depending on the JVM), and the Valhalla EA JIT appears to regress on the write path — the reference-array store that stock JDK 26 has optimised for years is now being outpaced by the stock JIT's improvements rather than helped by Valhalla.
Scenarios — Valhalla regresses. ParticleScenario is 68% slower under Valhalla, RealisticTick st 103% slower, RealisticTick mt 73% slower. SparseDelta has tightened to 4% slower, down from the 40% gap seen in earlier rounds — the PR's ChangeTracker.swapRemove fix and the concurrent ArchetypeGraph cache both trim Valhalla overhead disproportionately, because the EA JIT was amplifying the pre-fix hot paths. GC profiling still shows Valhalla allocating ~2× more per op on the scenario benchmarks than stock japes; the residual regression comes from value records crossing the erased Record parameter of World.setComponent, which forces the JVM to box the value into a heap wrapper even though the storage layer is value-aware.

Does an explicit flat-array opt-in fix it?¶

JEP 401 EA exposes an experimental flat-array allocator at jdk.internal.value.ValueClass.newNullRestrictedNonAtomicArray(Class, int, Object) plus a class-level @jdk.internal.vm.annotation.LooselyConsistentValue opt-in. Both are wired into DefaultComponentStorage and the Valhalla benchmark records (see the DefaultComponentStorage static initialiser — it's gated behind -Dzzuegg.ecs.useFlatStorage=true so it's off by default). The resulting backing array genuinely is flat (ValueClass.isFlatArray(arr) == true, verified in-process), but in an A/B comparison on the same JVM it was measurably worse:

benchmark	flat OFF	flat ON	Δ
`iterateTwoComponents` 10k	1.79	6.18	3.4× slower
`iterateTwoComponents` 100k	18.4	64.3	3.5× slower
`RealisticTick st`	14.0	16.3	16% slower
`SparseDelta`	2.57	2.49	noise

The EA JIT clearly hasn't yet emitted optimised get/set code for flat null-restricted arrays — the flat layout is in place but accessing it goes through a slower path than the reference-array fallback that the JIT has had longer to optimise. All the real Valhalla wins above (the 2–4× reads) come from the reference-array path, where the JIT scalar-replaces well and the value-record layout wins through escape analysis instead of through an explicit flat backing. The opt-in is there and correct; it'll become the right default once the Valhalla JIT's flat-array path catches up with its reference-array path.

SparseDelta is within noise. The bottleneck is change-tracker bookkeeping, not component reads, so there's nothing for Valhalla to flatten.

Predator / prey under Valhalla¶

The relations scenario (PredatorPreyForEachPairBenchmarkValhalla, in the ecs-benchmark-valhalla module) ports the benchmark to @LooselyConsistentValue value record Position, Velocity, Predator, Prey. The Hunting relation payload stays a plain record because it lives in TargetSlice.values, an Object[] inside the relation store, not in a flat ComponentStorage — so there is nothing to flatten on the payload side. Same scheduler, same @ForEachPair dispatch, same tier-1 generator, same grid parameters as the stock benchmark.

predators × prey	Stock JDK 26	Valhalla EA (value records, ref arrays)	Valhalla EA (value records, flat arrays)
100 × 500	6.3 µs	6.7 µs (+6 %)	18.6 µs (+195 %)
100 × 2000	14.0 µs	14.3 µs (+2 %)	25.8 µs ( +85 %)
100 × 5000	26.4 µs	26.7 µs (+1 %)	37.7 µs ( +43 %)
500 × 500	22.1 µs	25.0 µs (+13 %)	80.9 µs (+266 %)
500 × 2000	31.7 µs	33.9 µs (+7 %)	90.0 µs (+184 %)
500 × 5000	55.9 µs	57.9 µs (+4 %)	108.8 µs ( +95 %)
1000 × 500	43.1 µs	48.9 µs (+13 %)	161.0 µs (+274 %)
1000 × 2000	55.3 µs	61.1 µs (+10 %)	169.3 µs (+206 %)
1000 × 5000	88.4 µs	93.1 µs (+5 %)	195.7 µs (+121 %)

Two things jump out.

Value-record + reference-array storage is essentially a tie with stock. Declaring Position / Velocity as value record with @LooselyConsistentValue while keeping the backing storage a plain reference array costs between 0 and 13% across every cell — well inside the JMH error bars at most cells. For this workload the value-record declaration alone gives no measurable win: pursuit's inner body is so tight (two component reads, one write, one payload read, one invokevirtual) that the tier-1 generator already lets the JIT scalar-replace short-lived Position / Velocity instances on both JVMs. Nothing left for value semantics to recover.

Flat-array storage is a 1.4×–3.7× regression at every grid cell, matching the same warning already documented on the iteration micro-benchmarks. The absolute overhead scales with predator count, not with prey count:

predators	500 prey Δ	2000 prey Δ	5000 prey Δ
100	+12.3 µs	+11.8 µs	+11.3 µs
500	+58.8 µs	+58.0 µs	+52.9 µs
1000	+117.9 µs	+114.0 µs	+107.3 µs

That shape fingerprints the overhead as per-pair component access: ~predators × 3 pairs × (2 reads + 1 write) of flat-array I/O per tick, roughly +13 ns per access above the reference-array fast path. The unoptimised EA JIT code for flat get/set dominates everything the tier-1 pair runner was built to eliminate.

The upshot is the same conclusion the earlier sections reach: value records themselves cost nothing, the value-record layout hasn't yet unlocked a new win on top of the existing tier-1 generator for short-lived component shapes, and flat-array storage remains gated behind -Dzzuegg.ecs.useFlatStorage=true until the Valhalla JIT matures. Filed as a re-benchmark target for every future EA drop.

Honest takeaway¶

Under JEP 401 EA, Valhalla hands japes a real ~3× speedup on read-heavy iteration (the biggest single gain in the project) and ~10% on dense integration loops, and is still a net regression on change-detection scenario benchmarks that exercise setComponent heavily. Counter-intuitively, the explicit flat-array opt-in (newNullRestrictedNonAtomicArray + @LooselyConsistentValue) makes things worse today because the EA JIT hasn't optimised the flat-access path yet — the real wins come from the reference-array fallback where the JIT can scalar-replace through escape analysis. Both code paths are implemented and A/B-tested in the repo; the flat opt-in will become the right default once the Valhalla JIT catches up.

"Just set the JVM to Valhalla" is not a free performance switch today but the read-side numbers are very compelling, and the trajectory is clearly favourable.

Reproducing¶

./gradlew :benchmark:ecs-benchmark-valhalla:jmhJar

$VALHALLA_HOME/bin/java --enable-preview \
  --add-exports java.base/jdk.internal.value=ALL-UNNAMED \
  --add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED \
  -jar benchmark/ecs-benchmark-valhalla/build/libs/ecs-benchmark-valhalla-jmh.jar

Opt-in flat-array A/B:

# flat OFF (default — reference arrays with value records)
$VALHALLA_HOME/bin/java --enable-preview \
  --add-exports java.base/jdk.internal.value=ALL-UNNAMED \
  --add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED \
  -jar benchmark/ecs-benchmark-valhalla/build/libs/ecs-benchmark-valhalla-jmh.jar

# flat ON (experimental — flat null-restricted arrays)
$VALHALLA_HOME/bin/java --enable-preview \
  --add-exports java.base/jdk.internal.value=ALL-UNNAMED \
  --add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED \
  -Dzzuegg.ecs.useFlatStorage=true \
  -Dzzuegg.ecs.debugFlat=true \
  -jar benchmark/ecs-benchmark-valhalla/build/libs/ecs-benchmark-valhalla-jmh.jar

Relations scenario under Valhalla:

$VALHALLA_HOME/bin/java --enable-preview \
  --add-exports java.base/jdk.internal.value=ALL-UNNAMED \
  --add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED \
  -jar benchmark/ecs-benchmark-valhalla/build/libs/ecs-benchmark-valhalla-jmh.jar \
  "PredatorPreyForEachPairBenchmarkValhalla" \
  -p predatorCount=100,500,1000 -p preyCount=500,2000,5000