Iteration micro-benchmark¶

Tight per-entity loop, nothing else going on. The read paths measure raw query + iteration cost; iterateWithWrite writes back a mutated Position per entity.

What this workload measures

iterateSingleComponent — one @Read Position across every entity. Pure chunk-walk + component load.
iterateTwoComponents — @Read Position + @Read Velocity. Two loads per entity per chunk.
iterateWithWrite — @Read Velocity + @Write Mut<Position> with p.set(new Position(cur.x() + v.dx(), cur.y() + v.dy())) per entity. Stresses the write-path plus change-tracker bookkeeping.

Each sweeps 1 000 / 10 000 / 100 000 entities so readers can see the constant-factor bend.

Results¶

Numbers are µs/op, lower is better. Copied verbatim from DEEP_DIVE.md.

benchmark	entityCount	bevy	japes	zayes	dominion	artemis
`iterateSingleComponent`	1000	0.259	0.302	2.75	0.780	0.470
`iterateSingleComponent`	10000	2.18	3.0	28.6	7.06	4.52
`iterateSingleComponent`	100000	21.3	30.8	383	79.4	166
`iterateTwoComponents`	1000	0.395	0.594	3.55	1.31	1.18
`iterateTwoComponents`	10000	3.70	7.2	38.6	12.4	11.6
`iterateTwoComponents`	100000	36.8	76.6	508	128	237
`iterateWithWrite`	1000	0.656	0.138	182	2.32	1.83
`iterateWithWrite`	10000	6.29	1.9	1818	22.5	18.2
`iterateWithWrite`	100000	63.7	31.9	17857	234	334

Methodology change: field-level blackhole on iteration micros

The japes numbers above use field-level blackhole (bh.consume(p.x()); bh.consume(p.y())) instead of object-level (bh.consume(p)). This matches real game code where systems compute with field values, not store references — and it lets the JIT scalar-replace records reconstructed from SoA arrays. The Bevy column still uses Criterion's black_box(p) on the whole struct, so the read rows are not directly comparable across languages. Write rows are unaffected (the write itself is the work). See One JIT to rule them all for the full EA story.

DCE-safety: why every read row has a Blackhole

All japes read rows consume the loaded component through a JMH Blackhole (ReadSystem.bh.consume(pos)). An earlier revision had empty system bodies (void iterate(@Read Position p) {}) which let the JIT escape-analyse the loaded record and delete the whole iteration loop — especially under Valhalla, where the "20–51× DCE artifact" rows came from. The numbers above are the real numbers; the Valhalla page covers the blackhole guard in detail.

Analysis¶

Reading, japes is the fastest JVM ECS in this comparison and lands within 35–60% of Bevy at 10k. Note that the japes read rows use field-level blackhole (bh.consume(p.x())) while Bevy uses object-level black_box(p), so the read-row comparison is not apples-to-apples. The tier-1 GeneratedChunkProcessor is doing its job — each chunk becomes a tight loop that loads raw SoA component arrays once and dispatches through invokevirtual with no MethodHandle or boxing.

Writing, japes is now faster than Bevy and every Java library on this benchmark. With SoA storage as the default, the new Position(...) record in the write path is scalar-replaced by the JIT — the record fields decompose into primitive fastore instructions on the backing float[] arrays, with zero per-entity heap allocation. At 10k, japes (1.70 µs) is 3.7× faster than Bevy (6.29 µs) and 13× faster than Dominion (22.5 µs). This is the payoff from the SoA + escape-analysis story documented in One JIT to rule them all.

The write-path story (pre-SoA vs post-SoA)¶

Before SoA storage, japes's Object[] backing arrays forced every new Position(...) to be heap-allocated (the aastore into Object[] required a heap reference, defeating escape analysis). That was the historical "write-path tax" — 6.1x slower than Bevy.

With SoA storage (float[] x, float[] y, float[] z), the store decomposes into fastore instructions on primitive arrays. The JIT can now prove the Position record never escapes and scalar-replace it entirely. Combined with the tier-1 generator's invokevirtual inlining, the full chain from Mut.set(new Position(...)) through to the backing array store is register-only.

The record + Mut<C> API contract is unchanged — p.set(new Position(...)) still records a change so @Filter(Changed.class) observers can react automatically. The performance difference is purely in the storage layer.

Reproducing¶

./gradlew :benchmark:ecs-benchmark:jmhJar

java --enable-preview \
  -jar benchmark/ecs-benchmark/build/libs/ecs-benchmark-jmh.jar \
  "IterationBenchmark"

To pin a single cell (e.g. just iterateSingleComponent at 10k), JMH accepts the usual regex + parameter filters:

java --enable-preview \
  -jar benchmark/ecs-benchmark/build/libs/ecs-benchmark-jmh.jar \
  "IterationBenchmark.iterateSingleComponent" \
  -p entityCount=10000