Realistic multi-observer tick¶

RealisticTickBenchmark is the shape a real game-loop actually has. It exists to answer the obvious follow-up to the sparse-delta page: "but what if I add more observers?"

What this workload measures

Population. 10 000 (and, for the scaling cell, 100 000) entities carrying {Position, Velocity, Health, Mana}.
Mutation. 1% turnover per tick — 100 sparse mutations per component, three rotating cursors so the three slices don't overlap.
Observation. Three @Filter(Changed) observers, one per component, each accumulating a per-observer sum into a shared Stats resource.
Executors. Two variants: st (single-threaded scheduler) and mt (japes MultiThreadedExecutor, ForkJoinPool-backed, parallelises disjoint systems automatically).

Counterparts exist for every library in the comparison. Bevy and Zay-ES use their native change-detection primitives (Changed<T> query filter and EntitySet.getChangedEntities() respectively). Dominion and Artemis have no built-in change detection, so their observer passes do full iterations over every entity with the component — the "lazy user" path. The mt variant in Dominion / Artemis dispatches the three observer passes to a fixed ExecutorService, exactly what japes does for you from the declared system access metadata.

Results¶

100 dirty per component per tick, µs/op — lower is better. Copied verbatim from DEEP_DIVE.md.

library	10k µs/op	100k µs/op	scaling	cost model
japes st	6.94	12.7	1.83×	dirty-list skip (scales with K)
zay-es	15.4	19.6	1.27×	dirty-list skip (scales with K)
bevy (native Rust)	8.81	76.9	8.73×	full archetype scan (scales w/ N)
artemis st	24.5	279	11.4×	full archetype scan (no CD)
dominion st	44.6	389	8.72×	full archetype scan (no CD)

The libraries split into two cost-model camps, empirically:

Dirty-list skip (japes, Zay-ES) — a per-archetype list of slot indices that were mutated since the last prune. @Filter(Changed) / EntitySet.getChangedEntities() walks only that list. Per-tick cost is O(K) where K is the dirty count, not O(N) where N is total entities. Scaling from 10k→100k costs ~35% more on japes (larger handle array for the driver's getComponent lookups) and ~27% more on Zay-ES.
Full-archetype scan (Bevy, Dominion, Artemis) — observers iterate the full archetype and either tick-compare every entity (Bevy's Changed<T>) or walk every component regardless (Dominion findEntitiesWith, Artemis IteratingSystem with no filter). Per-tick cost is O(N) because that's the algorithmic shape. Scaling from 10k→100k costs ~8–11× more.

At 10k entities japes beats Bevy by 1.27×. The gap looks modest because 10k is small enough that Bevy's tight cache-friendly tick scan is only paying ~3 µs of pure scan cost. At 100k entities the same workload is a 6.06× gap — Bevy pays ~64 µs extra to scan 90 000 more tick words that japes never touches.

Worth calling out: Zay-ES beats Bevy at 100k (19.6 vs 76.9). Zay-ES has higher per-mutation overhead than japes (more allocations in the driver side, per-set applyChanges() calls) but its EntitySet.getChangedEntities() is a dirty-list skip, so it scales the same shape as japes. The two dirty-list libraries stay in the same cost bucket at any entity count; the three scan libraries scale out of it past ~50k.

How the two cost models separate¶

The per-additional-entity cost at the 10k → 100k step tells the whole story:

library	Δ µs for Δ 90k entities	per-entity overhead
japes st	+5.76	64 ns / entity
zay-es	+4.20	47 ns / entity
bevy	+68.1	757 ns / entity
artemis st	+254	2 828 ns / entity
dominion st	+344	3 827 ns / entity

japes's ~64 ns/entity is driver-side cost (the handle list grows, the archetype's chunk list grows, SoA field-array resizing). The observer side is ~flat because the dirty list is still 300 slots.

Bevy's ~757 ns/entity breaks down as 3 observers × ~252 ns = each observer does roughly one tick-word load + compare + branch per entity, which at ~0.25 ns/check × 100k entities × 3 observers ≈ 76 µs. Matches.

Dominion / Artemis pay more per entity because their full scans happen in the user-facing benchmark driver too (each observer calls findEntitiesWith / IteratingSystem.process which rebuilds its iterator state), not just inside a tight Bevy-style Changed<T> filter.

Why Bevy doesn't ship a dirty-slot list for Changed<T>

It's a deliberate API trade-off, not a missed optimisation. Tick-per-slot is cheaper per mutation (one store, no dedup, no append), which matters for Bevy's target workload — dense simulation where most components get touched every tick and the dirty list would contain most of the world. The catch with the dirty-list is opposite: it wins on sparse delta, loses on dense.

japes pays ~5–10 ns extra per mutation for the dirty-list maintenance, which is invisible at 300 mutations/tick (total ~3 µs) but would start to hurt at millions of mutations/tick. Run japes on iterateWithWrite (every entity touched every tick, K = N) and Bevy wins by ~6× — the opposite direction, same cost model.

DCE safety¶

Before trusting these numbers, the obvious question is "are we hitting a dead-code-elimination trap anywhere?" The Bevy observer body writes into ResMut<RtStats> which is never read outside the benchmark closure — if the compiler can prove the writes have no observable effect, it's allowed to delete the observer bodies entirely.

Explicitly checked:

japes. The @Benchmark body calls bh.consume(stats.sumX) / sumHp / sumMana at the end of every tick. JMH's Blackhole.consume is opaque to the JIT, so the accumulation chain is preserved.
Bevy. The b.iter(||) closure calls world.resource::<RtStats>() + std::hint::black_box(stats.sum_x) / sum_hp / sum_mana after schedule.run. black_box is rustc's equivalent of Blackhole.consume.

Re-ran Bevy after adding the black_box guards: result 8.81 µs at 10k (was 8.80 µs without the guard). Delta is pure measurement noise, which means DCE wasn't happening even without the guard — the cross-crate call chain schedule.run → system fn pointer → observer body already defeats rustc's DCE at the default cargo bench opt level (opt-level = 3, no LTO). The guard is there as insurance for future readers.

Same-work audit — driver parity¶

Each library's driver does 300 sparse mutations per tick via three rotating cursors. The operation shapes differ slightly:

library	operation per mutation	per-mutation alloc
japes	`world.setComponent(e, new Position(...))` (new record)	allocates
zay-es	`data.setComponent(id, new Position(...))`	allocates
bevy	`world.get_mut::<Position>(e).x += 1.0`	in-place
dominion	`e.get(Position.class).x += 1` (mutable POJO)	in-place
artemis	`pm.get(e).x += 1` (mutable Component subclass)	in-place

This is an asymmetry on the driver side: japes and Zay-ES allocate 300 record instances per tick that Bevy / Dominion / Artemis don't. Direction of the asymmetry: favours Bevy / Dominion / Artemis. japes is paying extra allocation cost its comparison-peers aren't — and still winning. If we fixed the asymmetry (either by making japes's driver mutate in place somehow, or by making Bevy's driver allocate new records), the 9.72× gap at 100k would widen further, not shrink.

Code comparison (single-threaded path)¶

The japes observer is 11 lines including the class declaration:

public static final class HealthObserver {
    final Stats stats;
    HealthObserver(Stats stats) { this.stats = stats; }

    @System(stage = "PostUpdate")
    @Filter(value = Changed.class, target = Health.class)
    void observe(@Read Health h) {
        stats.sumHp += h.hp();
    }
}

// ... and the builder:
World.builder()
    .executor(Executors.multiThreaded())   // <-- parallelism opt-in
    .addSystem(new PositionObserver(stats))
    .addSystem(new HealthObserver(stats))
    .addSystem(new ManaObserver(stats))
    .build();
// scheduler knows these observers read disjoint components
// and fans them out across the ForkJoinPool automatically.

And the Dominion / Artemis counterpart (in the mt path — the st path is similar but without the executor):

ExecutorService pool = Executors.newFixedThreadPool(3);

private long observePositions() {
    long sum = 0;
    var rs = world.findEntitiesWith(Position.class);
    for (var r : rs) sum += (long) r.comp().x;
    return sum;
}
// ... and two more observer functions ...

public void tick() {
    // ... sparse mutations ...
    var f1 = pool.submit(this::observePositions);
    var f2 = pool.submit(this::observeHealths);
    var f3 = pool.submit(this::observeManas);
    sumX += f1.get();
    sumHp += f2.get();
    sumMana += f3.get();
}

The Dominion / Artemis version has to know a priori that the three observers don't conflict, know to dispatch them in parallel, own the thread-pool lifecycle, and do all of this over again every time an observer is added or removed. japes knows all of that from @Read / @Write / @Filter annotations and the scheduler's DAG builder.

Valhalla delta¶

benchmark	case	japes	japes-v	Δ
`RealisticTick tick`	10k / st	6.94	11.9	—
`RealisticTick tick`	10k / mt	—	17.8	—

The Valhalla numbers are from the pre-SoA sweep and are not directly comparable to the new stock numbers. A fresh Valhalla sweep with SoA storage is pending. See the Valhalla page for the previous breakdown.

Reproducing¶

./gradlew :benchmark:ecs-benchmark:jmhJar

java --enable-preview \
  -jar benchmark/ecs-benchmark/build/libs/ecs-benchmark-jmh.jar \
  "RealisticTickBenchmark" \
  -p entityCount=10000,100000