Painter

Techniques

Tiled VAE

Latent-space normalisation before decode

The key insight

Per-tile colour drift is not a VAE problem — it's a denoising problem. Different tile content produces latents with different per-channel energy distributions. Tiled VAE corrects those distributions before the VAE ever sees them, producing more consistent decoded colours with no model surgery.

The actual cause of colour drift

Each tile's content is different — a dark dungeon corner denoises very differently from a bright golden meadow. The denoiser builds up different energy distributions per tile, and when those get decoded to pixels, they come out slightly mismatched in brightness and saturation. Tiled VAE corrects those distributions before the VAE ever decodes them.

Where the drift actually lives

Different tile content → denoiser produces latents with different per-channel energy distributions → the fixed BN step and VAE decode amplify those differences into visible brightness and saturation mismatches between tiles. The fix must happen at the latent level, before decode.

The solution: normalise latents, then decode

After all tiles finish denoising but before any VAE decode, we rescale each tile's latent per-channel mean and std to match the wave-0 anchor tile — the highest-constraint-density tile, generated first. This is a simple linear transform: no hooks, no model patches, no new parameters.

Pipeline flow with --tiled-vae

Tile 0 denoises
·
Tile 1 denoises
·
… Tile 8 denoises
normalise latents
batch VAE decode
Gaussian blend
composite

Wave-0: the normalisation anchor

The wave-0 tile is the one with the highest constraint density — the most painted features per area. It's generated first, and all tiles downstream inherit its anchor edges. It's also the natural reference for colour normalisation: every other tile is rescaled to match wave-0's per-channel statistics before decode.

The counterintuitive finding

The metric lies — visually it's the winner

The tiled-vae run scores 12/12 seams flagged with mean ΔE 43.6 vs palette-match's 4/12 flagged at ΔE 11.5. Yet the tiled-vae composites are visually superior — fewer visible seam lines, more consistent colour across the full map.

The seam ΔE metric measures pre-blend tile divergence — the colour difference between adjacent tiles before the Gaussian blend runs. At 256px overlap, the blend fully resolves those differences. The metric was designed for before-blend quality; it is not a measure of the final composite. Metric and visual quality are decoupled at this overlap size.

Results — all three constraint maps

Tiled VAE composite — Field & Forest

Field & Forest

forest · field · river

Tiled VAE composite — Dense Forest

Dense Forest

forest · river · paths

Tiled VAE composite — Field, Forest & Mountain

Field, Forest & Mountain

field · forest · mountain · river

All runs: 12 steps · TeaCache · stencil 0.85 · 256px Gaussian overlap · seed 42

How it's wired in the painter

Tiled VAE replaces pixel-level palette matching with latent-space normalisation, so when it's enabled the painter automatically skips the post-process histogram match. The Modal backend ships with it on by default; the Cloudflare Workers AI backend uses pixel-level palette match instead, since the per-tile latents aren't accessible from Workers AI.

Want the full math?

BatchNorm2d internals, the normalisation algorithm, and memory calculations.

Latent Normalisation →