The wrong way: anchor inside the tile
The intuitive thing is to paint a strip of neighbour content into the left edge of the tile and ask the inpainter to "fill in around it." This doesn't work. A standard inpainter doesn't actually treat anchor pixels as fixed — it sees them as a starting point and paints right over them. The seam moves a few pixels to the right, the anchor disappears, and the tile looks unrelated to its neighbour.
This was the first iteration of the technique: VLM 0.17/10. Worse than no conditioning at all — worse than baseline.
Anchor inside the tile area. The diffusion process repaints over it and the anchor's continuity information is lost.
Anchor in an exterior margin. The tile area is painted from scratch with the anchor as visible context. The margin is cropped off at the end.
The right way: anchor outside the tile
Extend the canvas outward by EXT pixels along the relevant edge (or on all four sides for an interior tile). Paste the neighbour's edge strip into that exterior margin. Mark the entire tile region as fully unknown. The painter sees the neighbour as context, paints inward to fill the unknown region, and the margin is cropped off after generation.
Because the anchor lives outside what's being painted, it never gets repainted. The painter has no choice but to treat it as a hard boundary condition — exactly what we want.
This pattern comes from Sartor et al. (2024)'s exterior boundary inpainting. The key claim: an inpainter can't preserve content inside its mask, but it can condition on content outside. Push the continuity guarantee from "preserve these pixels" (which doesn't hold) to "look at these pixels while painting" (which does).
How wide is wide enough?
EXT controls how much neighbour context the painter sees. Too thin and the model treats the margin as noise. Too thick and the margin starts to dominate the prompt budget. The an earlier iteration sweep landed on 512 px:
EXT=256 hit a wall — the VLM still flagged content_unrelated on most tiles. Doubling to 512 px gave the painter enough neighbourhood signal to commit to the right scene. Wider margins past 512 didn't help noticeably and start eating into latent space efficiency, so the pipeline ships at EXT=512.
Why this works
- The painter can't repaint pixels that aren't in its mask. Putting the anchor outside the mask makes it inviolable by construction.
- The painter does attend to outside-mask pixels through the encoder's receptive field. Those pixels reach the latent attention layers as context, which is exactly what's needed for continuation.
- Cropping the margin after generation costs nothing. The exterior pixels were a means to an end; the published tile is just the inner region.
Where this fits
The exterior margin, the differential-diffusion soft mask (see painted seams) and the seed strips that fill the margin (see bootstrapped seeds) all come together in the painter's tile worker before each generation call. They're the three pieces that make the exterior anchor work.