Painter

Techniques

Differential Diffusion

Per-pixel timestep scheduling for soft inpainting

The key insight

Standard inpainting has two states: a pixel is locked or it's free. That binary split produces visible cuts at tile boundaries. Differential diffusion replaces the binary mask with a change map μ ∈ [0, 1] — every pixel says how many denoising steps it's allowed. The seam blends instead of butting against an edge. In the constraint painter this is how painted rivers and paths stay on-curve across tile boundaries.

The limitation it fixes in Klein

Klein binarises the mask at 0.5

Flux2KleinInpaintPipeline converts the inpaint mask to binary at threshold 0.5. Any zone painted as MASK_FEATURE (grey = 128/255 ≈ 0.50) collapses to MASK_GENERATE — fully regenerated. The stencil colour workaround (painting a blue stripe into the seed canvas for rivers) compensates by biasing the model's starting image — a hint, not a constraint. Klein can still drift off the painted curve, especially on long river segments that cross two tiles.

Differential diffusion bypasses the binary mask entirely. Rather than asking Klein to honour a feature zone, the change map controls exactly how many of the 12 denoising steps each pixel is allowed to evolve — the painted stencil colour is in the source canvas, and the change map decides how much time the model has to change it.

Binary vs soft, side by side

Top row: binary mask — pixels are either fully locked or fully free. The teal box marks the seam region. Bottom row: soft mask — the same region transitions gradually, producing a continuous result rather than a step function.

The mechanism — per-timestep gate

The technique is based on Levin & Fried 2023 (arXiv:2306.00950). At each denoising step i of T total steps, a per-pixel gate decides whether a pixel is copied from the source canvas or left to the model:

Core formula

keep_map  = 1.0 - change_map       # 0 = generate, 1 = keep
dd_mask_i = keep_map > (i / T)     # True = inject source, False = generate
latents   = dd_mask_i * source_t + (1 - dd_mask_i) * latents

A pixel with μ = 0.35 → keep_map = 0.65. Gate is True (inject source) while i/T < 0.65 → i < 8. So the source canvas is injected for 8 of 12 steps (the global-structure steps), and the model has only 4 steps to paint texture over the stencil colour.

The Klein port: Flux2KleinDifferentialPipeline

The original differential diffusion pipeline targets FLUX.1 [dev] ( FluxDifferentialImg2ImgPipeline, PR #9268). Klein uses a different inpainting pipeline with its own denoising loop. Rather than patching that loop directly, the Klein port is implemented as a callback hook registered on the standard Flux2KleinInpaintPipeline. At each step, the callback computes the per-pixel gate and corrects pixels that should be held to the source:

Callback correction logic (the differential-diffusion pipeline)

class Flux2KleinDifferentialPipeline(Flux2KleinInpaintPipeline):

    def __call__(self, *args, change_map=None, **kwargs):
        if change_map is None:
            return super().__call__(*args, **kwargs)

        # Build per-step gates from the change_map PIL image
        keep_map = 1.0 - change_map_tensor        # (1, H/8, W/8)
        dd_masks = [keep_map > (i / T) for i in range(T)]

        def _dd_callback(pipe, step_i, timestep, cb_kwargs):
            latents = cb_kwargs["latents"]
            gate    = dd_masks[step_i].to(latents.device)
            source  = cb_kwargs["init_latents_proper"]
            inpaint = cb_kwargs["init_mask"] >= 0.5
            # Correct pixels that dd says hold AND inpaint mask says generate
            needs_correction = inpaint & gate
            cb_kwargs["latents"] = torch.where(needs_correction, source, latents)
            return cb_kwargs

        return super().__call__(*args, callback_on_step_end=_dd_callback, **kwargs)

Zone map — what each region gets

The change map is built from the same zone system used for the seed canvas and inpaint mask. Each zone type maps to a fixed μ value:

Zoneμ (change)Keep steps
12-step run
Effect
Anchor strip (outer 256 px)
0.0012 / 12Neighbour edge never deviates — hard copy from source.
Inner soft band (64 px)
0.1511 / 12Smooth transition, almost entirely held — tiny seam softening.
Painted river / path
0.358 / 12Follows stencil tightly — only 4 steps to drift from the painted colour.
Open terrain (gen zone)
1.000 / 12Full creative freedom — model generates from scratch.
Stencil colours + differential diffusion work at different levels

Stencil colours (added in Phase 3.7) paint intent directly into the seed canvas pixels — a blue stripe for a river, tan for a path. This is what the model sees at the start of denoising. Differential diffusion controls how many of the 12 steps the model has to deviate from that starting point.

Together: the stencil says "here is a blue river", the change map says "you only have 4 steps to change it". Both signals reinforce the same painted intent at complementary stages of the pipeline — input conditioning and step-level gating.

Experimental results vs baseline

ConfigFlagged seamsMean ΔEVisual
Stencil-only (production baseline)4 / 1211.46Best seams, vibrant colours
Diff-diff μ=0.35 + stencil 0.855 / 1211.95Muted/flat — too few free steps to render texture
Diff-diff μ=0.65 + stencil 0.856 / 1213.36Better texture quality, seams worsen

Runs: field-forest constraint map · 12 steps · TeaCache · stencil 0.85 · 256px Gaussian overlap · seed 42

The μ tuning tension — routing vs texture quality

At μ=0.35 the river follows the painted U-curve more tightly (routing fidelity ↑) but the visual result is flattened — the model has only 4 of 12 steps to paint realistic water texture over vivid blue stencil paint. At μ=0.65 texture quality recovers but seams worsen: adjacent tiles generate their shared 256px overlap zone with different injection histories → slightly different river positions where they meet → higher pre-blend ΔE.

The planned fix is an overlap-zone carve-out: set μ=1.0 for feature pixels that fall inside the anchor band (fully free there, so both tiles generate the same unbiased content in the overlap), while keeping μ=0.35 outside the anchor. This hasn't been tested yet. Current production config is stencil-only.

How it's wired in the painter

Differential diffusion ships in the Modal backend (Flux2KleinDifferentialPipeline) and is the path the painter takes whenever a run includes painted seams. The feature-change strength (μ) defaults to 0.35 — tighter routing fidelity — and rises to around 0.65 for better texture inside painted zones. The Cloudflare Workers AI backend falls back to plain inpainting; differential diffusion needs callback access to the denoising loop, which Workers AI doesn't expose.

Earlier iteration: FLUX.1 [dev] vendored pipeline

The technique was first evaluated in an earlier iteration using the community-contributed FluxDifferentialImg2ImgPipeline (PR #9268, Apache 2.0, @ryanlyn). That pipeline targets FLUX.1 [dev] img2img, not Klein's inpainting pipeline.

Earlier iterations on this stack moved the VLM score from 0.17/10 (plain inpaint) to 3.50/10 (exterior anchor + diff-diff, a 10× jump) to 5.56/10 (with bootstrapped seeds added). The Klein port carries over the structural insight (soft gating) without the seed drift problem, since the entire pipeline stays within Klein.