The limitation it fixes in Klein
Flux2KleinInpaintPipeline converts the inpaint mask to binary at threshold 0.5. Any zone painted as MASK_FEATURE (grey = 128/255 ≈ 0.50) collapses to MASK_GENERATE — fully
regenerated. The stencil colour workaround (painting a blue stripe into the seed canvas for rivers)
compensates by biasing the model's starting image — a hint, not a constraint. Klein can still
drift off the painted curve, especially on long river segments that cross two tiles.
Differential diffusion bypasses the binary mask entirely. Rather than asking Klein to honour a feature zone, the change map controls exactly how many of the 12 denoising steps each pixel is allowed to evolve — the painted stencil colour is in the source canvas, and the change map decides how much time the model has to change it.
Binary vs soft, side by side
Top row: binary mask — pixels are either fully locked or fully free. The teal box marks the seam region. Bottom row: soft mask — the same region transitions gradually, producing a continuous result rather than a step function.
The mechanism — per-timestep gate
The technique is based on Levin & Fried 2023 (arXiv:2306.00950). At each denoising step i of T total steps, a per-pixel gate decides whether a pixel is copied from the source canvas or left to the model:
Core formula
keep_map = 1.0 - change_map # 0 = generate, 1 = keep
dd_mask_i = keep_map > (i / T) # True = inject source, False = generate
latents = dd_mask_i * source_t + (1 - dd_mask_i) * latents A pixel with μ = 0.35 → keep_map = 0.65. Gate is True (inject source) while i/T < 0.65 → i < 8. So the source canvas is injected for 8 of 12 steps (the global-structure steps), and the model has only 4 steps to paint texture over the stencil colour.
The Klein port: Flux2KleinDifferentialPipeline
The original differential diffusion pipeline targets FLUX.1 [dev] ( FluxDifferentialImg2ImgPipeline, PR #9268). Klein uses a different inpainting pipeline with its own denoising loop. Rather
than patching that loop directly, the Klein port is implemented as a callback hook registered on the standard Flux2KleinInpaintPipeline. At each step, the callback computes the per-pixel gate and corrects pixels that should be
held to the source:
Callback correction logic (the differential-diffusion pipeline)
class Flux2KleinDifferentialPipeline(Flux2KleinInpaintPipeline):
def __call__(self, *args, change_map=None, **kwargs):
if change_map is None:
return super().__call__(*args, **kwargs)
# Build per-step gates from the change_map PIL image
keep_map = 1.0 - change_map_tensor # (1, H/8, W/8)
dd_masks = [keep_map > (i / T) for i in range(T)]
def _dd_callback(pipe, step_i, timestep, cb_kwargs):
latents = cb_kwargs["latents"]
gate = dd_masks[step_i].to(latents.device)
source = cb_kwargs["init_latents_proper"]
inpaint = cb_kwargs["init_mask"] >= 0.5
# Correct pixels that dd says hold AND inpaint mask says generate
needs_correction = inpaint & gate
cb_kwargs["latents"] = torch.where(needs_correction, source, latents)
return cb_kwargs
return super().__call__(*args, callback_on_step_end=_dd_callback, **kwargs)Zone map — what each region gets
The change map is built from the same zone system used for the seed canvas and inpaint mask. Each zone type maps to a fixed μ value:
| Zone | μ (change) | Keep steps 12-step run | Effect |
|---|---|---|---|
Anchor strip (outer 256 px) | 0.00 | 12 / 12 | Neighbour edge never deviates — hard copy from source. |
Inner soft band (64 px) | 0.15 | 11 / 12 | Smooth transition, almost entirely held — tiny seam softening. |
Painted river / path | 0.35 | 8 / 12 | Follows stencil tightly — only 4 steps to drift from the painted colour. |
Open terrain (gen zone) | 1.00 | 0 / 12 | Full creative freedom — model generates from scratch. |
Stencil colours (added in Phase 3.7) paint intent directly into the seed canvas pixels — a blue stripe for a river, tan for a path. This is what the model sees at the start of denoising. Differential diffusion controls how many of the 12 steps the model has to deviate from that starting point.
Together: the stencil says "here is a blue river", the change map says "you only have 4 steps to change it". Both signals reinforce the same painted intent at complementary stages of the pipeline — input conditioning and step-level gating.
Experimental results vs baseline
| Config | Flagged seams | Mean ΔE | Visual |
|---|---|---|---|
| Stencil-only (production baseline) | 4 / 12 | 11.46 | Best seams, vibrant colours |
| Diff-diff μ=0.35 + stencil 0.85 | 5 / 12 | 11.95 | Muted/flat — too few free steps to render texture |
| Diff-diff μ=0.65 + stencil 0.85 | 6 / 12 | 13.36 | Better texture quality, seams worsen |
Runs: field-forest constraint map · 12 steps · TeaCache · stencil 0.85 · 256px Gaussian overlap · seed 42
At μ=0.35 the river follows the painted U-curve more tightly (routing fidelity ↑) but the visual result is flattened — the model has only 4 of 12 steps to paint realistic water texture over vivid blue stencil paint. At μ=0.65 texture quality recovers but seams worsen: adjacent tiles generate their shared 256px overlap zone with different injection histories → slightly different river positions where they meet → higher pre-blend ΔE.
The planned fix is an overlap-zone carve-out: set μ=1.0 for feature pixels that fall inside the anchor band (fully free there, so both tiles generate the same unbiased content in the overlap), while keeping μ=0.35 outside the anchor. This hasn't been tested yet. Current production config is stencil-only.
How it's wired in the painter
Differential diffusion ships in the Modal backend (Flux2KleinDifferentialPipeline) and is the path the painter takes whenever a run includes painted seams. The feature-change
strength (μ) defaults to 0.35 — tighter routing fidelity — and rises to around 0.65 for better
texture inside painted zones. The Cloudflare Workers AI backend falls back to plain
inpainting; differential diffusion needs callback access to the denoising loop, which Workers
AI doesn't expose.
Earlier iteration: FLUX.1 [dev] vendored pipeline
The technique was first evaluated in an earlier iteration using the community-contributed FluxDifferentialImg2ImgPipeline (PR #9268, Apache 2.0, @ryanlyn). That pipeline targets FLUX.1 [dev] img2img, not Klein's
inpainting pipeline.
Earlier iterations on this stack moved the VLM score from 0.17/10 (plain inpaint) to 3.50/10 (exterior anchor + diff-diff, a 10× jump) to 5.56/10 (with bootstrapped seeds added). The Klein port carries over the structural insight (soft gating) without the seed drift problem, since the entire pipeline stays within Klein.