Inpainting-Driven Whole-Slide Synthesis with Latent Diffusion Models

Patrik Kozlík

Supervisor(s): prof. Ing. Vanda Benešová, PhD.

Slovak Technical University

Abstract: Synthesizing large histology regions from a known multi-class semantic mask supports controlled validation of pathology AI, where rare spatial configurations are hard to curate from real whole slides. We present a three-stage pipeline: a histology-fitted VQ autoencoder defines the latent space; a semantic-mask-conditioned latent diffusion model generates RGB patches; and a RePaint-style seam inpainting stage fuses a patch grid into a seamless megapixel canvas using band-shaped masks together with noise and color harmonization, followed by a downsample-inpaint-upsample pass to reach 4096x4096 fields within practical GPU budgets. On the BCSS dataset, outputs preserve mask-defined layout at scale; among evaluated samplers, DDIM with 50 steps delivers the strongest wall-clock efficiency with competitive Fréchet Inception Distance on 10 000 patches, and quantitative/qualitative seam analyses show large improvements over naive tiling.
Keywords: Computer Vision, Image Processing
Full text:
Year: 2026