Inpainting-Driven Whole-Slide Synthesis with Latent Diffusion Models
Patrik Kozlík
Supervisor(s): prof. Ing. Vanda Benešová, PhD.
Slovak Technical University
Abstract: Synthesizing large histology regions from a known multi-class semantic mask supports controlled validation of pathology AI, where rare spatial configurations are hard to curate from real whole slides. We present a three-stage pipeline: a histology-fitted VQ autoencoder defines the latent space; a semantic-mask-conditioned latent diffusion model generates RGB patches; and a RePaint-style seam inpainting stage fuses a patch grid into a seamless megapixel canvas using band-shaped masks together with noise and color harmonization, followed by a downsample-inpaint-upsample pass to reach 4096x4096 fields within practical GPU budgets. On the BCSS dataset, outputs preserve mask-defined layout at scale; among evaluated samplers, DDIM with 50 steps delivers the strongest wall-clock efficiency with competitive Fréchet Inception Distance on 10 000 patches, and quantitative/qualitative seam analyses show large improvements over naive tiling.Keywords: Computer Vision, Image ProcessingFull text:Year: 2026