Enhancing Histology Datasets With Synthetic Data for Semantic Segmentation
Tomas Tánczos
Supervisor(s): prof. Ing. Vanda Benešová, PhD.
Slovak Technical University
Abstract: Analyzing digital histopathological images is crucial in medical diagnostics; however, obtaining large, well-annotated datasets is challenging. This work focuses on augmenting histopathological datasets using generative neural networks and evaluating the new data's influence on the deep learning-based segmentation model. The analysis examines current methods for generating synthetic images and compares them to those that best meet our requirements. Based on this evaluation, we decided to prioritize denoising diffusion probabilistic models over generative adversarial networks due to their ability to perform image synthesis and inpainting. Because of the nature of image synthesis from noise and image inpainting processes, our solution combines these two and leverages their potential in dataset augmentation. The proposed solution experiments on in-house histopathological image Figure 2 datasets of heart tissue because, during the previous research, the blood vessel class showed a significant underrepresentation. We experimented with image synthesis in pixel and latent space with the same model architecture to test if we could capture better features with latent representation. The paper evaluates the quality of generated images using quantitative metrics and visual analysis. The high-level overview of our work is visible in Figure 1. We improved the performance of the segmentation model by 2% with our synthetic dataset. Keywords: Computer VisionFull text:Year: 2025