Segmentation of Whole-Slide Images with Context-Aware Vision Transformers

Michal Franczel

Supervisor(s): Lukáš Hudec

Slovak Technical University

Abstract: Histological examination is a crucial component of breast cancer diagnostics. Analysis of whole-slide images (WSI) is a time-consuming process due to their hierarchical na- ture and size, resulting both in slower diagnostics and a lack of annotations. Recent advances in vision transform- ers have demonstrated potential within the field of com- puter vision. However, their properties with hierarchical gigapixel images, where contextual information is cru- cial, remains underexplored. In this paper, we propose a solution employing semi-supervised learning based on a self-supervised pretraining and supervised fine-tuning paradigm, utilizing these advancements. Our approach modifies vision transformer encoders within the segmen- tation network to incorporate contextual information from lower magnification levels through late feature fusion. The multi-scale model variant outperforms its single-scale counterpart, improving the dice score by 6.2%. Further- more, we examine the properties of features learned by masked image modeling (MIM) and establish that vision transformers trained with MIM can effectively learn mor- phological phenotypes from unlabeled histopathological images, thereby validating its use as a pretraining tech- nique in this domain.
Keywords: Computer Vision, Image Processing
Full text:
Year: 2023