Michal Franczel
Supervisor(s): Lukáš Hudec
Slovak Technical University
Abstract: Histological examination is a crucial component of breast
cancer diagnostics. Analysis of whole-slide images (WSI)
is a time-consuming process due to their hierarchical na-
ture and size, resulting both in slower diagnostics and a
lack of annotations. Recent advances in vision transform-
ers have demonstrated potential within the field of com-
puter vision. However, their properties with hierarchical
gigapixel images, where contextual information is cru-
cial, remains underexplored. In this paper, we propose
a solution employing semi-supervised learning based on
a self-supervised pretraining and supervised fine-tuning
paradigm, utilizing these advancements. Our approach
modifies vision transformer encoders within the segmen-
tation network to incorporate contextual information from
lower magnification levels through late feature fusion.
The multi-scale model variant outperforms its single-scale
counterpart, improving the dice score by 6.2%. Further-
more, we examine the properties of features learned by
masked image modeling (MIM) and establish that vision
transformers trained with MIM can effectively learn mor-
phological phenotypes from unlabeled histopathological
images, thereby validating its use as a pretraining tech-
nique in this domain.
Keywords: Computer Vision, Image Processing
Full text: Year: 2023