Self-supervised Representation Learning for Ultrasound Video


Recent advances in deep learning have achieved promising performance for medical image analysis, while in most cases ground-truth annotations from human experts are necessary to train the deep model. In practice, such annotations are expensive to collect and can be scarce for medical imaging applications. Therefore, there is significant interest in learning representations from unlabelled raw data. In this paper, we propose a self-supervised learning approach to learn meaningful and transferable representations from medical imaging video without any type of human annotation. We assume that in order to learn such a representation, the model should identify anatomical structures from the unlabelled data. Therefore we force the model to address anatomy-aware tasks with free supervision from the data itself. Specifically, the model is designed to correct the order of a reshuffled video clip and at the same time predict the geometric transformation applied to the video clip. Experiments on fetal ultrasound video show that the proposed approach can effectively learn meaningful and strong representations, which transfer well to downstream tasks like standard plane detection and saliency prediction.

IEEE International Symposium on Biomedical Imaging (ISBI) 2020

PDF and aper summary coming soon!


  author = {Jianbo Jiao, Richard Droste, Lior Drukker, Aris Papageorghiou, Alison Noble},
  title = {Self-supervised Representation Learning for Ultrasound Video},
  booktitle = {IEEE International Symposium on Biomedical Imaging},
  year = {2020}


This work is supported by the ERC ( ERC-ADG-2015 694581, project PULSE) and the EPSRC (EP/GO36861/1 and EP/MO13774/1). AP is funded by the NIHR Oxford Biomedical Research Centre.