Poster

A Rotation-invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound Images

Tianyi Liu · Shuaishuai S Zhuang · Jiacheng Nie · Geng Chen · Yusheng Guo · Guangquan Zhou · Jean-Louis Coatrieux · Yang Chen

Strong blind review: This paper was not made available on public preprint services during the review process

Strong Double Blind

2024 Poster

Paper PDF [ Poster] [ Supplemental]

Abstract

Endoscopic Ultrasound (EUS) is advantageous in perceiving hierarchical changes in the esophageal tract wall for diagnosing submucosal tumors. However, the lesions often disrupt the structural integrity and fine-grained texture information of the esophageal layer, impeding the accurate diagnosis. Moreover, the lesions can appear in any radial position due to the characteristics of EUS imaging, further increasing the difficulty of diagnosis. In this study, we advance an automatic classification model by equipping the Vision Transformer (ViT), a recent state-of-the-art model, with a novel statistical rotation-invariant reinforcement mechanism dubbed SRRM-ViT. Mainly, we adaptively select crucial regions to avoid interference from irrelevant information in the image. Also, this model integrates histogram statistical features with rotation invariance into the self-attention mechanism, achieving bias-free capture of fine-grained information of lesions at arbitrary radial positions. Validated by in-house clinical data and public data, SRRM-ViT has demonstrated remarkable performance improvements, which demonstrates the efficacy and potential of our approach in EUS image classification. Keywords: Fine-Grained Visual Classification (FGVC), Endoscopic Ultrasound (EUS), Rotation Invariant, Token Selection.

Chat is not available.