Skip to yearly menu bar Skip to main content


Poster

DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution

Shrey Singh · Prateek Keserwani · Masakazu Iwamura · Partha Pratim Roy

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Wed 2 Oct 1:30 a.m. PDT — 3:30 a.m. PDT

Abstract:

Severe blurring of scene text images, resulting in the loss of critical strokes and textual information, has a profound impact on text readability and recognizability. Therefore, scene text image super-resolution, aiming to enhance text resolution and legibility in low-resolution images, is a crucial task. In this paper, we introduce a novel generative model for scene text super-resolution called ``\textit{Diffusion-Conditioned-Diffusion Model} (DCDM).'' The model is designed to learn the distribution of high-resolution images via two conditions: 1) the low-resolution image and 2) the character-level text embedding generated by a latent diffusion text model. The latent diffusion text module is specifically designed to generate character-level text embedding space from the latent space of low-resolution images. Additionally, the character-level CLIP module has been used to align the high-resolution character-level text embeddings with low-resolution embeddings. This ensures visual alignment with the semantics of scene text image characters. Our experiments on the TextZoom dataset demonstrate the superiority of the proposed method to state-of-the-art methods.

Live content is unavailable. Log in and register to view live content