Poster

DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution

Shrey Singh ⋅ Prateek Keserwani ⋅ Masakazu Iwamura ⋅ Partha Pratim Roy

Strong blind review: This paper was not made available on public preprint services during the review process

Strong Double Blind

2024 Poster

Project Page Paper PDF [ Slides] [ Poster] [ Supplemental]

Abstract

Severe blurring of scene text images, resulting in the loss of critical strokes and textual information, has a profound impact on text readability and recognizability. Therefore, scene text image super-resolution, aiming to enhance text resolution and legibility in low-resolution images, is a crucial task. In this paper, we introduce a novel generative model for scene text super-resolution called ``\textit{Diffusion-Conditioned-Diffusion Model} (DCDM).'' The model is designed to learn the distribution of high-resolution images via two conditions: 1) the low-resolution image and 2) the character-level text embedding generated by a latent diffusion text model. The latent diffusion text module is specifically designed to generate character-level text embedding space from the latent space of low-resolution images. Additionally, the character-level CLIP module has been used to align the high-resolution character-level text embeddings with low-resolution embeddings. This ensures visual alignment with the semantics of scene text image characters. Our experiments on the TextZoom dataset demonstrate the superiority of the proposed method to state-of-the-art methods.

Chat is not available.