Handwritten text generation (HTG) conditioned on content and style is a challenging task due to the variability of inter-user characteristics and the unlimited combinations of characters that form new words unseen during training. Diffusion Models have recently shown promising results in HTG, however they are still under-explored. We present DiffusionPen (DiffPen), a 5-shot style handwritten text generation approach based on Latent Diffusion Models. By utilizing a hybrid style extractor that combines the power of metric learning and classification, our approach manages to capture both textual and stylistic characteristics of seen and unseen words and styles and generate realistic handwritten samples. We perform experiments on IAM offline handwriting database to evaluate the generated style and content and compare them with other SotA methods. Our method outperforms other methods qualitatively and quantitatively and additional data from our method can improve the performance of a Handwriting Text Recognition (HTR) system. The code is available at: (code repository will be released in case of acceptance to maintain anonymity).
Live content is unavailable. Log in and register to view live content