Skip to yearly menu bar Skip to main content


Poster

Synthesizing Environment-Specific People in Photographs

Mirela Ostrek · Carol O'Sullivan · Michael J. Black · Justus Thies

[ ]
Thu 3 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Despite significant progress in generative image synthesis, and full-body generation in particular, state-of-the-art methods are either context-independent, overly reliant to text prompts, or bound to the specific, curated, training datasets, such as fashion images with monotonous backgrounds. Here, our goal is to generate people wearing clothing that is semantically appropriate for a given scene. To this end, we present ESP, a novel method for context-aware full-body generation, that enables photo-realistic inpainting of people into existing “in-the-wild” photographs. ESP is conditioned on a 2D pose and contextual cues that are extracted from the photograph of the scene and integrated into the generation process, where the clothing is modeled explicitly with human parsing masks (HPM). Generated HPMs are then used as tight guiding masks for inpainting and no changes are made to the original background. Our models are trained on a dataset containing a set of in-the-wild photographs of people covering a wide range of different environments. The method is analyzed quantitatively and qualitatively, and we show that ESP outperforms the state-of-the-art on the task of contextual full-body generation. Code will be public for research.

Live content is unavailable. Log in and register to view live content