Skip to yearly menu bar Skip to main content


Poster

Text2Place: Affordance-aware Text Guided Human Placement

Rishubh Parihar · Harsh Gupta · Sachidanand VS · Venkatesh Babu RADHAKRISHNAN

Strong blind review: This paper was not made available on public preprint services during the review process Strong Double Blind
[ ]
Thu 3 Oct 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

For a given scene, humans can easily reason for the locations and pose to place objects. Designing a computational model to reason about these affordances poses a significant challenge, mirroring the intuitive reasoning abilities of humans. This work tackles the problem of realistic human insertion in a given background scene termed as Semantic Human Placement. This task is extremely challenging given the diverse backgrounds, scale, and pose of the generated person and, finally, the identity preservation of the person. We divide the problem into the following two stages i) learning semantic masks using text guidance for localizing regions in the image to place humans and ii) subject-conditioned inpainting to place a given subject adhering the scene affordance within the semantic masks. For learning semantic mask, we leverage rich object-scene priors learned from the text-to-image generative models and optimize a novel parameterization of the semantic mask, eliminating the need of large scale training. To the best of our knowledge, we are the first one to provide an effective solution for realistic human placements in diverse real world scenes. The proposed method can generate highly realistic scene compositions while perserving the background and subject identity. Further, we present results for several downstream tasks - scene hallucination from a single or multiple generated persons and text-based attribute editing. With extensive comparisons against strong baselines, we show the superiority of our method in realistic human placement.

Live content is unavailable. Log in and register to view live content