Image generation: position objects using bounding box coordinates in prompts

Posted: Monday 17th July 2023

Meta claims CM3leon is a generative AI model that can perform both text-to-image and image-to-text generation with versatility.

Below is an example of on of the features from their research where specific objects are positioned in an image by giving a text description of the bounding box segmentation of the image.

Generate high quality image of "a room that has a sink and a mirror in it" with bottle at location (199, 130) -> (204, 150) and with a sink at location (149,133) -> (190, 154) and with bed at location (0,169) -> (67, 255)

Source: Meta’s research

Read about the other features of CM3leon here.


Receive the latest posts and updates by email.

Make sure you login or register to comment below:

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Receive the latest posts and updates by email.

>