Meta claims CM3leon is a generative AI model that can perform both text-to-image and image-to-text generation with versatility.
Below is an example of on of the features from their research where specific objects are positioned in an image by giving a text description of the bounding box segmentation of the image.
Generate high quality image of "a room that has a sink and a mirror in it" with bottle at location (199, 130) -> (204, 150) and with a sink at location (149,133) -> (190, 154) and with bed at location (0,169) -> (67, 255)

Source: Meta’s research