OmniSearch shows how multimodal embedding models build a vector space that holds different media types together. Because these four objects all describe the Sahara, they end up close to each other in that space, sharing similar semantic meaning.
text
“The Sahara is a desert spanning North Africa. With an area of 9,200,000 km², it is the largest hot desert in the world.”