The hype surrounding generative AI’s ability to analyze written words may be overstated, according to two recent studies. The research reveals that these models struggle to comprehend long-form books and answer questions about videos, highlighting significant limitations in their capabilities. One study found that even Google’s advanced Gemini generative AI models, touted for their ability to process large amounts of data, falter when it comes to understanding context in lengthy written works. For instance, Gemini 1.5 Pro correctly answered true/false statements about a 520-page book only 46.7% of the time, while Gemini Flash managed a mere 20%. Meanwhile, a separate study showed that vision language models, including Gemini Flash, struggle to answer questions about videos, often getting bogged down by irrelevant information. These findings should serve as a wake-up call for companies looking to integrate Gen AI into their workforce, as they may not be as effective as initially thought.

AI’s Blind Spot – Generative Models Struggle with Long-Form Context
While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content.
1–2 minutes










