Experts in artificial intelligence have gotten quite good at creating computers that can “see” the world around them—recognizing objects, animals, and activities within their purview. These have become the foundational technologies for autonomous cars, planes, and security systems of the future.
But now a team of researchers is working to teach computers to recognize not just what objects are in an image, but how those images make people feel—i.e., algorithms with emotional intelligence.
“This ability will be key to making artificial intelligence not just more intelligent, but more human, so to speak,” says Panos Achlioptas, a doctoral candidate in computer science at Stanford University who worked with collaborators in France and Saudi Arabia.
To get to this goal, Achlioptas and his team collected a new dataset, called ArtEmis, which was recently published in an arXiv pre-print. The dataset is based on the 81,000 WIkiArt paintings and consists of 440,000 written responses from over 6,500 humans indicating how a painting makes them feel—and including explanations of why they chose a certain emotion. Using those responses, Achlioptas and team, headed by Stanford engineering professor Leonidas Guibas, trained neural speakers—AI that responds in written words—that allow computers to generate emotional responses to visual art and justify those emotions in language.
The researchers chose to use art specifically, as an artist’s goal is to elicit emotion in the viewer. ArtEmis works regardless of the subject matter, from still life to human portraits to abstraction.
The work is a new approach in computer vision, notes Guibas, a faculty member of the AI lab and the Stanford Institute for Human-Centered Artificial Intelligence. “Classical computer vision capturing work has been about literal content,” Guibas says. “There are three dogs in the image, or someone is drinking coffee from a cup. Instead, we needed descriptions that defined emotional content.”