A rising variety of consultants have known as for these assessments to be ditched, saying they increase AI hype and create “the phantasm that [AI language models] have better capabilities than what actually exists.” Learn the total story right here.
What stood out to me in Will’s story is that we all know remarkably little about how AI language fashions work and why they generate the issues they do. With these assessments, we’re attempting to measure and glorify their “intelligence” primarily based on their outputs, with out absolutely understanding how they perform beneath the hood.
Our tendency to anthropomorphize makes this messy: “Individuals have been giving human intelligence assessments—IQ assessments and so forth—to machines for the reason that very starting of AI,” says Melanie Mitchell, an artificial-intelligence researcher on the Santa Fe Institute in New Mexico. “The problem all through has been what it means if you check a machine like this. It doesn’t imply the identical factor that it means for a human.”
Children vs. GPT-3: Researchers on the College of California, Los Angeles, gave GPT-3 a narrative a couple of magical genie transferring jewels between two bottles after which requested it the way to switch gumballs from one bowl to a different, utilizing objects similar to a posterboard and a cardboard tube. The thought is that the story hints at methods to resolve the issue. GPT-3 proposed elaborate however mechanically nonsensical options. “That is the type of factor that kids can simply clear up,” says Taylor Webb, one of many researchers.
AI language fashions will not be people: “With massive language fashions producing textual content that appears so human-like, it’s tempting to imagine that human psychology assessments will probably be helpful for evaluating them. However that’s not true: human psychology assessments depend on many assumptions that won’t maintain for big language fashions,” says Laura Weidinger, a senior analysis scientist at Google DeepMind.
Classes from the animal kingdom: Lucy Cheke, a psychologist on the College of Cambridge, UK, suggests AI researchers might adapt strategies used to check animals, which have been developed to keep away from leaping to conclusions primarily based on human bias.
No person is aware of how language fashions work: “I believe that the basic downside is that we maintain specializing in check outcomes reasonably than the way you cross the assessments,” says Tomer Ullman, a cognitive scientist at Harvard College.