In the 1980s, Andrew Barto and Rich Sutton were considered eccentric devotees to an elegant but ultimately doomed idea—having machines learn, as humans and animals do, from experience.

Decades on, with the technique they pioneered now increasingly critical to modern artificial intelligence and programs like ChatGPT, Barto and Sutton have been awarded the Turing Award, the highest honor in the field of computer science.

Barto, a professor emeritus at the University of Massachusetts Amherst, and Sutton, a professor at the University of Alberta, trailblazed a technique known as reinforcement learning, which involves coaxing a computer to perform tasks through experimentation combined with either positive or negative feedback.

“When this work started for me, it was extremely unfashionable,” Barto recalls with a smile, speaking over Zoom from his home in Massachusetts. “It’s been remarkable that [it has] achieved some influence and some attention,” he adds.

Reinforcement learning was perhaps most famously used by Google DeepMind in 2016 to build AlphaGo, a program that learned for itself how to play the incredibly complex and subtle board game Go to an expert level. This demonstration sparked new interest in the technique, which has gone on to be used in advertising, optimizing data-center energy use, finance, and chip design. The approach also has a long history in robotics, where it can help machines learn to perform physical tasks through trial and error.

More recently, reinforcement learning has been crucial to guiding the output of large language models (LLMs) and producing extraordinarily capable chatbot programs. The same method is also being used to train AI models to mimic human reasoning and to build more capable AI agents.

Sutton notes, however, that the methods used to guide LLMs involve humans providing goals rather than an algorithm learning purely through its own exploration. He says having machines learn entirely on their own may ultimately be more fruitful. “The big division is whether [AI is] learning from people or whether it’s learning from its own experience,” he says.

Barto and Sutton’s “work has been a lynchpin of progress in AI over the last several decades,” Jeff Dean, a senior vice president at Google, said in a statement released by the Association for Computing Machinery (ACM) which hands out the Turing Award annually. “The tools they developed remain a central pillar of the AI boom and have rendered major advances.”

Reinforcement has a long and checkered history within AI. It was there at the dawn of the field, when Alan Turing suggested that machines could learn through experience and feedback in his famous 1950 paper “Computing Machinery and Intelligence,” which examines the notion that a machine might someday think like a human. Arthur Samuel, an AI pioneer, used reinforcement learning to build one of the first machine learning programs, a system capable of playing checkers, in 1955.

Share.
Exit mobile version