It sounds like the premise to a blockbuster sci-fi film – computers that “see” like humans. Far-fetched, but you may be seeing it (and it may be seeing you) just around the corner. While we’re not quite there yet, a recently released study from a team of MIT neurosurgeons suggests that new computer models behave in an eerily similar fashion to primate brains when perceiving and sorting images.
New Strides in Technology
Computers, as you may know, have classically had poor vision. While properly equipped devices are able to receive visual input, the high level processing seen in humans and other animals was a far cry from easy to substitute. Most of this comes down to just how good the human brain is at certain tasks.
Our ability to identify faces, for example, is phenomenal. Humans can recognize a face regardless of lighting, expression, perspective, and many, many other factors that can easily confound a computer. Additionally, we have a near limitless ability to remember the faces we see. But all together, even the simplest of visual tasks for a human can pose a major challenge to a computer.
But scientists have started to close that gap. Recent jumps forward in technology and theory have led to major breakthroughs in the ways computers perceive their environments. The famous Google self-driving cars, for instance, use lasers to generate a 3D map of their surroundings. And similar advances have been made in the tricky field of creating computer models that approach the precision of animal visual systems.
An awful lot of that progress comes courtesy of the innovation at the heart of the MIT study – the neural network.
Surfing the Neural Net(work)
Artificial neural networks are one of the more intriguing products of the machine learning field. Designed to mimic biological neural networks, ANNs are composed of systems of connected “neurons” that react to different stimuli. Here’s what Wikipedia offers as an example of artificial neural networks:
“a neural network for handwriting recognition is defined by a set of input neurons which may be activated by the pixels of an input image. After being weighted and transformed by a function (determined by the network’s designer), the activations of these neurons are then passed on to other neurons. This process is repeated until, finally, an output neuron is activated. This determines which character was read.”
In short, ANNs are composed of multiple levels. Each level will process, transform, and transmit information to new levels. ANNs have shown their true value when tasked with challenges that traditional, rules-based programming falls short on, such as recognizing speech, or classifying visual objects.
The specific ANNs used in the study were a new generation known as deep neural networks. DNNs behave much like the ANN described above, and rely on a series of processing levels to eventually model high-level abstractions, making them the perfect subject for a team attempting to emulate the neural network of a primate.
The neural network that sparked the MIT study was constructed by a team from New York University. In trials, it proved to be a match for a macaque brain when tasked with identifying and classifying different objects.
During the study, both the DNNs and the macaques they were matched against were shown a brief glimpse of an image. The images were drawn from one of seven categories – which included cars, faces, and fruit – and were then placed in front of a random background. The macaques and DNNs would then categorize the objects accordingly.
While the fact that the DNNs were able to accurately classify the objects, regardless of background and using several different objects from one category, is remarkable enough, the work of the MIT team also proved something perhaps even more interesting. The ways in which the DNNs and the primate brains were recording information was strikingly similar. Both grouped objects similarly in “representational space,” with kindred objects being grouped close together and dissimilar objects further away.
The success of the DNN was largely attributed to new advances in processing power – researchers have been adopting the graphical processors used to handle the enormous visual load of video games – and recently available, massive image banks, which contain millions of human-labeled images that help train the DNNs.
The MIT study is also a crucial sign that researchers and programmers are using models that at least somewhat mimic the ways in which visual systems are organized in nature. These successes bring us one step closer to more widespread use of computer visual systems, but also give vital insight into the way the human visual system functions.
Time will tell if ANNs and their descendents will ever come into their own as a means to help vision disorders in humans, but at the moment, things are looking bright.