The naive view of how humans "see" what they look at assumes that what goes into our eyes, which is basically a map of hue and value areas captured from the outside world on our retinas, gets dumped into our brains pretty much as is. Philosophers once imagined a situation like the character in the first Men In Black movie, which had a "little guy in the big guy's head." That, of course, sets up an endless cycle of guys within guys that quickly fails via a reductio ad absurdum argument transparent to even the most stone headed of stone-age thinkers.
That ain't what goes on. The hue and value map that reaches our retinas undergoes massive image processing before what's left of the image ever gets out of the eyeball. What goes down the optic nerve is information extracted from the retinal image that our brains can use to identify objects, and fix their positions in three-space.
Look is an intransitive verb meaning to open our eyes, and aim them in a certain direction. See is a transitive verb meaning to identify the thing we're looking at.
Current research in machine vision aims at bridging the gap between looking and seeing for automated systems. Current systems appear to misidentify objects at a rate of 30 percent to 40percent false positives. That means the systems don't just get stumped, they actually come up with a wrong answer.
That ain't good.
Human perceptions err, too. But, much less often and always with a conservative bias. We've all had the experience of finding human faces emerging from chaotic patterns, such as the grain in a piece of wood. That's caused by our built-in bias to hypothesize people in any as-yet-unidentified visual scene. It's the first step in how babies find their mothers!
Current systems go whole HOG (Haw! Haw!) with something called a Histogram of Oriented Gradients. These systems construct a database from measurements of strength and orientation of gradients in small patches pulled out of images, and compare them with similar HOGs of previously analyzed images. So far, so good, except that the systems get it badly wrong one third of the time.
In an effort to figure out why, researchers at the Massachusetts Institute of Technology (MIT) turned the contents of HOG databases into images, themselves, and presented these images to human analysts for interpretation. The analysts not only had a similar success rate, but the mistakes they made were similar to the mistakes HOG-based object-recognition systems make.
The researchers hope that their technique will help machine-vision developers figure out what their machine-vision systems are doing wrong—and find a way to do it right.
C.G. Masi has been blogging about technology and society since 2006. In a career spanning more than a quarter century, he has written more than 400 articles for scholarly and technical journals, and six novels dealing with automation's place in technically advanced society. For more information, visit www.cgmasi.com.