Microsoft is trying to prove its artificial intelligence can do more than guess your age or win at Pac-Man.
On Wednesday, the company released Seeing AI, a "talking camera" app for iOS meant to help the visually impaired navigate their surroundings.
Its list of features sounds almost too good to be true: it will scan documents and read you their contents, tell you which denomination of bills you have in your hands, scan barcodes to let you know exactly what you’re holding, learn the faces of people you know, read their expressions, and even attempt to describe whatever’s in front of you at any given moment.
In practice though, it’s clear that Microsoft still has a lot of work to do before Seeing AI will actually be a useful tool for the visually impaired. (The company is well aware of this, too: the first thing that appears when you launch the app is a warning that "Seeing AI is not always accurate.") Here’s how it fared in our testing.
Text and barcodes
There are already plenty of apps out there that can competently analyze text and barcodes so it’s not surprising that’s what the app did best. It was able to read just about every snippet of text we threw at it and identified most barcodes easily.
While these capabilities are fairly common in other apps, Seeing AI’s implementation is notable in that it assists you in lining up the in-app camera with whatever it is you’re trying to scan. In the case of barcodes, it will beep as you get closer to the barcode and guide you with text to line up the camera with the edges of the document.
It’s readouts of printed text were a bit awkward — but accurate — but it scanned nearly every barcode with ease (a jar of powdered peanut butter tripped it up for some reason).
People and "scenes"
This is where things get more interesting. Text and barcodes aside, Seeing AI uses a ton of Microsoft’s AI algorithms to identify objects and people in your surroundings. These were the most impressive parts of Microsoft’s early demos of the app, but things didn’t go as smoothly in my testing.
I started off easy: my desk and a couch. The app easily recognized both, which maybe set my expectations too high because it only got worse from there.
Image: seeinG Ai
Image: Seeing AI
Next, I tried a stack of books, which the app misidentified as "a stack of flyers [sic]," and a mini fridge with some games on top. Seeing AI came back with "it seems to be floor, indoor, desk," which, though not technically wrong since all those items were technically present, didn’t seem helpful had I actually been trying to figure out what was in front of me.
Next up, I tried the "people" feature. It purports to tell you not only how many people are around you but also estimate their age and emotional state (based on their facial expression).
It seems Microsoft has improved its age-guessing tech quite a bit since it first came out, but it still has some work to do. While the ages came up reasonably close (my colleague whose age was overestimated by five years may disagree), the expression detection seemed less reliable.
The feature is also unable to detect people who are not looking straight into the camera, which seems like a pretty serious limitation.
There were other fails as well: a cardboard cutout of Chewbacca was identified as "person standing in front of a mirror posing for the camera," which is both hilariously wrong and weirdly specific, while one of Kylo Ren came back as "it seems to be wall, indoor [sic]."
Of course, none of this is entirely unexpected. Apps like this require a huge amount of training data and algorithm tweaking before they can be anything close to reliable.
And while it’s easy to laugh at more #AIfails, the reality is this type of technology really could be life changing for someone who’s visually impaired if it indeed gives them the ability to navigate their surroundings more confidently.
But after spending some time trying out Seeing AI, it’s clear Microsoft still has some work to do.