RSS
 

Whatever Happened to Voice Recognition?

21 Jun

Remember that Scene in Star Trek IV where Scotty tried to use a Mac Plus?

Star-trek-4-apple-mac-plus

Using a mouse or keyboard to control a computer? Don't be silly. In the future, clearly there's only one way computers will be controlled: by speaking to them.

There's only one teeny-tiny problem with this magical future world of computers we control with our voices.

Voice-recognition-accuracy-rate-over-time

It doesn't work.

Despite ridiculous, order of magnitude increases in computing power over the last decade, we can't figure out how to get speech recognition accuracy above 80% -- when the baseline human voice transcription accuracy rate is anywhere from 96% to 98%!

In 2001 recognition accuracy topped out at 80%, far short of HAL-like levels of comprehension. Adding data or computing power made no difference. Researchers at Carnegie Mellon University checked again in 2006 and found the situation unchanged. With human discrimination as high as 98%, the unclosed gap left little basis for conversation. But sticking to a few topics, like numbers, helped. Saying “one” into the phone works about as well as pressing a button, approaching 100% accuracy. But loosen the vocabulary constraint and recognition begins to drift, turning to vertigo in the wide-open vastness of linguistic space.

As Robert Fortner explained in Rest in Peas: The Unrecognized Death of Speech Recognition, after all these years, we're desperately far away from any sort of universal speech recognition that's useful or practical.

Now, we do have to clarify that we're talking about universal recognition: saying anything to a computer, and having it reliably convert that into a valid, accurate text representation. When you constrain the voice input to a more limited vocabulary -- say, just numbers, or only the names that happen to be in your telephone's address book -- it's not unreasonable to expect a high level of accuracy. I tend to think of this as "voice control" rather than "voice recognition".

Still, I think we're avoiding the real question: is voice control, even hypothetically perfect voice control, more effective than the lower tech alternatives? In my experience, speech is one of the least effective, inefficient forms of communicating with other human beings. By that, I mean ...

  • typical spoken communication tends to be off-the-cuff and ad-hoc. Unless you're extremely disciplined, on average you will be unclear, rambling, and excessively verbose.
  • people tend to hear about half of what you say at any given time. If you're lucky.
  • spoken communication puts a highly disproportionate burden on the listener. Compare the time it takes to process a voicemail versus the time it takes to read an email.

I am by no means against talking with my fellow human beings. I have a very deep respect for those rare few who are great communicators in the challenging medium of conversational speech. Though we've all been trained literally from birth how to use our voices to communicate, voice communication remains filled with pitfalls and misunderstandings. Even in the best of conditions.

So why in the world -- outside of a disability -- would I want to extend the creaky, rickety old bridge of voice communication to controlling my computer? Isn't there a better way?

Robert's post contains some examples in the comments from voice control enthusiasts:

in addition to extremely accurate voice dictation, there are those really cool commands, like being able to say something like "search Google for Balloon Boy" or something like that and having it automatically open up your browser and enter the search term -- something like this is accomplished many times faster than a human could do it. Or, being able to total up a column of numbers in Microsoft Excel by saying simply "total this column" and seeing the results in a blink of an eye, literally.

That's funny, because I just fired up the Google app on my iPhone, said "balloon boy" into it, and got .. a search for "blue boy". I am not making this up. As for the Excel example, total which column? Let's assume you've dealt with the tricky problem of selecting what column you're talking about with only your voice. (I'm sorry, was it D5? B5?) Wouldn't it be many times faster to click the toolbar icon with your mouse, or press the keyboard command equivalent, to sum the column -- rather than methodically and tediously saying the words "sum this column" out loud?

I'm also trying to imagine a room full of people controlling their computers or phones using their voices. It's difficult enough to get work done in today's chatty work environments without the added burden of a floor full of people saying "zoom ... enhance" to their computers all day long. Wouldn't we all end up hoarse and deaf?

Let's look at another practical example -- YouTube's automatic speech recognition feature. I clicked through to the first UC Berkeley video with this feature, clicked the CC (closed caption) icon, and immediately got .. this.

Uc-berkeley-physics-lecture

"Light exerts force on matter". But according to Google's automatic speech recognition, it's "like the search for some matter". Unsurprisingly, it does not get better from there. You'd be way more confused than educated if you had to learn this lecture from the automatic transcription.

Back when Joel Spolsky and I had a podcast together, a helpful listener suggested using speech recognition to get a basic podcast transcript going. Everything I knew about voice recognition told me this wouldn't help, but harm. What's worse: transcribing everything by hand, from scratch -- or correcting every third or fourth word in an auto-generated machine transcript? Maybe it's just me, but the friction of the huge error rate inherent in the machine transcript seems far more intimidating than a blank slate human transcription. The humans may not be particularly efficient, but they all add value along the way -- collective human judgment can editorially improve the transcript, by removing all the duplication, repetition, and "ums" of a literal, by-the-book transcription.

In 2004, Mike Bliss composed a poem about voice recognition. He then read it to voice recognition software on his PC, and rewrote it as recognized.

a poem by Mike Bliss

like a baby, it listens
it can't discriminate
it tries to understand
it reflects what it thinks you say
it gets it wrong... sometimes
sometimes it gets it right.
One day it will grow up,
like a baby, it has potential
will it go to work?
will it turn to crime?
you look at it indulgently.
you can't help loving it, can you?
a poem by like myth

like a baby, it nuisance
it can't discriminate
it tries to oven
it reflects lot it things you say
it gets it run sometimes
sometimes it gets it right
won't day it will grow bop
Ninth a baby, it has provincial
will it both to look?
will it the two crime?
you move at it inevitably
you can't help loving it, cannot you?

The real punchline here is that Mike re-ran the experiment in 2008, and after 5 minutes of voice training, the voice recognition got all but 2 words of the original poem correct!

I suspect that's still not good enough in the face of the existing simpler alternatives. Remember handwriting recognition? It was all the rage in the era of the Apple Newton.

Doonesbury-newton

It wasn't as bad as Doonesbury made it out to be. I learned Palm's Graffiti handwriting recognition language and got fairly proficient with it. More than ten years later, you'd expect to see massively improved handwriting recognition of some sort in today's iPads and iPhones and iOthers, right? Well, maybe, if by "massively improved" you mean "nonexistent".

While it still surely has its niche uses, I personally don't miss handwriting recognition. Not even a little. And I can't help wondering if voice recognition will go the same way.

[advertisement] JIRA 4 - Simplify bug tracking for everyone involved. Get started from $10 for 10 users »

 
 

Bacon Pancakes

21 Jun

Now this is the way to start a Monday! Has anyone here ever tried bacon pancakes made like this? Link

 
 

Bacon Pancakes

21 Jun

Now this is the way to start a Monday! Has anyone here ever tried bacon pancakes made like this? Link

 
 

100+ amazing pieces of Star Wars concept art [Concept Art]

20 Jun
You can't celebrate the awesomeness of concept art without paying special tribute to Star Wars. Ralph McQuarrie's paintings for the original trilogy inspired every concept artist today, and the prequels and video games' art was eye-popping. Here are our favorites. More »
 
 

Real-Life Ninja Turtle [Daily Dose of Cute]

17 Jun

Introducing Raphael's little cousin...

Via Geekologie.

 
 

My Favorite Web Applications for Designers

17 Jun

As we all know, there are many web tools available to designers online, so instead of listing ALL of them, I've decided to share my favorites that I use on an everyday basis. FYI, this doesn't include mac apps or programs, just online web tools and a couple of firefox plug-ins. Enjoy!

 
 

Amazon Patents Social Networking System, Winks at Facebook

17 Jun

social networks clutter imageThe United States Patent and Trademark Office awarded Amazon a patent for a “Social Networking System.” Amazingly enough, the description of the patent sounds, well, pretty much like any social network we’ve seen over the years, including Facebook.

The description of the patent is as follows:

“A networked computer system provides various services for assisting users in locating, and establishing contact relationships with, other users. For example, in one embodiment, users can identify other users based on their affiliations with particular schools or other organizations. The system also provides a mechanism for a user to selectively establish contact relationships or connections with other users, and to grant permissions for such other users to view personal information of the user. The system may also include features for enabling users to identify contacts of their respective contacts. In addition, the system may automatically notify users of personal information updates made by their respective contacts.”

Replacing the word “system” in the paragraph above with “Facebook” reveals, once again, how flawed the U.S. patent system is. This patent was invented by Brian Robertson and Warren Adams — the same two guys who founded social networking service PlanetAll, which Amazon acquired in 1998. Alas, not seeing potential in the service, Amazon shut it down in 2000, but decided to revive it by filing a patent application in May 2008.

Now, let’s look at some of the other social networking-related patents. Four years ago, Friendster patented — you guessed it — “social networking.” It was described as a “system, method, and apparatus for connecting users in an online computer system based on their relationships within social networks.”

Friendster was awarded several more patents for certain aspects of social networking over the years, Facebook patented the newsfeed three months ago, and we’ve seen many other patents that supposedly cover the fundamentals of social networking over the years.

So, after all this, who owns the patent for the social network? You tell us, cause we have no idea.

[via TechFlash]

image courtesy of iStockphoto, drflet



For more technology coverage, follow Mashable Tech on Twitter or become a fan on Facebook




Reviews: Facebook, Friendster, Twitter, iStockphoto

Tags: amazon, facebook, patent, social networking

 

Amazon Patents Social Networking System, Winks at Facebook

17 Jun

social networks clutter imageThe United States Patent and Trademark Office awarded Amazon a patent for a “Social Networking System.” Amazingly enough, the description of the patent sounds, well, pretty much like any social network we’ve seen over the years, including Facebook.

The description of the patent is as follows:

“A networked computer system provides various services for assisting users in locating, and establishing contact relationships with, other users. For example, in one embodiment, users can identify other users based on their affiliations with particular schools or other organizations. The system also provides a mechanism for a user to selectively establish contact relationships or connections with other users, and to grant permissions for such other users to view personal information of the user. The system may also include features for enabling users to identify contacts of their respective contacts. In addition, the system may automatically notify users of personal information updates made by their respective contacts.”

Replacing the word “system” in the paragraph above with “Facebook” reveals, once again, how flawed the U.S. patent system is. This patent was invented by Brian Robertson and Warren Adams — the same two guys who founded social networking service PlanetAll, which Amazon acquired in 1998. Alas, not seeing potential in the service, Amazon shut it down in 2000, but decided to revive it by filing a patent application in May 2008.

Now, let’s look at some of the other social networking-related patents. Four years ago, Friendster patented — you guessed it — “social networking.” It was described as a “system, method, and apparatus for connecting users in an online computer system based on their relationships within social networks.”

Friendster was awarded several more patents for certain aspects of social networking over the years, Facebook patented the newsfeed three months ago, and we’ve seen many other patents that supposedly cover the fundamentals of social networking over the years.

So, after all this, who owns the patent for the social network? You tell us, cause we have no idea.

[via TechFlash]

image courtesy of iStockphoto, drflet



For more technology coverage, follow Mashable Tech on Twitter or become a fan on Facebook




Reviews: Facebook, Friendster, Twitter, iStockphoto

Tags: amazon, facebook, patent, social networking


 

Mona Lisa replicated in software “using only 50 semi transparent polygons”

09 Jun
monalisa.jpg

Roger Alsing created a small program that keeps a string of DNA for polygon rendering. He explains the procedure:

0) Setup a random DNA string (application start)
1) Copy the current DNA sequence and mutate it slightly
2) Use the new DNA to render polygons onto a canvas
3) Compare the canvas to the source image 4) If the new painting looks more like the source imag
e than the previous painting did, then overwrite the current DNA with the new DNA
5) repeat from 1

Now to the interesting part. Could you paint a replica of the Mona Lisa using only 50 semi transparent polygons? That is the challenge I decided to put my application up to.

You can see the whole unfolding here. It's pretty cool!

 
 

Gallery: Digitizing the past and present at the Library of Congress

09 Jun

The Library of Congress has nearly 150 million items in its collection, including at least 21 million books, 5 million maps, 12.5 million photos and 100,000 posters. The largest library in the world, it pioneers both preservation of the oldest artifacts and digitization of the most recent--so that all of it remains available to future generations.

I recently took a tour of two LoC departments that exemplify this mission: the Preservation Research and Testing Division in Washington, D.C., and the National Audio-Visual Conservation Center in Culpeper, Va.


The library's preservation specialists use the latest technology to study and scan ancient books, maps and other historical artifacts.

One process, called scanning electron microscopy, allows them to create elemental maps of manuscripts, identifying the chemical nature of inks and pigments, or the paper itself. Imperceptible changes made by artists appear plain as day when viewed using x-rays.

X-rays, however, aren't easy to work around. One new technique, hyperspectral imaging, offers similarly revelatory results in the darkroom: ultra-high resolution scans of documents, imaged under sharply restricted wavelengths of light, show details denied to the naked eye. Viewed at sharp angles, old documents even reveal data about the woodblocks used to impress them onto the page.

It's not all about moldy maps and tomes, either: thanks to the poor quality of consumer media, techniques are already being developed to recover information from damaged examples. Researchers already understand, for example, why using sticky labels increases the likelihood of failure in CDs and DVDs. (LightScribe etching has no apparent negative effects). So when the work of today's unheralded geniuses end up as priceless, rotting museum pieces, the preservers will be ready.


An ancient book presents the typical problem for archivists: how to better understand something that may be destroyed simply by the act of examining it? Researchers have adopted policies which forbid sacrificing part of an item in the hope of learning more about it.

"We can't afford any damage to anything," said Eric Hansen, chief of the Preservation Research and Testing Division. "Never take a sample; be completely nondestructive. ... We know there will be advances in technology and that current techniques will become outmoded."


The LoC's Jennifer Wade scans a centuries-old but well-preserved copy of Platina's The Lives of the Popes. "We can map the elements, the chemical components," Wade said. "We can simulate changes in heat, cold, and humidity. [But] all we do is provide information about treatment. Others make the restoration decisions."

Fenella France, a research chemist with the Preservation Research and Testing Division, uses a 39 megapixel camera to take high-resolution images of documents ranging from renaissance-era maps to American state papers.

"We don't filter at the camera, we illuminate with small wavelengths," Fenella said. "We're creating a reference set of samples. We can't take samples of the documents themselves--it's just not going to happen"

This technique creates a set of images like a 'stack of cards,' all identically framed but revealing a different spectral face of the subject.

On the plan for the city of Washington designed in 1791 by Pierre L'Enfant, a hidden street plan emerges under IR light. A design for a circle emerges on 16th and K.



It's incredible, it's humbling. It might be 6 p.m. and I'll be exhausted but I think, 'I can't complain--I'm working with the Gettysburg Address!'"

The Gettysburg Address exists on her computer as 8 different documents, each representing a different waveband in the visible spectrum. But only some show the mysterious fingerprint residue that may be Lincoln's own.

"In the next 5-10 years, I wouldn't be surprised if they could pull residual genetic information from the documents. [This is why] one of our foci is making sure that we don't interfere with future research."


One machine used to examine the book is an x-ray fluorescence spectrometer. "The clasp's corroding, degrading, so we're trying to figure out exactly what the corrosion material is," said Wade. "What is it caused by? What could stop it? Interpretation is important."



 


Among the finds: tracings of an earlier document on a Marco Polo map that dates to 1480. Lost text, revealing the cartographer, on 1516's Carta Marina. James Madison's debate papers, it turns out, contain hidden revisions.

"If it's fragile, even researchers have trouble with it," France said."I want to make it acessible."




Hansen stands by a collection of badly-damaged audio recordings that may yet be recoverable using new technology: "You can learn about a culture from how it builds and stores things."



A visitor stands before the Waldseemüller world map.




Fenella France stands beside the unique, 400-liter environmental chamber used to publicly house the map. Hurricane-proof glass and a high-tech aluminum enclosure ensure that it is kept at the perfect temperature and humidity; tests had to be performed to ensure the weight would not pose a structural problem for the Library.

"We pretty much know that the Vinland Map contains titanium dioxide in a form that didn't exist until modern times."
- Eric Hansen




Printed by Martin Waldseemüller in 1507, the Universalis Cosmographia was the first world map to use the name "America" to identify the new world. The only copy of it is at the Library of Congress.


Far fom the bustle and majesty of Capitol Hill, a former nuclear bunker has become home to an unprecedented effort to catalog the nation's creative works. And while the media is more recent than that dealt with in D.C's basement labs, plenty of technical challenges remain.

The National Audio Visual Conservation Center, near Culpeper, Virginia, once contained billions in cash, squirrelled away to kickstart the economy in the event of an atomic apocalypse. Beautifully renovated, it now has 175,000 square feet of offices and laboratories, 135,000 square feet of collections storage, and 55,000 square feet dedicated to storing dangerous nitrate film in optimal conditions. There are more than a million films, television shows, DVDs and games already in its collection.

And it grows, day in, day out. Delivered to loading docks, thousands of items make their way through processing areas until finding a permanent home in the vaults.

Gregory Lukow, chief of the motion picture, broadcasting and recorded sound division at the campus, said that it was staffed by about 100 techs, engineers and other workers. Many items are digitized to ensure their preservation, and to allow researchers to view them remotely in D.C reading rooms. They also host public screenings of classic movies at the in-house cinema.


As the copyright office did not register celluloid prints until 1912, early movie makers created prints of the entire reel on opaque photographic paper. "It's an iconic image in America cinema, that cowboy shooting his gun at the camera, at the audience, at the end of the Great Train Robbery," said Gregory Lukow. "The quality of prints recovered from the paper is shockingly good."


Most of the collections arrive via the copyright registration process. Though works receive copyright protection at the moment of creation, registration provides more legal options in court disputes, ensuring what Lukow called "a tidal wave of material" for the campus to process. But a lot of the material is old -- and not all of it is in good nick.




"The late 1970s is one of the worst times for video longevity," Lukow said. "Magnetic tape is our largest preservation problem."



Gregory Lukow of the Library of Congress shows off the intake bins at their audiovisual campus in Culpeper, VA., packed with the cultural output of a nation. Millions of items are added every year to LoC collections. Highly sensitive items, such as digital prints of movies playing in theaters, often arrive under assumed titles to reduce the likelihood of interception.




The distinctive round-rect casing of RCA Selectavision disks was briefly commonplace in the U.S. Now, the analog video format is a rarity.




There is an entire room at the campus dedicated to rewinding things. Almost every room, however, has cutting facilities of one kind or another.




"We don't want videotape coming in in 5 or 10 years time. Magnetic media is a losing proposition"
- Gregory Lukow


Into the Nitrate Film Storage Vaults: maintained at 39° at 30 percent relative humidity, nitrate film is divided into 124 individually fireproofed chambers, each able to hold about 1,000 cans. Each is designed so that even if a particular reel goes up in flames, it can only damage those in the same insulated cubbyhole. Total capacity: 145,056 cans. Films removed from the vaults must first go through an acclimation chamber before being exposed to normal temperatures and humidity.


The Tony Schwartz collection has an astounding number of field recordings of commercials and other publicly-broadcast media. Passed to the Library after Schwartz's death in 2008, the archive currently fills several large walls. "It's immense," said the Library's Matt Barton. "Thousands of reels of tape, film, video. And I don't know how much correspondence." Schwartz is famous to many as the creator of the Daisy Cutter campaign ad.






Gregory Lukow describes RCA Selectavision, a video format so homely it is denied even the ironic contemporary cachet enjoyed by LaserDisc and 8-track.




Matt Barton of the Library of Congress's National Audio Visual Conservation Center.




Not everything that the Library of Congress uses to examine its collections is high-tech.

Gregory Lukow explains the workings of one of the Library's basic tools: a flatbed film viewer designed to let staff play fragile films without the use of projectors and potentially damaging bulbs.




IRENE--image, reconstruct, erase noise, etcetera--is a system that creates a high-resolution digital map of a record's surface without touching it. Recordings on warped and damaged vinyl can be recovered and restored, then played back by a computer program that emulates the movements of a stylus passing over the modeled grooves. Some records, however, are too badly damaged even for IRENE.




Banks of reel-to-reel tape machines stand in one of the conservation center's digitization rooms. Nearby, a robot-operated VCR works through dozens of tapes automatically.




Scott Rife, senior system administrator, explains the library's digital storage system in this video clip: a tape library with 37,500 slots, each able to store 1TB of data. "That's 37 petabytes. As far as we know, this is the largest digital preservation operation in the world." Even so, they remain committed to preserving film as film: "We wouldn't preserve 35mm as digital right now."

James Snyder, senior systems administor, explains the challenges involved in capturing hundreds of channels of archivable broadcast material. When completed, the Packard Campus's "Live Capture" room will grab 120 video streams from satellite and FM television, 90 DirectTV channels and 20 DISH Network channels. 72 Mac Minis will capture the output of 42 internet radio stations, 10 FM radio stations, and much of what's played on the XM/Sirius satellite radio service. Each machine is able to capture two sources at once: if an individual capture station fails, another picks up the load. Playlists, as cultural snapshots, are themselves important artifacts


A small museum is set aside at the campus for the most beautiful film and broadcasting equipment in its stores. But it's not just for show: old media often needs old equipment to play it. The LoC has little interest in DRM, due to the inherent likelihood that decryption methods will fail or fade away as time passes. "We don't wan't to have to hack anything," Lukow said.

Welcome to the Critical Listening Room. James Smetanick describes the work of an audio engineer tasked with preserving sound recordings. The environment is perfect: non-parallel walls and deeply-pocked paneling kill standing waves and reflections. A custom-made Simon Yorke turntable is good enough for government work: maple knobs not required. "I can't complain about coming in each day," Smetanick said.




Michael Hinton, a staffer at the Library of Congress' NAVCS, works in a spartan room housing an enormous film-processing machine.

The Packard campus contains a huge variety of old and obsolete machines used to view, cut or otherwise manipulate media. It's not just for show, either: obscure formats will become unreadable if the vintage tech used to play them isn't maintained.