New Norm | The Rise of Audio


Audio’s day is coming.

Like battles in geopolitics, operating systems, and hem line length, the mediums of text, images, and audio have each been in and out of favor.

Text took an early – and admittedly long – lead with the Gutenberg Printing Press in 1436. This allowed for not only storage of text, but easy sharing in the printed form. The impact of the printing press on religion, politics, thought, education, and the world is nothing less than transformative. There is not a single area of society that escaped the impact of simply storing, reproducing, and distributing text.

In addition to storing the original, print allowed for standardized translations, search via a table of contents and index, and elevated text into an art form with different fonts, colors, and layouts.

It was close to 400 years before images gained the same storage and sharing ability with camera photography, and like printing, several more decades before it became more widely spread. Finally, in 1877 audio finally caught up to the storage and sharing race with Thomas Edison’s phonograph cylinder.

With the birth of computers in the 1940s a new race began. Given the complexity and size of image and audio files, it is not surprising that text took (again) an early lead in this realm. While mice and trackballs were added as part of the GUI interface, text became the default input and output media.

Digital audio did gain a short lead over digital photography and the two grew steadily in the 1980s and 1990s. In fact, it is now estimated that 10% of all pictures ever taken have been taken in the last 12 months. With 7 billion people on the planet speaking all the time, imagine the scope and scale of audio compared to images.

Yet behind the majority of the searching, sorting, and organizing of audio and image files is text in the form of metatags, indexing terms, etc. thus limiting the ability to use audio as an input device or to truly search within the file itself.

This is all changing. Just ask Siri or Google Now.

Audio is now being used as a control device replacing keyboards, it can be searched to the spoken word within recordings and videos, and can sync content across multiple screens. Imagine a world with no keyboards, searchable audio, and instantaneous translation.

TVPlus [+] is an interactive television application you use while watching your favorite programs on TV that syncs your second screen device to your television and delivers interesting, relevant, contextual content and social activity about each scene of the show, including actor bios, music, photo galleries, behind the scenes facts and much more.

MAVIS is Microsoft’s Audio Video Indexing Service which uses state of the art speech recognition technology developed at Microsoft Research to enable searching of digitized spoken content, whether from meetings, conference calls, voice mails, presentations, online lectures, or even Internet video.

Shazam, SoundHound, and Tuneup listen for music or audio from commercials and bring you to a web page, URL, or special content. SayHi and T-Translator will translate spoken words in real time on hand held devices.

Even one of the backbones of image tagging – the bar code – is being converted into acoustic barcodes that convert the spacing of the barcodes to unique audio patterns that can be recognized.  Chirp is using unique sounds for sharing between devices. And Gocen is converting written music to audio in real time

We are more than just at the fringe of the rise of audio and we still have a long way to go.

The movies, always a good place to look for signs of new norms, show everything from audio activated spaceships in Prometheus to voice interactive videos in the new Total Recall. And in the real world, SayHi exceeded 10 milliontranslations back in July, Shazam 5 billion songs in August, and specific conversation assistants like Winston are delivering social updates and personalized news in a narrated broadcast format..

Have you started thinking about the voice and personality of your experience or corporate audio? Changing bar and QR codes for unique audio tags? Are you adding voice interface to your event mobile app?


Note: As always, the desire of Janus Dialogs is not to adjudicate the appropriateness of any trend, but to bring it to the forefront for consideration by the caretakers for the shared moments in time we call experience marketing.




  1. When I worked for Dick Clark, Dick was fond of saying that no one understood the power of radio. It’s still an extraordinarily efficient media buy.

    Books on digital like have long taken advantage of the modest bandwidth and storage size of audio files. Now podcasts are beginning to take advantage of the same things that make AM/FM so appealing: low-tech players, multi-taskable (drive and listen), and inexpensive production. Add indexing as you suggest and audio will find new applications in corporate communications, training and HR.

  2. Anonymous says:

    7 basic things a computer has to do to understand speech. #Newnorm #audio @JanusDialogs

  3. Anonymous says:

    All well and good…but would really like a return to high quality audio.