Joshua Howe: Styling the Sound of Text using CSS

It’s not often that an idea comes to you that feels revolutionary and with such clarity that you to sit and write it down in detail. The other day, I had just such an idea, but after spending an hour and a half writing it out, I decided to Google it. With mixed emotions, I discovered that much of what I’d thought of has already been worked through in detail .

The idea is that a website developer could style the sound of text much like they style the font, color and size of text now. When a user visits a site with technology that can interpret this code, that it would translate the content of the site into the voice, sex, accent, pitch, volume and speed that they desire. I’m excited by the idea that there are others out there thinking and working on these issues and that ideas such as this could become a reality.

Speech to Text

Speech to text (STT) is has gained greater popularity with the general public in recent years. Users can speak a phrase or paragraph and the computer will convert it to written text that can then be edited. Products and services such as Nuance’s Dragon NaturallySpeaking and Dragon Dictate and and a host STT enabled applications have taken STT from exclusively software for individuals with disabilities and made it a productivity tool. Nuance, the makers of Dragon NaturallySpeaking have released versions of their software explicitly for the medical and legal professions as well as professional and home editions.

Text to Speech

Text to Speech (TTS), on the other hand, is the process of having software read text text to the user. It could be a website, document or other electronic format. TTS is most frequently used by individuals who are blind and visually impaired or who have learning disorders such as dyslexia or other print disabilities. It allows a website or document to be read to the user so long as the information is in a format that is understandable by the software, accessible. Users can control the speed at which information is read, the tone and pitch of it, and in some software, even the gender and accent of the speech.

TTS is gaining popularity by people other than those with disabilities. Various applications will read out your text messages, Kindle book or other information from your computer, smartphone or tablet.

“There are also circumstances in which listening to content (as opposed to reading) is preferred, or sometimes even required, irrespective of a person's physical ability to access information. For instance: playing an e-book whilst driving a vehicle, learning how to manipulate industrial and medical devices, interacting with home entertainment systems, teaching young children how to read. “ (CCS Speech Module specification )

Styling the Look of Text

Web site developers are currently able to take a piece of text or block of text and specify various attributes about that text.   This includes (but is not limited to) the font, size of the font, colors, and emphasis such as bold and italics.    Developers can also specify and importance to a piece of text by specifying what information is a headline (header), sub header or body text
.
When building the page, a developer will often use something called CSS or cascading style sheet to specify the characteristics of each level. So a Header 1 will always be 18 point, bold, in a particular font. The Header 2 will be 16 point, red italics in a particular font.   This way, throughout their website, they can mark text as Header 1 and it will pull in the styling from the style sheet (and the developer doesn’t have to remember to tag every one individually).

Two benefits of this approach is if the developer changes their mind and wants to make all Header 1’s blue, it can be done in the style sheet and every page of the site would change rather than having to make the change individually to occurrence of the Header 1. It also informs search engines what information is most important on the page. The headline (Header 1) will be considered more important than the body or paragraph text. It will interpret it as being what the body is about.

When a user goes to a website, the site is loaded into their browser and it converts the look of the text based on the information it receives from the site.   Users can override this by changing settings in their browser to change how the text appears based on their personal preferences.   This includes eliminating background images, changing font styles as well as colors.

Styling the Sound of Content

The CSS Speech module specs out how developers willl be able to specify not only how text looks, but how it will sound when a user is using text to speech software. Text would be identified to be synthesized in a particular way, which could include the gender, age, speed, pitch and duration. The current specification, though not part of approved code, also includes the ability to

identify how flat or variable, how much range, the voice has
emphasis particular words or sentences (vocal-stress)
volume of a selection
left / right balance of a selection
identify audio clip or file to be played at a particular point in the page
pauses or gaps between words or sentences

Combining the auditory with the visual text of a site offers an opportunity to create new depths of richness and texture to a site that we previously developed exclusively in the visual realm.

See also

Part II: The Sound of Text

CSS Speech Module
Appendix A Aural Style Sheets (part of Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification June 2011)

Joshua Howe

Pages

Thursday, October 4, 2012

Styling the Sound of Text using CSS

Speech to Text

Text to Speech

Styling the Look of Text

Styling the Sound of Content

No comments:

Post a Comment