Monday, October 15, 2012

The Sound of Text

In Part I Styling the Sound of Text,  I covered how website developers can currently style the look of text whether it’s the font, appearance, size, color and other features.  There are currently unapproved specifications from the W3C now to provide the ability to be able to style the sound of text on a page. 

Here in Part II, I'll cover some of the main benefits and assets of using aural style sheets with a focus on usability, accessibility and engagement.


One of the reasons why I get excited about this idea is that in order to take advantage of it, developers will have to write their code in a way that the browser can read and translate the words rather than purely displaying them on the page.  So rather than displaying an image of a menu or site content in an image, they'd need to code it in a way that the browser can read the text.   When text is formatted this way, it means that it will have a greater probability of being accessible to those with disabilities who rely on technology to read the page to them such as an individual who is blind or visually impaired or has a learning disability.   

Text would also have to be formatted in a logical reading order so that the browser read information in order rather than relying on the conventions currently which assume the reader is scanning visually.   

However, this isn’t to say that the rest of the page will be coded in a way that will allow individuals with disabilities to navigate the page or access other content. For example, be able to navigate without a mouse as individuals who are blind do, or making controls for rich media such as video accessible.


As for use of the aural properties in a web page, there are a number of ways a developer might use the opportunity to add richness and texture to their content.   They could alternate two voices on the page to simulate a conversation, whether it’s male / female or two of the same sex.  The specification mentions the ability to specify different versions of the same voice, such as adult male.   There is also the possibility to have the voice be of a child.  

Additionally, they could have each voice solely or primarily emulate out of either right or left channel, so that for someone using headphones, it would enhance the effect. This is what the specification refers to as “voice balance”.  Anyone who has listened to Whole Lotta Love from Led Zeppelin can testify to the impact of switching channels or changing ears can have.  

A developer could also use a voice that matches its target audience such as having a woman speaking to women, or inversely, woman selling products to men (such as is currently done using models to sell men’s hygiene products or jewelry.)

An alternative would be to modify the speed or volume of a voice or add a dramatic pause as the message becomes more poignant.  The specification notes the ability to increase volume by using numbers 1-100, and the ability to control the speed or where a sentence might pause and for how long. The could subtly increase the speed and volume as the message increases in intensity, much like the background music does in a movie. 

Search Engine Optimization (SEO)

As the web evolves,  search engines continue to find ways to gather meaning from the signals and content out there.   In addition to down ranking "black hat" SEO techniques, Google looks for additional signals form pages to determine what your content may be so as to bring better, more relevant, results.  To this end, they've been using image file names as a signal of image content.  So rather than a default file name assigned by the camera which is usually a combination of letters and numbers such as 3242r3.jpg developers are encouraged to use meaningful file names representative of the content of the image.  cat.jpg, ford_mustang.jpg etc. 

With being able to style text by sound, Google and other search engines will be able to derive greater understanding of content based on the tone and feel of the text.  Using the voices chosen, speed, pauses and other information, they'll be able to identify not only content, but perhaps emotion.

The web remains a young animal and is changing and adapting. We continue to add tools which allow us to add texture and relevancy to our content. It's how we visually and viscerally engage readers.

See also

Thursday, October 4, 2012

Styling the Sound of Text using CSS

 It’s not often that an idea comes to you that feels revolutionary and with such clarity that you to sit and write it down in detail.  The other day, I had just such an idea, but after spending an hour and a half writing it out, I decided to Google it.  With mixed emotions, I discovered that much of what I’d thought of has already been worked through in detail .

speaker icons formatted like html code and words "your text here" between them

The idea is that a website developer could style the sound of text much like they style the font, color and size of text now.  When a user visits a site with technology that can interpret this code, that it would translate the content of the site into the voice, sex, accent, pitch, volume and speed that they desire.  I’m excited by the idea that there are others out there thinking and working on these issues and that ideas such as this could become a reality.

Speech to Text

Speech to text (STT) is has gained greater popularity with the general public in recent years.  Users can speak a phrase or paragraph and the computer will convert it to written text that can then be edited.  Products and services such as Nuance’s Dragon NaturallySpeaking and Dragon Dictate and and a host STT enabled applications have taken STT from exclusively software for individuals with disabilities and made it a productivity tool.  Nuance, the makers of Dragon NaturallySpeaking have released versions of their software explicitly for the medical and legal professions as well as professional and home editions. 

Text to Speech 

Text to Speech (TTS), on the other hand, is the process of having software read text text to the user.   It could be a website, document or other electronic format.  TTS is most frequently used by individuals who are blind and visually impaired or who have learning disorders such as dyslexia or other print disabilities.  It allows a website or document to be read to the user so long as the information is in a format that is understandable by the software, accessible.  Users can control the speed at which information is read, the tone and pitch of it, and in some software, even the gender and accent of the speech. 

TTS is gaining popularity by people other than those with disabilities. Various applications will read out your text messages, Kindle book or other information from your computer, smartphone or tablet.  
“There are also circumstances in which listening to content (as opposed to reading) is preferred, or sometimes even required, irrespective of a person's physical ability to access information. For instance: playing an e-book whilst driving a vehicle, learning how to manipulate industrial and medical devices, interacting with home entertainment systems, teaching young children how to read. “ (CCS Speech Module specification )

Styling the Look of Text

Web site developers are currently able to take a piece of text or block of text and specify various attributes about that text.   This includes (but is not limited to) the font, size of the font, colors, and emphasis such as bold and italics.    Developers can also specify and importance to a piece of text by specifying what information is a headline (header), sub header or body text
When building the page, a developer will  often use something called CSS or cascading style sheet to specify the characteristics of each level.  So a Header 1 will always be 18 point, bold, in a particular font.  The Header 2 will be 16 point, red italics in a particular font.   This way, throughout their website, they can mark text as Header 1 and it will pull in the styling from the style sheet (and the developer doesn’t have to remember to tag every one individually).  

Two benefits of this approach is if the developer changes their mind and wants to make all Header 1’s blue, it can be done in the style sheet and every page of the site would change rather than having to make the change individually to occurrence of the Header 1.  It also informs search engines what information is most important on the page.  The headline (Header 1) will be considered more important than the body or paragraph text.  It will interpret it as being what the body is about.  

When a user goes to a website, the site is loaded into their browser and it converts the look of the text based on the information it receives from the site.   Users can override this by changing settings in their browser to change how the text appears based on their personal preferences.   This includes eliminating background images, changing font styles as well as colors.  

Styling the Sound of Content

The CSS Speech module specs out how developers willl be able to specify not only how text looks, but how it will sound when a user is using text to speech software.  Text would be identified to be synthesized in a particular way, which could include the gender, age, speed, pitch and duration.    The current specification, though not part of approved code, also includes the ability to
  • identify how flat or variable, how much range, the voice has
  • emphasis particular words or sentences (vocal-stress)
  • volume of a selection
  • left / right balance of a selection
  • identify audio clip or file to be played at a particular point in the page
  • pauses or gaps between words or sentences

Combining the auditory with the visual text of a site offers an opportunity to create new depths of richness and texture to a site that we previously developed exclusively in the visual realm. 

See also

CSS Speech Module
Appendix A Aural Style Sheets (part of Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification June 2011)

Cleanwweb UK

I love this image from Cleanweb UK.It nicely combines technology of the circuit board as the trunk and roots of the tree with easily recognizable representations of different aspects of sustainable culture.

Cleanweb UK Manifesto

We know that as a species, we are hitting the limits, in resources, pollution, and our impact on the natural world. We know that exponential growth in a closed system is dangerous. We know that we must reduce the impact of our society immediately or face widespread systemic failure.
As makers and entrepreneurs, our task is to make these constraints work for us, and use our creativity to deliver progress without the costs we previously accepted as a side effect of our work.
We have very little time, but we have an incredibly powerful tool at our disposal. We have to apply the power of the web to make change happen at all levels of society, transforming businesses, governments, and citizens on a massive scale.
Beginning today, we will dedicate ourselves to this mission. We will work on projects with true meaning, that make the future a better place to be, rather than creating illusory short-term value.
We hope you will join us. We have a lot of work to do.
Related Posts Plugin for WordPress, Blogger...