Pages

Wednesday, February 13, 2013

WTF OMG its Dictionary Maintenance

close up of words in russian to english dictionary

It’s easy to ignore the thousands of decisions that go into a piece of software or a website.  Beyond what each page says and how it’s arrange, there how many menu items and how they’re organized to the size of a font and the degree of contrast from the background.  These are things that most people never think about, but encounter virtually every day.  

Noah Sussman wrote a post called Falsehoods programmers believe about time . It illustrates all of the various ways a programmer can screw up coding time in an application.   For example:
  • There are always 24 hours in a day.
  • Months have either 30 or 31 days.
  • Years have 365 days.
  • February is always 28 days long.
  • Any 24-hour period will always begin and end in the same day (or week, or month).
In it he gives a hat tip to Patrick McKenzie’s Falsehoods programmers believe about names which gives a litany of ways to incorrectly program names. 

I am reminded of these as I manage a system dictionary which serves as the spell check for the application.   When you search for “dictionary maintenance” you get a full page of definitions of maintenance (with the last being the Urban Dictionary definition of “high maintenance”), but there no resources that I could find which aid in maintaining a system dictionary. 

The reason for this is that the dictionary is, well, the dictionary. It’s given to us and there’s one way right way to write a word, right?  

Aside from the obvious misspellings and slag which people frequently recommend be added to the dictionary, there are a host of other issues you encounter when you dive into the maintenance aspect of adding or rejecting items for the dictionary.

Names in general pose a variety of issues.  Should proper nouns / proper names be in the dictionary?   The editors as Wikipedia wish to make a distinction between the two,
A distinction is normally made in current linguistics between proper nouns and proper names. By this strict distinction, because the term noun is used for a class of single words (tree, beauty), only single-word proper names are proper nouns: Peter and Africa are both proper names and proper nouns; but Peter the Great and South Africa, while they are proper names, are not proper nouns
For my purposes they’re largely interchangeable.   Individuals’ names do not need to be included in the dictionary while business names (especially those which are unique or spelled incorrectly al la Dunkin Donuts) should be included in the dictionary.  Location and geographic names such as towns, cities, counties, states, lakes, rivers etc. should be included in the dictionary.  

Initialisms and Acronyms are always fun.   While extremely similar there are slight differences.
Initialisms are abbreviations which consist of the initial (i.e. first) letters of words and which are pronounced as separate letters when they are spoken”- Oxford Dictionaries
For example the BBC (British Broadcasting Company), UN (United Nations) and the text lingo WTF (What the Fuck) and OMG (Oh My God).  On the other hand,
Acronyms are words formed from the initial letters of other words and pronounced as they are spelled, not as separate letters.- Oxford Dictionaries
These are things such FIAT, IKEA and SCUBA.  (Some might also call these neologisms as they’ve become words inthemselves and most couldn’t tell you what they stand for with the possible exception of SCUBA).

Another issue with names is shortening them.   While most Europeans wouldn't dream of shortening Katherine to Kathy without explicit permission, in America we frequently do just that. Rather than putting in the full name of a company, its shortened because it’s familiar.  For example, L.L. Bean shortened to Bean’s /Beans (L.L.Bean also falls into the category Names with Punctuation).

Abbreviations: Similar to acronyms, abbreviations are short hand for longer words.  So rather than writing out the full word such as “evaluation” the user writes “eval”.  Client become clt. (or pt.). Many words may have multiple versions of an abbreviation such as appointment.  Would you abbreviate it as appt?  apt? If you schedule multiple, is it  appts? What if they live in an apartment (apt.)?

There’s also names with punctuation which adds a twist.  I always ensure that a business which uses punctuation has it correctly in the dictionary, but this doesn’t prevent people from recommending variations (especially names that seem like they should have hyphens such as Walmart/ Wal-Mart).  Other names with punctuation include, O*Net, Asperger’s /Asperger syndrome (both seem to be acceptable) and L.L. Bean.  

Lastly, you have those which hit the jackpot of issues: Lou Gehrig’s disease (proper name and punctuation) or should that be ALS (initialisms) or Amyotrophic Lateral Sclerosis?


Photo courtesy of Perosha
Related Posts Plugin for WordPress, Blogger...