Quite a bit of research went into developing Articulus into what it is now.

Continue reading to learn more about the research involved in selecting a
readability metric and automated synonym selection.

Readability Metrics

Readability of text is the problem of matching text to the reader’s ability to comprehend the text. Since the 1920’s, there has been interest in the use of quantification in the development of a reading curriculum [x]. The study of readability has emerged as the search for factors in reading materials that could be easily and objectively counted. Readability has been characterized as a reading grade level since the 1970’s. According to research conducted by Dubay (2004), factors that affect readability include familiarity of words, correct grammar, punctuation, spelling, simplicity of sentences, word tense, and visualization of text (bullet points, graphic elements, etc). Flesch-Kincaid (1975) and Coleman-Liau (1975) pointed primarily to word and sentence complexity, and suggested that these complexities were the best way to measure readability. Both Flesch and Coleman defined sentence complexity by sentence length, or words per sentence (Flesch, Coleman). Word complexity can be defined in two ways: letters per word or syllables per word, and the accuracy of using one or the other has been disputed. Coleman-Liau argues that counting syllables can be difficult and inaccurate because it is typically computed using vowels. Furthermore, it is easier for a computer to determine the length of a word rather than the content of a word (Coleman). Flesch-Kincaid argues that using letters is an inaccurate way to measure word complexity, as a large word doesn’t necessarily equate to a difficult word (Flesch). In addition, several paradigms have been developed that influenced the creation of metrics. A positivist paradigm [x] has characterized readability by the belief that reality can be measured objectively. This implied that reading level was related to features of the reading material itself. A different paradigm, called the interpretive paradigm [x], has been developed. This paradigm separates comprehension from reading level, placing emphasis for comprehension on the psychological and sociological dimensions of the reading process. Mental processes and socio-cultural influences feature heavily in comprehension.

Beyond measuring readability based on the content within the reading, readability of online webpages poses a completely challenge, as many online articles contain different fonts, images, videos, and advertisements that detract from a reader’s ability to extract meaning from an article. As a result of the difference between traditional reading and the different means of interacting with elements that online articles offer, students can find that contextually unimportant material poses as a distraction to their ability to focus on important material. However, these elements, when contextually relevant, may offer support that the reader wouldn’t have had in a traditional reading format, such as a textbook or novel (Corio). Therefore, in terms of readability of Web pages hosted on the Internet, there seems to be no end to anecdotal suggestions to Web page readability improvement. Very little research-based measures exist. The standard measures of readability can apply, but Web pages are complicated by the free form, magazine-style format of the pages.


In order to account for both letters and syllables when computing word complexity, the researchers derived the Carnahan-Gelbaugh metric, which computes the average score from the Flesch-Kincaid Grade Level score and the Coleman-Liau Index score. Flesch-Kincaid typically underestimated lower grade levels and overestimated higher grade levels, but Coleman almost always overestimated to a lower degree (Coleman). Therefore, given the variability of the Flesch-Kincaid metric, the Carnahan-Gelbaugh metric emphasizes the Coleman-Liau calculation greater than the Flesch-Kincaid Grade Level. Martin Cutts’ Oxford Guide to Plain English recommends that the average sentence length of a document should be 15-20 words, 25-33 syllables, and 75-100 characters. Long sentences and sentences with many polysyllabic words make a portion of text more complex to comprehend, as readers must be able to maintain all information gained throughout the sentence until the end (Nirmaldason). The Carnahan-Gelbaugh formula also takes some inspiration from the Coh-Metrix, which more heavily relies on cognitively-based indices for its formula. According to research conducted by Crossley et. al, linguistic variables impacted readability more than “surface” variables, such as number of letters and synonyms used in “traditional formulas” (Crossley, 475). The Coh-Metrix looked at syntactic complexity and word frequency as contributing to the complexity of a reading, which our formula tries to factor into readability grade levels by computing the number of complex sentences and the number of complex words—words that have more than 7 letters or more than 2 syllables (Crossley).

Therefore, the Carnahan-Gelbaugh Metric defines long words as words with more than five letters, complex words as words with more than two syllables, and complex sentences as sentences with more than 35 syllables and more than 60 letters. The Carnahan-Gelbaugh formula finds the total complexity score by adding the result of the above calculations and dividing the number by three to get an average complexity score. Typically, this results in a complexity score between 0-3, with a higher score correlating to a higher grade level; extremely difficult passages may get more of a score of ‘3’ and were above a twelfth grade reading level. This complexity score was then added to the result of the average scores of Flesch-Kincaid Grade Level and Coleman-Liau Index, as the average was approximately the complexity score lower than the expected reading level.

Synonym Selection

While adjusting readability involves analyzing the page to determine its level, it also involves accurately selecting synonyms that are better for the reader to understand. Since the 1940’s, there has been extensive research on accurate selection of synonyms, given the ambiguity of the English language and the inability of a computer to accurately understand linguistics, or perform natural language processing. The study of discerning the meaning for words in a given context is typically referred to as word sense disambiguation, which can be applied to accurately picking synonyms in the context of the research (Navigli). The solution to the problem of word sense disambiguation has been widely studied and there are several solutions, including external-based solutions that use dictionaries and thesauri, supervised machine learning, and unsupervised methods that don’t involve manually sense-tagged corpus (Navigli). The “best” method for solving word sense disambiguation has been argued, given the numerous solutions that are available, but often difficult to implement. As a result, different solutions are typically used for specific scenarios. According to a study conducted by Navigli (2009), the task of word sense disambiguation involves the selection of word senses, where a word’s sense is defined as a commonly accepted meaning given the context of a word (Navigli). Selecting word senses lends itself to errors, and there has been extensive research on how to solve the problem of word sense disambiguation, ranging from manually defining each individual word to mapping words as vectors in a complex graph, which has led to significant discoveries in artificial intelligence. Despite these breakthroughs, there is still much more to research regarding how to accurately compute a correct synonym for a word in a given context.

The researchers decided to utilize external knowledge sources, specifically Collin’s thesaurus and grade level sight word lists to determine synonyms. Grade level sight word lists are lists of words for each grade level up to grade nine that should be immediately recognized and understood by students in that particular grade (Pinnel). If the word to be replaced is found in the sight word list for the desired grade level specified by a user, it does not get altered. Otherwise, the word is searched in the Collin’s thesaurus. Collin’s thesaurus was chosen because the HTML paragraph elements were easy to pull synonyms from. The researchers then populate an array with the synonyms pulled from the paragraph elements. With the array of synonyms that the thesaurus returns, the researchers narrow the new word down to a word that does not have more letters or more syllables than the original word that was being replaced. If any of the synonym options are found in the sight word list for the user’s specified grade level, that synonym is chosen. The final word is then sent through a grammar check method that determines whether the new word has to be plural or match a tense of the original word, if it was a verb. However, the researchers found that despite the checks to ensure that the thesaurus didn’t replace with a more difficult word, it often picked synonyms that didn’t match the context of the sentence. Furthermore, shorter words are not always simpler, nor are all long words always complex. For this reason, the researchers compiled their own list of synonym for the 3,000 of the most common English words, according to Education First, an online resource for English and language education (Education First). The researchers then tested synonym choices on an application called Twinword Language Scoring API, which can evaluate the difficulty level of a word, sentence, or passage. Twinword uses word frequency and exam occurrences to determine word complexity[Twinword Inc.]. The API scored words between one and ten to match typical grade level assignments. If the grade level of the original synonym that Collin’s thesaurus selected was above 1 or above the original word, the researchers found a synonym word that tested less than the original word to replace it with manually. The result was a list of common words and easier synonyms that the researchers named the developer synonym list. The developer synonym list is the last list checked before the word is sent to Collin’s Thesaurus. In additional to contextual accuracy, the researchers faced the problem of grammatical accuracy. To account for grammatical accuracy, the researchers defined several rules that the English language strictly follows grammatically and came up with an algorithm for ensuring grammatical accuracy. First, the researchers have methods to determine whether the original word was an adjective, adverb, or preposition. These methods check the original word’s ending against common adjective or adverb endings or the entire word against a list of common prepositions. If any of these cases are true, then the new word does not go through any grammatical alterations. Otherwise, there are three main grammar checks that must happen. The first is if the original word ends with -s. If that is the case, then that word could be a plural noun or a present tense verb. The new word would have to be made plural or made present tense, respectively. The original word could also end with -ed, which would most likely mean it is a past tense verb. The new word would have to be changed to match the original tense. However, there are many irregular verbs, so the fix would not be as easy as adding -ed to the end of the new word. The researcher’s checks look at the ending of the new word to determine the best way to make it past tense. These checks only occur if the original word is passed through Collin’s thesaurus, as all of the words within the user synonym, developer synonym, and grade sight lists are already formatted for grammatical accuracy.

Click here to view the Developer Synonym List or the Developer Ignore List.