Judging and Adapting Style-Analysis Software

8(2), April 1991, pages 17-30

Judging and Adapting Style-Analysis Software

Gordon P. Thomas and Dene Kay Thomas

A naive observer considering the amount of time that writing instructors spend grading papers might conclude that computer programs that analyze texts, usually referred to as style-analysis software, could relieve much of the drudgery caused by surface problems. As Wallraff (1988) describes, specialized style-analysis programs have become increasingly popular in business. The software itself seems highly developed: for computers operating in MS-DOS, there is PUNCTUATION & STYLE and GRAMMATIK m (see Crew, 1988, for a view of GRAMMATIK; for UNIX-based systems there is STYLE (see Smye, 1988) and WRITER'S WORKBENCH (Day, 1988; Reid & Findlay, 1986; Kiefer & Smith, 1983, 1984). Most programs consist of several subprograms that work either by gathering statistical data about a text or by flagging certain phrases and making a comment designed to help the writer edit the text.

It is the second type of program, specifically an adapted version of PUNCTUATION & STYLE, that we explore. Day (1988) describes Diction, the subprogram of this type used in WRITER'S WORKBENCH as,

Diction locates and displays certain incorrect, wordy, commonly misused, and possibly sexist expressions and displays each sentence in which they occur; Diction also displays a able of Substitutions' suggesting alternative expressions. (p.66)

To avoid confusion, we will refer to these expressions as "phrases" and to the alternative expressions associated with each of them as "suggestions." We will call the part of the program where these phrases and suggestions are stored the "phrase dictionary." Most programs make it possible to expand the list of phrases that the program searches for and to edit its suggestions.

This study has a dual purpose: (1) to distinguish and analyze the differences between teacher and student perceptions of the comments made on student texts and (2) to test and tailor the adapted version of PUNCTUATION & STYLE. It is important to explore whether the students actually perceive the comments generated by this specially adapted version to be helpful in the context of their own writing. Because teachers are the primary source of writing instruction, and because a style-analysis program is supposed to supplement the interaction between student and teacher, it is also important to explore teachers' perceptions and to compare them to students' perceptions. In Lev Vygotsky's (1962) language, teachers are working with students in their "zone of proximal development" (p. 103)--the level they are able to reach with some assistance--which is beyond the level they are capable of reaching on their own, and the computer program attempts to supplement the process of the teachers assisting the students. In other words, to what extent can a computer program fit in with teachers' efforts to work in Vygotsky's "zone of proximal development," and give what both teachers and students would call helpful advice? Where differences exist, what are the reasons for those differences? From this kind of analysis we can move to the second part of the study, tailoring a style-analysis program to the writers in introductory composition courses.

One assumption underlying the study is that the better adapted such a program is to the kind of writing in introductory composition courses, the more useful the students and teachers will find the feedback from the program. For this reason, the second part of the study explores those types of suggestions which seem to be the most helpful and those which seem the least helpful. Knowing these characteristics will make it possible to write a phrase dictionary that will offer genuinely useful editing advice to student writers. Thus, this study seeks to demonstrate how style-analysis programs should be tested before being used by hundreds of students.

Although some style-analysis programs can detect such contextual details as whether the phrase begins or ends a sentence, the programs essentially work by blindly matching phrases in the phrase dictionary with phrases in the student's text. The only advantages that these programs have over capable teachers are speed and stamina. These advantages pale, however, when one considers that the comments the program makes can often be confusing or misleading.

To point out the absurdity of programs that confuse, mislead, or give wrong advice, Dobrin (1986) says that it would be just as efficient to create a program (he suggests the name RANDOM) that would randomly mark sentences throughout a student's text and tell the student to check that sentence again. Frequently (Dobrin guesses 40% of the time), the program would be correct in saying there was something wrong with the sentence. Dobrin's criticism ultimately is not that style-analysis programs are incapable of ever providing helpful advice, but that even when they are working well, they are dangerous because they serve to distract novice writers and their instructors from the more pressing task of considering "whether the thoughts make sense or are worth saying, or whether they are expressed well" (p. 30). For style-analysis programs to be worth using, they must be used in a way that is clear to both teachers and students in the context of the class and of the student's writing; and they must give helpful, appropriate advice as defined by both teachers and students.

Methods

Procedure
To create a phrase dictionary especially for students in our introductory composition courses, we modified PUNCTUATION & STYLE by adding phrases, discarding phrases, or altering the wording of the accompanying suggestions to make them clearer. First we collected on disk over 400 texts from students in introductory writing courses. We then ran these texts through the Phrase subprogram in PUNCTUATION & STYLE, which searched the texts for each one of the 733 phrases in its phrase dictionary. Upon finding what it considers an infelicitous phrase, the program would mark the phrase and suggest a change. Trying to put ourselves in the student's position, we considered the tone of the suggestion. Because PUNCTUATION & STYLE was originally intended for technical writers or business people, many phrases and suggestions seemed inappropriate for students in introductory composition courses. In each case, we decided whether to keep the phrase, discard it, or alter the tone or substance of the accompanying suggestion. Figure 1 lists some examples.

Phrase or word in student's text	Original Version	Modified Version

AS TO	ABOUT <OR> ON	Substitute ABOUT or ON or some other preposition. OR rephrase.
FOUND TO BE	<OMIT>	A very passive construction. Can you rephrase?
IN A MANNER WHICH	<AVOID>	Can often be avoided.
ULTIMATE	LAST	Are you sure you don't mean LAST or FINAL?

Figure 1. Examples of Computer Suggestions and Possible Revisions

We also read the student texts for word and phrase problems that PUNCTUATION & STYLE did not address and added 150 phrases that occur frequently in our introductory writing courses. These phrases represent our attempt to tailor the program to the needs of our students. (See Figure 2.)

AWESOME	Slang that is really too vague. Can you think of a more descriptive word?
HOWEVER	When substituting for BUT (at beginning of clause), semicolon usually precedes and comma follows. OK at beginning of sentence, but must be followed with comma.
OK	This is OK, but you might use ALL RIGHT for a slightly more formal tone.
SIZED	This should be hyphenated when used with another word before a noun.
WHICH IS	When used at beginning of sentence that is not a question, will always result in sentence frag. Should be joined to previous sentence with a comma.

Figure 2. Examples of Revisions to Computer Suggestions

After these revisions, the original phrase dictionary containing 733 items had been expanded to 830 items. Almost all the phrases were expanded slightly to be more readable; 150 of the phrases contained a fairly significant change from the original. We also eliminated a few inappropriate phrases.

The next semester 86 subjects from 15 different sections of introductory writing courses volunteered to participate in our study. Each of them submitted at least one text on disk. Seventeen of these students, whom we will refer to as the "trained" group, were in a section that had used the style-analysis program on two previous papers. The program was new for the others, whom we will refer to as "untrained."

These 86 papers were run through the modified Phrase subprogram. We then asked the students and teachers to rate each comment independently as helpful, unhelpful, or confusing.

Judging of Comments
The students rated each of the comments made on their own writing. Each comment also received independent ratings from three of the participating teachers. From these ratings, we derived a single teacher opinion, which was the opinion of at least two of the three teachers. We then compared the teacher opinion with the student opinion for each comment to determine the percentage of agreement between all the teacher opinions and all the student opinions.

To determine which of the 830 phrases in the phrase dictionary were used the most frequently, each phrase and the accompanying suggestion in the phrase dictionary were assigned a number between 1 and 830, which we called the phrase number. When the suggestions were used to make comments on a particular paper, the comments were numbered according to the order they appeared in the paper. Because each paper had a set of comment numbers and a set of corresponding phrase numbers it was possible to determine how many times a particular suggestion occurred in the comments. It was also possible to determine how both the students and the teachers evaluated each suggestion in the different contexts that it appeared.

To explore the nature of helpful suggestions, we selected the 15 phrases that had occurred the most frequently. Each of these was evaluated at least 13 times (and thus had received at least 52 evaluations--13 from students and 39 from teachers). We then ranked these suggestions by their level of helpfulness and looked for corresponding patterns.

Results and Discussion

Taken together, the students wrote a total of 68,365 words, an average of 785 words per paper. The program made 1,299 comments an average of 15 per paper. Of the 830 phrases in the phrase dictionary, only 192 of them were used. The top 10% of the phrases used most frequently comprised 65% of the total number of comments; the top 20% comprised 77% of the comments.

The teachers were more positive about the computer-generated comments than the students were. The teachers classified 67.6% of the comments as helpful, 31.8% as unhelpful, and 0.6% as confusing, while the students classified 54.4% as helpful, 36.7% as unhelpful, and 8.9% as confusing. In other words, the teachers found the comments to be helpful two-thirds of the time and almost never confusing, while the students found the comments to be helpful closer to half the time and confusing 9% of the time. Students familiar with the program were more critical of the comments, but they also received fewer comments on their papers--10.8 vs. 15.3.

Because simply tabulating student opinion and comparing it to teacher opinion does not indicate the true extent of student-teacher agreement, we also calculated the percentage of agreement on a comment-by-comment basis. The students agreed with the evaluation of the teachers in 60.5% of the cases, with little variation among them. Students and teachers each differed more with the other group than they did within their own group.

Although the program made 842 suggestions in the 59 texts, many of them addressed the same phrases. In fact, 491 or 58.3% of these comments were made using the same 15 phrases. Figure 3 lists these 15 phrases and the accompanying suggestions.

We closely analyzed these 15 phrases because they recurred the most often, not because they were the most helpful . In fact, teachers and students both ranked nine of the suggestions accompanying these phrases as helpful in more than half of the occurrences (82% by the teachers and 59% by the students). Six of the suggestions were ranked as helpful less than half of the time (21 % by the teachers and 38% by the students). Figure 3 lists the phrases and suggestions in the order that the teachers ranked them. With two exceptions, the teachers and students found the same phrases and suggestions to be helpful (the students usually found "BIG" to be helpful when the teachers did not, while the teachers rated 'THERE ARE" to be helpful when the students did not). Only four of these 15 phrases ("ALL OF," "VERY," 'THAT IS," "ONE OF THE") were in the original phrase dictionary of PUNCTUATION & STYLE; we added the others to better suit first-year writing students.

PHRASE	COMMENT

ALL OF	Can you simply use ALL? Or try deleting the phrase.
BEING	Your writing will usually improve if you re-phrase to avoid this word.
VERY	Can often be avoided.
THERE ARE	Can you rephrase to have a more active verb? Try substituting ARE.
THERE IS	Can you rephrase to have a more active verb? Try substituting IS.
THAT IS	Can often be avoided.
FUN	Re-phrase to be more precise.
ONE OF THE	Can you substitute ONE or A or THE?
SUCH AS	COMMA should usually precede. NO punctuation mark should follow. When used at beginning of sentence, will almost always result in sentence frag. In such a case, needs to be pined to previous sentence with a comma.
BIG	LARGE will make your writing sound better.
WELL	Should be hyphenated if followed by adjective before a noun ("well-known man"). Best to avoid at beginning of sentence, where it makes your writing sound too chatty and casual.
ITS	ITS is the possessive form. IT'S means IT IS. Are you using this correctly?
IT'S	ITS is the possessive form. IT'S means IT IS. Are you using this correctly?
HOWEVER	When substituting for BUT (at beginning of clause), semicolon usually precedes and comma follows. OK at beginning of sentence, but must be followed with comma.
WHICH IS	When used at beginning of sentence that is not a question, will always result in sentence frag. Should be joined to previous sentence with a comma.

Figure 3. The 15 Most Frequently Used Phrases

Our adapted version of PUNCTUATION & STYLE gave good advice two-thirds of the time according to teacher judgments and over half of the time according to student judgments. Although there are obvious limitations, the program is providing advice that is better than the simply random advice of which David Dobrin speaks. This suggests that a computer program can offer helpful advice with greater frequency as the phrase dictionaries are more carefully constructed. From this general finding, we will discuss each of our initial explorations in turn.

What are the differences between teacher and student perceptions of the comments made on student texts and what do those differences suggest?
Critics of style-analysis programs suggest that students may tend to accept uncritically the feedback from a computer program. People have a natural tendency, this argument goes, to believe that an impressive program is more intelligent than it really is. Certainly, some programming practices, such as having the program speak to the user in the first person or address the user by name--practices that are intended to make the program more user-friendly--encourage such misconceptions. This study suggests that such reactions are not necessarily natural tendencies, but are responses encouraged by the computer itself. The students who participated in this study were not sophisticated about computers, but they classified a substantial 45% of the comments as either unhelpful or confusing, 13 percentage points higher than did the teachers, who were very likely more sophisticated about computer technology than the students.
It is possible that the design of the study and the form in which the students received the data may have diminished somewhat the authority with which they would otherwise have invested the computer. But these results suggest that the concern over students' uncritical acceptance of computer-generated advice is not merited and, in fact, the greater concern may be whether students are willing to recognize good advice and use it in their writing. We suggest that there is a connection between students' overall perceptions of the helpfulness of these comments, as compared to the teachers' perceptions, and students' reluctance to revise their writing due to their sense of ownership of what they have written.
The teachers' overall stronger perceptions of the helpfulness of the comments also suggest that the teachers are relying more on their sense of larger patterns of what constitutes good writing, while the students, less experienced in the principles of good writing, are reacting more to the specific situation. But we need to be careful not to write off student perceptions simply because of their lack of instruction. Enough trouble with a specific comment may point to problems with the wording of the advice the program gives or even with the level of the comment. If students find a comment too troublesome, it is not benefiting them and may indeed be counteracting the helpfulness of the pattern the teacher sees. Good advice that is not communicated and understood is not really good advice. In evaluating style-analysis programs, we need to rely on both the teachers' general views, valuing their expertise, and on the student' specific views, valuing what they are telling us about our failure to communicate.
We also need to recognize the expertise of the teacher and to focus on providing the student with that expertise in the form of specific instruction in how to edit their writing according to the suggestions from the program. This need points to a limitation of our study, where the majority of students and teachers were first-time users of the program. We did use the program late in the writing process, when it could appropriately aid in editing, but we did not provide for nor control for specific instruction in how the program was to be used.
Fourteen of the 59 texts in this study were written by students who had used the feedback from this program on two previous papers. It seems likely that the program was by this point affecting their writing. The main difference between the trained and the untrained groups of students was in the teachers' rankings of helpful comments: 53% for the trained students and 71% for the others. This result suggests that the more students use such a program, the more they incorporate the advice and the less helpful the advice becomes. Such a conclusion is also borne out by the fact that the average number of comments on the papers of the untrained students was 28% lower than the number of comments on the papers of the others (10.8 instead of 15.3). We suggest that this is a pattern that users of style-analysis programs should seek. As students recognize and internalize good advice on style, it should be reflected in fewer comments from the style-analysis program, and this is what happened.
What do the combined teacher and student comments tell us about our adapted version of PUNCTUATION & STYLE?

First, we learned that it is indeed important to adapt a program to the needs of a specific group of users, considering their level, needs, and ability to assimilate. Of the 15 most frequently used phrases, 11 were phrases we added from our initial analysis of frequent problems in introductory writing courses. Second, we learned some of the features of a helpful phrase and its accompanying suggestion, features such as giving definite directives and mentioning specific alternatives whenever possible. And, through our detailed analysis of teacher and student reactions to specific comments, we were reinforced in our initial assumption that it was important to seek advice from both teachers and students in designing a style-analysis program.
The 15 most frequently used phrases provide good examples of what makes a phrase and its accompanying suggestion likely to be rated as helpful. The phrase "ALL OF" used 23 times in this study was one such phrase; others that the program included were "THEIR ARE" and "COULD OF." However, the group of phrases that will be helpful all the time is not large. Furthermore, in the 59 essays examined here, these phrases did not occur in as much frequency as the irritation they cause writing teachers would seem to suggest.
Another type of phrase gives information that clearly does or does not apply to the situation; one would expect student and teacher opinion to be nearly identical for these cases. The words "ITS" and "IT'S" are two such cases, but student and teacher opinion differed considerably (23% vs. 39% and 21% vs. 39%, respectively). The students and teachers were all told to classify a comment as helpful only if taking the advice would result in an improved sentence.
For suggestions like the ones that accompanied these two phrases ("ITS is the possessive form. IT'S means IT IS. Are you using this correctly?"), it was unclear what advice the program was giving. By checking the contexts in which this suggestion occurred, we determined that the teachers were marking this suggestion as unhelpful if the student was already using the phrase correctly. The students, on the other hand, seemed to compare the information in the suggestion not with what was on the paper, but what was in their heads. If they already knew the difference between "it's" and "its," they would mark the comment as unhelpful; if not, they marked it as helpful.
This tendency is shown even more dramatically in the ratings for "HOWEVER." The suggestion apparently applied in very few cases in the student writing because the teacher opinion found the suggestion helpful only 15% of the time. The students, though, found it helpful in 46% of the occurrences, most likely because it told them something they did not know. The question then arises as to whether suggestions in this category should be eliminated from the phrase dictionary because they were not often perceived as helpful. Apparently, some students found the suggestion for "ITS" or "IT'S' helpful because they were not sure they had used the form correctly, even if they had.
Perhaps the largest category of phrases and words that can be included in a phrase dictionary is designed to help students with straightforward diction problems. The majority of words and phrases in most phrase dictionaries consist of examples of inflated and pompous diction, the type of language that writers working in large institutions or bureaucracies fall into using. Writing students have some of these problems, too, but the instances are so rare that it appears students develop these problems only after they have had some experience with bureaucratic or institutional discourse. A phrase dictionary can also contain slang or informal expressions that can prevent a student's essay from establishing the tone suitable to academic writing. This study seems to indicate that students in introductory writing classes are likely to have more problems with expressions or phrases that are too informal than they are with inflated or pompous diction. The group of the 15 most frequently used phrases included "FUN" and "BIG," but it did not include any instances of inflated or pompous diction except for possibly "ONE OF THE."
Determining which of these informal phrases to include can cause difficulties. Teacher opinion classified the suggestion accompanying "FUN" as helpful 70% of the time (as opposed to the students' 51%), but it classified the suggestion with "BIG" as helpful only 37% of the time, while the students ranked it at 56%. It may have been that students found the suggestion for "FUN" ("Rephrase to be more precise.") too vague to be specifically helpful, while they found it much easier to respond to the more specific suggestion accompanying "BIG" ("LARGE will make your writing sound better.")
In the difference between the content of these two suggestions lies part of the explanation for what distinguishes the two groups of suggestions: specific suggestions versus general suggestions that ask the student to rephrase. Four of the nine most helpful suggestions (accompanying the phrases "BEING," "THERE ARE," "THERE IS," and "FUN") ask the student to rephrase the sentence to avoid the expression (see Figure 3). If the students lack the syntactic fluency to rephrase their sentences and alter the construction, they will naturally find the phrase unhelpful or confusing. The teachers, on the other hand, can probably think of several ways of rephrasing the sentences to improve them, so they are more likely to find the suggestion helpful.
Given the present state of computer technology, we are left with some limitations. On the one hand, we can work to make suggestions as specific as technology permits and omit general suggestions that simply ask students to re-phrase. Or, we can reinforce the connection between teacher and student working together to help students develop strategies for rephrasing problem sentences. We suggest that reinforcing the teacher/ student connection in such cases is the better approach. Specific suggestions seem less likely to be ranked as helpful by both teachers and students.
Another part of the explanation for the differences in the suggestions in these two groups is that the advice for three of the suggestions in the least helpful group ("WELL," "HOWEVER," and "WHICH IS") is valid only if the phrase occurs at the beginning of the sentence; two of these suggestions ("WELL" and "HOWEVER") attempt to anticipate two contexts. In evaluating whether to follow these suggestions, the reader must first ascertain whether the phrase is used in the particular context required by the suggestion. The suggestions accompanying 'ITS" and "IT'S" also require a particular context. Consequently, the type of suggestion that accompanies five of these six phrases appears to result in too many false alarms to be useful. One suggestion of this nature occurs in the most helpful group ("SUCH AS"), but it is ranked last in that group. A more sophisticated style-analysis program could of course determine whether the phrase was the beginning of the sentence; if PUNCTUATION & STYLE had that capability, it would have been possible to simplify some of these suggestions and their helpfulness rating would improve.
To conclude the explanation of what distinguishes these two groups of suggestions, we should note that two of the suggestions in the most helpful group suggest simply deleting the phrase ("VERY" and "THAT IS"), and two others suggest substitutions ("ALL OF" and "ONE OF THE"). Only one of the six least helpful phrases ("BIG") suggests a substitution, and it was ranked at the top of its group. It appears that suggestions most likely to be rated as helpful involve either a simple substitution or a deletion, or they encourage the student to rephrase altogether. The least successful suggestions are valid only in a particular context.
The good suggestions in a phrase dictionary are similar stylistically to what an instructor might write on a student's paper. A small class of suggestions will be easy to design and will nearly always provide useful advice. Another class of suggestions may involve advice that is correct when considered in isolation, but produces incorrect or awkward constructions in some situations (at the beginning of a sentence or after certain marks of punctuation, for example). Unless the program can detect these contexts, it is better to avoid such phrases. Most of the suggestions should consist of advice to rephrase the sentence to avoid a certain construction.

Implications
This study did not attempt to assess the overall quality of the students' writing. We considered comparing the quality of the original texts with the quality of those texts after the students had made their own editing changes. It would have been possible to compare those texts to the same texts that had been altered in ways that the comments classified by expert opinion as helpful suggested they should. However, a cursory glance at the program's comments on one student text suggests that even if a student were to make all the suggested editing changes, he or she would not produce a large change in the overall quality of the text.

Paradoxically, it is precisely because the comments are so trivial that we would argue that such programs can be helpful in teaching writing. When used intelligently, such programs can be used to make the comments so that instructors can concentrate on more important aspects of the writing. This does not mean that instructors can require students to use the programs and expect that confusion between, say, it's and its will clear up. The instructors may still need to discuss these small points with students (taking care, of course, not to overemphasize their importance), but the computer program can save them the time involved in marking them on student papers.

Teachers may be able to induce a kind of halo effect from the computer-generated comments: the attention that students devote to the portion of the text affected by the computer-generated comment may be carried over to other portions of their texts as well. In our use of style-analysis programs, we have noticed this effect often enough to feel encouraged by it. Used carefully and intelligently, with programs specifically designed for their users and with specific instruction in how to respond to suggestions, we feel that computerized style-analysis programs will enhance the teaching of writing. Without the care we describe, we see the capacity for harm. We need to maintain vigilance to keep the computer from controlling composition.

Gordon P. Thomas and Gene Kay Thomas teach in the Department of English at the University of Idaho.

References

Crew, L. (1988). The style-checker as tonic, not tranquilizer. Journal of Advanced Composition, 8, 66-70.

Day, J. T. (1988). WRITER'S WORKBENCH: A useful aid, but not a cure-all. Computers and Composition, 6(1), 63-78.

Dobrin, D.N. (1986). Style an analyzers once more. Computers and Composition, 3(3), 22-32.

Kiefer, K.E., & Smith, C.R. (1983). Textual analysis with computers: Tests of Bell Laboratories' computer software. Research in the Teaching of English, 17, 201-214.

Kiefer, K. E., & Smith, C. R. (1984).Improving students' revising and editing: The WRITER'S WORKBENCH system. In W. Wresch (Ed.), The computer in composition instruction: A writer's tool (pp. 65-82). Urbana, IL: National Council of Teachers of English.

Reid, S., & Findlay, G . (1986) . WRITER'S WORKBENCH analysis of holistically scored essays. Computers and Composition, 3(2), 6-32.

Smye, R. (1988). Style and usage software: Mentor not judge. Computers and Composition, 6(1), 47-61.

Wallrath, Barbara. (1988, January). The literate computer. The Atlantic, 61, 64-71.

Vygotsky, L. (1962). Thought and language. Cambridge, MA: The Massachusetts Institute of Technology Press.