8(1), November 1990, pages 7-22

Computers and Writing Assessment:
A Preliminary View

W. Webster Newbold

Increased attention to the forms and purposes of writing assessment has roughly paralleled a rapid expansion in the use of computers in writing instruction. However, these trends have not resulted in investigations of computers and writing assessment operating together. Research and publication on writing assessment has tended to deal with the important questions of the nature of assessment and the purposes and legitimacy of various testing methods. The emphasis in research in computing and composition instruction has centered on the machine as a "partner" in the writing process, as an "instructor" in CAI, as a "tool" in creating and processing text, and as an "environment" fomenting collaborative learning (see Hawisher, 1989). Because so much interest has been generated in both computing and assessment, and because computers are already involved in test delivery and scoring, I think it is time to consider whether computers can make real contributions to the broad enterprise of writing assessment. I believe they can, both as instruments of some forms of traditional language-skills assessment and as components of an environment that enables us to broaden and deepen the concepts and practices of writing assessment.

It may be helpful to begin with a clarification of what is usually meant by "assessment," followed by a definition of terms and a look at the current state of writing assessment. In its narrowest sense, assessment means testing, which may be thought of as the evaluation of some product(in this case a sample of writing or a set of answers to questions) in order to assess the writer's level of knowledge about or skill in composing. The major purposes of testing writing have been for predicting success in future courses [for example, the College Board's Test of Standard Written English (TSWE)], for placing incoming students into appropriate courses, for evaluating (and hence justifying) writing programs, for certifying minimum competency ("exit" exams), and for evaluating a student's general proficiency or progress in a course or courses. In its broader sense, assessment can embrace the evaluation of processes as well as products, with the goal often being to discover how the writer goes about writing and what cognitive strategies or patterns may be involved. The most sophisticated and ambitious model of assessment examines both processes and products, as we shall see in the following pages.

Several other terms are relevant to understanding some of the main features of assessment: direct and indirect testing, norm-and criterion-referencing, test validity and reliability.

The modern era of testing writing, and of controversy about testing writing, began with the decision of Harvard University in 1873-74 to require standardized essay-based entrance examinations. Behind this move was the desire to make the admissions process more "objective" and, hence, more fair to the candidates, as well as easier for the university to administer (Lunsford, 1986). Harvard's decision was severely criticized by many educators (especially by the University of Michigan's Fred Newton Scott, founder of the National Council of Teachers of English) who believed that standardized tests mandated by powerful institutions were not the best way to achieve educational reform; nevertheless, the Harvard program "set off a chain of events that led inevitably to increased standardization and reflected the growing desire to find more and more objective means of assessing student performance" (Lunsford, 1986, p. 1). Assessment by means of writing samples (i.e., "direct" testing) seemed the most appropriate and objective means at that time because it contrasted sharply--in terms of reliability--with oral examinations, which had been the principal evaluative mode in Western higher education since the middle ages (Lunsford, 1986). Moreover, written examinations were favored--and still are--because their direct skill sampling lends legitimacy to skill in composing, a skill on which the newly-fledged discipline of English was founded.

After the Second World War, as the number of students rose dramatically, still more objective and efficient means of testing presented themselves in indirect instruments that were usually multiple-choice in format and easily scored, especially by machine. These "objective" tests (i.e., tests that are objectively scored) were driven by psychological and statistical theories that indirect measurement of a skill could measure the skill itself with validity and reliability (for a summary of recent theory on testing, see Weiss, 1983); this view is represented in writing testing, for example, by having test takers identify grammar and usage errors or syntactical choices in multiple-choice questions. In an era of increasing specialization and scientific advancement, testing of all kinds became the province of experts in "educational measurement." Such indirect testing became popular, especially with schools and college entrance bodies, where relatively cheap, easy-to-administer tests further offered the advantages of reliability and national norming under an aura of scientific accuracy.

But while indirect tests and measurement specialists held sway over most of the assessment territory, composition teachers were attempting to reclaim the high ground and establish writing assessment as a function of writing instruction. They convinced the College Board in the 1950s to reinstate a writing sample in the Entrance Examination battery so that the tests' results relating to English composition would have more face validity among English teachers (Godshalk, Swinefore, & Coffman, 1966).

Since the 1960s, a series of influential proponents of direct, essay testing of writing have been engaged in a broad offensive. The evaluation of writing samples employing various scoring philosophies and techniques analytic scoring (e.g., Diederich, 1966), holistic scoring (e.g., Cooper, 1977; White, 1985, 1989), primary-trait scoring (e.g., Lloyd-Jones, 1977), and performative assessment (Faigley et al., 1985) has been established as a valid, reliable, and even cost effective means of assessing writing.

Computers have been used in testing for many years, usually in the service of indirect methods. Standard objective test designs have proven adaptable to computer-delivery without substantial alteration; having question responses entered in digital form has meant quick scoring and efficient storage and computation of results (although care must be taken to preserve the statistical reliability of such tests; see Sarvela & Noonan, 1988). But the potential of the computer as a more intrinsic evaluation instrument was apparent to some researchers early on. In efforts such as Project Essay Grade conducted by Ellis Page and Dieter Paulus (1968) at the University of Connecticut, and MIT's General Inquirer content analysis project by Philip Stone and colleagues (1968), computer technology was applied to the comprehensive evaluation of essays and other texts. By using statistical or "actuarial" techniques to analyze syntactic, semantic, and mechanical features of high school students' essays, Page and Paulus achieved a good deal of success in predicting the grades human raters would likely attach to the texts. These researchers assumed that the theoretical and practical groundwork for computer grading of essays had been laid. Believing that he was on to something big, Page wrote at the end of his report, "Surely the computer analysis of language will become a permanent feature of the educational scene" (p. 198).

Why it has not is an important question. One reason arose from political factors outside the vision of Page and Paulus and other pioneers. While educational measurement specialists (such as Page and the staff of the Educational Testing Service) pursued their statistically-based assessment agendas through the sixties and seventies, the discipline of composition developed and assumed an important position in the national effort to improve student writing. Compositionists fought for recognition of the importance and complexity of writing as a cognitive tool and a means for individual expression; such values naturally came into conflict with the practice of standardized, indirect, multiple-choice testing, and led to the movement for direct, writing-sample testing described earlier. The extensive involvement of computers in standardized testing tended to tar them with the same brush that blackened the whole objective-testing enterprise.

A more serious problem with using computers for actuarial writing assessment grows out of the differing goals sought by test specialists and writing teachers. Test specialists want the big picture that statistics can give them; their goal is to look at large test populations and discover how individuals and classes fit in. These specialists also want to show tendencies within these populations and frequently wish to verify that the group has made progress because instruction has been effective. In testing, assessment results in a graded or scaled rating can be used to achieve these goals. Writing teachers, on the other hand, tend to use assessment as a means of teaching individuals how to improve their personal writing process. Thus, even given the likelihood, for example, that Page's essay-grading program can assign grades as accurately as human readers, grading itself, by machines or humans, is not the main goal of most teachers. Moreover, the single grade or rating produced by tests tends to become a driving force behind instruction, and teachers have usually resented and resisted the pressure to "teach to the test."

Finally, a major limitation in direct computer assessment of language has been that programs have tended to deal with aspects of language that can be processed algorithmically, and these tend to be surface features such as spelling, punctuation, subject-verb agreement, and other aspects of text-as-product. Such surface features do not yield much information useful to writing teachers who have become increasingly interested in evaluating deeper components of writing such as purpose, audience, and revision processes. Researchers like Page and Stone, with large expense of time and money, have indeed made valiant if misdirected efforts to elevate computer technology toward a simulation of general literacy; but the most common uses of computers in writing assessment have remained the administration and scoring of traditional kinds of indirect tests, usually multiple choice or fill-in-the-blank, rather than the support of direct evaluation of writing samples.

It is unfortunate that computers have been burdened with such negative associations among writing teachers, because these powerful tools do, I believe, have a valuable contribution to make to testing when they are applied within a responsible, educationally sound context. Computer-based tests that have limited, local purposes seem most likely to be well integrated with programmatic and instructional goals. One such purpose is placement of new students into an existing curriculum. Test program algorithms that have been developed to estimate a level of expertise in a subject or skill can be quite useful here. For example, the College Board now offers a placement-test program which uses microcomputers as the delivery system and adaptive testing as the main assessment strategy. Adaptive testing, or "tailored testing," is a method of presenting test questions based on the test taker's pattern of response. That is, when questions are answered correctly, the testing program follows up with more difficult items; if questions are answered incorrectly, the program offers easier ones. The test continues until the test taker cannot answer any more questions, at which point the final question theoretically represents his or her ability level within the total range of difficulty. Programs of this type can allow the local administrators to set the parameters for difficulty and to interpret the results according to their curriculum. Obviously, since each item must be scored before the test can proceed, the delivery of an adaptive test is nearly impossible without the aid of computers.

Another way that computers can be used to make objective tests more responsive to teachers' purposes would be through "nondichotomous" judging of phrase or sentence-level input. Most indirect tests pose questions so that their answers are "dichotomous"--either right or wrong with no intermediate possibilities. Many composition teachers have justifiably felt that this practice does an injustice to normal language use where several equally correct stylistic or grammatical alternatives exist for any given message. A new method of computerized testing could match the pattern of an input answer with possible alternatives or "target strings" built into the program, allowing for various alternatives to be given credit and possibly even for different levels of credit to be assigned according to the choice made by the test taker. Answers might be entered as whole sentences because the program could exclude all words or character strings except those necessary for attempted matching with the target strings.


At Ball State University, where I teach, I have been working on a language-skills test that incorporates a limited version of the above strategy within an experimental test based on sentence combining. The test's purpose is to support our basic writing program by providing one means to judge students' pre- and post-levels of achievement as well as the semester's total average change in skill level across the basic writing population. Teachers use a variety of measures of students' ability to arrive at the credit or no-credit course grade, including portfolio-based essay evaluation. The experimental strategy described below is coupled with a longer computer-delivered, multiple-choice test that was generated from sentences composed from previous generations of Ball State students. Thus, our total computer testing component has been conceived and developed to serve our local basic writing program as part of a whole evaluation process.

In the experimental test, students are given a pair of sentences whose subject matter is related in either time, order, causation, or opposition; one of these relationships is most strongly implied. Students first identify the primary relationship between the sentences, then write a compound sentence that joins the two with the most appropriate conjunction or conjunctive phrase. For example, one sentence pair is

Ron had nearly completed his work. Melissa hadn't even begun.

Because the primary conceptual relationship between the two is opposition or contrast, answer routines search for significant connecting target strings such as but, although, or yet. Errors and variant or optional structures in the contextual sentence are ignored (such as finished and the omission of nearly). The following sentences would be given credit:

Ron had completed his work, but Melissa hadn't even begun.

Ron had nearly completed his work although Melissa hadn't even

begun.

Although Melissa hadn't begun, Ron had completed his work.

Ron had finished his work, however, Melissa hadn't even begun.

but these would not:

Ron had completed his work and Melissa hadn't even begun.
(and does not indicate opposition)

Ron had completed his work since Melissa hadn't even begun.
(since suggests causation, not opposition)

Ron had completed his work when Melissa hadn't even begun.
(when shows a time-order relationship, which is possible but not primary)

With the above type of testing strategy, the student gets practice in thinking about language relationships, and the value of composing skill, albeit still at the sentence level, is reinforced. The aims of the test are limited but attainable and supportive of the larger process of composing in ways that simple multiple choice tests are not. I believe that applications such as this are indeed a responsible way to incorporate computers into testing, and to exploit some of their strengths-- such as efficiency of scoring, statistical-processing power, and data-storage capability--while developing more sophisticated and useful methodologies and programs for writing assessment.

The preceding discussion has attempted to bring into focus the emphasis placed on assessment as grading and testing, and the roles computers have played. But this conventional understanding of assessment limits its usefulness in teaching by directing attention to products rather than processes, and by promoting grading and ranking as a primary goal of education. Sometimes such assessment is necessary and/or desirable; however, when it is more broadly conceived as a means to understand the larger, multi-dimensional learning process, "assessment" takes on meaning beyond testing, grading, and ranking. This expansion of the assessment concept also suggests that the applications for computers should extend to a wide range of functions beyond the delivery and scoring of tests.

A promising model that diversifies the concept of writing assessment has been advanced by researchers at the University of Texas at Austin and elaborated in assessing writers' knowledge and processes of composition (Faigley et al., 1985). Faigley and colleagues offer a technique called "performative assessment," which seeks to evaluate students' written products with criterion-referenced essay scoring, conducted by trained raters and guided by a set of rubrics specifically related to the rhetorical task demanded in the assignment. These tasks include argumentation using induction, deduction, and classification; they call for constructing an hypothesis and addressing a specific audience. The use of rubrics in criterion-referenced scoring makes the results of the assessment helpful in judging individuals' weaknesses in composing as well as their relation to other writers in the assessment population. Moreover, performative assessment is designed to take place in "portfolio" mode over the course of a semester or longer, avoiding the problem of one-time, single-draft sampling that has been argued to be a weakness in validity for holistic assessment and essay testing generally (Conlon, 1986; Godshalk, Swineford, & Coffman, 1966).

At the base of the performative assessment model is the assumption that the composition classroom defines a discourse community which provides a complete context for writing. Within it,

students learn how to read and write and how to talk about reading and writing. They learn to use codes that are intelligible for discourse communities beyond the classroom. (Faigley et al., pp. 90-91)

They learn to produce messages with definite forms that move through accepted channels and that say something about something.

The class-as-discourse-community model of Faigley and others (see Mosenthal, 1983) not only provides the rationale for assessing the various products and processes that constitute it, it is itself the environment for conducting assessment. Not only essays and composing strategies, but the whole culture of the classroom needs to be studied and assessed. Thus, Faigley and his colleagues set out an assessment model that takes the whole class environment into account, responding to the three main approaches to writing generally acknowledged among teachers and researchers: literary (or text-based), cognitive, and social. Corresponding paths along which assessment might proceed--that is, evaluation of texts, assessment of individuals' processes, and ethnographic investigation of the class as a community--accommodate these theories and unify current practices toward a common goal. The following discussion of each of these areas provides perspective on the use of computers to assess students' skills and knowledge on the textual, cognitive, and community levels.

Although the application of computer programs to textual assessment is widespread, it is fraught with controversy. Writers almost universally rely on spelling checkers and frequently on programs that flag "problems" in grammar and style, but teachers are uneasy about these programs' facile and often inaccurate treatment of surface features of the text. As David Dobrin (1989) has recently emphasized, computer-based analysis of text is full of traps that might ensnare the careless user. Dobrin comes down particularly hard on the Style and Diction sections of WRITER'S WORKBENCH because they mislead students into thinking that the computer is really evaluating their style and diction when, really, it is only manipulating symbols according to a simplistic scheme programmed by humans who know little about these aspects of student texts. Moreover, Style, Diction, and other "linguistic aids" assume the user has the knowledge to correct the flagged "errors" or to choose the "appropriate" words when almost always he or she does not.

Dobrin's criticisms of the inadequacies of WRITER'S WORKBENCH and similar programs are well taken; no sane person should rely totally on style checkers for the assessment of text. But I think Dobrin misses the heuristic value of such programs when they are used in appropriate instructional contexts. Kiefer and Smith's (1983) initial use of WRITER'S WORKBENCH in a composition course placed it firmly within the course's context of sound teaching. Teachers who supply concepts of statistical text analysis along with numerical ranges of performance that might be expected, or even actual patterns from the class's writings, allow their students to conduct self-assessment and provide them with goals for improvement. This strategy may be most applicable in advanced composition courses; at the first-year or basic writing level, teacher or peer tutoring can be assisted by the use of text analysis programs. In this context, tutors can act as interpreters of the program's results, explaining more fully the often cryptic remarks programs make to justify their flagging of an error or questioning of word choice. I have used the checking routines that drive the VAX GRAMMAR CHECKER and micro-based CORRECT GRAMMAR programs to tutor students in the identification of sentence boundaries. Unlike WRITER'S WORKBENCH, these programs flag possible fragmentary or run-on sentences. Although they seemed to be "correct" only about half the time, I could discuss with the student each sentence flagged and reinforce or increase his or her knowledge of standard sentence patterns in spite of the program's lack of linguistic sophistication.

Why must these kinds of text assessment take place on a computer? They can of course be done with paper and pencil, but that would fail to take advantage of the power of the computer as heuristic tool. In addition, the opportunity for cooperation between student and mentor would be missed. Dobrin (1989) acknowledges, and many more of us have recognized, that the supportive relationship formed between teacher and student as they both work on displayed text is one of the most important advantages of computer-mediated instruction. As better programs become available especially in the area of well-informed and helpful feedback to users--text-analysis software will continue to make a contribution to the text-based dimension of writing assessment (see Collins, 1989, for a discussion of several of these programs and the description of a teaching-oriented style checking program under development at SUNY-Buffalo, WRITING TEACHER'S TOOLBOX).

Another promising use of computers for comprehensive writing evaluation can be seen in programs that enable researchers to gather information in order to study how individuals compose. This type of investigation has been popular since the ground-breaking work of Linda Flower and John Hayes theoretically described fundamental cognitive processes by using protocols that captured writers' thoughts expressed aloud on tape as they composed . The gathering of records on writers composing by computer, however, offers a further advantage in that the writer's process is uninterrupted by the oral expression of thought. Thus, as software is available to preserve and reproduce each keystroke, the researcher can analyze how the text came into existence and the writer can see and re-experience his or her composing.

Such an application of computers to writing research and evaluation has been attempted in several recent and ongoing research projects (Hawisher, 1989). With assistance from their computer center staff at the University of Minnesota, Sirc and Bridwell-Bowles (1988) designed and implemented a terminate-and-stay-resident program to capture all keystrokes from a WORDSTAR editing session, including time intervals between strokes. The record was then put to several uses: played back for the writer, it served as the basis for a retrospective oral protocol without having interfered with the original writing process at all; it allowed researchers to study what actually happened during the emergence of a text through subsequent stages; and it allowed the development of a teaching strategy that modeled revision dynamically by showing it happen on the screen.

Another, more powerful scheme for studying and evaluating writers' cognitive processes is under development by John Smith and colleagues at the University of North Carolina (Smith & Lansman, 1989). Smith is developing three programs as part of his WRITING ENVIRONMENT (WE) system that keep a record of each action the writer takes in planning, organizing, writing, or editing a text. One program gathers the raw protocols. Another replays the session to collect retrospective protocols from the writer. The third parses all protocols with a "grammar" that categorizes them in terms of a symbol system describing the writer's activities in five levels of abstraction, ranging from keystrokes at the lowest level to broad cognitive functions (for example, exploring or organizing) at the top. The parsing program is especially promising as a tool for analyzing process because it reduces the large volume of collected data to manageable proportions. It also enables consistent logging of a whole group's data and study of individual writers over time.

These recording techniques have significant potential for building our understanding of writing by giving researchers access to a wide range of processes. Right now, this use of computers can empower individuals to assess their own writing, with or without the aid of teacher or tutor, through means that go beyond mere text analysis. Eventually, an individual's electronic composing record may serve as the basis for an assessment, at least in part, of her achievement in a writing task.

Thus far, we have seen that computers are being used to assess writers' knowledge by focusing on individuals' texts and the cognitive processes that produce them. The third dimension of the model proposed by Faigley et al. (1985) involves assessment of the whole class as a social context for writing. In order to discover more about how writing .lasses work as discourse communities, Faigley endorses a research approach based on ethnographic study, that is, the open-ended collection and study of information about a group of writers as a "culture" in order to discover how and why its members communicate. Understanding how writers express and build meaning within their group amounts to assessing a variety of texts in many ways, with a broadly descriptive purpose. Ultimately, Faigley's whole model aims at constructing a teaching approach out of ongoing assessment of the classroom community. This kind of research-based teaching is strongly supported by a networked computer writing environment, one of the most recent and promising developments in computer technology applied to writing instruction. Networks assist this process in several ways.

In research-oriented assessment, ethnographic and textual data such as journals, questionnaires, and essay drafts can be easily gathered through the network's facilities for file transfer and communication, especially e-mail (asynchronous communication). In networks fully featured with real-time conferencing systems (synchronous communication) [1], group interactions can be studied both first hand, as group conferencing is taking place, and textually, through the recorded transcripts of conferences. A major difference in the networked classroom is that the researcher would be able to observe unobtrusively, as a participant on the network, in a classroom where much of the activity takes place on-line. Also, most of the student data from the class will be in text rather than oral form, which both encourages written expression and does away with the need for cumbersome transcriptions.

In teaching-centered assessment, full textual communication via network means that students' writing is readily available to others for any purpose, such as reviewing and commenting on topics, prewriting, and drafts. My own general composition students have used comments made by peers and by me for assessing the success of their drafts and for planning revision; network-based handling of texts has made that process more efficient and attractive than exchanging handwritten drafts. Synchronous conferencing adds another dimension of assessment options. For example, using synchronous conferencing as a medium for class discussions of short stories in literature-based composition classes, I have been able to gather a much better impression of the extent of the whole class's understanding of characters and themes; all students generally participate in on-line discussions, in contrast with typical literature discussions in conventional settings where only a few--and the same--students are normally heard from. In addition, by virtue the Daedalus system's functions, I have a printed transcript of the network session, which allows me to later review a particular discussion in detail.

Other conferencing activities provide various assessment opportunities. Sentence combining exercises, which my classes have done in small subgroups, let me see students' syntactical strengths or weaknesses while the INTERCHANGE program affords students practice as they generate sentence forms on-line. "Collaborative argumentation," that is, the collaborative invention and negotiation of arguments for or against a proposition, allows me to find out quickly which students are grasping the process and which are not, so that I can work individually with those needing help.

Perhaps the most novel and exciting dimension of networking in writing instruction is the encouragement it gives to self assessment. When writing for a teacher alone, the habitual goal of student writers (especially first-year students) has been to produce texts free from spelling and grammar mistakes and imbued with that ineffable quality that strikes just the right chord with the teacher: the infamous "what the teacher wants" ingredient. No wonder students are generally mystified at how their writing is assessed and cannot appropriate the process for themselves in effective revision. But when they get a sense of writing as communicating with people like themselves, which is fostered by network-based activities, they begin to be able to assess their writing more realistically and successfully. They can see, immediately in some cases, the effect a statement, a paragraph, or an essay has had on others and can adjust their ideas and words to improve the effect Teachers can stay involved with this self-assessment and/or revision process, but in network contexts. especially through synchronous conferencing--we can be a much more real audience because we are in a dynamic, peer-like relationship with all participants. Our presence is first known as a fellow discussant or a kind of electronic coach, not as a "corrector" and "grader." This kind of self-evaluating, audience based revision process is one that many of us already try to develop in our students, but I have found that a networked computer-writing environment strongly supports it by enabling writers to encounter more vital audiences while downplaying the false audience of the teacher-assessor.

Finally, one of the most significant aspects of computer networks for assessing and teaching is their comprehensiveness. Individual machines, or "nodes" on the classroom network, can still operate as they have done, running programs for individual users, or they can all run the same program, putting all nodes in touch with each other. For assessment, this implies that all dimensions of the model supported by computer-based technique the text-based, cognitive, and social--can work together in optimum fashion in the full-featured network environment. Characteristics of individual texts, protocols of cognitive processes, and records of class interactions can all be generated, studied, and assessed; only the organizing purpose and plans are needed to coordinate and optimize teaching and assessment activities.

We have seen, then, that computers already support aspects of the writing assessment enterprise, although the technology might be applied far more extensively as the concept of assessment expands. Computers are pervasive in indirect testing, where distributing, collecting, scoring, and keeping track of large amounts of data depends on their speed and capacity. But in spite of advancements in computer-based testing techniques, the kind of testing computers can do, within a multiple choice or short answer format, has limited usefulness or appeal for most writing teachers and students. Current trends in writing assessment, responding to the awareness of the importance of process in composing, have begun to supply us with assessment techniques and purposes more congruent with our teaching goals; we are now better able to reinvest the results of assessment back into teaching and research. Through text-analysis, electronic protocols, networking, and other functions, computer technology has begun to supply us with significant tools to broaden and deepen the practice of writing assessment as its theory continues to evolve.

W. Webster Newbold teaches at the Department of English at Ban State University in Muncie, Indiana.

Note
  1. Several programs are available that support synchronous conferencing, among them REALTIME WRITER from Realtime Learning Systems, 2700 Connecticut Ave., Washington, D.C. 20008-5330, and INTERCHANGE from The Daedalus Group, Inc., 300 E. Huntland Dr. #222, Austin, Texas 78752. In the English Zenith/Novell Computer Classroom at Ball State, we have installed the Daedalus Instructional System, six programs, including INTERCHANGE, that support many phases of the writing process.

References

Collins, J. L. (1989). Computerized text analysis and the teaching of writing. In G. E. Hawisher & C. L. Selfe (Eds.), Critical perspectives on computers and composition instruction. New York: Teachers College.

Conlon, G. (1986). "Objective" measures of writing ability. In K. L. Greenberg, H. S. Wiener, & R. A. Donovan (Eds.), Writing assessment: issues and strategies (pp. 109-125). New York: Longman.

Cooper, C. R. (1977)~ Holistic evaluation of writing. In C. R. Cooper, & L. Odell (Ed.), Evaluating writing: Describing, measuring, judging (pp. 3-30). Urbana, IL: NCTE.

Cooper, C. R. (1977). Primary trait scoring. In C. R. Cooper, & L. Odell (Ed.), Evaluating writing: Describing, measuring, judging (pp. 3-30). Urbana, IL: NCTE.

Diederich, P. B. (1966). How to measure growth in writing ability. English Journal 55, 435-449.

Dobrin, D. (1989). Writing and technique. Urbana, IL: National Council of Teachers of English.

Faigley, L., Cherry, R. D., Jolliffe, D. A., & Skinner, A. M. (1985). Assessing writers' knowledge and processes of composition. Norwood, NJ: Ablex.

Godshalk, F. I., Swineford, F., & Coffman, W. E. (1966). The measurement of writing ability. New York: College Entrance Examination Board.

Hawisher, G. E. (1989). Research and recommendations on computers and composition. In G. E. Hawisher & C. L. Selfe (Eds.), Critical perspectives on computers and composition instruction. New York: Teachers College.

Kiefer, K. E., & Smith, C. R. (1983). Textual analysis with computers: tests of Bell Laboratories' computer software. Research in the Teaching of English, 17, 201-214.

Lloyd-Jones, R. (1977). Primary trait scoring. In C. R. Cooper, & L. Odell (Ed.), Evaluating writing: Describing, measuring, judging (pp.33-68). Urbana, IL: NCTE.

Lunsford, A. A. (1986). The past--and future--of writing assessment. In K. L. Greenberg, H. S. Wiener, & R. A. Donovan (Eds.), Writing assessment: issues and strategies. New York: Longman.

Mosenthal, P. (1983). On defining writing and classroom writing competence. In P. Mosenthal, L. Tamor, & S. Walmsley (Eds.), Research on writing: Principles and methods (pp. 26-71). New York: Longman.

Page, E. B. & Paulus, D. H. (1968). The analysis of essays by computer. Washington, D. C: U.S. Department of Health, Education, and Welfare, Office of Education, Bureau of Research.

Sarvela, P. D. & Noonan, J. V. (May 1988). Testing and computer-based instruction: Psychometric considerations. Instructional Technology, 28, 17-20.

Sirc, G, & Bridwell-Bowles, L. (May 1988). A computer tool for analyzing the composing process. Collegiate Microcomputer, 6, 155-159.

Smith, J. B. & Lansman, M. (1989). A cognitive basis for a computer writing environment. In B. K. Britton & S. M. Glynn, (Eds.), Computer writing environments: Theory, research, & design (pp. 17-56). Hillsdale, NJ: Lawrence Erlbaum.

Stone, P. J., Dunphy, D. C., Smith, M. S., & Ogilvie, D. M. (1966). The General Inquirer: A computer approach to content analysis. Cambridge:

Weiss, D. J. (Ed.). (1983). New horizons in testing: Latent trait test theory and computerized adaptive teaching. New York: Academic Press.

White, E. M. (1985). Teaching and assessing writing: Recent advances in understanding, evaluating, and improving student performance. San Francisco: Jossey-Bass.

White, E. M. (1989). Developing successful college writing programs. San Francisco: Jossey-Bass.