Evaluating Placement Exams with a Structured Decision System

9(2), April 1992, pages 71-86

Evaluating Placement Exams with a Structured Decision System

Emil Roy

In a 1983 survey by the National Testing Network in Writing (NTNW), more than two-thirds of responding institutions used writing samples to assess their incoming students' proficiency in composition. Half of these tests (53%) relied solely on writing samples. Researchers believe grades and multiple-choice writing tests fail to measure writing competence, arguing that,

Multiple-choice tests cannot measure the skills that most writing teachers identify as the domain of composition: inventing, revising, and editing ideas to fit purpose and audience within the context of suitable linguistic, syntactic, and grammatical forms. (Greenberg, 1986, p. xiv)

However, shortcomings have plagued placement testing (White, 1986):

Writing skill is often poorly defined;
Many testing programs ignore English faculty;
Exam specifications and topics are seldom pretested;
Readers may be poorly trained, selected, and paid;
Writers and teachers get little feedback from exam evaluation;
Few schools combine the results from writing samples with those from multiple-choice tests;
Some topics favor especially good writers while penalizing mediocre ones; and
Testers may borrow scoring guides unwisely.

Moreover, tests of writing are volatile, subjective instruments: Writers have little time or help with prewriting, composing, and revising. Students may need special knowledge or be required to analyze a text. The tests seldom give cues to help students shape their responses (Ruth & Murphy, 1988). In addition, even well-trained readers fatigue rapidly (Fader, 1986, p. 86). Many readers disagree sharply regarding student uses of diction, convention, or genre (Hake, 1986, p. 158). Finally, both the human and administrative costs of placement tests are high (Greenberg, 1986, p. xi). Whatever their acknowledged virtues, then, even rigorously controlled placement exams achieve dubious reliability and have substantial problems.

As these shortcomings indicate, present methods of rating replacement tests fit H. A. Simon's (1960) definition of an unstructured task. Trained readers must rate a 200- to 400-word essay in two or three minutes. Of necessity, they rely heavily on heuristics, logic, trial and error, intuition, and common sense for this task.

To date, only two studies, both of which were conducted at Colorado State University, have used a computerized style checker, WRITER'S WORKBENCH (WWB), to measure essay quality. Stephen Reid and Gilbert Findlay correlated WWB analyses with essay quality, as measured by graders' holistic scores (1986). James Garvey and David Lindstrom later used WWB analyses to compare student writing with professional prose (1989). A year later, at the University of South Carolina-Aiken, I demonstrated how RIGHTWRITER (RW), a computerized style checker, could improve both the students' and my performance in the classroom. My study indicates by using RW's comments and my own "macro messages" to guide revisions of their papers, students improved their awareness of genre, topic, and purpose. This improvement is reflected in better RW indexes for readability, sentence length, and strength.

I decided to use RW to design a computerized procedure that simulated the ratings of placement tests by trained readers. The theory of Management Information Systems (MIS) labels this technique a Structured Decision System (SDS). My purpose was to design a prototype SDS to replace impressionistic human evaluations of placement tests with automatic, reliable, and inexpensive ratings. Once created and validated, an SDS carries out decision-making processes with no role for human discretion or error.

My design of a prototype SDS had a specific objective related to the use of a computerized style checker in a college composition course. By using RW's stylistic indexes to rate placement tests, my SDS could place students in appropriate writing courses more accurately and efficiently than trained readers could. College writing programs could use the SDS to rate placement samples automatically without subjecting their faculty to onerous training and grading exercises. The RW analyses cannot interpret essay content, of course, or evaluate an essay's general effectiveness. However, RW's blindness to content cancels reader bias, and its precise measurement of stylistic flaws and virtues can be linked to traits such as fluency, completeness, and coherence.

Methods and Materials

Initially, I obtained a representative sample of 46 placement essays written by incoming first-year students at the University of Utah in the fall of 1990. The writing program at the University of Utah provides its placement test readers with written Criteria for Rating Placement Essays. These criteria base ratings of quality on the sensitivity with which student writers respond to audience, topic and purpose. Each fall, the Utah writing program asks about 2,000 incoming first-year students to respond to a set topic meant to assess their writing competence. Students are asked to write a division-and-classification essay supported by examples and reasons. They are given 45 minutes to describe a situation that disturbs them; they are also told to explain what changes they want to see made, and then draw conclusions about how people respond generally to unpleasant situations. They may use a dictionary and handbook.

Readers then score the essays on a 4-point scale, placing students in basic remedial, remedial, regular composition, and advanced composition courses. The Criteria define holistic standards for each placement level: the ability to link the topic to readers; to control the subject to support a point; and to address structure, syntax, diction, and mechanics. The readers, mostly college faculty and some high school writing teachers, use the Criteria and a set of ranked anchor essays to standardize their ratings. In addition, the Utah writing program has learned to estimate the percentages within which the ratings will fall, as shown in table 1.

Table 1
Utah Ratings and Placement Percentages


Rating	Placement Percentages

Basic Remedial Remedial Regular Composition Advanced Composition	1-3% 15-18% 60% 12-16%

I typed both the anchor essays and the representative samples into WORDPERFECT 5.0, analyzed them all with RW, and entered the RW counts and indexes into a specially-designed QUATTRO spreadsheet. Because I initially wanted to find out which stylistic traits correlated most closely with the ratings of the Utah anchor exams, I checked for all eight RW counts and indexes:

readability level (Flesch-Kincaid);
number of words;
average number of syllables per word;
average sentence length;
percentage of prepositions;
"strength" index--measuring the sample's use of brief, forceful words, phrases and sentences;
"descriptiveness" index--assessing the sample's use of modifiers; and
percentage of unique words.

Results of Analyzing the Utah Anchor Exams

All eight RW analyses correlate wholly or partly with the 1-4 ratings of the Utah anchor exams, as charted in Table 2. Two of the RW stylistic measurements closely parallel all four ratings. As the syllable count rises with the 1-4 ratings, the percentage of unique words falls. That is, these traits correlate positively and negatively, respectively, with the quality of anchor papers. The accompanying graph illustrates these trends (see Figure 1).

Table 2
Ratings of Utah Anchor Essays Rated 1 (Weak) - >4 (Strong)


Indicators	Utah1	Utah2	Utah3	Utah4	Ave	Min	Max	Std/ Dv

Readability	4.44	7.7	5.8	9.7	6.91	4.44	9.70	1.98
Total # of words	287	253	321	469	332.5	253	469	82.39
# syllables/ word	1.2	1.28	1.33	1.47	1.32	1.20	1.47	0.10
# words/ sentence	15.11	21	14.59	10.39	15.27	10.39	21	3.78
% words/ preps	11.5	7.91	11.22	12.79	10.86	7.91	12.79	1.80
Strength	0.42	0.3	0.54	0.03	0.32	0.03	0.54	0.19
Descriptiveness	0.54	1.04	0.85	0.71	0.79	0.54	1.04	0.18
% unique words	66.1	51.4	48.6	31.6	0.49	0.32	0.66	0.12

Figure 1. Global Indicators in Utah Placement Exams.

[Figure 2]

Figure 2. Paper Length in Utah Placement Exams.

Four other RW stylistic measurements track the most important 2-4 ratings. Total words and percentage of prepositions both rise in step with them, a positive correlation. The accompanying graph reveals the close link between paper length and quality of anchor exams (see Figure 2). On the other hand, average sentence length and RW's "descriptiveness" index both drop as ratings rise, a negative correlation. In addition, high and low readability levels set apart the best "4" rated samples and the worst "1" rated samples from the anchor pool. The advanced composition paper achieves nearly tenth grade readability (9.7) on the Flesh-Kincaid scale. The basic remedial sample barely reaches a fourth grade level (4.44). Finally, the "1" and "3" rated papers register higher "strength" ratings than the "2" and "4" rated papers, respectively, limiting the scope of this stylistic indicator.

Therefore, these correlations pass the "black box" test (Murdick, 1986, p. 61), in that, regardless of any theoretical link between stylistic traits and the holistic scores of anchor exams, the outputs change regularly and predictably with the inputs. That is, the dependent variables (the holistic ratings) closely parallel the independent variables (RW's stylistic measurements). Although these correlations can help rank samples, they cannot reliably sort them: The RW stylistic measurements appear as points on scales. They provide no ceilings or floors on numerical scales to divide different ratings from one another.

Ranking the Utah Sample Placement Exams

To rank the 46 representative Utah placement exams from the Utah writing program by quality, I drew upon rhetorical and readability research, both experimental and theoretical. To sort papers by assigning ratings, I used Utah's apportionment of representative exams by percentage (as presented in Table 1). I based my quality ranking of the Utah placement essays on essay length. This choice is strongly confirmed by the link between this measurement and the holistic rankings of the 2-4 Utah anchor exams. Experimental and theoretical research also single out fluency as the primary indicator of writing excellence, particularly in timed exams.

In their correlation of WWB analyses with holistic ratings, Reid and Findlay linked essay length most closely to the quality of writing, both statistically and rhetorically:

The longer essays correlate significantly with quality writing because they demonstrate development within paragraphs, structural completeness, and scribal fluency (the skill of keeping the pen on the page, keeping the flow of prose going). (1986, p.12)

Writing teachers intuitively share Ruth and Murphy's view that "the primary aim of the testing . . . is to see not only how much but how well the student can write in response to the topic" (1988, p.278). However, Thomas and Donlan (1982) identified the number of words as the variable most highly correlated with essay quality, regardless of the grade level of the writers. Gordon Brossell has also found that "the longer an essay was in a 200- to 400-word range, the likelier it was to get a higher score [mostly because of] the amount of information presented in the topics" (1986, p. 173).

Sorting the Utah Sample Placement Exams

To rate (or sort) the 46 Utah samples, I applied Frederick Taylor's theory of management by exception. To set standards, look first at exceptional (very good or very bad) performance. The "3" rated papers, which assign writers to regular first-year composition, comprise by far the greatest segment in any representative sample: about 60%. Therefore, I first extracted the two weakest papers (3% of the sample), those requiring basic remedial writing. Attachment B charts the ranking and sorting of all 46 samples. Those papers coded UTAH18 and UTAH44 managed a meager 112 and 150 total words, respectively. By comparison, the shortest "2" rated essay achieved 167 words. The dividing line between basic and regular remedial essays falls neatly between 150 and 167 words total: ² 160 words.

Moving from the extremely weak to the extremely strong end of the quality scale, I extracted seven papers for tentative "4" ratings (15% of the sample). However, a fluency floor of ³ 499 words is suspect. At this point, a meager 3-word margin separates the shortest "4" paper from the longest "3" paper (at 496 words). This negligible distinction requires other stylistic measurements to confirm or revise the use of number of words to sort the strongest "4" samples from the weakest "3" samples.

To confirm or override fluency as the sole dividing line between the "3" and the "4" rating, I adopted two other RW stylistic measurements: high syllable count and low percentage of unique words. To confirm a "4" rating, I placed the floor for average syllables per word at ³ 1.45; the ceiling for unique words rises no higher than ² 50%. On this basis, one paper on each side of the fluency dividing line exchanged ratings: UTAH19's rating rises to a "4," while UTAH10's rating drops to a "3." This adjustment also lowers the initial dividing line based on fluency by three words: A floor of ³ 496 words now separates the advanced composition papers from writers assigned to regular composition.

Like the ranking criterion--essay length--use of these sorting criteria is warranted by both experimental and theoretical research. However, the significance of RW's stylistic measurements--syllable length and percentage of unique--needs an explanation. Readability theory calls short, Anglo-Saxon terms "function" words. Users pick up these function words by speaking and hearing them often. On the other hand, readability theory applies the term "content" words to long, polysyllabic terms derived from Latin and Greek. These content words convey meaning, and users learn them systematically, usually from print (Gilliland 1972, p. 60-61).

Thus, as a writer increasingly relies on a more learned vocabulary, the percentage of unique "function" words drops, and the number of syllables per word rises. This change in style reflects an increasingly rich vocabulary derived more from reading than from speech. Therefore, measurements of syllable count and percentages of unique words are partly redundant. The following QUATTRO formula automatically extracts "4" entities from a representative group: [CELL]³496#AND#[CELL]³1.45#AND#[CELL]².50. This formula deals with fluency, average number of syllables, and percentage of unique words, respectively. It extracts seven "4" rated papers from the representative essays, well within the 12-16% Utah range for placement in advanced composition at the University of Utah.

Reid and Findlay's study finds significant correlations between these two indexes and holistic ratings, especially for the best student writers:

[In] impromptu essays, the overall weight of longer words, indicating a mature lexicon, increases essay quality . . . [At the same time] these middle- and upper-range writers can manipulate abstractions better than can the more basic writers. Conversely, the vague word percentage . . . is highest for the low group and lowest for the high group. (1986, p. 14, 17)

As paper length rises, the proportion of unique words falls. The correlation between the paper length of the Utah samples and their percentage of unique words is a high 59.6%. Moreover, the Utah Criteria strongly favor a literate style. They endorse an awareness of "differences between the requirements for oral and written language" (p. 2). They also favor a polysyllabic style which "moves toward precision and abstraction" (p. 5).

After these extractions, a pool of 37 papers remains, ranked by fluency. Setting the dividing line at ³ 284 words reliably extracts eight "2" rated essays (16% of the total group of samples). No other stylistic measurements need be applied. By default, the remaining 29 papers rated "3" (for regular composition) comprise 63% of the total sample group. This percentage closely approaches Utah's estimate for regular composition placements (60%).

The length of the "3" rated papers ranges from a low of 290 words to a high of 506; they average 390 words with a standard deviation of 67.6 words. By comparison, the advanced composition papers use longer words on the average than the regular composition papers: 1.53 compared with 1.43. The advanced composition samples also use a smaller average percentage of unique words, 46.96% as opposed to 48.48% for the regular composition essays. To the reader these differences appear slight. However, a holistic scorer would notice writers' uses of precise or unusual diction. The advanced composition samples also achieve an average readability level nearly a grade and a half higher than the regular composition papers. And the best writers use a higher average sentence length--by nearly a word per sentence. Reid and Findlay found high correlations between both these measures and holistic scores. Readability--though not sentence length--also sets the weaker "3" rated Utah anchor exams apart from the stronger "4" rated ones.

Conclusions

This prototype SDS reliably ranks and sorts all 46 representative Utah samples on the basis of paper length alone--with the exception of the "4" rated advanced composition placements. Minor adjustments must be made to placements on either side of the dividing line between the "3" rating and the "4" rating. I applied two other stylistic measurements: high syllable counts and a low percentage of unique words. When this SDS is applied to the Utah anchor exams, the SDS's sorting criteria reliably match the holistic ratings of the "2," "3," and "4" rated anchor papers. However, one further set of adjustments is required--to the extreme low end of the quality scale. At 287 words, the "1" rated anchor exam far exceeds the ²160 word ceiling separating basic remedial papers from the representative pool. Yet, this weak anchor paper uses very short words and a very high percentage of unique words--1.2 average syllables and 66.1% unique words, respectively. These traits group the weak anchor paper with the two basic remedial papers extracted from the representative pool. Thus, these two additional traits must be incorporated in the criteria for extracting basic remedial papers.

To sum up, this prototype SDS evolves rather elegantly from Taylor's theory of management by exception. Fluency alone cannot extract papers with extremely low and high ratings: Adjustments must be made. For the weakest basic remedial papers, low syllable averages and high use of unique words must be brought into play. To identify the strongest advanced composition papers, high syllable averages and low use of unique words affect sorting accuracy. The following formulas below are listed in the order of their application:

Basic Remedial

Remedial

Advanced Composition

Regular Composition

Discussion

The ratings assigned by this prototype SDS to the Utah placement exams reflect face and construct validity. That is, the ratings of the representative group of samples are consistent with holistic ratings of the Utah anchor exams. A growing body of experimental and theoretical research also confirms their validity. However, the concurrent validity of this SDS needs to be established. How consistent are its ratings with test scores--the verbal sections of the ACT or SAT, for example? Second, how much predictive validity does this SDS have? That is, how well do its scores predict the grades earned in first-year writing courses? Finally, this prototype needs a test rating all placement exams administered for a given period and school.

An SDS for rating placement exams offers significant tangible and intangible gains, both monetary and non-monetary. Its tangible monetary gains can be easily estimated. The greatest expense involves typing handwritten essays using a word processing package. A typist capable of 120 words a minute could type a 45-minute placement exam in 2-5 minutes; analyzing a sample with RW takes no more than a minute or two, and could easily be automated. Thus, creating each style-checked file need cost no more than $.90 to $1.00 apiece. Permitting or requiring students to write placement exams on computer could significantly reduce this expense. The rest of the work--transferring data to a spreadsheet, ranking, and sorting--can be fully computerized.

By comparison, one administration of the English Composition Test asked 85,000 students to each write 20-minute essays; each essay was scored twice. Gertrude Conlan estimates the cost of scoring the batch of essays at $500,000. Thus, rating each paper costs $5.88 (1986, p. 111). Conlan's estimate leaves out the administrative costs of recording grades, etc. Longer placement exams would drive the grading costs much higher.

In addition, whatever database is used, ranking and sorting the RW data involves some initial one-time programming. However, most colleges and universities have already committed themselves to computerization for research and classroom purposes--word processing for writing classes, spreadsheet programs for accounting, for example. The development and operation of an SDS for placement testing require only modest additional outlays. Although such programming is expensive, it is straightforward, uses existing hardware and software, and need be done only once. In MIS terms, the one-time programming costs are limited. Data transfers easily between programs, and the process requires less expensive batch processing than the more expensive online data processing. Then too, operators can run batch processing with minimal training. A computerized SDS fully justifies its usefulness by cheaply replicating ratings reached by trained readers.

However, a placement-rating SDS offers more than improved accuracy and economy. It offers significant intangible non-monetary gains for future college students and their high school teachers, for the instructors of college writing courses, and for researchers. It does so by increasing the value of placement exams as information. As Erika Lindemann points out, placement exams are now valued for administrative use only (1987, p. 203). Neither the writers nor their instructors get any feedback from them. A fully validated SDS could shift high school writing instruction away from impressionistic, localized criteria of writing quality to more precise measures.

In addition, SDS data could give college writing teachers accurate profiles of their incoming students' writing strengths and weaknesses, individually and by class. My article, "A Decision Support System for Improving First-Year Writing Courses," suggests ways instructors might use stylistic analyses of placement exams in their writing courses. Reliable and valid ratings of their incoming students' initial writing skills would give colleges a base point to assess their students' writing ability as they move toward graduation.

Finally, the quantified indexes of the quality of students' writing products would help researchers in composition. No longer would

student essays . . . [sit] in millions of computers [while] many thousands of teachers are trying to determine how best to take advantage of this opportunity for writing analysis. (Wresch, 1988, p. 16)

However, the significant benefits of a placement exam SDS must be balanced against faculty resistance to technological change. Before committing themselves to computerized evaluation of placement exams, a writing faculty must make several difficult decisions: They must agree on the type of exam, criteria for a rating system, representative samples, the apportionment of students among writing courses levels, and other matters. The scarcity of working models of such programs reflects the difficulties of these tasks. A writing faculty needs to be experienced and comfortable with a range of computerized word processing, error and style checkers, reformatters, and grading utilities. Only then could they realistically expect to make the conceptual leaps required by such a program.

Emil Roy teaches in the Department of English, University of South Carolina-Aiken.

References

Brossell, G. (1986). Current research and unanswered questions in writing assessment. In K. L. Greenberg, H. S. Wiener, & R. A. Donovan (Eds.), Writing assessment: Issues and strategies (p. 168-182). New York: Longman.

Conlan, G. (1986). "Objective" measures of writing ability. In K. L. Greenberg, H. S. Wiener, & R. A. Donovan (Eds.), Writing assessment: Issues and strategies (p. 109-125). New York: Longman.

Cooper, C. R. & Odell, L. (Eds.). (1977). Evaluating writing: Describing, measuring, judging. Urbana, IL: National Council of Teachers of English.

Fader, D. (1986). Writing samples and virtues. In K. L. Greenberg, H. S. Wiener, & R. A. Donovan (Eds.), Writing assessment: Issues and strategies (p. 79-92). New York: Longman.

Garvey, J. J. & Lindstrom, D. H. (1989). Pro's prose meets writer's workbench: Analysis of typical models for first-year writing courses. Computers and Composition, 6(2), 82- 109.

Gilliland, J. (1972). Readability. London: Hodder & Stoughton.

Hake, R. (1986). How do we judge what they write? In K. L. Greenberg, H. S. Wiener, & R. A. Donovan (Eds.), Writing assessment: Issues and strategies (p. 153-167). New York: Longman.

Harrison, C. (1980). Readability in the classroom. Cambridge, MA: Cambridge University Press.

Klare, G. R. (1963). The measurement of readability. Ames, IA: Iowa State University Press.

Lindemann, E. (1987). A rhetoric for writing teachers (2nd ed). New York: Oxford University Press.

Murdick, R. C. (1986). MIS concepts and design (2nd ed). Englewood Cliffs, NJ: Prentice Hall.

Neilsen, L. & Piche, G. (1981). The influence of headed nominal complexity and lexical choice on teachers' evaluation of writing. Research in the Teaching of English, 15, 65-73.

Nold, E. & Freedman, S. (1977). An analysis of readers' responses to essays. Research in the Teaching of English, 11, 165-174.

Reid, S. & Findlay, G. (1986). Writer's workbench analysis of holistically scored essays. Computers and Composition, 3(2), 6-32.

Roy, E. A. (19??). Decision support system for improving first-year writing courses. Computer-Assisted Composition Journal, 4(3), 79-93.

Ruth, L. & Murphy, S. (1988). Designing writing tasks for the assessment of writing. Norwood, NJ: Ablex.

Simon, H. A. (1960). The new science of management decisions. New York: Harper & Row.

Thomas, D. & Donlan, D. (1982). Correlations between holistic and quantitative methods of evaluating student writing, grades 4-12. Washington, DC: GPO (ERIC Document Reproduction Service No. ED 211 976).

University Writing Program (1989). Criteria for rating placement essays. Unpublished manuscript. University of Utah, University Writing Program, Salt Lake City.

White, E. M. (1986). Pitfalls in the testing of writing. In K. L. Greenberg, H. S. Wiener, & R. A. Donovan (Eds.), Writing assessment: Issues and strategies (p. 53-78). New York: Longman.

Wresch, W. (1988). Six directions for computer analysis of student writing. Computing Teacher, 15(7), 13-16.

Appendix A
Ratings of Utah Placement Exams
"1" Rated Exams (Basic Remedial)


Indicators	Readability	Total # words	Avg # Syl'bls	# words sent	% words preps	Stmgt	Desc't	% uniq words

Utah18	7.94	112	1.5	13.9	13.39%	0.84	0.4	66.96%
Utah44	8.73	150	1.6	13.56	8.00%	0.57	0.86	50.67%

Note: Standard: =<160#or#=<1.2#or# =>66%

Appendix B
Ratings of Utah Placement Exams
"2" Rated Exams (Remedial)


Indicators	Readability	Total # words	Avg # Syl'bls	# words sent	% words preps	Stmgt	Desc't	% uniq words

Utah24	10.68	167	1.46	23.14	13.17%	0.00	0.57	65.27%
Utah45	5.37	173	1.37	12.28	12.72%	0.62	0.86	59.54%
Utah11	9.87	176	1.58	17.50	11.90%	0.33	0.64	61.93%
Utah14	7.4	178	1.46	14.75	8.99%	0.62	0.71	63.48%
Utah9	7.99	243	1.39	18.39	12.76%	0.36	0.65	49.79%
Utah5	8.18	243	1.5	15.13	10.29%	0.30	1.13	61.32%
Utah3	8.16	259	1.5	14.88	11.58%	0.71	0.57	52.12%
Utah41	5.45	277	1.35	13.09	10.83%	0.70	0.61	56.32%

Count	8
% of total	17.39%
Minimum	5.37	167	1.35	12.28	8.99%	0.00	0.57	49.79%
Maximum	10.68	277	1.58	23.14	13.17%	0.71	1.13	65.27%
Average	7.89	214.5	1.45	16.15	11.53%	0.46	0.72	58.72%
Std dev	1.75	42.3	0.07	3.25	1.33%	0.23	0.18	5.15%

Note: Standard: =<284 (Sort "2"-Rated from "3"-Rated)

Appendix C
Ratings of Utah Placement Exams
"3" Rated Exams (egular Compositions)


Indicators	Readability	Total # words	Avg # Syl'bls	# words sent	% words preps	Stmgt	Desc't	% uniq words

Utah23	8.46	290	1.44	18.06	8.28%	0.30	0.77	49.31%
Utah1	9.59	297	1.44	21.07	12.12%	0.14	0.62	53.54%
Utah39	8.74	297	1.52	16.44	13.47%	0.43	0.70	52.86%
Utah20	4.58	302	1.30	12.50	10.93%	0.69	0.73	47.68%
Utah21	9.12	309	1.50	18.10	10.36%	0.32	0.78	52.43%
Utah15	11.21	327	1.56	23.29	7.95%	0.00	0.73	58.41%
Utah17	11.44	329	1.57	21.86	9.12%	0.01	0.92	52.89%
Utah35	9.71	337	1.57	17.47	13.95%	0.35	0.63	49.85%
Utah2	10.14	343	1.59>	18.00	12.83%	0.37	0.70	51.90%
Utah42	3.97	347	1.35	9.35	9.79%	0.69	0.71	49.86%
Utah46	6.16	350	1.34	15.13	11.43%	0.47	0.65	56.57%
Utah30	10.92	364	1.58	20.17	13.18%	0.38	0.71	54.95%
Utah31	11.25	365	1.64	19.16	11.00%	0.17	0.95	47.95%
Utah37	11.13	367	1.52	22.50	15.26%	0.00	0.73	47.96%
Utah16	8.18	368	1.27	22.63	8.40%	0.06	0.73	45.38%
Utah22	5.55	379	1.40	13.50	11.87%	0.69	0.68	51.72%
Utah27	7.18	385	1.49	13.21	11.43%	0.54	0.52	55.58%
Utah7	8.6<	423	1.35	21.10	11.35%	0.00	0.76	47.52%
Utah4	7.53	432	1.28	20.48	9.49%	0.21	0.75	44.91%
Utah6	8.7	437	1.30	22.95	8.90%	0.00	0.73	36.38%
Utah32	7.15	451	1.42	15.52	9.50%	0.42	0.62	49.22%
Utah38	7.66	452	1.48	15.03	11.28%	0.34	0.71	40.93%
Utah43	8.32	460	1.49	16.39	10.22%	0.47	0.80	46.96%
Utah12	9.85	462	1.40	21.95	10.17%	0.13	0.92	39.83%
Utah25	9.37	468	1.48	17.21	10.47%	0.32	0.66	50.43%
Utah8	8.54	481	1.33	21.73	13.93%	0.32	0.92	45.11%
Utah29	6.25	493	1.37	15.00	11.77%	0.51	0.72	40.16%
Utah40	7.38	495	1.30	19.60	11.52%	0.20	0.63	43.23%
Utah10	7.66	506	1.38	18.00	11.07%	0.39	0.79	42.49%

Count	29
% of total	63.04%
Minimum	3.97	290	1.27	9.35	7.95%	0.00	0.52	36.38%
Maximum	11.44	506	1.64	23.29	15.26%	0.69	0.95	58.41%
Average	8.43<	390.2	1.43	18.19	11.07%	0.31	0.73	48.48%
Std dev	1.92	67.6	0.11	3.51	1.78%	0.21	0.10	5.30%
Note: Standard: =>285 (Sort "3"-Rated from "2"-Rated)

Appendix D
Ratings of Utah Placement Exams
"4" Rated Exams (Advanced Compositions)


Indicators	Readability	Total # words	Avg # Syl'bls	# words sent	% words preps	Stmgt	Desc't	% uniq words

Utah34	11.39	576	1.53	23	12.85%	0.10	0.82	45.31%
Utah36	10.14	537	1.59	17.8	11.17%	0.13	0.81	47.49%
Utah28	9.59	532	1.56	17.47	12.40%	0.23	0.7	46.62%
Utah26	8.67	518	1.45	18.46	11.39%	0.15	0.78	46.72%
Utah13	12.84	499	1.63	23.70	14.80%	0.10	0.64	49.70%
Utah33	7.58	499	1.45	15.56	10.60%	0.32	0.66	45.89%
Utah19	8.91	496	1.5	17.7	12.70%	0.20	0.61	46.98%

Count	7
% of total	15.22%
Minimum	7.58	496	1.45	15.56	10.60%	0.1	0.61	45.31%
Maximum	12.84	576	1.63	23.7	14.80%	0.32	0.82	49.70%
Average	9.87	522.4	1.53	19.10	12.27%	0.18	0.72	46.96%
Std dev	1.64	26.7	0.06	2.82	1.29%	0.07	0.08	1.3%

Note: Standard: =>496#and#=>1.45 --And-- =<50% (Sort "4"-Rated from "3"-Rated)