Friday, 3 October 2014


A.    Background
The five principles of practicality, reliability, validity, authenticity, and washback go a long way toward providing useful guidelines for both evaluating an existing assessment procedure and designing one on your own. Quizzes, tests, final exams, and standardized proficiency tests can all be scrutinized through these five lenses.
Are there other principles that should be invoked in evaluating and designing assessments? The answer, of course, is yes. Language assessment is an extraordinarily broad discipline with many branches, interest areas, and issues. The process of designing effective assessment instruments is far so complex to be reduced to five principles. Good test construction, for example, is governed by research-based rules of test preparation, sampling of tasks, item design and construction, scoring responses, ethical standards, and so on. But the five principles cited here serve as an excellent foundation on which to evaluate existing instruments and to build your own.
We will look at how to design tests in Chapter 3 and at standardized tests in Chapter 4. The questions that follow here, indexed by the five principles, will help you evaluate existing tests for your own classroom. It is important for you to remember, however, that the sequence of these questions does not imply a priority order. Validity, for example, is certainly the most significant cardinal principle of assessment evaluation. Practicality may be a secondary issue in classroom testing. Or, for a particular test, you may need to place authenticity as your primary consideration. When all is said and done, however, if validity is not substantiated, all other considerations may be rendered useless.

B.     Problem Formulation
1.      Are The Test Procedures Practical?
2.      Is The Test Reliable? 
3.      Does The Procedure Demonstrate Content Validity?
4.      Is The Procedure Face Valid And “Biased For Best”?
5.      Are  The Test Tasks As Authentic As Possible ?
6.      Does The Test Offer Beneficial Washback To The Learner?

A.    Are The Test Procedures Practical?
Practicality is determined by the teacher’s (and the student’s) time constraints, costs, and administrative details, and to some extent by what occurs before and after the test. To determine whether a test is practical for your needs, you may want to use the checklist below.
Practicality checklist :
1.      Are administrative details clearly established before the test?
2.      Can students complete the test reasonably within the set time frame?
3.      Can the test be administered smoothly, without procedural “gliches”?
4.      Are all materials and equipment ready?
5.      Is the cost of the test within budgeted limits?
6.      Is the scoring/evaluation system feasible in the teacher’s time frame?
7.      Are methods for reporting result determined in advance?
As the checklist suggests, after you account for the administrative details of giving a test, you need to think about the practicality of your plans for scoring test. In teachers’ busy lives, time often emerges as the most important factor, one that over rides other considerations in evaluating an assessment. If you need tailor a test to fit your own time frame, as teachers frequently do, you need to accomplish his without damaging the tests’ validity and washback. Teachers should, for example, avoid the temptation to offer only quickly scored multiple-choice selection items that may be neither appropriate nor well-designed. Everyone knows teachers secretly hate to grade tests (almost as much as students hate to make them! ) and will do almost anything to get through that task as quickly and effortlessly as possible. Yet good teaching almost always implies an investment of the teacher’s time in giving feedback-comment and suggestions-to students on their tests.

B.     Is The Test Reliable? 
Reliability applies to both the test and the teacher, and at least for source of unreliability must be guarded against, as noted in the second section of this chapter. Test and test administration reliability can be achieved by making sure that all students receive the same quality of input, whether written or auditory. Part of achieving test reliability depends on the physical context-making sure, for example that
Every student has a cleanly photocopied test sheet, Sound amplification is clearly audible to everyone in the room, Video input is equally visible to all, Lighting, temperature, extraneous noise, and other classroom conditions are equal (and optimal) for ll students, and Objective scoring procedures leave little debate about correctness of an answer.
Rather reliability, another common issue in assessments, may be more difficult, perhaps because we too often overlook this as an issue. Since classroom tests rarely involve two scorers, inter-rater reliability is seldom an issue. Instead, intra-rater reliability is of constant concern to teachers: What happens to our fallible concentration and stamina over the period of time during which we are evaluating a test? Teachers need to find ways to maintain their concentration and stamina over the time it takes to score assessments. In open-ended response tests, this issue is of paramount importance. It is easy to let mentally established standards erode over the hours you require to evaluate the test. Intra-rater reliability for open-ended responses may be enhanced by the following guidelines: Use consistent sets of criteria for a correct response.
Give uniform attention to those sets throughout the evaluation time.
Read through tests at least twice to check for your consistency.
If you have made “mid-stream” modifications of what you consider as a correct response, go back and apply the same standards to all. Avoid fatigue by reading the tests in several sittings, especially if the time requirement is a mtter of several hours.

C.    Does The Procedure Demonstrate Content Validity?
The major source of validity in a classroom test is content validity: the extent to which the assessment requires students to perform tasks that were included in the previous classroom lessons and that directly represent the objective of the unit on which the assessment is based. If you have been teaching an English language class to fifth graders who have been reading, summarizing, and responding to short passages, and if your assessment is based on this work, then to be content valid, and test needs to include performance in those skills. There are two steps to evaluating the content validity of a classroom test.
1. Are classroom objectives identified and appropriately framed? Underlying every good classroom test are the objectives of the lesson, module, or unit of the course in question. So the first measure of an effective classroom test is the identification of objectives. Sometimes this is easier said than done. Too often teachers work trough lessons day after day with little or no cognizance of the objectives they seek to fulfill. Or perhaps those objectives are so poorly framed that determining whether or not they were accomplished is impossible. Consider the following objectives for lesson, all of which appeared on lesson plans designed by students in teacher preparation programs:
a.       Students should be able to demonstrate some reading comprehension.
b.      To practice vocabulary in context.
c.       Students will have fun trough a relaxed activity and thus enjoy their learning.
d.      To give students a drill on the / i / - / I / contrast.
e.       Students will produce yes / no questions with final rising intonation.
Only the last objective is framed in a form that lends itself to assessment. In (a), the modal should is ambiguous and the expected performance is not stated. In (b), everyone can fulfill the act of “practicing”; no standards are stated or implied. For obvious reasons, (c) cannot be assessed. And (d) is really just a teacher’s note on the type of activity to be used. Objective (e), on the other hand, include a performance verb and a specific linguistic target. By specifying acceptable and unacceptable levels of performance, the goal can be tested. An appropriate test would elicit an adequate number of samples of student performance, have a clearly framed of standards for evaluating the performance ( say, on a scale of 1 to 5), and provide some sort of feedback to the student.
2. Are lesson objectives represented in the form of test specifications?
The next content-validity issue that can be applied to the classroom test centers on the concept of test specifications. Don’t let this word scare you. It simply means that a test should have a structure that follows logically from the lesson or unit you are testing. Many tests have a design that divides them into a number of sections (corresponding, perhaps, to the objectives that are being assessed), Offers students a variety of item types, and Give an appropriate relative weight to each section.
Some tests, of course, do not lend themselves to this kind of structure. A test in a course in academic writing at the university level might justifiably consist of an in class-written essay on a given topic-only one ”item” and one response, in a manner of speaking. But in this case the specs (specifications) would be embedded in the prompt itself and in the scoring or evaluation rubric used to grade it and give feedback. We will return to the concept of test specs in the next chapter.
The content validity of an existing classroom test should be apparent in how the objectives of the unit being tested are represented in the form of the content of items, clusters of items, and item types. Do you clearly perceive the performance of test-taker as reflective of the classroom objectives? If so, and you can argue this, content validity has probably been achieved.

D.    Is The Procedure Face Valid And “Biased For Best”?
This question integrates the concept of face validity with the importance of structuring an assessment procedure to elicit the optimal performance of the student. Student will generally judge a test to be face valid if:
Directions are clear, The structure of the test is organized logically, Its difficulty level is appropriately pitched, The test has no “surprises”, and timing is appropriate. A phrase that has come to be associated with face validity is “biased for best” a term that goes a little beyond how the student views the test to a degree to strategic involvement on the part of student and teacher in preparing for, setting up, and following up on the test itself. According to Swain (1984), to give an assessment procedure that is “biased for best”, a teacher offers student appropriate review and preparation for the test, Suggests strategies that will be beneficial, and structures the test so that the best students will be modestly challenged and the weaker students will not be overwhelmed.
It’s easy for teachers to forgot how challenging some tests can be, and so a well-planned testing experience will include some strategic suggestions on how students might optimize their performance. In evaluating a classroom test, consider the extent to which before, during, and after-test options are fulfilled.
Test-taking strategies
Before the Test:
1)      Give students all the information you can about the test: Exactly what will the test cover? which topics will be most important? What kind of items will be on it? How long will it be?
2)      Encourage students to do a systematic review of material. For example, they should skim the textbook and other material, outline major points, write down examples.
3)      Give them practice tests or exercises, if available.
4)      Facilitate formation of a study group, if possible.
5)      Caution students to get a good night’s rest before the test.
6)      Remind students to get to the classroom early

During the Test
1)      After the test is distributed, tell students to look over the whole test quickly in order to get a good grasp of its different parts.
2)      Remind them to mentally figure out how much time they will need for each part.
3)      Advise them to concentrate as carefully as possible.
4)      Warn students a few minutes before the end of the class period so that they can finish on time, proofread their answers; and catch careless errors.

After the Test:
1)      When you return the test, include feedback on specific things the student did well, what he or she did not do well, and, if possible, the reasons for your comments.
2)      Advise students to pay careful attention in class to whatever you say about the test results.
3)      Encourage questions from students.
4)      Advise students to pay special attention in the future to points on which they are weak.
Keep in mind that wht comes before and after the test also contributes to its face validity. Good class preparation will give students a comfort level with the test, and good feedback-washback-will allow them to learn from it.

E.     Are  The Test Tasks As Authentic As Possible ?
Evaluate the extent to which a test is authentic by asking the following questions:
*      Is the language in the test as natural as possible?
*      Are items as contextualized as possible rather than isolated?
*      Are topics and situations interesting, enjoyable, and/or humorous?
*      Is some thematic organization provided, such as through a story line or episode?
*      Do tasks represent, or closely approximate, real-world task?
Consider the following two excerpts from tests, and the concept of authenticity may become a little clearer.
The sequence of items in the contextualized tasks achieves a modicum of authenticity by contextualizing all the items in a story line. The conversation is one that might occur in the real world, even if with a little less formality. The sequence of items in the de contextualized tasks takes the test-taker into five different topic areas with no context for any. Each sentence is likely to be written or spoken in the real world, but no in that sequence. Given the constraints of a multiple-choice format, on a measure of authenticity I would say the first excerpt is “good” and the second excerpt is only “fair”.

F.     Does The Test Offer Beneficial Washback To The Learner?
The design of an effective test should point the way to beneficial wash back. A test that achieves content validity demonstrates relevance to the curriculum in question and thereby sets the stage for wash back. When test items represent the various objectives of a unit, and/or when sections of a test clearly focus on major topics of the unit, classroom tests can serve in a diagnostic capacity even if they aren’t specifically labelled as such.
Other evidence of wash back may be less visible from an examination of the test itself. Here again, what happens before and after the test is critical. Preparation time before the test can contribute to wash back since the learner is reviewing and focusing in a potentially broader way on the objectives in question. By spending classroom time after the test reviewing the content, students discover their areas of strength and weakness. Teachers can raise the wash back potential by asking students to use test results as a guide to setting goals for their future effort. The key is to play down the “Whew, I’m glad that’s over” feeling that students are likely to have, and play up the learning that can now take place from their knowledge of the results,
Some of the “alternatives” in assessment referred to in Chapter 1 may also enhance wash back from tests. (See also Chapter 10.) Self assessment may sometimes be an appropriate way to challenge students to discover their own mistakes. This can be particularly effective for writing performance: once the pressure of assessment has come and gone, students may be able to look back on their written work with a fresh eye. Peer discussion of the test results may also be an alternative to simply listening to the teacher tell everyone what they got right and wrong and why. Journal writing may offer students a specific place to record their feelings, what they learned, and their resolutions for future effort.
The five basic principles of language assessment were expanded here into six essential questions you might ask yourself about an assessment. As you use the principles and the guidelines to evaluate various forms of tests and procedures, be sure to allow each one of the five to take on greater or lesser importance, depending on the context. In large-scale standardized testing, for example, practicality is usually more important than wash back, but the reverse may be true of a number of classroom tests. Validity is of course always the final arbiter. And remember, too that these principles, important as they are, are not the only considerations in evaluating or making an effective test. Leave some space for other factors to enter in.
In the next chapter, the focus is no how to design a test. These same five principles underlie test construction as well as test evaluation, along with some new facets that will expand your ability to apply principles to the practicalities of language assessment in your own classroom.


Assessment requires students to perform tasks that were included in previous lesson and represent the objectives of the unit on which the assessment is based
         Students will judge a test to be valid if:
        Directions are clear
        The structure of the test is organized logically
        Its difficulty level is appropriately pitched
        The test has no “surprises”
        Timing is appropriate
         Biased for best: a term that goes a little beyond how the student views the test to a degree of strategic involvement on the part of student and teacher in preparing for, setting up, and following up on the test.
         The design of an effective test should point the way to beneficial washback.
A test that achieves content validity demonstrates relevance to the curriculum in question and sets the stage for washback


Post a Comment