Table of Specification (TOS) with an Overview on Test Construction

This is the author's personal reflection on the Characteristics of a Good Test. It can help teachers realize some important points that are needed to be considered when constructing tests for their students.

Download Free PDF View PDF

2016 Meaning of key terms Test is an assessment intended to measure the respondents' knowledge or other abilities. Classroom tests provide teachers with essential information used to make decisions about instruction and student grades (Oncu 1994). A table of specification (TOS) can be used to help teachers frame the decision making process of test construction and improve the validity of teachers' evaluations based on tests constructed for classroom use. Reliability is the property of a set of test scores that indicates the amount of measurement error associated with the scores. Teachers need to know about reliability so that they can use test scores to make appropriate decisions about their students (Oncu 1994). Reliability is synonymous with consistency. It is the degree to which test scores for an individual test taker or group of test takers are consistent over repeated applications (Tekin 1977). Test reliability refers to the consistency of scores students would receive on alternate forms of the same test. Due to differences in the exact content being assessed on the alternate forms, environmental variables such as fatigue or lighting, or student error in responding, no two tests will consistently produce identical results (Gay 1985). This is true regardless of how similar the two tests are. In fact, even the same test administered to the same group of students a day later will result in two sets of scores that do not perfectly coincide. Obviously, when we administer

Download Free PDF View PDF

Assessment is a crucial aspect of the teaching and learning processes. According to Nunan (1999), “assessment refers to the tools, techniques, and procedures for collecting and interpreting information about what learners can and cannot do”. Based on the previous statement, this paper wants to focus its attention on tests. They are useful tools to evaluate students provided that they are implemented appropriately; that is why, their analysis is a critical factor in the assessment process, given the fact that it informs teachers about their instruction, students’ performance, and the curriculum, amongst others. Bailey (1998), states that “when we talk about a test being “appropriate,” the issue is partly whether the test provides us with the information we need to gain about the students we serve.” It is paramount to identify the desired goals we want our students to reach, the place where they are in relation to them and the tools we need to provide them with, in order to help them getting higher achievements. Tests analysis helps us to identify those aspects and to adjust our teaching practices aiming at students to succeed. In accordance with the preceding information, the effectiveness of three subtests will be analyzed, so their validity, reliability, practicality and positive washback (Bailey, 1998, p. 3) will be determined as well. These subtests are part of an only language test that was administered to eleventh graders at a public school. It mixes three different constructs taking into account that students at this grade need to be prepared to face a standardized test, which evaluates learners similarly and is called Saber Test. It assesses learners’ English skills, such as their pragmatic, lexical, communicative and grammatical knowledge, and their reading comprehension (ICFES, 2014). The test discussed in this report is a progress one, whose stimulus material is based on mythology. This topic was studied in depth during the second term of the academic year, and learners have received complementing knowledge in other classes, such as Social Studies and Spanish. My purpose is to discover whether the test I designed fulfill my students’ needs, expectations and English level or, on the contrary, it needs to be modified in order to get reliable information about my pupils’ learning process. Additionally it will be analyzed how well the test measures the constructs and the way they relate each other, as Brown (2005) advocates, validity “is the degree to which a test measures what it claims, or purports, to be measuring”.

Download Free PDF View PDF

Giving regular tests to the class is important both to teachers and to students. Tests enable teachers to know what their students can or can't do and at the same time give them an idea about how successful their teaching has been. This paves the way to decisions to make about future class practices. As to students, tests enable them to know if they are progressing and " gauge " how well they are progressing. They can discover their own needs as learners and make decisions about their future learning. Different types of tests take place during the period of instruction. These tests and their frequency vary according to the general aim behind administering them. However, particular aims behind testing language components and/or language use may be another raison d'être for specific tests. For a test to be rated " good " , it should be valid (content validity; construct validity; face validity), reliable (test reliability; scorer reliability) and it should have an effect on the teaching program(s) that lead(s) to it (washback or backwash effect). When constructing tests, test makers respect the whole current educational approache(s) adopted in the curricula. For if the act of teaching happens communicatively, the act of testing takes place also communicatively. Within this framework, test constructors present contextualized items and authentic language/ material based on the principle of interaction, accompanied with instructions and tasks within the reach of the intended population. A learner-centered test, concerned with outcomes and processes and valuing the creative side of the learner's performance (accuracy and fluency) is, therefore, designed. Here is an evaluative checklist that a teacher can complete after administering a test:

Download Free PDF View PDF

Based from the behavioural educational theories, higher learning institutions has been using assessment to measure the quality or success of a taught course and to evaluate whether the students have achieved (Ellery, 2008) the minimum standard that is acceptable to be awarded with the degree. An assessment can be conducted by means of paper and pencil, presentations, lab work, case studies, essays, multiple choice questions, true/false statements, short essays, etc. During the semester, students may be tested to improve their learning experience; this is called a formative test (continuous assessment), whereas a summative test (final assessment) is done at the end or completion of the course or program. A test can be used to measure students' ability or to determine the basic mastery or skills or competencies acquired during a course. There are several types of tests; such as, placement test, diagnostic test, progress test, achievement test, and aptitude test. A placement test is done to place students in teaching groups or classes so that they are within the same level of ability or competency. A diagnostic test is done to identify students' strengths and weaknesses in a particular course. A progress test is done during the semester to measure the progress of students in acquiring the subject taught. An achievement test is done to determine students' mastery of a particular subject at the end of the semester. Whereas an aptitude test is done to determine the students' ability to learn new skills or the potential to succeed in a particular academic program. A good assessment should be valid, reliable, and practical. In terms of validity, an assessment should test what it is intended to measure. For example, content validity is when the test items adequately cover the syllabus. A valid assessment measures achievement of the course learning outcomes. In terms of reliability, does the assessment allow the examiners to evaluate it consistently and differentiate between varying levels of performance? Whereas in terms of practicality, we need to ensure that the length given to students for their assessments are appropriate. There are two types of tests, objective and subjective. For objective, we can choose multiple choice questions, true/false, or fill in the blanks; whereas for subjective we can choose either short or long essay. Although there are objective and subjective tests, I would like to focus on subjective test (essays) because we use this type most often; especially in final exam. When constructing an assessment, we need to bear in mind the objectives of learning of a particular course. Specifically, we need to refer to the course information of the course learning outcomes before constructing the exam questions. In addition, we need to understand Bloom's Taxonomy or classifications of objectives. The three classifications are cognitive, affective, and psychomotor. The six levels of cognitive domain are knowledge, comprehension, application, analysis, synthesis, and evaluation. The levels for affective domain are receiving, responding, valuing, organizing, and characterizing. Psychomotor levels are imitation, manipulation, precision, articulation, and naturalization. I have discussed in detail about the levels of each domain in the previous issue; thus, in this issue I would like to discuss on cognitive domain because this is the most frequently used in final exam and we are quite familiar with it.

Download Free PDF View PDF