learnt by tremendous efforts, but not elaboration of the exact actual
knowledge of the student (that, unfortunately, does not exist at all).
Moreover, there could be even more disastrous case when the student has
cheated and used his/her neighbour’s work. Apart from the above-mentioned
there could be other factors that could influence an inadequate completion
of the test (sleepless night, various personal and health problems, etc.)
However, very often the test itself can provoke the failure of the
students to complete it. With the respect to the linguists, such as Hughes
(1989) and Alderson (1996), we are able to state that there are two main
causes of the test being inaccurate:
. Test content and techniques;
. Lack of reliability.
The first one means that the test’s design should response to what is
being tested. First, the test must content the exact material that is to be
tested. Second, the activities, or techniques, used in the test should be
adequate and relevant to what is being tested. This denotes they should not
frustrate the learners, but, on the contrary, facilitate and help the
students write the test successfully.
The next one denotes that one and the same test given at a different time
must score the same points. The results should not be different because of
the shift in time. For example, the test cannot be called reliable if the
score gathered during the first time the test was completed by the students
differs from that administered for the second time, though knowledge of the
learners has not changed at all. Furthermore, reliability can fail due to
the improper design of a test (unclear instructions and questions, etc.)
and due to the ways it is scored. The teacher may evaluate various students
differently taking different aspects into consideration (level of the
students, participation, effort, and even personal preferences.) If there
are two markers, then definitely there will be two different evaluations,
for each marker will possess his/her own criteria of marking and evaluating
one and the same work. For example, let us mention testing speaking skills.
Here one of the makers will probably treat grammar as the most significant
point to be evaluated, whereas the other will emphasise the fluency more.
Sometimes this could lead to the arguments between the makers;
nevertheless, we should never forget that still the main figure we have to
deal with is the student.
2.2. Validity
Now we can come to one of the important aspects of testing – validity.
Concerning Hughes, every test should be reliable as well as valid. Both
notions are very crucial elements of testing. However, according to Moss
(1994) there can be validity without reliability, or sometimes the border
between these two notions can just blur. Although, apart from those
elements, a good test should be efficient as well.
According to Bynom (Forum, 2001), validity deals with what is tested and
degree to which a test measures what is supposed to measure (Longman
Dictionary, LTAL). For example, if we test the students writing skills
giving them a composition test on Ways of Cooking, we cannot denote such
test as valid, for it can be argued that it tests not our abilities to
write, but the knowledge of cooking as a skill. Definitely, it is very
difficult to design a proper test with a good validity, therefore, the
author of the paper believes that it is very essential for the teacher to
know and understand what validity really is.
Regarding Weir (1990:22), there are five types of validity:
. Construct validity;
. Content validity
. Face validity
. Wash back validity;
. Criterion-related validity.
Weir (ibid.) states that construct validity is a theoretical concept that
involves other types of validity. Further, quoting Cronbach (1971), Weird
writes that to construct or plan a test you should research into testee’s
behaviour and mental organisation. It is the ground on which the test is
based; it is the starting point for a constructing of test tasks. In
addition, Weird displays the Kelly’s idea (1978) that test design requires
some theory, even if it is indirect exposure to it. Moreover, being able to
define the theoretical construct at the beginning of the test design, we
will be able to use it when dealing with the results of the test. The
author of the paper assumes that appropriately constructed at the
beginning, the test will not provoke any difficulties in its administration
and scoring later.
Another type of validity is content validity. Weir (ibid.) implies the
idea that content validity and construct one are closely bound and
sometimes even overlap with each other. Speaking about content validity, we
should emphasise that it is inevitable element of a good test. What is
meant is that usually duration of the classes or test time is rather
limited, and if we teach a rather broad topic such as “computers”, we
cannot design a test that would cover all the aspects of the following
topic. Therefore, to check the students’ knowledge we have to choose what
was taught: whether it was a specific vocabulary or various texts connected
with the topic, for it is impossible to test the whole material. The
teacher should not pick up tricky pieces that either were only mentioned
once or were not discussed in the classroom at all, though belonging to the
topic. S/he should not forget that the test is not a punishment or an
opportunity for the teacher to show the students that they are less clever.
Hence, we can state that content validity is closely connected with a
definite item that was taught and is supposed to be tested.
Face validity, according to Weir (ibid.), is not theory or samples
design. It is how the examinees and administration staff see the test:
whether it is construct and content valid or not. This will definitely
include debates and discussions about a test; it will involve the teachers’
cooperation and exchange of their ideas and experience.
Another type of validity to be discussed is wash back validity or
backwash. According to Hughes (1989:1) backwash is the effect of testing on
teaching and learning process. It could be both negative and positive.
Hughes believes that if the test is considered to be a significant element,
then preparation to it will occupy the most of the time and other teaching
and learning activities will be ignored. As the author of the paper is
concerned this is already a habitual situation in the schools of our
country, for our teachers are faced with the centralised exams and
everything they have to do is to prepare their students to them. Thus, the
teacher starts concentrating purely on the material that could be
encountered in the exam papers alluding to the examples taken from the past
exams. Therefore, numerous interesting activities are left behind; the
teachers are concerned just with the result and forget about different
techniques that could be introduced and later used by their students to
make the process of dealing with the exam tasks easier, such as guessing
form the context, applying schemata, etc.
The problem arises here when the objectives of the course done during the
study year differ from the objectives of the test. As a result we will have
a negative backwash, e.g. the students were taught to write a review of a
film, but during the test they are asked to write a letter of complaint.
However, unfortunately, the teacher has not planned and taught that.
Often a negative backwash may be caused by inappropriate test design.
Hughes further in his book speaks about multiple-choice activities that are
designed to check writing skills of the students. The author of the paper
is very confused by that, for it is unimaginable how writing an essay could
be tested with the help of multiple choices. Testing essay the teacher
first of all is interested in the students’ ability to apply their ideas in
writing, how it has been done, what language has been used, whether the
ideas are supported and discussed, etc. At this point multiple-choice
technique is highly inappropriate.
Notwithstanding, according to Hughes apart form negative side of the
backwash there is the positive backwash as well. It could be the creation
of an entirely new course designed especially for the students to make them
pass their final exams. The test given in a form of final exams imposes the
teacher to re-organise the course, choose appropriate books and activities
to achieve the set goal: pass the exam. Further, he emphasises the
importance of partnership between teaching and testing. Teaching should
meet the needs of testing. It could be understand in the following way that
teaching should correspond the demands of the test. However, it is a rather
complicated work, for according to the knowledge of the author of the paper
the teachers in our schools are not supplied with specially designed
materials that could assist them in their preparation the students to the
exams. The teachers are just given vague instructions and are free to act
on their own.
The last type that could be discussed is criterion-related validity. Weir
(1990:22.) assumes that it is connected with test scores link between two
different performances of the same test: either older established test or
future criterion performance. The author of the paper considers that this
type of validity is closely connected with criterion and evaluation the
Страницы: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15