|
|
Interactive Assessment OverviewAn interactive test is one where interaction takes place between the examinee and the examiner. This can be direct face to face interaction as described by Vygotsky (1986) in his procedure for determining the Zone of Proximal Development, or by contemporary researchers such as Guthke (1993) or Hamers and Resing (1993) in outlining their Learning Potential Tests. Active Arithmetic Interactive Assessment Software is designed to work on the same principles, but the interaction is with a computer. Active Math ComponentsActive Math focuses on the Number Strand of the Western Australian Math Curriculum. Students select an activity from an extensive tabbed menu, type in their name (or select it from a drop-down list) and click a large button to start the test. When the test begins, a cursor blinks in the answer field in readiness for the student to enter the answer from the keyboard. A student who knows the answer enters it, and either clicks the Check Answer button with the mouse or presses the Return key on the keyboard. A student who does not know the answer might click the Help button, and receive a lesson relating to the question on the screen. Alternatively, they might enter a wrong answer, and then click Check Answer. The computer then clicks the Help button on behalf of the student, thus forcing a lesson. After the lesson, the student is given a second attempt at the same question. Item difficulty, with predefined boundaries, is matched to the skill level of the student. A series of correct answers raises the difficulty level. An incorrect answer lowers the difficulty level. The computer records the name and age of the student sitting the test, the heading of the test, the components of each item, the response to each item, the time taken to respond to each item, and the time taken in lessons if the Help button is used. In this study, item responses and the use of help were used to calculate raw scores. Scoring rates were computed by dividing raw scores by the total time taken. Times were rounded to the nearest second, and the scoring rate was reported as the number of correct answers submitted per minute. A practical application of the softwareThe eight links given below are to charts showing distribution curves for the raw scores and scoring rates for the eight subtests of the interactive test used in a study which applied the Active Math software. In the study primary school students aged 8 to 11 were invited to use the software, and their raw scores and scoring rates were recorded. The results have been written up formally elsewhere. This is just a sketch for a casual reader, who is assumed to be familiar with the basic principles of education psychometrics.Each chart shows a similar and rather interesting pattern. The raw score distributions are all negatively skewed. This would normally indicate a test too easy for the population being examined. The scoring rate distributions all show a positive skew. A positive skew in a raw score distribution would indicate a test too difficult for the population being examined. It is therefore quite interesting that tests which generate a negative skew in their raw score distribution generate a positive skew in their scoring rate distribution. The reason for this apparent contradiction lies in the differences between variability of the raw scores, and that of the scoring rates. Scoring rates are highly variable. Raw scores are not. For a single item the raw score is a dichotomous or Boolean variable: it is either 1 or 0. If a single item is used to assess the abilities of a group of children, the raw score will only divide the children into two groups: those who knew the answer, and those who did not. The scoring rate on the other hand will generate a whole spectrum of results for the children who entered a correct answer, ranging from those who did so very quickly to those who did so slowly. (This idea was discussed more thoroughly in a paper presented to a forum of the Western Australian Institute of Education Research WAIER). By way of illustration, consider an item which requires that 12 be added to 24. A Year 3 student might do the sum on his or her fingers, and take a minute or two to perform the calculation. A Year 5 student might grab a scrap of paper to rewrite the question in vertical format, and come up with an answer in 10 or 20 seconds. A Year 7 student might recognize 24 as the second item in the 12 times table and key in the answer in a couple of seconds. All three students arrive at the answer. The raw score for all three students on the item is 1, and if this is all that is available, as it is in most written tests, then the three students are indistinguishable. Yet the scoring rate generates a clear ordering. The Year 3 student achieved a scoring rate 0.5-1 correct answers per minute (capm). The Year 5 student was in the range of 3-6 capm. The Year 7 student achieved 30 capm. The traditional requirement for a test to include items with a broad range of difficulty levels, and for the difficulty range of a test to match the ability range of the students sitting the test, derives from the limited variability. If you are restricted to a measuring device with a Boolean scale, then you will need a very large array of such devices to make accurate measurement and you need some which closely match what you are trying to measure. By way of physical analogy, if you wanted to measure the height of a child and your only measuring device is a collection of sticks with length labels, then you will need an array of sticks whose length is close to that of the child in order to make an accurate measurement. A collection of very short sticks would be of limited use. You could only say the child is taller than all of the sticks. A collection of very long sticks would also be of limited use. You could only say the child is shorter than all the sticks. To estimate the height of the child you need a collection of sticks some of which are longer than the child and some of which are shorter. This restriction does not apply to the scoring rate. If the raw score on a single item is like a measuring stick (with no graduations), the scoring rate is more like a tape measure. When the scoring rate is used to estimate the ability of a group of students, there is no need for the difficulty of the items to match the ability of the students. In fact there is an advantage if all the items are within the ability range of the students, because the continuity of the scoring rate scales stops when an item is answered incorrectly. At an intuitive level the time taken to compute an answer correctly seems more interesting than the time taken to enter an incorrect answer; and for this study scoring rate has been defined as the number of correct answers entered per minute. The scoring rate is zero for incorrect items, and no distinction is made between students on items answered incorrectly. Because of the variability of scoring rate, if scoring rate can be used as an indicator of ability, we should expect it to be a more reliable metric than raw scores; and because scoring rate loses its variability on items which are answered incorrectly, we should expect the best results on tests where the items are well within the ability range of participating students. Details included in the linked charts confirm this to be the case. Cronbach's alpha for the scoring rate was .96 for the addition subtest, which the chart suggests is well within the ability range of the participating students. Cronbach's alpha for the scoring rate for the subtraction subtest was also .96. Cronbach's alpha for the scoring rates from the three multiplication subtests was .94, .96 and .96 respectively. Cronbach's alpha for the scoring rates from the three Division subtests was .95, .96 and .92. Distribution CurvesExplanatory NoteThese charts show distributions of raw scores calculated by two methods. In Method 1 we ignored the use of the IT help facility. That is to say we gave full credit for all correct answers, regardless of whether or not the examinee had needed help to arrive at the answer. In Method 2 we did not allow correct answers submitted after the voluntary use of help. When an examinee makes a mistake the IT forces the use of help before offering a second attempt at the same question. Method 2 allows correct answers given on the second attempt. Try out an interactive testSorry this has been temporarily removed for revision (as at 5 September 2008). It will be reposted shortly. |