Rasch
Home ]

The Home of Tests that Teach

Title

 

Looking at data from an interactive arithmetic test from the perspective of a probabilistic model

 

Introduction

 

Since the publication of Probabilistic Models (Rasch 1960) a growing number of researchers have been advocating the use of probabilistic or stochastic models in the analysis of data from psychometric tests (Sijtsma 1993).  The focus of interest of such researchers is usually static data from untimed static tests.  The analysis of data from speeded tests usually follows the traditional deterministic methodology (Furneaux, 1960; White, 1982; Roskam, 1987).

 

This paper will refer to the chapters in Probabilistic Models  which deal with the rate at which events occur; it will examine whether similar analysis might be applied to the scoring rate from an interactive arithmetic test (IT); and it will follow the reasoning of Rasch in testing whether actual data from the IT fits the model designed by him for the analysis of timed oral reading rates.

 

Theory

 

A problem in telephony

 

Rasch (1960) introduces the model by describing a problem in telephony.  At the beginning of the century, when the first telephone exchanges were being constructed, designers wanted a model which would predict within a certain confidence range, the maximum number of calls which would be made during any period.  According to Rasch, it was a Danish mathematician, AK Erlang, who first devised such a model in 1917.

 

The model assumes that the initiation of calls is random and mutually independent.  It defines the number of calls initiated in any period as the calls intensity.  The model divides time into very short periods; so short that only one call would ever be made during one such period.  During N such periods, the calls intensity is assumed to remain constant. 

 

The probability of a call being initiated during one of the short time periods is defined as q.  The probability of a particular number of calls, a, being initiated during N short periods is given by the binomial distribution:

 

 

Rasch argues that since the intervals are very short, and q is very low, the binomial distribution may be replaced by the Poisson distribution:

 

 

Rasch defines the total time taken during N short periods as t, during which average number of calls initiated will be Nq.  Then from the definition of Calls Intensity l:

 

                                        l      =       Nq/t

or

                                        lt     =       Nq

 

Substituting lt for Nq in the formula for the Poisson distribution, gives:

 

 

So the probability of any particular number of calls being initiated during time t becomes a function of the average call intensity l and the time t, while the length and number of short intervals have dropped out of the equation.

 

Application of the model to a speeded oral reading test

 

Rasch applies essentially the same model to the oral reading of a text of 200 words.  He assumes the text to be homogenous, in the sense that difficult and easy words are distributed evenly through the text.  He also assumes that students read at the same average speed throughout; they do not slow down as a result of tiredness towards the end.  Then the probability that the student reads a words in time t is given by the formula for the Poisson distribution, and is a function of the average words per minute or reading rate l, and t.

 

Application of the model to an interactive arithmetic test

 

A detailed description of the interactive arithmetic test (IT) which forms the focus of this paper is not necessary to the argument.  All that needs to be said is that is possible to design a test which generates a homogeneous stream of questions and to find a student who tackles the questions in a consistent way.  There is, however, one fundamental difference between the metric of the test and Rasch's calls intensity and reading rate.  The calls intensity relates to the initiation of a call; nothing is said about the content of the call.  The reading rate relates to the reading of a word; nothing is said about the accuracy with which the word is read.  The metric of the IT is the scoring rate; this relates to the delivery of a correct answer.  It is a momentary event, just like the initiation of a call or the reading of a word, but it might be regarded as the compound of two other events: tackling a question, and delivering the correct answer to a question tackled.

 

There are two reasons why this should not affect the argument.  In the first place, for many students using the IT, it took on the characteristics of what was once called a speed test (Furneaux, 1960).  For these students a correct answer was submitted for almost every question attempted, so the scoring rate was almost the same as the answer submission rate.  In the second place, and more importantly, the two events may quite correctly be regarded as a single event which reflects a compound probability.  In any short interval of time there is a probability that a question may by answered and for every question answered, there is a probability that the answer will be correct.  Combining the two will give the probability that a correct answer will be submitted in any short interval of time.

 

So we might imagine an IT comprising 50 homogeneous questions being tackled by a student in a consistent way.  We may assume that students work at the same average speed throughout and that their accuracy does not change during the test.  Then the probability that the student submits a correct answers in time t is given by the formula for the Poisson distribution, and is a function of the average scoring rate l, and t.

 

 

Personal and test parameters

 

Suppose two students, A and B, sit the same test and suppose student A scores at twice the rate of student B in a test labelled Test 1, then:

 

                                        lA1   =       2lB1

 

Now suppose the same students sit a whole series of tests and that student A scores at twice the rate of student B in every subtest.  We can then say generally:

 

                                        lAi    =       2lBi

Dividing:

                                lAi/lA1      =       lBi/lB1

 

In other words the ratio of scoring rates on the two subtests has a value, which is independent of the students used to compute the ratio.  If v is any student:

 

                                lvi/lv1       =       ei

or

                                        lvi    =       lv1.ei

 

Rasch defines the term ei as the relative easiness of the Test i, and in doing so he is implicitly defining Test 1 as the base or benchmark test.  He also defines lv1, the scoring rate of student v in the benchmark test, as xv, the ability of student v. He then defines:

 

                                        di      =       1/ei

 

as the difficulty of the test, so:

 

                                        lvi    =       xv/di

 

Thus the scoring rate of student v in test i is proportional to the ability of student v and inversely proportional to the difficulty of test i, which is in accordance with the Rasch property of additivity.

 

Testing data against the model

 

At an informal level Rasch plots the reading rates measured from one test (i) against those from another (j) and observes a dispersion of points around a straight line which passes through the origin.  Rasch says very little about the parameters of this chart, which is surprising, because they relate directly to the preceding discussion.  For example the slope of the straight line is the ratio of the modal reading rates in the two tests.  If lvi is the reading rate of student v in test i and lvj that of student v in test j then since lvi/lvj has been defined as a constant the (x,y) coordinates of any point on the line will be given by:

 

                                        y       =       lvi/lvj x

or

                                        y       =       ei/ej x

or

                                        y       =       dj/di x

 

In other words the slope of the line is given by the relative easiness of test i or the reciprocal of its relative difficulty.  This line defines the relativity between the test parameters.  The actual reading rates will be distributed around this line, but from the definitions given above, the points along the line should be regarded as the modal intersection of two perpendicular distributions.  Associated with any reading rate in test j there should be a distribution of reading rates in test i, but the modal reading rate should lie on the line.  Similarly, associated with any reading rate in test i there should be a distribution of reading rates in test j, but again the modal scoring rate should lie on the line.

 

In his formal analysis Rasch does not set out to define the distribution of points in this chart.  Rather he divides the data into two sets for analysis, and discards a third.  He uses the data from the students who did not finish either test within the time limit and from those who finished both test within the time limit.  He discards the data from those who finished in one but not the other.  For those who finished the test he focuses on the time taken.

 

For the purpose of the present paper we shall focus on the conclusions and implications of Rasch's analysis of the results of the students who completed the reading test.  This forces us to switch our focus from the scoring rate in the IT to the question answering rate, and to convert this into a completion time.  The details of the mathematical argument will be omitted.  For anyone interested it is given in Chapter IX Sections 2-5 of the original text (Rasch 1960).

 

To begin, we shall consider the graphical implications of converting question answering rate data into completion time data.  For any question answering rate l the completion time t will be given by:

 

                                        t       =       N/l

 

where N is the number of questions in the test.  If both tests under analysis have the same number of words, the modal line will be given by:

 

                                        y       =       lvj/lvi x

or

                                        y       =       di/dj x

 

This is another straight line passing through the origin, the slope of which is the reciprocal of that comparing the question answering rates of the same tests.

 

Rasch goes on to take the natural log of the completion times.  Here each point on the chart (lvj,lvj) will be replaced by (logtvj,logtvi) and the modal line will be given by:

 

                                        y       =       (log(N/lvi)/log(N/lvj)) x

 

From the mathematical argument it can be shown that the vertical and horizontal distribution of data points around this line approximates to the normal distribution and that the standard deviation is 1/√N.  From this it follows that 95% of the actual data points should be within 1.96/√N of the line.  This hypothesis is very easily tested with real data.

 

 

Method

 

Participants

 

The sample comprised 331 primary school children from Years 3 through 7.  The ages of the children ranged from 7 to 12 years at the start of the study, which ran for the whole of the 1996 (Australian) school year.  Whole mixed classes were used to ensure a random selection of children and an even gender distribution.  There were 165 girls in the sample and 166 boys.  The school years were less evenly distributed, with 109 children from Year 3, 63 from Year 4, 103 from Year 5, 42 from Year 6 and 14 from Year 7.

 


Settings

 

Of five participating schools two of the were in the Perth metropolitan area; one located in a high socio-economic status (SES) area and the other in a low to middle SES area.  Three were from the regional city of Kalgoorlie-Boulder, with one located in a high SES area and the other two in low to middle SES areas.

 

In each participating class the IT was administered individually using a single Macintosh computer located either at the back of the classroom, or in a utility area nearby.  In either case, students sitting the IT were submitted to some level of distraction.  Sometimes a break would interrupt a test and the student would continue after the break.  This informality was tolerated in order to maintain goodwill with the teachers and to encourage the maximum possible volume of data from each participating school.  It should be taken into account when interpreting the results.

 

Instrumentation

 

The IT was created specially for this project using Hypercard and saved as a stand alone application.  It generates a stream of questions of preset standard, and both records and times student responses to the questions.  The IT comprises eight tests, designed to measure fluency in elementary addition, subtraction, multiplication and division.  To save time, data from just the first two tests is discussed in the current paper.

 

Procedure

 

The researcher was introduced to the class and explained in general terms the purpose and mechanics of the IT.  The researcher then explained in detail to one or two children how to use the IT.  A system of peer tuition was then used to teach the rest.  From this point the IT was administered by the class teacher.  The researcher returned once a week to discuss any problems or possible design improvements with the class teacher, and to collect data.  In some schools the IT was used as part of regular class activities for the entire academic year.  In others a project was run for just a few weeks.

 

 

Results and Discussion

 

Time taken to complete the test

 

We begin by assessing the data from two sittings of the addition test.  In this case both N and l should be the same for both tests.  The modal line should be the line of equality through the origin, and 95% of the data should lie between the tram lines given by:

 

y = x ± 1.96/√N

 

Figure 1 shows a scatter for 165 data points collected from two sittings of the addition subtest of the IT.  Analysis of the table behind the chart reveals that 4% of the data points lie above the upper tram line and 16% lie below the lower one.  Given the informality of the study described above, this represents a fairly good fit to the model.  Some of the points outside the tramlines are clearly the result of distractions during one of the sittings, but they were left in to preserve the integrity of the data set.

 

Figure 1     Completion times for two sittings of the addition test

 

 

Figure 2 shows a scatter for 125 data points collected from two sittings of the Subtraction subtest of the IT.  The fit here is excellent.  Just 11 out of the 125 participants (9%) fell below the lower line and 3 (2%) came out above the upper line.  Subtraction was second test in group so there was greater familiarization with test format and perhaps a lower tendency to be distracted.  This test also included a lower proportion of the youngest children who are more easily distracted, and because they are slower, more likely to be interrupted for a long break such as lunch.

 

Figure 2     Completion times for two sittings of the subtraction test

 

 

 

Conclusion

 

The purpose of this paper was to argue that a mathematical model developed in the early part of the current century originally to assist in the design of telephone exchanges and adapted by Rasch for the interpretation of data from timed oral reading tests, has applications in the interpretation of data from an interactive arithmetic test (IT) developed and tested over the last two years.  In the theory section it was shown that the scoring rate from the IT should be expected to fit the Rasch model, and in the practical section it was shown that in spite of relatively informal conditions, the completion times primary school children in two sittings of two tests showed a reasonable fit to the model.

 

This lays the way open for a formal analysis of scoring rate data in accordance with the Rasch model and the development of a difficulty scale for the components of the IT, using data from the performance of primary school students timed to a level of detail witch was not possible without the computer technology applied in the current study.

 

 

References

 

Furneaux WD (1960) Intellectual abilities and problem solving behaviour Handbook of abnormal psychology New York (Ed.Eysenick)

 

Rasch G (1960) Probabilistic models for some intelligence and attainment tests University of Chicago Press, Chicago

 

Roskam EE (1987) Towards a psychometric theory of intelligence Progress in mathematical Psychology North Holland (Ed. Roskam & Suck)

 

Sijtsma K (1993) Current trends in theories and assessment of intelligence Learning Potential Assessment: Theoretical methodological and practical issues (Hamers JHM, Sijtsma K, Ruijssenaars JM Eds) Swets & Zeitlinger, Amsterdam N.L.

 

White PO (1982) Some major components of intelligence A model for intelligence Springer Verlag, Berlin (Ed.Eysenick)

 

 

 

To reach the Softway InterActive Education Home Page
Click Here