Identifying potential strength and weakness in key learning areas using data from NAPLAN tests
This report was originally published 10 May 2019.
Federal, state and territory education ministers have agreed that NAPLAN will move online from 2018, with online tests also adopting an adaptive testing format rather than the traditional static testing format. Adaptive testing means that the test automatically adapts to a student’s performance and presents questions that are appropriate for the student’s achievement level. Adaptive testing provides more reliable and accurate information about high and low ability students as items can be better targeted to challenge and engage students throughout tests, soliciting performance that more accurately reflect students’ underlying abilities. Once all schools have transitioned to online testing, it is also expected that the results will be delivered to schools quicker than the current paper tests, which means teachers potentially have more relevant test data to tailor their teaching specifically to student needs (ACARA, 2016).
The change of testing mode and format increases the complexity involved in teachers and school leaders appropriately using and interpreting NAPLAN data. There are three main challenges:
- When using results from traditional paper-based tests, NSW teachers and school leaders often gauge the strengths and weaknesses in student performance by comparing the proportion of correct answers for a test item for a school or a class to the average in the state/ system, or to that in similar schools. When using results from the adaptive online tests, not all students are exposed to an item, meaning that the proportion of correct answers for an individual item will be less straightforward to interpret.
- Online testing will result in each item being exposed to fewer students in a class; thus, focusing on students’ performance on an individual item is likely to produce less reliable information than previous static tests.
- Item content for the majority of online test items will not be released to teachers and school leaders.1 Lack of visibility to the item content prevents deep analysis of item performance (e.g. distractor analysis2) that is traditionally undertaken by teachers in NSW when they receive NAPLAN data.
This paper details a new method of using NAPLAN test item data to inform teaching and learning. While new for NAPLAN, this method is similar to that used for analysing student performance patterns in Programme for International Student Assessment (PISA) (Yildirim, Yildirim & Verhelst, 2014).
The method represents a shift in the focus of test analysis from an individual item to a learning area or a skillset that is commonly assessed by a group of items. For each test (e.g. Year 3 reading, Year 5 numeracy), the process entails:
- first grouping all test items by the skillsets (or learning areas) the items assess, and then
- examining how students perform on a group of items assessing one common skillset, relative to students’ overall performance in the domain.
Once we have identified particular skillsets where the performance of a group of students on the skillsets is better or worse than expected from robust measurement models, this information can then be provided to schools to help them identify teaching program strengths or weaknesses. By shifting the analysis focus from individual students on individual items to performance from a group of students on a set of items, the insight gained from the analysis will be more reliable and accurate. This analysis is referred to below as ‘skillset analysis’.
We use a generalised differential item functioning analysis approach to identify patterns of interest.
The analysis involves conducting Rasch modelling first to obtain person and item parameters, and then using a statistical process to evaluate whether a collection of responses (i.e. item scores) achieved by a group of students on a set of items assessing a common skillset is as expected or not based on this group of students’ overall ability estimates for the test domain.
Rasch modelling (Rasch, 1960, 1980) is selected as it is underpinned by robust measurement principles and is commonly used for evaluating test validity. It is also used by the Australian Curriculum, Assessment and Reporting Authority (ACARA) for item evaluation and scale score computation for NAPLAN tests.
This report explains the methodology, and the findings after applying this methodology, using writing test data as an example. Writing is an ideal domain to start with since NAPLAN writing is assessed using an analytic rubric across ten writing traits (or skillsets - see appendix A for the marking rubric used for NAPLAN writing). Each writing trait can readily be perceived as a conceptually distinct aspect of writing skill, and together the ten traits define the essential skills and knowledge required to produce an effective piece of writing.
The following sections illustrate how we may identify the areas or skillsets where students perform unexpectedly better or worse based on their overall performance in the whole domain.
Whilst this report uses writing as an example to illustrate the application of the proposed method, the method can be adapted to analyse test data from other domains such as reading and numeracy. Where appropriate, the report makes references to the adaption required in the illustrated methodology for other domains.
1 This is due to a heightened level of test security because online adaptive testing requires a much larger item bank than paper tests and therefore questions are more likely to be used across years. However, it is noted that the link between each item and the Australian Curriculum content areas will be provided to education systems, making the grouping of items by skillsets, and thus the application of the proposed methodology documented in this report, possible.
2 Distractor analysis is the analysis of response patterns for incorrect options, for each item. It includes the investigation of possible reasons why students chose an incorrect answer.