A glossary of data and evaluation terms.


Activities – In evaluation, activities are the tasks to be done to achieve the project outcome.

Aggregate – Aggregate information or charts contain high-level information summarised for groups of data units. They do not tend to reveal personally identifiable information, and may instead provide an overview of examination results, ages, value-added information or statistical information relating to communities.


Bar charts (also known as column charts) – A graph containing horizontal or vertical bars that represent variables.

Baseline – Initial collection of data which serves as a basis for comparison with subsequently acquired data.

Benefit cost ratio – A way to show the relationship between benefits and costs. A benefit cost ratio of 1.5 means that, for every $1 invested in the program, $1.50 worth of benefit is produced.

Bias (statistics) – A systematic distortion of results away from the true value.

Boxplots (or box and whisker plots) – A type of chart that may contain information on quartiles, mean, median, minimum and maximum.

Bubble charts – A chart that plots data in the form of bubbles, the size of which reflects a 3rd dimension of the data (for example, volume of students within a NAPLAN band).


Causal – Evidence that an effect is the direct consequence of an intervention.

Cognitive bias – When the way someone thinks affects their judgement or decision-making.

Column charts – Refer to bar charts.

Confidence interval – A range of values in which it is estimated the true value lies.

Control group – The group of participants in a study that does not receive the intervention or treatment. This group is then compared to a 'intervention' or 'treatment' group, who has received the intervention of treatment. This allows the researcher to test whether the treatment is having an impact.

Correlation – A measure of the strength of the relationship between two variables (expressed as a number). A correlation between two variables does not necessarily imply that one causes the other. For example, the strength of the relationship between attendance at school and achievement.

Cost benefit analysis – A process of measuring whether the benefits of something are worth the cost of implementing it. It is most useful when the benefits can be reasonably quantified in terms of money. Also refer to Benefit cost ratio.

Cost effectiveness analysis – A process to compare the relative costs and effectiveness of different programs, when the benefits cannot be easily quantified in monetary terms.


Data – Measurements or observations that are collected as a source of information.

Data item
– A characteristic of a data unit which can be measured or classified (such as gender or exam scores). It is also known as a variable, because the characteristics may vary.

Data unit – A single entity (such as a student, staff member or school). It may also be referred to as a data record, case or a row in a spreadsheet.

Dataset – A collection of observations.

Denominator – The number on the bottom of a fraction (for example, the 2 in ½). It is important to define what the denominator is when calculating percentages.

Design – The method by which intervention and control groups are determined.

Dispersion (or distribution) – The spread or distribution of values for a numerical data item (for example, the distribution of exam scores across a class) or the distribution of data units across categories (for example, percentage of students across NAPLAN bands).


Economic evaluation
– An evaluation that measures the economic costs and benefits of a program.

Effect – A difference in the value of one variable (often an outcome you are interested in) that is associated with a change in one or more other variables.

Effect-size – A commonly-used measure of the strength of a phenomenon. For example, the degree of change in an average student’s performance in response to an intervention.

Empirical evidence – Knowledge acquired by observation or experiment.

Error (data) – A generic term that describes a variety of ways in which data can vary from its true value. Different types of error include sampling error and measurement error.

Estimate – A summary statistic calculated from the data collected (for example, school mean for Year 3 reading scores).

Evaluation – A systematic and objective process to make judgements about the merit or worth of one or more programs, usually in relation to their effectiveness, efficiency and/or appropriateness.

Evaluative thinking – A disciplined approach to inquiry and reflective practice that helps us make sound judgements using good evidence, as a matter of habit.

Evidence base – Collection of robust information and data available on a topic.

Evidence-based practice
– Teaching strategies and behaviours that are based on and backed up by the best available evidence.

Evidence hierarchy – A tool to classify evidence according to its quality.

Expert opinion
– The views of a person generally considered to be very knowledgeable in a particular field.

External validity
– The extent to which the results of a study are transferable to other students, other schools, or other countries.


Formative evaluation
– An evaluation that takes place during a program, or while a program is developing, focused on improvement.


Gold standard – The highest level of quality on our evidence hierarchy.

Granular – Granular data/charts contain information on individuals which may include identifiable information.


Hierarchy of evidence – see evidence hierarchy.


Index (or indexed number) – The numeric rank of a percentile.

Inputs – In evaluation, inputs are the financial, human and material resources provided for a program.

Internal validity
– The rigour with which a study is conducted, and the extent to which the designers of a study have taken into account alternative explanations for any causal relationships they explore.

Interquartile range – The calculated difference between the upper quartiles and lower quartile, describing the range of the middle 50% of values when ordered from lowest to highest.

– A program, product, practice, or policy aimed at improving outcomes.

Intervention group – See 'treatment group'.


Line chart – A chart which displays information as a series of data points called ‘markers’ connected by straight lines.

Logic model
– A design tool that presents the layout of a program in a diagram, to help get a clear line of sight between needs, inputs, activities and outcomes.


Measures of central tendency – Statistical ways to represent the centre of a distribution of values. They are:

mean – (also known as the arithmetic average) is the sum of the values of each observation in a dataset, divided by the number of observations.

median – the middle value in distribution when the values are arranged in ascending or descending order. The median is the 50th percentile.

mode– a value that appears most often in a dataset.

– A synthesis of multiple other studies and usually refers to the process its authors undertook to contrast and combine the results from these studies. The purpose of a meta-analysis is to identify patterns that only come to light by looking at multiple study results. Provided the authors adhere to strict guidelines when choosing studies to include, a meta-analysis of multiple randomised controlled trials, for example, would be more reliable than a single randomised controlled trial.

Metadata – The information that defines and describes data. For example, the definition and description of a population, the source of the data, or the methodology used in a study.

Methodology – The way in which a study has been designed.


Negative effects – Evidence that an intervention harmed participants’ outcomes relative to doing nothing.

Numerator– The number on the top of a fraction (for example, the 1 in ½). It is important to define what the numerator is when calculating percentages.

Numerical data– Data measured or identified on a numeric scale. The primary methods of reporting this data are measures of central tendency (mean and median) and dispersion (standard deviation and interquartile range).


Observation – The unit that is being examined in the study. For example, students, schools or classes. A study examining students might note that N=300. This means there were 300 students in the study.

Opportunity cost – The value of the alternative activities that could have been implemented using those resources, if we had taken a different path.

Other evidence – The lowest level of quality on our evidence hierarchy.

Outcome evaluation – An evaluation that assesses whether a program is achieving its intended goals.

Outliers – Extreme or atypical data value/s that are notably different from the rest of the data.


– A number or ratio expressed as a fraction of 100.

Percentile – The value below (or above) which a percentage of data falls. For example, the median is the 50th percentile.

Pie chart – A circular statistical graph, divided into slices to illustrate numerical proportions.

Population – A complete group with at least one characteristic in common.

Positive effects – Evidence that an intervention improved participants’ outcomes relative to doing nothing.

Pre-post comparison – Differences between the performances of an intervention group before and after the intervention takes place.

Process evaluation
– An evaluation that focuses on how, and how well a program is implemented.


Qualitative data – Non-numerical data. It may come from open-ended questions, observations, work samples, pictures, audio or other sources.

Quantitative data – Any information that can be reduced to a set of numbers, for example, where something is counted, measured or assessed.

Quartiles – The values that divide a list of ordered values into four groups of equal size.

Quasi-experiment – A study that compares the outcomes of an intervention group with the outcomes of a control group that has not been chosen through randomisation. For example, a comparison of students in an intervention group at one school with a control group comprising students in neighbouring schools who have similar demographic characteristics (e.g. age, gender, race, socioeconomic status) and educational achievement levels.


Radar chart – A graphical representation of data in the form of a web-type chart. Each spoke represents a different data item (variable).

Randomised controlled trial (RCT)
– A study that measures the effect of an intervention by randomly assigning individuals to a treatment group or a control group and then compares the achievement of the groups over time. These are widely considered to be the gold standard of evidence.

Reliability – The extent to which an experiment, test, or any measuring procedure yields the same result on repeated trials.


Sample – A subset of a larger group.

Sampling – The process of selecting a suitable portion (or sample) of respondents, either to represent a larger population, or to represent the group of people you are trying to learn about.

Sankey diagram – A specific type of flow diagram, in which the width of the arms are proportional to the size of the flow quantity.

Scatter plot – A type of chart which plots data points based on their values against two numeric variables.

Scope – The boundaries on what will and what will not be included in an evaluation.

– The method that is used to decide which people in the population will be included in the study.

Selection bias
– Error that can arise when allocating individuals to control and intervention groups.

Self-report data
– An individual's own report of their experiences, opinions, attitudes and motivations.

Silver standard – Middle level of quality on our evidence hierarchy.

Standard error – An estimate of the error around a mean or other statistical estimate. It is calculated from the standard deviation of the estimate and the number of data units (records) on which the estimate is based.

Standard deviation – A measure of variation across observations in a sample.

Statistical significance
– A finding that the results are likely to be due to a real difference rather than chance.

Summative evaluation – An evaluation to provide information about the worth of a program when it is complete, or has been operating for enough time to show reliable results.

Synthesis – A summary of many other studies.


Theoretical saturation – A stage of data analysis where further analysis of additional data would provide no new insight into the issues being investigated.

Treatment group (also known as intervention group) – The group of participants in a study that receives special treatment or an intervention. This group is then compared to a ‘control’ group, who has not received the intervention or treatment. This allows the researcher to test whether the treatment is having an impact.

Triangulation – Using multiple sources of data to more fully understand the evidence.

True value – The value that is obtained if a study has been conducted without any errors or biases.


Validity – The degree to which a study accurately reflects or assesses the specific concept that the researcher is attempting to measure.

Value added – The contribution over and above other explanatory variables.

Variable – A characteristic, number, or quantity that can be measured or counted.

Variance – The square of the standard deviation. It measures the spread or dispersion of the data and is an indicator of the average deviation from the mean.


  • CESE
  • Educational data

Business Unit:

  • Centre for Education Statistics and Evaluation
Return to top of page Back to top