# Transcript of Stage 5 statistics and probability

Speaker 1: All right good afternoon ladies and gentlemen. Welcome to the fourth and the last in this series of the SyllabusPLUS Adobe Connect Series. We will be gathering some more information from you about what your needs are. It's nice to see that a lot of you have completed the poll questions in the lobby, and we'll certainly do our best to accommodate your desires, wishes, there in the coming two terms.

Today's session. Are you there Nicola? You've got your video on? Today's session is on statistics and probability and of course if you want to interact with the other participants then you can do so through the chat function. So it's over to you Nicola.

Nicola: Welcome everyone. Lovely to have you with us today. We're going to be looking stage 5 statistics and probability. And today is our final session for this small series on Adobe Connect, but we'll be hoping to get your feedback today from the polls, and maybe prepare a few more for later on in the year. Okay, let's get going.

Like I said today we'll be looking at stage 5 statistics and probability strand. We'll be looking at single variable data analysis, identifying skewed data, comparing datasets, and statistical reports in the media, developing critical and creative thinking and reasoning, looking at bivariate data analysis, investigations, and relationships between two statistical variables, and their relationship over time.

All right, before we begin we'll have a look at the continuum of learning in Maths K-10 for statistics and probability. And as you can see we have at stage 5.1 single variable data analysis going right across to stage 5.3, bivariate data analysis starts at stage 5.2 and continues in to stage 5.3, and probability is in stage 5.1 and 5.2. And the difference this time with probability is that we're actually calculating probabilities from Venn diagrams and two-way tables, which is just a little bit different from before.

Okay, key ideas. This gives you a little bit more information in the outcome does in the sense of what you'll actually be teaching in a little bit more detail. Single variable data analysis. They'll be constructing and interpreting back-to-back stem-and-leaf plots, describing data using the terms including skewed, symmetric, bi-modal. Comparing two sets of numerical data in a display using mean, median, and range. Interpreting and critically evaluating reports in the media and elsewhere. That link claims ... Hang on, I'll just stop.

Chris: [inaudible 03:22] I think there's a problem with [inaudible 00:03:20] page, okay. It's frozen, I think, okay.

Nicola: Me too.

Chris: All right, it may be a bit of a problem just going by the chat, okay [I'll need 00:03:45] presentation.

Nicola: Yes.

Chris: Right. Okay, I think that's in there.

Nicola: Is it running okay now?

Chris: You should now be able to see, yes everything's okay now.

Nicola: We're all good, okay.

All right, back to our continuum of key ideas. So with Stage 5.1, the new content that we'll be looking at that we haven't currently been teaching with our current syllabus is describing data using skewed, symmetrical and bi-modal. At the Stage 5.2 level, it's looking at bivariate data analysis, in particular looking at scatter plots, and in Stage 5.3 for the bivariate data analysis, all of it's new. In probability, we have calculating relative frequencies to estimate probabilities, calculate probabilities from Venn Diagrams and two-way tables, so that's new, in Stage 5.1, and for Stage 5.2 probability, calculate probabilities from simple and compound in two and three steps chance experiments with and without replacement, distinguish between independent and dependent events informally, and calculate probabilities of events where a condition is given that restricts the sample space, as well as critically evaluating conditional statement in chance situations.

Okay, you'll actually see now that in statistics we have outcomes and key ideas at Stage 5.3 level. For your current syllabus it's Stage 5.1 and Stage 5.2, so that's a little bit different as well.

All right, so let's move on into a little bit more detail. Currently, in Stage 5.1 mathematics, when we're looking at statistics, we ask students to construct cumulative frequency tables, cumulative frequency histograms and polygons, and to find the median. They group data into class intervals and construct a frequency table for group data, and constructing histogram for group data finding the mean, using the class centre and finding the modal class.

It's quite different now you'll see in the current syllabus. What they're asked to do now is quite different, we don't have the group data anymore. The first teaching point is identifying everyday questions and issues involving at least one numerical, and at least one categorical variable and collect data directly from secondary sources. Students are asked to construct back-to-back stem-and-leaf plots and histograms, and describe data using terms including 'skewed', 'symmetrical' and 'bi-modal'.

Now students are required to recognise the general shape and lack of symmetry in skewed distributions. They still construct frequency histograms and polygons from a frequency distribution table, they have to use the terms 'positively skewed', 'negatively skewed', 'symmetric', 'bimodal' to describe the shape of the data, and construct back-to-back stem-and-leaf plots to display and compare two like sets of numerical data, and describe the difference in the shapes of the distribution of the two sets of like data.

So it is quite different to what we're currently doing. They calculate and compare means, medians and ranges of two sets of numerical data displayed in back-to-back stem-and-leaf plots, parallel dot plots and histogram, okay, and make comparisons between the two sets by referring to the mean bar, referring to the median bar, referring to the range. They're asked to evaluate statistical reports in the media and other places by linking claims to displays, statistics and representative data.

So we want them to interpret media reports, and advertising that quote various statistics, that could be media ratings, house prices, sports results, environmental data. Analyse graphical displays to recognise features that may have been manipulated to cause a misleading interpretation and/or support a particular point of view. Critically review claims linked to data displays in the media and elsewhere, and consider, informally, the reliability of conclusions from statistical investigations, taking into account issues such as factors that may have masked the results. So, accuracy of measurements taken, and whether the results can be generalised to other situations.

Just before we continue, in your glossary of your syllabus, there's a definition for each of these different types of graphs, so it's really worth going to and just having a look and make sure that, you know, the definition that you know of is quite similar to the one in the syllabus.

Here's an example of a stem-and-leaf plot. You can see that you've got the unit stem here, and then the leaf of the numbers on the right-hand side of the line. So if I grab my pointer, there's your stem unit, and this is your leaf unit.

Okay, with your back-to-back stem-and-leaf plot, it's a method for comparing two data distributions by touching two sets of leaves in the same stem. So you've got the one stem, but then you've got one set of leaves and one set of leaves which represent the data from two different sets that you can compare. So in this case you've got an example of 19 students and this is their pulse rate before exercise and after exercise.

Parallel dot plots, so we're talking about comparing two like sets of data, same scale, by referring to the mean, median and range of numerical data displayed in parallel dot plots. So this is a good example, you've got the one class' quiz scores, the boys' results, the girls' results.

Skewed data. So negatively skewed looks very much like this, positively skewed, symmetrical. So we want students to be able to recognise the shape of the data and be able to describe the data as negative, positively skewed, symmetric etc., bimodal etc. So quite interesting there for kids to be doing that at Stage 5.1 level.

We'll be looking at box plots, or box and whisker plots, which we've been seeing more of, and there's your definition for a box plot and an example.

Parallel box and whisker plot, that means you've got two sets, so you're visually comparing the five number summaries of two or more data sets on the same scale. So as you can see here, this is the pulse rate again, this time the data is displayed as ... There's the pulse before exercise, and then the pulse after exercise. You have your outliers, so I've got one outlier here etc., and then the extreme points, median, etc.

You've got terminology in there like bi-modal data is new, bivariate data is new, bivariate numerical data is new. Bio-modal data is data who's distribution has too modes. Bivariate data is data relating to two variables. For example, the arm spans and heights of 16-year-olds, the sex of primary school kids and their attitude to playing sport. Bivariate numerical data is data relating to two numerical variables, for example height and weight.

Okay, so what are we currently teaching at Stage 5.2, and let's look at what we'll be teaching next year. So currently we're looking at determining upper and lower quartiles for a set of schools, constructing box-and-whisker plots using the median, the upper and lower quartiles and the extreme values, so you have a five number summary basically or "five-point-summary", finding standard deviation for a set of scores, using the mean and standard deviation to compare two sets of data, and comparing the relative merits of measures of spread (range, inter quartile range and standard deviation). Then using the terms "skewed" or "symmetrical" to describe the shape of the data.

So, what we'll be doing from next year, still [inaudible 00:11:48] inter quartiles ranges and quartile ranges for box-and-whisker plots, so it uses quartile and box plots to compare sets of data and evaluate sources of data. Determine the upper and lower extremes, median, upper and lower quartiles for sets of numerical data, etc. Construct and interpret box plots, and use them to compare data sets, so just as before.

Compare shapes of box plots to corresponding histograms and dot plots, so actually got to determine quartiles from data displayed in histograms and dot plots, and use these to draw a box plot to represent the same set of data. Compare the relative merits of a box plot with it's corresponding histogram or dot plot, and identify skewed and symmetrical sets of data displayed in histograms and dot plots, and describe the features of the corresponding box plot for such sets of data.

We'll be investigating reports of surveys in digital media and elsewhere on how data was obtained to estimate population means and medians. Investigate survey data reported in the digital media and elsewhere to critically evaluate the reliability, validity of the source of data and usefulness of the data, and make predictions for a sample that may apply to the whole population.

Okay at Stage 5.2, they'll also be looking at investigating relationships between two statistical variables, including their relationship over time.

So investigate and describe bivariate numerical data where the independent variable is time. So recognise the different between an independent variable and it's dependent variable, distinguish bivariate data from single variable data. Investigate a matter of interest, representing the dependent numerical variable against the independent variable, time, in an appropriate graphical form. So they're gonna determine and explain why line graphs are the most appropriate method for representing data collected over time. Describe changes in the dependent variable over time, e.g. changes in carbon pollution over time. Suggest reasons for changes in the dependent variable over time with reference to relevant world or national events, and you can see that you have the sustainability learning across the curriculum, etc.

The Aborigine [inaudible 00:14:06] Island is critical in creative thinking, your little cogs. You've got your little book here, which is your literacy. The cogs for critical and creative thinking, the leaf for sustainability again, etc. So make sure you're referring to these areas as well with your teaching, with the learning across the curriculum areas and catering for that.

We'll be using scatter plots to investigate and comment on relationships between two numerical values, and you can use digital technology or do it without digital technology. So investigate a matter of interest involving two numerical values, and construct a scatter plot, to determine and comment on the relationship between them. So for example, height vs. arm span, reaction time vs. hours of sleep. Display informally the strength and direction of the relationship between two variables displayed in a scatter plot, so strong positive relationship, weak negative relationship, and you can see now the connexion with the Stage 6 General Mathematics happening here now. Make predictions from a given scatter plot or other graph.

Okay, Stage 5.3 Level. Now previously in our current syllabus we don't have anything listed under Stage 5.3, we teach the Stage 5.1 and 5.2 content. Now we actually have specific outcomes for statistics at Stage 5.3, so use the standard deviation to analyse data. Calculate and interpret the mean and standard deviation of data, and use these to compare data sets. So we want to investigate the meaning and calculation of standard deviation using a small set of data, find the standard deviation of a set of data using technology. Investigate and describe the effect, if any, on the standard deviation of altering all of the data values in the set by operations such as doubling all data values, or adding a constant to all data values. Investigate and describe the effect, if any, on the standard deviation of adding a data value to the set of data, example adding a data value equivalent to the mean, or adding a data more or less than one standard deviation from the mean.

Use the mean and standard deviation to compare two sets of data. So compare and describe the spread of sets of data with the same mean but different standard deviation, compare and describe the spread of sets of data with different means by referring to standard deviation.

So there's quite a bit there, and it continues with investigate the relationship between numerical variables using line of best fit, and explores how data us used to inform decision-making processes.

So here we're going to use technology to actually construct a line of best fit for bivariate numerical data, and investigate the different methods of constructing a line of best fit using digital technologies. Uses lines of best fit to predict what might happen between known data values, and predict what might happen beyond known data values. So interpolation and extrapolation, and compare predictions obtain from different lines of best fit.

Investigate reports of studies in digital media and elsewhere for information on their planning and implementation. So investigate and evaluate the appropriateness of sampling methods and sample size used in reports where statements about a population are based on a sample. And critically review surveys, polls and media reports, and investigate the use of statistics and associated probabilities in shaping decisions made by governments and companies. For example, setting of insurance premiums, the use of demographic data to determine where and when various facilities may be built.

So, to support you in all this, we have a built-in capacity resource on Stage 5 statistics, and this is one of the lesson plans that's there, and we've attached it to the end of the presentation. There is a lot to do in statistics at this level, and it's quite interesting stuff, but it links beautifully to your general mathematics, so it's worth doing well.

In this teaching lesson, we're actually looking at comparing measures of spread, so you're looking at outcome 5.3 MA-5.3 18SP statistics and probability, so use the standard deviation to analyse data. So, in this lesson, the students can compare the relative merits of the range, inter quartile range and standard deviation. They calculate the standard deviation of a set of data, and investigate and describe the effect of the standard deviation of adding or altering the scores.

So you're given the whole lesson plan. What to give students to do, we want them to calculate and interpret the mean and standard deviation of data, and use these to compare data sets. Now this is the resource we're using, it's called 'Measures Of Spread', it's a little learning object, and it's on the next slide, so I'll show it to you in a minute, and they have two worksheets that go with it. And what they're asked to do, and I'll just show you the slide, it looks like this and it's quite nice. You can move the data, create more data, and as you arrange the data, the mean adjusts, the standard deviation adjusts, the inter quartile range adjusts and so does the range and you can reset it. So you drag the dots to create your own dot plot, and it comes out quite nice.

So students are asked to investigate the meaning and calculation of standard deviation using a small set of data, find the standard deviation of a set of data using technology, and investigate and describe the effect, if any, on the standard deviation by adding a value to the set of data and then looking at them mean, mode, etc., or by changing the whole data. So the effect of, for example, altering the data values in the set by doubling, or adding a constant to all data values, so describing that effect as well. And there ask to compare the relative merits of the range, inter quartile range and standard deviation. So students usually work in pairs for this, there are two worksheets and then actually contribute to class discussion, as the teacher talks them through the lesson.

This is the learning object and the link's at the bottom here, so that will take you straight to the learning object through this link, but it's also on the Adobe file that we've attached at the end, so you'll find an attachment there as well.

Okay, so students are asked to actually look at the first worksheet. Have a look at the different distributions, and as you can see, teaching note: The distribution is a symmetrical 'bell' shape. The mode, the median and the mean are therefore equal and are located in the centre of the distribution, having those discussions.

Going on, what do you expect will happen in the measures of spread as the scores become more tightly clustered around the mean in a symmetrical distribution? And you can have all these conversations, but while you're having these conversations if you actually move the dots, the figures come up to give you the answer, so it is quite an interesting one, it's a good one to do.

There's the actual worksheet that kids look at, and you can see the questions here. How would you describe the shape of the distribution? Is it skewed to the left, right, symmetrical, bimodal? Determine the mean, the median, the mode of the data. Calculate the range and inter quartile range of the data set.

Then we ask teachers to actually create these different data sets that are in front of you now by manoeuvring the data, and then the students have to calculate the mean and standard deviation, inter quartile range, etc., and then have the discussion about it and what's happened.

You can create this distribution as well, and ask the students what's actually happening here. They should also know that the inter quartile range here is unchanged, so when you cluster them heavily around the mean, or the scores, this is what's happening, and have that discussion with students. You know, which of the measures of spread have been sensitive to the changes so far, why do you think the inter quartile range has not changed, etc. Keeping in mind the mean, standard deviation, inter quartile range and the range will automatically adjust as you add data and move the data around, so it's a nice one ready to go and it's attached at the end.

Again reproducing another set of scores, this one here, and going through the questions about standard deviation, variance, you know, which of the three measures of spread have been most sensitive to the change in distribution of the scores when you reproduce the next distribution and why do you think this is the case? Get students to define variance and define standard deviation, and what kind of measures of spread they actually are. Well the variance measures the average square deviation about the mean, and the standard deviation is the square root of the variance. So get them to calculate the variance and standard deviation, etc., and see what they get, and they should come up with their results, have a nice conversation about all that and generate the discussion we need to give them a deeper understanding of what we're doing here in terms of analysing statistical data.

Again, you can create this type of distribution, or the next one, and go through the question with the students. It's all there for you to go through, we've put all the slides up.

This is another lesson that's up as well on the Building Capacity Resource for Stage 5 statistics, and this is a great one. This is a GeoGebra one, it takes you through a tutorial to show you how to use the spreadsheet in the GeoGebra View, so there's a file with data in it, and once you copy and paste the data out of the Excel spreadsheet and take it into GeoGebra spreadsheet and paste it, there's a random number generator where you can generate a whole heap of numbers, and the random number generator is from a normal distribution with mean=50 and standard deviation=5, and that gets pasted in here for you to start the statistical analysis and actually generate graphs.

So as you see you fill it, comes up, just [inaudible 00:24:28] down on the spreadsheet, and then you can start creating the graphs that you want to create. So for single variable data analysis, students are required to use a range of statistical displays and measures to describe, compare and analyse data sets containing one variable. So we go to our little icon, which I've pointed to here, and that's one variable data analysis, and it'll create a histogram, a frequency histogram, and you can have discussions about the frequency histograms. You can then create the frequency polygon, and have your discussions about the frequency polygon.

Moving forward you can then change the display to a box-and-whisker plot, it's just a drop down menu and you select it, and it converts it. And you can have all the conversation you need to have about the outliers, so at Stage 5.2 students are required to construct a box plot using the median, the upper and lower quartiles and the extreme vales, which is what's happening up here. And then they can look at the effective outliers which is really very useful in terms of generating that discussion.

Stage 5.2, students are required to compare the shapes of box plots to corresponding histograms and dot plots, and the software does that for you, so you can actually compare them and see what they look like next to each other.

Okay, then you can plot two box plots, so parallel box plots drawn on the same scale, and that's Stage 5.2, where students are required to compare two or more sets of data using parallel box plots, and it's really simple to do, just follow these instructions.

Stage 5.3, students are required to find the standard deviation of the set using digital technologies and also investigate and describe the effect like we said earlier on the standard deviation by adding a data value to a set of data or altering all the data values by doubling all the values, or adding a constant to all the values, etc., so you can do that and generate the parallel box-and-whisker plots, and see what's happening on the same scale.

Okay, bivariate data analysis. In Stage 5.2 and Stage 5.3, in Stage 5.2 students investigate relationships between two statistical variables, including their relationship over time. And in Stage 5.3, students investigate the relationship between numerical values using lines of best fit, so actually going to draw lines of best fit. So if we have a look here, the software actually allows you to put the data in as a scatter plot first, so it's all displayed as a scatter plot, and at Stage 5.2, students use scatter plots to investigate and comment on relationships between the two numerical values. They can start having discussions about the scatter plot that's presented in front of you using the technology, and then you can select the regression model from the drop down list, and select a linear model, so then it draws a line of best fit in red.

And students in Stage 5.3 are require to use technology to construct a line of best fit for bivariate numerical data in a spreadsheet. They can also do it by hand, but the option is there to use the technology which I think is a great idea. And at Stage 5.3, students use line of best fit to estimate what might happen between known values, and predict what might happen beyond known values, so you can have that discussion with the students as well, and there's a lot of critical and creative thinking that will be happening there I'm sure in the classroom.

Okay, then we've got the wealth vs. health lesson. So this is the third one, and this is again Stage 5.3 data, and it's got some great stuff, and I might actually get Chris to jump in at any point and have a chat about it, because he was part of the team that [inaudible 00:28:26] ...

Chris: Okay. [inaudible 00:29:04]

Nicola: Can you hear Chris?

Chris: Uh, maybe if you unmute yours. Okay. Right, I think just going to let [inaudible 00:29:14]. All good, great.

Yes, as I was saying, the focus of this unit of work, or of the Building Capacity Resource was to obviously make use of tools that are in our hands already, and I'm sure you know lots about the capacity of GeoGebra in terms of the dynamic geometry, but the statistics package is something new. So that was very much the focus of the building capacity sample lessons that I've done there. You can also of course use Excel, if you prefer Excel. I think GeoGebra is sort of along the path there, but not quite there in a lot of ways, but in many other ways it's much easier to use for students I think.

Look, the wealth vs. health sample lesson is obviously looking at the relationship between two variables, and it gives the students the opportunity to work with some real data. Some of you would be aware of the Gapminder website, which is quite an interesting project in itself, and another great source of some really interesting numbers for you to use in your classroom, so I encourage you to have a look at the end of this presentation or sometimes soon, to go and have a look at what's available there, and also just the really amazing graphics that they use to present statistics.

Yeah, in this one, there's a couple of GeoGebra files in the resource ready for you to use, and the students go about producing a line of best fit using real data, and using the GeoGebra tools to that, okay, as you can see there. We're using the linear least squares regression, and of course that's what most people refer to as regression, or linear regression, or least squares to fit a model to their data, and the least squares regression is the line which produces the smallest value of the sum of the squares of the residuals, and the residual and the vertical distance from a point on a scatter diagram to the line of best fit.

Okay, so the conclusion of the lesson, it gives some really good opportunities to talk about the difference between causality and correlation, and you know, one of those great teaching points that this sort of data allows you to explore. There's also a couple of really nice videos around that idea, and Hans Rosling's presentation, you know, '200 countries, 200 years, 4 minutes', if you haven't already seen that one that would be a great thing to share with your students, you know just to set the [same 00:32:23] for the data, and he does it in a really dynamic way, and is just a really wonderful presenter.

I think we're getting towards the end of the presentation are we? Nicola?

Nicola: Absolutely. Just wanted to remind you all that if you go to the Australian Curriculum website on MyPortal, you will find these Adobe interactive documents with all the lesson plans, smart notebook files, the videos, the links to Youtube are all there ...