MDM4U - PROJECT BY SECTIONS
Normal Probability Distribution

The focus of the section on the Normal Probability Distribution is to identify
situations that give rise to Normal distributions by demonstrating an ability to
understand the properties of the Normal distribution, use these properties to
solve problems and make probability statements about Normal distributions.

The Normal Probability Distribution is a fundamental building block for most
elementary parametric Statistics, important applications are

1. it is the theoretical distribution of random error when the scale is
    continuous, for example, if every student in your school was given
    a ten centimetre rule to measure the length of the gym, the
    resulting distribution of their measurements would follow a Normal
    distribution.
2. it is the theoretical distribution for the means of randomly drawn
    large samples, for example, if everyone in the class rolled a dice
    50 times and was asked to compute the mean of their results,
    the distribution of the means from the class would follow a Normal
    distribution.

Area of Interest
Driver injuries and age group.

Refined Question
Which age group has the most driver fatalities? Serious Injuries?

Data
The Data used for this section has come from Canadian Motor Vehicle Traffic Collision Statistics 2000. (the website in the pdf file was http://www.tc.gc.ca/roadsafety).

Analysis
I have created a data chart containing data from the Canadian Motor Vehicle Traffic Collision Statistics 2000.

We can see that 25-34 year olds have the highest percentage of driver fatalities.
We can also see that 25-34 year olds also have the highest percentage of
serious injuries.

We can see that the above dot plots represent a bell shaped curve. The high
percentage value for 65+ is explained by the larger size of the age group. Notice
also that the grouping for ages 15-24 is divided into 15-19 and 20-24. We should
also notice that the age groupings do not all have the same range. I am allowing
this due to lack of normally distributed attributes concerning the section of the
transportation industry with an understanding that this will affect the probabilities
we will compute later. We will assume that these values represent a population
that has a normal distribution.

To calculate the mean, µ , and the standard deviation, ó , we use the formulas:


Note: We will isolate our data to ages 5-64 to avoid an overly high mean due to
the jump of values at 65+. We will only be using m and s to create a hypothetical
set of values to use for problem solving.

Driver Fatalities:

Driver with Serious Injuries:

Lets assume that the driver fatalities are normally distributed with mean 30.5 with
standard deviation 13.3. Lets assume that the driver serious injuries are normally
distributed with mean 31.2 with standard deviation 12.8. I will now use Fathom to
create a sample of 250 values for age of fatalities and 250 values for age of
serious injuries.

If the population is equal to 250 (let s ~= ó), what is the probability of a
driver who is killed is between 16 and 24 years old?

Therefore the probability a driver who is killed is between 16 and 24 years
of age is 17.64%.

If we only have a sample of 250 people, what is the probability of a driver that
is seriously injured is between 30 and 32?

Therefore the probability of a driver that is seriously injured is between 30
and 32 is 22.1%.
What ages do 95% of driver fatalities occur?

Therefore 95% of driver fatalities occur between ages 17.7 and 43.3.

Consider the youngest 63% of drivers with serious injuries. What is the oldest
driver with a serious injury?
We must find a corresponding z value to 0.63.
z=0.33


Therefore, concerning the youngest 63% of drivers with serious injuries, the
oldest would be 35.424 years old.

Conclusion
I found that by using a modified x-axis (agegroup) I could use the data (mean and
standard deviation) from the Motor Vehicle Statistics, whose graph formed a bell
curve, to generate a hypothetical normal distribution to compare ages with
serious injuries and fatalities in automobile accidents. We then used the
properties of the normal distribution to solve problems relating to probabilities
concerning age andinjuries including death.

Exploration
The Fathom Workshop Guide that I used in my exploration of simulation suggests
that the distribution of the mean proportion of males in my samples follows a
Normal distribution.

Plan and Data
What I plan to do is to use the data generated from the simulation and to analyse
the distribution through the tools supplied by Fathom.

Analysis

If the simulation data follows a Normal distribution I would expect the cumulative
distribution to look like an S shaped curve very similar to the one I obtained. I can
get better visual confirmation that the data follows a Normal distribution by
looking at the Normal Quartile Plot provided by Fathom
.

From the Fathom Help file - A Normal Quartile Plot shows the distribution
continuous (numeric) data. It plots the z-scores associated with the percentile of
each case if the data were normally distributed. Therefore, if the data are
Normal, the plot should show a straight line. My simulation data is very close to
the straight line shown on the plot.

Finally, I can do some checking whether the distribution of my simulation data
has the following properties of the Normal Probability distribution:
    50% of the data falls on each side of the mean
    About 68% of the data falls within one standard deviation of the mean
    About 95% of the data falls within two standard deviations of the mean
    About 99% of the data falls within three standard deviations of the mean
For my simulation data the mean proportion is 0.531, the standard deviation is 0.0497 and the Dot Plot is

where each dot represents two data. I counted the data in each interval and
divided by 200 hundred to get the percentage within each interval. I found the
percentage of data that falls on either side of the mean is 49.5% and 50.5%,
which is about 50% on each side of the mean.
The percentage of data that falls within one standard deviation (0.48 - 0.58) of
the mean is 73.5, which is larger than the predicted 68%.
The percentage of the data that falls within two standard deviations (0.43 - 0.63)
of the mean is 97%, which is larger than 95%.
The percentage of the data that falls within three standard deviations (0.38 -
0.68) of the mean is 100%.

I did not have time to explore whether the approximations that I did (rounding off
the mean and standard deviation, and counting from the Dot Plot rather than
using the raw data) biased these results.

Conclusion
The work that I did suggests that the distribution of the data I obtained from the
simulation could be modelled by the Normal probability distribution.

Checklist
Student Name : _______________________________
Area Of Interest : ______________________________


 
Very Limited
Adequate
Competent
Exceptional
Research and Comparison of
Data to Normal Distribution
 
 
 
 
Understands Properties of
Normal Distribution
 
 
 
 
Problems Solved Using
Normal Distribution
       
Probability Statements Made