My hypothesis is that men score more points in snowboarding than women.
I am going to collect my data from http://www.fis-ski.com/ . I am going to take a random sample so that every bit of data has an equal chance to get picked. To make the investigation fair i am going to control some variables. One of the variables that I am going to control is the year i am giong to collect my data from because the equipment the people use would have changed over the course of time so I am going to take all my data from the most recent years and not 2015 because 2015 is not finished yet. Another variable that I am going to control is the discipline because some discipline might be easier then others or might have a different ranking system. I am going to control these variables for both men and women to make the investigation fair and comparable.
The calculations I am going to use are standard deviation to get a better measure of spread than the range, interquartile range to get a range that excludes outliers so that it is more accurate, mean to have an average that includes every bit of data, median to have an average that does not get affected by extremely high or low values, and pearson's coefficient. I am going to group the data so that I could draw a histogram. The graphs I am going to use are histograms to compare the skew of both sets of data, and the cumulative frequency graphs with box plots to easily compare the data.
Selection and collection
I collected my data for both men and women from http://data.fis-ski.com/snowboard/results.html?place_search=&seasoncode_search=20
=50 and it is reliable because I only used one website to get all my data since there could be a slight difference between each website as other websites might not have some particular results or they might have rounded the number to differently. The size of the sample I took is 50 because I thought this is large enough to represent all the population. I took a random sample because each bit of data will have an equal chance of being picked. On the website there were 22 different competitions for snowboarding so I went on a random number generator to generate a number between 1 and
22 with each number representing their corresponding competition and it generated 1 which was representing Olympic winter games. I took the data from the 2014 and 2010 because they were the most recent games and taking results from only 2014 would not get me enough bits of data. To choose the discipline that I am going to collect my data I went on the random number generator to generate a number from 1 to 5 and it generated 2 which represented Halfpipe. Then clicked on the
'result final' option for men in 2014 which also showed the semifinals and the qualification results so I gave each result their corresponding number and I used a random number generator to generate me only 25 bits of data since I am going to get the other
25 from 2010 to make the investigation fair. I did the same thing for 2010. I also did the same thing for women. Every person had two different scores and so I decided to take their best score since that was the one that counted in the competition.
My histogram for men's data shows a negative skew that means that the median is greater than the mean and it also means that there is more data towards the higher end which means that there are more men who got a high score instead of a low score. Even though my histogram has a negative skew it shows that there are a lot of people who got a low score (less than 60). Even though there is a bit of variance, you can clearly see the negative skew. This supports my hypothesis a bit as there are a lot of men who got a good score but it does not completely support the hypothesis because