1. Checking the values of R2 and adjusted R2
For this coursework, I select 20 countries randomly.
I enter the data on to Stata. The screenshot shows below is from the Stata Data Editor.
Birth = Life Expectancy at birth, total (years)
Gni = GNI per capita, PPP (current international $)
Water = Improved water source, rural (% of rural population with access)
From the result of the regression, the value of R-squared is 0.8692 and the value of adjusted R-squared is 0.8538
The formula of R2 is R2 =ESS/TSS
ESS (Explained sum of squares) is 763.212265
TSS (total sum of squares) is 878.039867
R2 = ESS/TSS = 763.212265/878.039867 = 0.8692
The formula of adjusted R2 is adjusted R2 = 1-(1- R2)(n-1)/(n-K) n (the number of observations) is 20 k (the number of parameters to be estimated) is 3
Adjusted R2 = 1-(1- R2)(n-1)/(n-K) = 1- (1-0.8692)(20-1)/(20-3)=0.8538
In statistics, R2 is used as a descriptive statistic to describe the strength of the linear relationship between the independent X variables and the dependent variable, Y. In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. Thus, R2 = 1 indicates that the fitted model explains all variability in, while R2 = 0 indicates no 'linear' relationship between the response variable and regressors. The R2 of the regression is 0.8692, which is close to 1. It means that the regression model is well fit to the real data. 87 percent of the variation in the response variable can be explained by the explanatory variables. The remaining 13 percent can be attributed to unknown, lurking variables or inherent variability.
Adjusted R2 measures to correct for the fact that non-relevant independent variables will result in some small reduction in the error sum of squares. Thus, the adjusted R2, provides a better comparison between multiple regression models with different numbers of independent variables. In this case, the difference between R2 and adjusted R2 is not very large, so the regression model is quite fit to the real data.
2. Performing F-tests to examine the suitability of the model The regression model is
The hypothesis test is as below: (two tailed test)
The test-statistic (F) is calculated as follows:
In this model, RSSR is the total sum of squared and RSSU is the residual sum of squared. N is the number of observations. m is the number of restrictions. k is the number of parameters to be estimated.
Hence, RSSR = TSS = 878.039867 RSSU = RSS = 114.827602 M = 2 N = 20 K = 3
The critical F value of F at 5% significant level
As , reject .
Since the computed value of F exceeds the critical value, we reject the null hypothesis that GNI per capita and improved water sources are not in combination conditionally significant. The combined effect of these two variables does improve the model that predicts life expectancy at birth. Therefore, GNI per capita and improved water source should be included in the model.
3. Performing t-tests to check whether each of the components has any significant effect (in the presence of other components) on the Life Expectancy value
The T-test for GNI and improved water source are as follows:
For GIN per capita, (Two tailed test)
Construct t=(0.0002669-0)/std. err.=0.0002669/0.0000548=4.87
Critical value is t= 2.11 at 5% significant level with 17 degrees of freedom.
Construct t=4.87 > critical t=2.11, so reject .
So GNI does