Monday, May 4, 2015

Assignment 5: Regression Analysis

Part 1: Crime Rates and Lunches

y=21.819+1.685x

Null hypothesis:
There is no linear relationship between crime rates and lunches.
Alternative hypothesis: There is a linear relationship between crime rates and lunches.
Reject the null hypothesis with a significance value to be .05 at a 95% confidence interval. If crime rates (X) were at 79.7, free lunches (Y) would be 2,930.

The data suggests that the two variables have a linear relationship, but the study forgets to mention why people are receiving free lunches. People that receive free lunches do not have enough money to pay for them; they most likely have fallen below an income bracket that qualifies them for a free lunch. If a school is giving more free lunches, it means that it has more people that cannot pay for their food. One cannot automatically assume that receiving free lunches is causing people to break laws. It would be interesting to look at household income data and crime rates, because the number of free lunches is a backwards way of looking at income.


Part 2:

Introduction

The purpose of the assignment is to spatially analyze enrollment data from the UW system. Two UW schools, Eau Claire and Madison, were chose for the analysis. Although the reasons that an individual attends a given college are endless, this analysis is looking at overall trends based on population, household income, and number of bachelor degrees.

Methods

Data was obtained for Wisconsin Counties with information like number of bachelor degrees, population normalized by distance to the University, and household income. The data is opened in SPSS, and run through a linear regression. A regression analysis is a statistical tool to investigate the relationship between two variables. It seeks to predict the effect of on variable on another to investigate causation. The two variables are the independent and dependent variables. Independent is found on the x-axis and is what explains the independent variable. The independent variable is found on the vertical axis Y and is what is explained by the dependent variable.

For the analysis, both Eau Claire and Madison are the dependent variables. Each are run through 3 individual linear regression analysis: the number of bachelor degrees, population normalized by distance to the University, and household income. Significant linear relationships were run through again to save the standardized residuals. Residuals is the amount of deviation of each point from the line of best fit, it shows the difference between the actual and predicted value of y. The residuals were then opened in ArcMap and were mapped using natural breaks.


Results

Null hypothesis: There is no linear relationship between percent of bachelor degrees and Eau Claire enrollment.

Alternative hypothesis: There is a linear relationship between percent of bachelor degrees and Eau Claire enrollment.

Reject the null hypothesis with a significance of .003. The R Square however shows that there is a weak linear relationship. The standard error of the estimate is 209.611, which is very high. This means that their are outliers. The map shows that the biggest outlier is Eau Claire county. This could be explained because it is a regional University; many people that go to Eau Claire are from Eau Claire, so Eau Claire county is a large outlier.


Null hypothesis: There is no linear relationship between population by county and Eau Claire enrollment.

Alternative hypothesis: There is a linear relationship between population by county and Eau Claire enrollment.

Reject the null hypothesis with a significance level of .000. The R square value shows that their is a strong linear relationship. The map shows mostly flat residual values, with  couple counties that have more than expected.

Null hypothesis: There is no linear relationship between percent of bachelor degrees and Madison enrollment.

Alternative hypothesis: There is a linear relationship between percent of bachelor degrees and Madison enrollment.

Reject the null hypothesis with a significance level of .000. The R Square value of .363 shows that their is a weak linear relationship. The map shows Dane county as another very large outlier, because many people from the area attend the school.



Null hypothesis: There is no linear relationship between Household Income and Madison enrollment.

Alternative hypothesis: There is a linear relationship between household income and Madison enrollment.

Reject the null hypothesis with a significance value of .001. The R square value is very weak at only .154, and a high standad error of the estimate 810.123. The map shows Dane county to be a large postive outlier, and the counties around it negative outliers.This means that more people in Dane County attend Madison than predicted with their income levels.

Null hypothesis: There is no linear relationship between Population and Madison enrollment.

Alternative hypothesis: There is a linear relationship between population and Madison enrollment.

Reject the null hypothesis with a significance value of .000. The R square value shows a strong postive linear relationship with a .902. 

Conclusion:

The most significant variable for both Madison and Eau Claire enrollment was the population normalized by distance. All other variable were significant, but were weak linear relationships because they had many outliers. it is very interesting that the most significant variables are also the only normalized numbers. Proximity to the school plays a large role in who attends, because the UW system has created many regional Universities, they have become quite popular in their regional areas.