Monday, May 4, 2015

Assignment 5: Regression Analysis

Part 1: Crime Rates and Lunches

y=21.819+1.685x

Null hypothesis:
There is no linear relationship between crime rates and lunches.
Alternative hypothesis: There is a linear relationship between crime rates and lunches.
Reject the null hypothesis with a significance value to be .05 at a 95% confidence interval. If crime rates (X) were at 79.7, free lunches (Y) would be 2,930.

The data suggests that the two variables have a linear relationship, but the study forgets to mention why people are receiving free lunches. People that receive free lunches do not have enough money to pay for them; they most likely have fallen below an income bracket that qualifies them for a free lunch. If a school is giving more free lunches, it means that it has more people that cannot pay for their food. One cannot automatically assume that receiving free lunches is causing people to break laws. It would be interesting to look at household income data and crime rates, because the number of free lunches is a backwards way of looking at income.


Part 2:

Introduction

The purpose of the assignment is to spatially analyze enrollment data from the UW system. Two UW schools, Eau Claire and Madison, were chose for the analysis. Although the reasons that an individual attends a given college are endless, this analysis is looking at overall trends based on population, household income, and number of bachelor degrees.

Methods

Data was obtained for Wisconsin Counties with information like number of bachelor degrees, population normalized by distance to the University, and household income. The data is opened in SPSS, and run through a linear regression. A regression analysis is a statistical tool to investigate the relationship between two variables. It seeks to predict the effect of on variable on another to investigate causation. The two variables are the independent and dependent variables. Independent is found on the x-axis and is what explains the independent variable. The independent variable is found on the vertical axis Y and is what is explained by the dependent variable.

For the analysis, both Eau Claire and Madison are the dependent variables. Each are run through 3 individual linear regression analysis: the number of bachelor degrees, population normalized by distance to the University, and household income. Significant linear relationships were run through again to save the standardized residuals. Residuals is the amount of deviation of each point from the line of best fit, it shows the difference between the actual and predicted value of y. The residuals were then opened in ArcMap and were mapped using natural breaks.


Results

Null hypothesis: There is no linear relationship between percent of bachelor degrees and Eau Claire enrollment.

Alternative hypothesis: There is a linear relationship between percent of bachelor degrees and Eau Claire enrollment.

Reject the null hypothesis with a significance of .003. The R Square however shows that there is a weak linear relationship. The standard error of the estimate is 209.611, which is very high. This means that their are outliers. The map shows that the biggest outlier is Eau Claire county. This could be explained because it is a regional University; many people that go to Eau Claire are from Eau Claire, so Eau Claire county is a large outlier.


Null hypothesis: There is no linear relationship between population by county and Eau Claire enrollment.

Alternative hypothesis: There is a linear relationship between population by county and Eau Claire enrollment.

Reject the null hypothesis with a significance level of .000. The R square value shows that their is a strong linear relationship. The map shows mostly flat residual values, with  couple counties that have more than expected.

Null hypothesis: There is no linear relationship between percent of bachelor degrees and Madison enrollment.

Alternative hypothesis: There is a linear relationship between percent of bachelor degrees and Madison enrollment.

Reject the null hypothesis with a significance level of .000. The R Square value of .363 shows that their is a weak linear relationship. The map shows Dane county as another very large outlier, because many people from the area attend the school.



Null hypothesis: There is no linear relationship between Household Income and Madison enrollment.

Alternative hypothesis: There is a linear relationship between household income and Madison enrollment.

Reject the null hypothesis with a significance value of .001. The R square value is very weak at only .154, and a high standad error of the estimate 810.123. The map shows Dane county to be a large postive outlier, and the counties around it negative outliers.This means that more people in Dane County attend Madison than predicted with their income levels.

Null hypothesis: There is no linear relationship between Population and Madison enrollment.

Alternative hypothesis: There is a linear relationship between population and Madison enrollment.

Reject the null hypothesis with a significance value of .000. The R square value shows a strong postive linear relationship with a .902. 

Conclusion:

The most significant variable for both Madison and Eau Claire enrollment was the population normalized by distance. All other variable were significant, but were weak linear relationships because they had many outliers. it is very interesting that the most significant variables are also the only normalized numbers. Proximity to the school plays a large role in who attends, because the UW system has created many regional Universities, they have become quite popular in their regional areas.


Friday, April 10, 2015

Assignment 4

Part 1



Null Hypothesis: Distance and sound level are not correlated.
Alternative Hypothesis: Distance and sound are correlated.

Fail to reject the alternative hypothesis because the significance is .000, which is less than .5 based on a 95% confidence level. 

Distance vs. Sound level has a strong negative correlation and the points are situated closely around the best fit line. 

Part 2
Some of the patterns I have noticed are with bachelors degrees. There is a negative correalation with percent black, percent no high school, and percent below poverty. There is also a positive correlation between bachelors degrees and percent white. The percent white is the only of these statistics to have a strong positive correlation. It is also the other way around, if there is a high percent of hispanic, black, or poverty, there is a negative correaltion with bachelors degrees. 

There is also a racial divide, no race has a positive correlation with another race. This means that

Part 3
Introduction

The Texas Election Commission is interested in doing an analysis comparing the 1980 and 2008 presidential elections by county. They have provided all of the the data necessary for both years including voter turn out, percent democratic vote, and percent Hispanic population.

The purpose of this study is to determine if there is a spatial auto-correlation with the data. If there is clustering, where does it occur and how does it relate?

Methods

A shapefile of Texas counties was obtained from the American FactFinder. Hispanic county data was also obtained and added to the Texas data sheet. This data sheet was joined to the Texas counties and exported as a shapefile. 

The data was then ready to be opened in Geoda to perform spatial autocorrealtion tests. A spatial autocorrelation is defined as the correlation between a variable with itself through space. First, a Moran's I test was performed on each data set.



Moran's I is a spatial auto-correlation test compares the value of the variable at any one location with the value at all other locations. Moran's I have 4 quadrants of comparisons.

Next, Local Indicators of Spatial Autocorrelation (LISA) maps were made for each variable. These maps provide a spatial component of spatial autocorrelation. It uses spatial weights to determine clustering on a visual map.

Results

For percent Hispanic, there are 2 cluster areas, low low and high high. The high high significance is shown to be close to the border of Mexico. The low low is clustered farthest away from the border. The Moran's I shows there is a strong correlation of .7787.

For the percent democratic vote in 2008, there is high high significance close to the border of Mexico. There is a low low significance on the northern border of the state. The Moran's I shows high correlation of .6957.

For the percent democratic vote in 1980, there are two high high significance clusters, one close to the southern border of Texas and Mexico and one to the east north border. There is low low significance cluster close to the northwestern border. The Moran's I shows a strong correlation of .5752.

The voter turnout in 1980 shows a low low significance cluster close to the southern border of Texas and Mexico. There is a high high significance in the northern side of the state, clustering around both Dallas and Austin TX. The Moran's I shows a significance of .3634.

The voter turnout in 2008 shows a low low significance cluster close to the southern border of Texas and Mexico. There a high high significance near the northern border and also clustered around where Austin, TX is. The Moran's I shows a significance of .4681.

Figure 1



















Conclusion

From the comparison of 1980 and 2008 elections, some interesting patterns have revealed them self through both LISA and Moran's I spatial auto-correlation tests. There is a high correlation of Hispanic populations near the southern border of Texas and Mexico. On this same border, there is a high correlation of low numbers of voter turnouts, and high number of democratic votes. This means that areas that have high Hispanic clustering also have low voter turnouts, and high democratic votes in relation to them self.

There also is shown to be low hispanic spatial auto-corelations in the northern part of the state, excluding the area of  Dallas TX. The area of Dallas TX Texas shows high voter turnout, but no significance of democratic vote. The northwestern part of the state shows no significance of Hispanic clustering, but high significance of voter turnout, and low democratic vote.

If the TEC is trying to increase voting in the state, the should focus on the southern border of Texas and Mexico. There is a trend of low voter turnout there, so it would be beneficial if someone was able to get these areas to vote, and the turnout is mostly democratic. This area also has clustering of Hispanic populations so that should be put into consideration also.



Monday, March 16, 2015

Quantitative Methods- Assignment 3

1.

2.

Asian Long Horned Beetles
Null hypothesis: the number of this invasive species in a Bucks county sample should not differ from the state of Pennsylvania averages.

Alternative Hypothesis: the number of this invasive in Bucks county is different from the state of Pennsylvania averages.

I reject the null hypothesis that there is no difference in this number of invasive species between Bucks County sample and the state of Pennsylvania averages. This is because Z-score of the given sample is -7.7519 which falls outside of the critical value of +/- 1.96.

Emerald Ash Borer Beetle 
Null hypothesis: the number of this invasive species in a Bucks county sample should not differ from the state of Pennsylvania averages.

Alternative Hypothesis: the number of this invasive in Bucks county is different from the state of Pennsylvania averages.

I reject the null hypothesis that there is no difference in this number of invasive species between Bucks County sample and the state of Pennsylvania averages. This is because Z-score of the given sample is 9.249 which falls outside of the critical value of +/- 1.96.

Golden Nematode
Null hypothesis: the number of this invasive species in a Bucks county sample should not differ from the state of Pennsylvania averages.

Alternative Hypothesis: the number of this invasive in Bucks county is different from the state of Pennsylvania averages.

I reject the null hypothesis that there is no difference in this number of invasive species between Bucks County sample and the state of Pennsylvania averages. This is because Z-score of the given sample is 2.47 which falls outside of the critical value of +/- 1.96.

In conclusion, all of these samples reject the null hypothesis.This means that something is happening in Bucks county that makes it less habitable for these invasive species.


3.
 Null hypothesis: The number of people per party has no difference in the intervening years.
Alternative hypothesis: The number of people per pasty has a difference in the intervening years.

t-score: 4.92

The corresponding probability value for the t-score is 1.711 for a one tailed test at 95% confidence level.



4.

Introduction

In this assignment I have been hired by the tourism board of Wisconsin to analyze the concept of "Up-North." Northern Wisconsin is home to many cabins and is where many go to vacation for the summer. Being able to understand aspects of tourism of Northern vs. Southern Wisconsin could lead to better marketing and planning for such activities.
   Fishing is the focus of this analysis. Fishing is an activity many people partake in, northern and southern Wisconsin may have a difference in who and how many people are fishing there.

Methods

The State of Wisconsin provided a broad data set (SCORP) where 3 different variables were to be chosen. The chosen variable are state fishery areas, non-residential fishing licenses, and residential licenses.\
 A shapefile of Wisconsin was obtained from the U.S. Census FactFinder. This shapefile was joined with the given dataset table. The 3 variables are broken down using natural breaks into 4 classes for statistical analayis and mapping.

These classes were added as another field and exported as a dBASE table for use with SPSS.

SPSS was used to run a chi-squared analysis of the 3 variable data against the northern vs. southern data. Chi-square tests whether or not observed values differ from expected values. All three variables were calculated at a 95% confidence level to determine significance.


Results

Tourism in northern Wisconsin (Figure 1) proves not to be a different than the South, except for resident fishing licenses. State Fishery Areas (Figure 2) show that there are a few hot spots for fishing around Wisconsin, but it is not limited to the north. This is further backed up with a chi-square test that fails to reject the null hypothesis that there is not a difference between the north and south. With a significance value of .192, it is greater than .05, there is not a significant difference between the northern and southern acres of Wisconsin State Fishery Area locations.

The number of non-resident fishing licenses (Figure 3) shows popularity in both northern and southern counties. While the north may look like it has a lot more non-residential fishing going on, it is not a significant amount. The result for the chi-square fails to reject the null hypothesis that there is not a difference between non-residential fishing licences in northern and southern Wisconsin. At a 95% confidence interval, the significance value is .144. This number is greater than .05, which supports that there is no difference.

The number of residential fishing licenses per county shows different story (Figure 4). The map shows looks as though there is more residential fishing going on in southern counties. This is supported by a chi square test that rejects the null hypothesis that there is no difference between northern and southern counties. Tested at a 95% confidence level, the significance value is just less than .05 at .049. This means that there is a significant difference between residential fishing licenses, and the map shows that there seems to be more in the south.


Figure 1
Figure 2
Figure 3

Figure 4


Conclusions

The tourism for fishing in Wisconsin does not differ between the north and south, only residential fishing does. This could be because there is not a significant difference between State fishery areas. If I could further investigate this, I would try and find if population per county and residential fishing licences correlates.

Thursday, February 26, 2015

Quantitative Methods- Assignment 2


Introduction

     In this assignment, I have been hired by an independent research consortium to study the geography of tornados in Kansas and Oklahoma. This is a topic of interest because tornados are very common in these states. If there is a spatial pattern to where the tornados land and how destructive they are in a given area, safety measures can begin to be implemented in places that need it most.
     This analysis compares two periods of time; 1995-2006 and 2007-2012. Some people argue that tornado patterns have not changed over the years, so places where they have always occurred should be required to build shelters. Others disagree, and say that not every place sees tornados, shelters are a waste of time and money. This project will be looking at if tornados change over time, if there are any reoccurring patterns of touchdowns and size of tornados across the states. This review will provide answers to whether or not storm shelters could be a necessary precaution to be implemented.

Methodology

      Two datasets were received of tornado locations and width for the years 1995-2006 and 2007-2012. A shapefile of the county level for a combined view of Kansas and Oklahoma. The first spatial statistical analysis tool used is the mean center. The mean center is the average spatial point of a given data set. This is calculated from the average of x and y values. A weighted mean center was also used, which is a mean center but take into occasion frequencies of grouped data. The mean center was found for both 1995-2006 and 2007-2008. The weighted mean center was also found for both data sets and was weighted by width of tornados. It is assumed in this study that the width of tornados makes it more destructive.
      The next spatial statistical tool used is standard distance. Standard distance is the spatial equivalent to the standard deviation. Standard distance measures the degree to which features are concentrated or dispersed around the points and expressed by as a radius or circle. It can only be calculated if there is a weighted mean center. Standard distance was found for 1995-2006 and 2007-2012 within 1 standard deviation, both weighted by tornado width.
     Lastly, the standard deviation of tornado occurrences by counties was found. The standard deviation shows how close to the mean a given dataset is. A high standard deviation shows that there is a lot more occurring in an area than the mean, and  a low standard deviation showing there is a lot less than the mean.

Results

The mean center and weighted mean center of 1995-2006 data show that the mean center is farther north than the weighted mean center (Figure 1). This means that there is a tendency for larger tornados in the more southerly locations. For 2007-2012, the weighted mean center also is more southerly and farther east than the mean center(Figure 2). This means that there were larger tornados in the south and east pulling the weighted mean center in that direction compared to just the tornadoes locations in the mean center. Comparing the years 1995-2006 and 2007-2012, both weighted mean centers are the farthest south (Figure 3). The mean center for 2007-2012 is also farther north than all of the weighted and non-weighted mean centers, meaning that there was more frequency of tornados farther north in 2007-2012, but they were not as big.



Figure 1
Figure 2
Figure 3

The standard distance for the two time periods, 1995-2006  (Figure 4) and 2007-2012 (Figure 5). These maps show 1 standard deviation around the mean center weighted by width of tornados. Comparing the two standard distances shows that in 2007-2012 (Figure 6),has a smaller radius than 1995-2007. This means that the 2007-2012 data is more concentrated around the weighted mean center than in 1995-2006. In 2007-2012, the width of tornados and their locations show two concentrations of tornados, one starting north and running through the weight mean center and one running through the south side of the standard distance. These concentrations are both near the weighted mean center, and there is not many tornados outside of the standard distance. In 1995-2006 there is a much higher number of tornados farther away from the mean center, it is much more spread across the states. The standard distance has to be bigger for 1995-2007 to account for the larger number of tornados occurring on the edges of the states.



Figure 4

Figure 5

Figure 6
The standard deviations of the year 2007-2012 was also found (Figure 7). This shows where each county falls within a normal distribution. This map shows where there are patterns by counties that more or less tornados.



Figure 7

    Statistics of the data were also calculated. The Z-scores based on the number of tornadoes per county for Russell County, KS is 4.88, for Caddo County, OK is 2.09, and Alfalfa County is .23. The average number of tornadoes per county is 4 and the standard deviation is 4.3. Russell County has a very high Z-score of 4.88 which means that it is 4.88 standard deviations away from the mean, that county has many more tornados compared to the mean. Afalfa County on the other hand, with a Z-score of .23, is close to an average amount of tornados because it is within 1 standard deviation.
    If the patterns hold true over the next five years in OK and KS, the z-score of tornados that will be exceeded 70% of the time is 1.764. The z-score of tornados that will exceed only 20% of the time is 7.612.

Conclusions

the weighted mean center for both time period shift to the south which means that width plays a role in tornados, it shows that more southern locations have larger tornados. There is a larger standard distance radius for 1995-2006 because there are more tornados spread on the outer edges of the state, where has in 2007-2012 tornados are more concentrated around the weighted mean center. The standard deviations of counties show that there are patterns of more occurrences and less occurrences of tornados by county. The z-scores show that there is a large difference between counties on the frequency of tornados, this shows that some counties would benefit more from shelters than other counties. Both time periods lean towards the south for larger tornados, so if shelters were to be put in, Oklahoma would benefit the most from shelters. Looking at the graduated symbols of tornados across both states, there are a large number of tornados happening almost everywhere, so for safety precautions I would suggest shelters are a necessity, especially around the weighted mean center.




Sunday, December 14, 2014

Lab 5 GIS 1

Goal
The goal of this lab was to solve a spatial problem of my own choosing. The spatial problem I chose was where I should look to live in Portland, Oregon.

Background
This fall I went to Oregon with one of my other geography classes. I fell in love with Portland and have talked about moving there after graduation. I chose some parameters for where I would live in Portland: proximity to bike lanes, average income of census tracts, not being near an airport, and living within Portland city limits. I do not have a license because biking is my main mode of transportation so being near a bike infrastructure is vital for me. I found data for bike lanes on Portland's public data website. As for income, I found data from the U.S. Census FactFinder. I also looked up the average income for a GIS technician on indeed.com and found that it was $37,000. As for airports, that was for personal reasons; I have lived near one before and I would rather not do that again. I found the data for counties in the MGIS folder on our class share drive.

Methods

Objective 1: Getting started
First I created a file database for the project called MultnomahCounty. I went to above websites and downloaded all data needed, including bike lane shapfile, Multnomah County census tract shapefiles, Portland city limit shapefile, ACS 5 year estimate census data table. Airports and County data was pulled from the Oregon folder from MGIS. I exported all of this data into the MultnomahCounty.gdb database I created. I then used the project tool and changed all of the files to the NAD_1983_UTM_Zone_10N projected coordinate system. I then took the ACS 5 year estimate income data and joined it to the income tracts. I added a field and used the field calculator to load household median income to the field. I then removed the join.

Objective 2: Beginning of Analysis

I then used select by attribute for the census tracts, and queried for tracts that had a median household income of less than $40,000 dollars, because the average income of a GIS technician is $37,000 dollars. I exported the selected data into a new layer. I then added a 100 meter buffer to bike lanes. I used intersect for both the buffered bike lanes and the queried income data, I called this layer WhereToLive. I then set a 4 mile buffer to the airport in the county. I used the erase tool to erase the airport buffer from WhereToLive. Finally, I took the city limits and used the clip tool to make WhereToLive only inside the city limits of Portland.  This gave me areas where I should look to live pictured in the map below.

Objective 3: Create a Cartograhically Pleasing Map

I exported the ArcMap file to an .AI document two times, one with a larger scale and one with a smaller scale. I then opened them in Adobe Illustrator and began my work. I changed the colors to make the selected areas stand out, by making it a bright orange against pastels. I also added an indicator map and a close up of the data. This gives the viewer some context as to what they are looking at. I chose a basic sans-serif font because I intend this map to be viewed on the computer. The main goal of mine when making a map is for it to be simple, clean and understandable, which I achieved in this map.

Results
This map shows the areas I should look to live. There is a large cluster of orange which is downtown Porland. This seems to be a good area to live for me because there are a lot of bike lanes, which means getting around across many tracts would be easy due to the large number of bike lanes.

Figures
This is the flow chart explained under methods.
Conclusion:

This project was a great way to end the semester. The goal of the end results is a easily understandable map. They look simple, clean, and not very complex to someone that does not know about GIS. Little do many know that the road to the end result is long and full of problems. My project may have seemed simple in the beginning, but proved to be very complex. I was pulling and downloading data from many different sources. Overall, this project was a fun way to pull together all of the thigns I have learned this semester.

Sources

U.S. Census FactFinder: Tracts, Income Data, City Limits
Civicapps.org: Bike Lanes
ESRI Software: Counties, Airports


Wednesday, December 3, 2014

Lab 4 GIS 335


Goal
The goal of this lab was to map suitable habitats of bears in Marquette County, MI using a variety of tools in ArcMap.
Background
In this scenario, the DNR wanted to figure out suitable habitats and management areas for bears. Data was downloaded from the Michigan Center for Geographic Information: LandcoverDNR managment units, and streams.

Methods
I started by adding the Marquette bear study from a non-spatial database using X, Y coordinates. File->add data > add x,y data. Once they were mapped I exported the data into a feature class and named it bear_locations. I then added all other shapefiles i would need including streams and landcover. I performed a spatial join between bear locations and landcover, with bear being the destination and landcover being the source. I named this new feature class bear_cover. Next, I figured out how many bear locations were within 500m of streams. I used select by location with streams and bear_locations within 500km. Over 70% of the bear locations were within 500km of streams which is significant. Bears live near streams for necessity like the food and water they can get, this means 500m within streams is suitable. To find complete land suitability for bears, I added a 500m buffer to streams.  I then used intersect between the buffered streams and a exported shapefile of the top 3 landcover types for bear locations and performed an intersect on the areas. After that, I dissolved the results to remove internal boundaries. This gave me Suitable_habitats. I used the clip tool to cut out the DNR management areas outside of the study_area. I took the DNR mangement boundaries and did another intersect on suitable_habitats and the clipped DNR_mgmt to find the suitable habitats that the DNR can manage in its jurisdiction. 

The DNR in this scenario liked the results, but wanted more. They wanted to look at management areas 5 km away from urban and built up areas. I went back to my data and selected by attribute all urban and built up areas and exported to its own shapefile. I applied a 5 km buffer on this area and dissolved the internal boundaries. I used the erase tool to delete the urban and built up areas with a 5 km buffer from the DNR managable habits I had found earlier. This gave me the final areas the DNR wants to look at. 





Results



Suitable habitat are areas where the top 3 landcover and 500m proximity to streams intersect. The bear managment areas are where suitable habitats and DNR management areas intersect that are 5 km away from urban and built up areas. What I have found potentially problematic is that almost no bears are located in the management areas. They are in suitable habitats, but bears for what ever have not chosen those specific habitats to live in. This will make studying these areas difficult because there are almost no bears to study. Perhaps expanding the study area to all of Marquette County the DNR would be able to cover more management areas that have bears to study.

Figures


Sources
Data was downloaded from the Michigan Center for Geographic Information: LandcoverDNR managment units, and streams.

Tuesday, October 28, 2014

Lab 3 GIS 1


Introduction

The goal of this lab was to create a map from found data. I used data from the U.S. Census. The lab walked me through making the first map, and the second map allowed me to try it out for myself with data of choice to compare with the first map.

Methods

      I went to the U.S. Census Data Finder and found Wisconsin County level population data from 2010. I opened the metadata in Excel and saved that as a new .mlx file so it could be usable on ArcMap. I loaded the metadata and the census data into ArcMap and joined the data together. I then changed the colors of the map to be more aesthetically pleasing and also represents the data better.
    After creating this map, I found some more census data to work with. I found U.S. Rural and Urban housing data. I downloaded the data, and there were 6 different data sets I could map. I chose the population of urban housing in Wisconsin by County. I went through the same steps as the first map which created the map on the right.
 I went into data view of the maps where I added the legends, scale bars, and north arrows. I then exported the map into an .ai file and opened it in Adobe Illustrator. This is where I organized the legends, scale, and added boxes behind the data and title text to make a presentable map that is easy to understand.

Results


The trends between urban housing and population seem to be similar. Madison and Milwaukee have the darkest areas on both maps, the only difference is there is less urban housing on the counties surrounding these cities. This means that the urban sprawl of Madison and Milwaukee is not very far. There may be more people in these areas, but it is not considered urban housing.