Thursday, April 30, 2015

Regression Analysis

Part I:
The regression of crime rate to percentage of students getting free lunch has a significant level of .005, indicating that it is indeed significant. 48.5% of the students will get free lunches when the crime rate is 79.7 per 100,000 people. I'm confident that there is a relationship, I just don't think that relationship is quite as strong as the local news station might be implying. 

Part 1 












Part II:
Introduction:
The UW system wants to know why students choose the schools they're going to. In order to analyze this, spatial regression analysis is to be performed on data regarding University enrollment and County population.

Methods:
Testing spatial regression was done through three separate equations for two schools, Eau Claire and Milwaukee. The null hypothesis states that there is no relationship between the two variables. The alternate hypothesis states that there is a relationship between the two variables. The variables we tested against the number of students attending each school were: population divided by distance, percentage of the county population with a bachelor’s degree, and median household income per county.

Results:
Figure 1
Figure 2
For both schools, only two tests from each were deemed significant, population divided by distance and percentage of the county population with a bachelor’s degree. Because the significance level of each of these tests was .005 or smaller, for each of these we REJECT THE NULL! The Eau Claire population divided by distance to student regression (Figure 1) has a significance level of .000 and has an r2 of .945, showing this regression to be strong. The Eau Claire Bachelor Degree to student regression (Figure 2) has a significance level of .003 and has an r2 of .121, showing this regression to be very weak. The Milwaukee population divided by distance to student regression (Figure 3) has a significance level of .000 and has an r2 of .922, showing this regression to be strongly correlated. The Milwaukee Bachelor Degree to student regression (Figure 4) has a significance level of .001 and has an r2 of .160, showing this regression to be weakly correlated.

The Significance values for the number of students attending to Median Household Income had significance values of .104 and .027 for Eau Claire (Figure 5) and Milwaukee (Figure 6), respectively. Because both of these significance levels are greater than .005, both FAIL TO REJECT THE NULL! 
Figure 3


Figure 4
Figure 5
Figure 6
When looking at Residual Map 1, it can be seen that areas with larger populations (other than Milwaukee) have higher numbers of students attending Eau Claire than the regression would predict, however, most of the state closely follows the predicted regression. When looking at Residual Map 2, it seems that closer counties deviate higher than counties further away. Regardless of what the symbology of Residual Map 3 seems to indicate, the map shows that areas with higher populations (other than Milwaukee) deviate higher from the regression than those with lower populations that are closer. Residual Map 4 shows small rural counties with smaller populations and counties closer to Milwaukee as deviating higher than the regression. For all of the maps, distance is the most common influence on school selection throughout the state. Percentage of the population with a bachelor’s degree has some influence, but would perhaps be more indicative if it were weighted by distance as well.
Residual Map 1

Residual Map 2

Residual Map 3

Residual Map 4

Friday, April 10, 2015

Correlation and Spatial Autocorrelation

Part I:

1. 
Hypotheses:
Null hypothesis: there is no linear association between distance in feet and sound level in decibels (r = 0)
Alternate hypothesis: there is a linear association between distance in feet and sound level in decibels (r≠ 0)

Question1
The Pearson correlation for distance and sound level is -.896. The .896 tells that the variables are strongly correlated, and the negative value tells that as distance increases the sound level decreases. The critical value at 8 degrees of freedom for a 95% Significance Level is 1.860, and the t-score is -5.71, so the null is rejected.


2.
The findings from the correlations show several patterns. The strong negative correlation between percent white and percent black of -.887 is one of the major reasons why Milwaukee is seen as one of the most segregated cities in North America. Most of the neighborhoods that have white residents have no black residence whatsoever, and there is a large separation between the two groups. The differences between the two groups become only more heightened when the correlation between percent white and present with a bachelors degree is compared to percent black and percent with a bachelors degree, as they are both moderately strong , yet in different directions. Neighborhoods with a higher percentage of white population typically have less of a population living below the poverty line, as there is a -.767 correlation between the two. Unfortunately, the opposite rings true for percent black and population living below the poverty line, as there is a moderately strong positive correlation of .668 between the two. The correlation between percent white and percent Hispanic is almost identical to the correlation between percent black and percent Hispanic at -.218 and -.246 respectively. It seems that almost every demographic group is just as likely to walk to work as the others, with the only slightly significant correlation being a .354 positive correlation between the percent below the poverty level and the percent that walk to work. The percentage with no high school diploma varies across the groups from a moderate negative correlation with percent white and percent with a bachelor's degree, to a high positive correlation with percent Hispanic and a moderate positive correlation with percent below the poverty line. This suggests that the in the Hispanic neighborhoods, the percentage of the population with a diploma is lower than in other neighborhoods. 
Question 2
Part II:

Introduction: 
The Texas election commission is analyzing the patterns of elections and wants to see if any of the election patterns are clustered. Furthermore they want to determine if election patterns have changed over 20 years. I am to analyze the data and determine if there is spatial autocorrelation of voting results, and to determine if there are any correlations, if the populations are indeed clustered.

Methods: 
In order to test clustering in the election data, I used GeoDa to create several LISA maps and to calculate Moran's I for the variables. Next I used SPSS to create a correlation matrix of the variables.

Results: 
My analysis has determined that all of the data is clustered, but not all of it is clustered to the same extent. The most clustered of all of the data was the percentage of Hispanic persons throughout Texas, with a Moran's I value of 0.7787 (graph 1, map 1). The percent of the population that voted Democratic in 2008 was the second most clustered data set with a Moran's I of 0.6957 (graph 2, map 2). The voting turnout in 2008 was less clustered than the percentage that voted democratic with a Moran's I of only 0.3634 (graph 3, map 3). The 80s voting data differed from the 08 voting data, but only slightly, with the percent democratic having a Moran's I of 0.5752 (graph 4, map 4) and the voting turnout having a Moran's I of 0.4681 (graph 5, map 5).

When looking at the correlation matrix (graph 6), and the LISA maps, further results may be observed. The percent that voted Democrat in the 80s and the voting turnout percentage of a county in the 80s had a correlation of -.612, indicating that areas with a higher percentage of the population voting democratic had less people actually vote. In 2008 the same comparison resulted in a correlation of -.604 suggesting that this trend hasn't changed much in the last 20 years. When comparing the democratic voting percentages from the eighties to 2008, the resulting correlation is .540, suggesting that areas were more democratic in the 80s were even more democratic in 2008. When comparing the 2008 voting data to the percentage of the population that self describes itself as Hispanic, there is a correlation of .669 between percent Hispanic and percent that voted Democrat, suggesting that the Hispanic members of the population greatly favor the Democratic Party. When comparing the same Hispanic percentage to the voting turnout from 2008 there is a correlation of -.668, suggesting that well the Hispanic population favors the Democratic Party, they are less likely to vote.

Map 1
Graph 1



Map 2

Graph 2

Map 3

Graph 3

Map 4

Graph 4

Map 5

Graph 5

Graph 6

Conclusion: 
The Lisa maps show that the voting patterns are indeed clustered, and lend some insight into where these patterns have changed in the last 20 years. When looking at maps 2 and 4, the viewer can see that the democratic voting has only become more clustered in 2008 than it was in the 80s. When looking at maps 3 and 5, the viewer can see that the voting turnout has only gotten less clustered in the last 20 years. When comparing map 1 to maps two and three, it is clear how the areas of high Hispanic population correlate with areas with high democratic voting percentages and low voter turnout.