For this week’s lab we compared four different data classification methods and identified the most appropriate method for visualizing the spatial data provided. Our data source was the 2010 US Census Tract in Miami-Dade County obtain by FGDL, and we were tasked with representing the population of seniors (65 and up) by both percentage of the population for each tract and by normalizing the data to show the number of senior citizens per square mile. To do this we created two maps, both comparing four classification methods- Natural Break, Equal Interval, Quantile, and Standard Deviation.
We were tasked to explain how each classification
method differs as well as which method of classification (by percentage or per
square mile) was a more accurate representation of the distribution of senior
citizens. I concluded that since census tracts were specifically designed to
have uniform populations the percentage of senior citizens was the best choice
for representing this data set. Normalizing the data by square mile favors
smaller census tracts even if the senior population is significantly less than a
larger tract. In order to obtain a more accurate “per square mile” comparison I
proposed that the county should be equally divided by area to get a more
accurate visual of population density.
Of the four classification methods,
I believe that the Natural Break is the most accurate representation of the
data. The data contains an outlying tract consisting of 79% of the population
being 65 or older. This skews the data slightly to the right. The Natural Break
method accounted for this outlier while still breaking up the classes into
categories that are a cleaner representation of the data. The equal interval
method contained an empty class due to the presence of an outlier and grouped
the majority of observations in the lower classes. It failed to provide insight
into which tracts other than the outlier contained a higher density of senior
citizens. The quantile method grouped the outlier with significantly lower
population categories which masked its status as an outlier and might lead to
inaccurate assumptions about the population represented in those tracts. Standard
deviation is not the best representation due to skewedness of the data as well
as the presumed audience.
I found this exercise very helpful in solidifying my understanding of the lecture material, how each classification method works, and under which circumstances they are best suited for creating an accurate representation of spatial data.

No comments:
Post a Comment