-
Notifications
You must be signed in to change notification settings - Fork 42
Exploratory Analysis
The exploratory analysis consisted in visualizing 311 data from the City of Chicago Data Portal in order to understand it.
The data is aggregated at the Community Area level. The script munging/gen_comm_area_data.py
reads in the JSON databases of 311 requests and generates a comma-separated file that lists the total number of requests per type, for each of the 77 Chicago Community Areas. Additionally, this module uses the demographic information aggregated by Community Areas.
First, we explored how the volume of requests varies over time. It is clear that some request types exhibit a high seasonality effect. A good example of this are potholes, whose monthly request volume is plotted in the figure below.
The time of the year clear influences how many potholes are reported. More potholes are reported in the early spring, an which could be explained by melting ice that penetrated street surfaces as winter ends. Other types of 311 requests show similar seasonal patterns.
Time of the week also influences request levels. For virtually all 12 request types, the volume of requests submitted during weekends is much lower than during weekdays. Again, we take potholes as an example:
This could be due to several factors - citizens might be less concerned with reporting street problems during weekends. On the other hand, most of the City's work crews have weekends off, so they don't submit work orders on those days. And we know that some requests come from citizens through aldermanic offices, who submit them in batches during weekday business hours.
In order to see whether the 311 data tells us something about Chicago, we carried out a visual analysis to check whether there is a correlation between income, race, and volume of requests by type. Below, we illustrate some significant results.
In the scatter plots above, each dot represents one of the 77 Chicago community areas. The x-axis is the median household income, and the y-axis is the total volume of 311 requests since January 2011, per 10,000 residents. The colors represents race and ethnicity of the community area, as explained in the legend.
We see a clear correlation in the graffiti plot: all Hispanic neighborhoods are contained in a narrow interval of income levels, and most of them present a remarkably high volume of graffiti removal requests. This can be due to several factors. According to a crime expert we talked to, predominantly Hispanic neighborhoods neighborhoods have elevated gang activity, and these groups tend to tag their territory by spray-painting walls. Moreover, Hispanic communities are known to be very tight and interested in the state of their neighborhood, which might lead to an increased rate of clean-up requests.
Sanitation code complaints and vacant building reports contain comparable trends: high magnitude and variance of call rates for poorer, black neighborhoods, and decreased rates for wealthier ones. For similar income levels, we have a higher rate of reported vacant buildings in predominantly Black neighborhood compared to Hispanic ones. This probably reflects a higher population density in the latter.
Street lights out don't appear to be correlated with income. Nevertheless, neighborhoods with same predominate race tend to cluster with each other, a trend that calls for more investigation.