New York City Collisions Data-IPython Notebook and Plotting in CartoDB

I missed last Thursday’s STAT W4701 class due to being under the weather but was able to check it out via Columbia Video Network. Dr. Malecki posted a mini assignment where we could use NYC Crash Data. I’ve used this data before for mapping pedestrian and bicycle injuries and fatalities. This data used to be available via Transportation Alternatives: CrashStat but is now delivered via NYC Open Data.

Some Python Pandas in IPython Notebook

Converted to Markdown (of course). See my old blog post Using IPython Notebook with Pandas and exporting to Markdown.

Importing and cleaning the CSV

import pandas as pd

#csv_path = 'https://raw.githubusercontent.com/nygeog/data/master/nyc_crashes/data/NYPD_Motor_Vehicle_Collisions.csv'
inCSV = 'data/NYPD_Motor_Vehicle_Collisions.csv'
ouCSV = 'data/nypd_mv_collisions.csv'

df = pd.read_csv(inCSV).rename(columns=lambda x: x.lower())

#drop ones w/out valid lat #super lazy, just grabbing lat's above 35
df = df[(df.latitude > 35)]
#print df.dtypes 
print len(df.index)
df.head(5)

449644 #count of valid lat's records 

The original CSV data (first 5 records)

	date	time	borough	zip code	latitude	longitude	location	on street name	cross street name	off street name	...	contributing factor vehicle 2	contributing factor vehicle 3	contributing factor vehicle 4	contributing factor vehicle 5	unique key	vehicle type code 1	vehicle type code 2	vehicle type code 3	vehicle type code 4	vehicle type code 5
1	02/13/2015	21:45	MANHATTAN	10002	40.715622	-73.994275	(40.7156221, -73.9942752)	FORSYTH STREET	CANAL STREET	NaN	...	NaN	NaN	NaN	NaN	3168577	PASSENGER VEHICLE	UNKNOWN	NaN	NaN	NaN
2	02/13/2015	21:45	MANHATTAN	10001	40.747535	-73.988307	(40.7475349, -73.9883068)	WEST 31 STREET	BROADWAY	NaN	...	Fatigued/Drowsy	NaN	NaN	NaN	3169163	TAXI	TAXI	NaN	NaN	NaN
3	02/13/2015	21:45	BRONX	10462	40.833558	-73.857732	(40.8335582, -73.8577325)	WESTCHESTER AVENUE	PUGSLEY AVENUE	NaN	...	Unspecified	NaN	NaN	NaN	3169251	PASSENGER VEHICLE	SPORT UTILITY / STATION WAGON	NaN	NaN	NaN
4	02/13/2015	21:44	MANHATTAN	10017	40.748800	-73.969846	(40.7487997, -73.969846)	EAST 42 STREET	1 AVENUE	NaN	...	Other Vehicular	NaN	NaN	NaN	3169176	PASSENGER VEHICLE	PASSENGER VEHICLE	NaN	NaN	NaN
5	02/13/2015	21:40	STATEN ISLAND	10304	40.617295	-74.080479	(40.6172954, -74.0804791)	PARK HILL AVENUE	OSGOOD AVENUE	NaN	...	Unspecified	NaN	NaN	NaN	3169614	PASSENGER VEHICLE	PASSENGER VEHICLE	NaN	NaN	NaN

5 rows × 29 columns

Mergin’ the date (day) and time (hours)

#create datetime http://stackoverflow.com/questions/17978092/combine-date-and-time-columns-using-python-pandas
df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])


df = df[['datetime','latitude','longitude']]
df = df[(df.datetime > '2014-01-01 00:00:01')] #query out only data from 2014 onward
df = df.sort('datetime')
df.to_csv(ouCSV,index=False)
print len(df.index)

192324 #count of records 2014+

df.head(5)

Clean and small data to import to CartoDB

	datetime	latitude	longitude
228854	2014-01-01 00:01:00	40.725432	-73.996771
228850	2014-01-01 00:01:00	40.767889	-73.981512
228855	2014-01-01 00:01:00	40.750844	-73.978608
229144	2014-01-01 00:15:00	40.588646	-73.992452
228849	2014-01-01 00:20:00	40.689019	-73.986157

The table on CartoDB’s PostGIS Server

IPython Notebook to Markdown

IPython nbconvert 01-read-crash-data.ipynb --to markdown

Maps in CartoDB

PostGIS SQL Statement to grab 2014 from the 2014-2015/02/13 Table

SELECT * FROM nypd_mv_collisions_2014_20150213 WHERE datetime BETWEEN '2014-01-01 00:00:00' and '2015-01-01 00:00:00'

Density Hex Bins of 2014 Collisions

I’m not totally sure if when creating the classes if it ignores hex bins with no collisions or what. Need to look into that.

Density of 2014 Collisions over time (using Torque)

Animation of Density of Collisions. I think these are Kernel Densities but not sure, need to check CartoDB documentation.

Valentine’s Day 2014 Collisions (Clickbait)

I hate Clickbait. :)

Parting Note:

While informative, crash data doesn’t really tell us anything interesting about street design and whether specific places are more prone to crashes as without a denominator (such as traffic and/or pedestrian and/or cyclists) its difficult to discern if there’s just more targets for collisions (like counting homicides vs. homicides per 10,000 people) or if street design (or in the case of homicides, higher homicide rate) is actually to blame.

New York City has some traffic estimates but nothing we can attribute to every street over the whole city. There may be something that NYC DOT or someone else has, but until its FOIL’ed we can’t really create that denominator without more reliable data.