For work, a while back, I used NAICS codes on some commercial business data density by Census Tract (2010) to train a Decision Tree Regressor model for WeWork location low desk prices (see mape below). My colleague Stuart Lynn was helpful in reviewing my code and suggested the ML method. WeWork has actually opened some offices since I did this about a year ago, so I’m actually curious how close my predicted prices turned out to match the new locations. More on that later hopefully.

The Proposal

I’d like to train a model using Decision Tree Regressor to predict Walkability in New York City, with results that closely match the BEH Neighborhood Walkability Index using OpenStreetMap derived variables, including POI feature tags, street form and density. After I do this, I’d like to attempt applying this to the rest of the United States and see if there is perhaps a way to scale so that Walkability measures can be applied globally, in areas with adequate OpenStreetMap data.

Predicting WeWork Desk Prices

WeWork Locations - Predicted (low) Desk Price - DecisionTreeRegressor


Below is a some text from the BEH Neighborhood Walkability Index page on the BEH website. I worked for this research group for 6.5 years at Columbia University, so I think this would be a fun exercise to apply Machine Learning to.

We have conducted a series of analyses investigating whether “neighborhood walkability” is associated with lower body mass index (BMI), greater levels of physical activity and more pedestrian activity among residents of New York City.

Urban planners refer “neighborhood walkability” as the extent to which neighborhood design supports walking and they describe neighborhood walkability in terms of “the D’s” – density, diversity, design, destination accessibility and distance to transit. Density refers to attributes of interest per geographic area, diversity refers to the mix of land uses, design pertains to the layout of the street grid, destination accessibility is the availability of destinations to travel to such as stores, and distance to transit is the physical distance to public transportation. Additional neighborhood characteristics such as aesthetics and safety can also promote walking and are often described as being part of neighborhood walkability.

Our initial studies funded by NIEHS showed that indicators of neighborhood walkability described in the urban planning literature – population density, land use mix and access to public transit – were associated with lower BMI among adults and higher levels of physical activity in children [1, 2, 3]. However, further analyses showed that associations between these indicators of neighborhood walkability and BMI were only apparent in more socioeconomically advantaged individuals [4], a finding that is fairly consistent among the literature [5]. We were also able to show that measures of neighborhood aesthetic qualities were also associated with lower BMI.

Dr. Rundle recently spoke at ISBNPA webinar this past March, so here is some useful background info regarding Walkability in New York City.


I keep trying to think of useful examples to use OpenStreetMap data and using tagged POI and street form features, it may be possible to accomplish this.

To be continued…

In hopes to get back into blogging and doing more ad hoc and data research for fun, I’m hoping to continue this project into the future and hope to document it via this blog. Thanks!