How we did the Storefront Index

We’ve received many questions on how we did the analysis behind our Storefront Index. This post will describe our dataset, our method, and how we created our visualizations. We hope that this will spur future research and new forms of visualizations, similar to the way in which the release of our Lost In Place data led to amazing reinterpretations of the dataset.

Screen Shot 2016-05-20 at 11.28.32 AM

We used a database of businesses from Custom Lists American Business Directory. The Directory contains 2014 records on US businesses, their industry classification, and their address. Our aim was to understand how clusters of these quasi-private storefront spaces contribute to active streetscapes and generated steady flows of people — so we filtered our business dataset based on three criteria: 1) businesses in the largest metro areas; 2) businesses that have storefronts; and 3) lastly, a spatial filter based on clustering.

For the first filter, we simply chose business in the largest metro areas based on the 2013 definitions of CBSA. For the second filter, we selected businesses that fit into one of 44 industry classifications that would typically have customer-serving storefront. These include businesses like grocers, bookstores, and salons. A full list of our categories can be found on page 18 of the report.

Armed with a storefront business dataset, we next sought to find clusters of storefronts. Thus, for each business with a storefront, we needed to know the distance to the next closest storefront. Our first task was to geocode the addresses, turning the address into a latitude and longitude that we could map. (Luckily, we performed this geocoding in ArcMap just before they cut off access to their geocoding API.) We then used the NEAR function in GIS software, allowing us to calculate the distance to the next closest storefront in meters. To apply our filter, we then chose only storefronts that had another storefront within 100 meters, allowing us to identify clusters of destinations that would be easily walkable.

With our three filters applied, we created a set of map images (for the report) and an interactive map. We used a 3-mile buffer around the central business district of our metro areas of interest. For the images, we highlighted these buffers in white, and used only the points of our clustered storefront locations and the US Census Bureau’s Roads shapefile. For the interactive web map, we used a circles to represent the 3-mile buffer, the points of our clustered storefront locations, and the Mapbox library with the Stamen Toner basemap.

Our list of industry classifications (using Standard Industry Classifications) can be found here and GeoJSON shapefiles for each metro area (using FIPS codes for each metro) can be found here.

Tracking Neighborhood Change: How we made “Lost In Place”

In this post, we’ll go over the data and mapping steps that were used to create our Lost In Place report on the concentration of poverty and the interactive web map. This post is one of several commentary posts that accompany the report, including an examination of how poverty has deepened.

Data for our report is provided at the Census Tract geography for US Metropolitan Statistical Areas (MSAs) with a 2010 population of over 1 million people — 51 in total. Our online map and report are based off two reported data points across five Census years: population and poverty levels in 1970, 1980, 1990, 2000 and 2010. Unfortunately for data analysis, Census Tract boundaries changed each census year between 1970 and 2010 as the geography of people changed. Fortunately, John Logan of Brown University and his colleagues released the Longitudinal Tract Database (LTDB) and have estimated tract-level Census counts from historical Census data from 1970 through 2010 using Census 2010 tract boundaries.

Two additional steps were necessary: we needed to determine which Census Tracts were part of our MSAs of interest, and in order to create maps and determine which tracts are within 10 miles of each MSAs central business district (CBD), we need to merge the Census tract data with their corresponding Census tract polygons. First, each 2010 Census Tract number is composed of a 2-digit identifier for the state, a 3-digit county identifier, followed by a 6-digit tract identifier. For example, Census Tract number 41051010600 can be decomposed State 41 (Oregon), County 051 (Multnomah) and Tract 010600. Using the Census Metropolitan Statistical Area Definition Files, we see that any Census Tract that starts with 41051 (often referred to as the FIPS State-County) is within the MSA 38900 (Portland-Vancouver-Hillsboro, OR-WA). Using this information, we filtered the our tracts list to only those within our MSAs of interest.

Second, using only this filtered set of Census Tracts, we matched the Census Tract numbers to the Census Tract numbers of Census 2010 tract shapefiles. Using a list of CBDs for each of the metro areas, we calculated the distance from the CBD to the nearest point of each MSAs Census Tract polygon. Once this spatial relationship is calculated, we could calculate totals for core MSA and the MSA as a total.

Using the existing literature, we developed the typology of tracts in poverty featured in the report. We used QGIS to create GEOJSON-based shapefiles with the geographic data in them. This allows us to host the files on github, making them available for download. But first, we needed to shrink the size of the shapefiles in order to serve an entire metro areas tract files quickly. We did this by simplifying the geography using rgeos in R. Additionally, by having a shapefile for each metropolitan area, we can quickly and dynamically load shapefiles for each metro area. To create the interactive maps for each metro, we used a combination of Mapbox.js and the Stamen Design basemaps.

We hope that both the data and the analysis that we develop at City Observatory help advance the understanding of cities.  Please feel free to contact us with your questions and comments.

Our dataset can be downloaded here.


At CityObservatory, we strive to make data the driving force behind our operations. We know that many of you share our keen interest in digging through the data, and we strongly believe that everyone benefits when data sources and methods are as transparent as possible.  In the spirit of open data, we’ve created this page as a one-stop shop for the data we’ve used to generate our CityReports.  We invite you to download and use this data in your city to further explore the factors that drive city success.

If you have any further questions, please don’t hesitate to email

Young and Restless

Our Young and Restless report provides data on the number of four-year college graduates aged 25-34, and 25 and older for the nation’s 51 largest metropolitan areas, and for close-in urban neighborhoods in those metros.  Data are from Census 2000, and the American Community Survey.  Data can be downloaded here.

Posts: The report is here, and the overview blog post is here.


Lost in Place

Our Lost in Place data is a subset of the Brown University US 2020 Longitudinal Tract Data Base.  We present tract level data on population and poverty for 1970, 1980, 1990, and 2000 for areas within 10 miles of the center of the nation’s 51 largest metropolitan areas.

Data can be downloaded here.

Posts: The report is here, and the overview blog post is here. An individual city dashboard is featured here, and maps for each metro are available here.

Other content: A post explaining how we did our analysis is here, our technical appendix here, and a deeper dive into the data is here and here.


Surging Center City Growth

We used the Census Bureau’s Local Employment and Housing Dynamics (LEHD) dataset to compile employment statistics for 41 of the nation’s 51 largest metropolitan aras for the years 2002, 2007, and 2011.  Here we report data for the city center of each metro (an area encompassed by a 3-mile radius around the center of the region’s major central business district).  Our techniques and methodology are spelled out in the appendix to this report.

Data can be downloaded here.

Posts: The report is here,  the overview blog post is here, and the city dashboard (comparing individual cities to the whole sample) is here.

List of Companies Moving to the City Center:

Metro Company
Atlanta Coca Cola, NCR
Austin Cirrus Logic
Boston Acquia, Biogen/IDEC
Chicago Archer Daniels Midland, Motorola, Hillshire Brands, United
Cincinnati Omnicare
Dallas Active Network
Detroit Quicken Loans, Blue Cross Blue Shield, Fifth Third Bank
Kansas City MindMixer
Las Vegas Zappos
Nashville Bridgestone
New York UBS, Hugo Boss
Pittsburgh Jawbone, Michael Baker, True Fit
San Diego Bumble Bee Seafoods
San Francisco PinterestVISA, Yahoo
Seattle AmazonTableau, Weyerhauser