Using county-level data, it depends on who’s classification system you use
Counties may not be the right basis for diagnosing the contributors to Covid.
One of the oft-repeated claims in the pandemic is the notion that cities and density are significant contributors to the risk of being infected with the Covid-19 virus. Some of this, we have argued, is based on a deep-seated (and wrong-headed) prejudice associating cities with communicable diseases (the “teeming tenement” meme). But beyond base beliefs, what does the data show?
Because in the United States, the public health function is administered chiefly through counties, the nationwide data on the prevalence of Covid-19 cases and deaths is reported county-by-county. And analysts (ourselves included) have used this county-level data to plot the prevalence of the disease in different parts of the country. Our approach has been to aggregate data to the metropolitan level for the nation’s largest metro areas, based on the understanding that county units vary widely across the nation, and that the the labor, commuting and economic markets formed by metro areas are probably a more robust basis for comparing the extent of the disease nationwide.
As we noted earlier, several very good analysts, including Bill Frey at the Brookings Institution, Jed Kolko of Indeed and Bill Bishop at the Daily Yonder, have used the county-level data to look at the comparative prevalence of the disease in cities as compared to suburbs. The style of their analyses is similar: They look at county-level data, and classify counties as either urban, or some flavor of suburban or exurban. They then aggregate the data for all the similarly classified counties across the nation, and compute prevalence rates (reported cases or deaths divided by population). Jed Kolko’s analysis provides a representative example.
Translating with the Rosetta Stone
The challenge in interpreting their results comes from the fact that each of the three analysts uses a different system for classifying counties based on “urbanness,” as we explored in our earlier commentary on this subject. We concluded that there was no right or ideal way to provide decide such a classification of counties, and noted that the three methods differ substantially in both the number of places (and people) classified as “urban” and also in which counties fall into which bins.
To illustrate the practical differences between the different definitions, we created a kind of Rosetta Stone (with data graciously provided by each of the three authors). Let’s take a look at the city/suburb split using each of the three definitions. For this exercise, we use the USA data county level data on Covid Cases as of May 12. As we usually do at City Observatory, our focus is on the 53 metropolitan areas in the US with a population of one million or more.
Covid-19 prevalence
The following table illustrates Covid-19 prevalence aggregated by each of the three classification systems. The table shows the total population living in each county classification, the total number of reported cases in those counties, and computes the rate of cases (per 100,000) population.
Urban/Suburban Population, Covid-19 Cases, and Rate per 100,000, May 12, 2020, (Metropolitan Areas with 1 million or more population).
Population
Cases
Rate
Brookings
1-Urban Core
99,667,019
780,762
783
2-Mature Suburb
61,943,681
178,381
288
3-Emerging Suburb
14,308,473
27,413
192
4-Exurb
5,253,680
11,915
227
NonMetro
118,497
259
219
Total, Large Metros
181,291,350
998,730
551
Kolko
1-Urban
75,949,521
622,889
820
2-SuburbanHigh
69,104,197
274,372
397
3-SuburbanLow
36,237,632
101,469
280
Total, Large Metros
181,291,350
998,730
551
Yonder
1-Central counties
90,665,117
439,086
484
2-Suburban Counties
86,288,760
551,678
639
3-Exurban
4,218,976
7,707
183
4-Rural Adjacent to Large MSA
118,497
259
219
Total, Large Metros
181,291,350
998,730
551
As you can see, one gets a very different impression of whether city or suburb rates are higher depending on which classification one uses. Overall, the prevalence rate across categories is 551 cases per 100,000. But the Kolko and Brookings classifications imply that the average prevalence is about twice as high in cities as in suburbs, while the Yonder classification implies the reverse, that the rate is about 30 percent higher in suburbs than in cities.
Outside of New York
The New York City metropolitan area has been the epicenter of the pandemic, and has accounted for a disproportionate share of reported cases and deaths. Because of that concentration of cases, and the region’s large (nearly 20 million) population), it could be that this single metro area skews the totals. In addition, there are significant differences in how the three typologies classify counties in the New York metropolitan area. For example, the Brookings and Kolko methods classify Westchester and Nassau counties as “urban” while Yonder classifies those counties, and also Queens County, as suburban.
To filter out the effects of New York’s direct contribution to the pandemic, and to sidestep the disagreements about how to classify counties there, we construct a second table aggregating the data for the remaining 52 large metropolitan areas. First, its worth noting that the aggregate rate of reported cases per 100,000 population drops about 40 percent, to about 354 cases per 100,000.
Urban/Suburban Population, Covid-19 Cases, and Rate per 100,000, May 12, 2020, (Metropolitan Areas with 1 million or more population, excluding New York metro)
Population
Cases
Rate
Brookings
1-Urban Core
82,235,306
376,890
458
2-Mature Suburb
60,420,138
159,745
264
3-Emerging Suburb
14,042,865
25,657
183
4-Exurb
5,197,900
11,482
221
NonMetro
118,497
259
219
Total, Large Metros
162,014,706
574,033
354
Kolko
1-Urban
61,770,791
289,436
469
2-SuburbanHigh
65,521,033
199,860
305
3-SuburbanLow
34,722,882
84,737
244
Total, Large Metros
162,014,706
574,033
354
Yonder
1-Central counties
84,227,331
308,054
366
2-Suburban Counties
73,505,682
258,446
352
3-Exurban
4,163,196
7,274
175
4-Rural Adjacent to Large MSA
118,497
259
219
Total, Large Metros
162,014,706
574,033
354
Excluding New York shifts the apparent city/suburb balance in each of the three classification systems. Brookings reports essentially the same relative gap between city and suburban rates (with city rates about double those in mature suburbs). In the data that exclude the New York metro, Kolko still reports a higher prevalence of Covid in suburbs than in cities, although by a smaller margin (50 percent higher in cities, rather than nearly double). The Yonder tabulation now reports a higher rate of prevalence in urban counties than suburban ones, although by a relatively small margin (366 in cities vs 352 in suburbs).
In the end, we’d recommend that anyone interested in understanding the geography of the pandemic closely read the work of Kolko, Frey and Bishop–they’re all smart analysts who paint vivid pictures with the data. But as this analysis makes clear, drawing clear lines between urban and suburban counties is more of an art than a science, and its useful to understand the different definitions in order to be able to make sense of the conclusions. Even with the best efforts, it is hard to use highly aggregated county data to make a clear cases about whether the pandemic is much worse in cities than in suburbs. The devil is very much in the definitional details, as the range of estimates presented here illustrates.
In addition, it may help to look at the issue from different perspectives. As our own analysis comparing city and suburban rates within metropolitan areas shows, it matters a great deal more whether you are in a metro area with a high rate of infections, than whether you are in a city or suburb of any given metro area. Its also clearly the case that within metros, city and suburban rates are highly correlated, and that when the virus was spreading rapidly, the average suburb was only about six days or so behind its central city in the prevalence of the virus. Plus, when more detailed data becomes available, such as zip code level tabulations of cases and deaths, we may be able to more carefully discern the differences between cities and their suburbs.
Is the pandemic worse in cities or suburbs?
Using county-level data, it depends on who’s classification system you use
Counties may not be the right basis for diagnosing the contributors to Covid.
One of the oft-repeated claims in the pandemic is the notion that cities and density are significant contributors to the risk of being infected with the Covid-19 virus. Some of this, we have argued, is based on a deep-seated (and wrong-headed) prejudice associating cities with communicable diseases (the “teeming tenement” meme). But beyond base beliefs, what does the data show?
Because in the United States, the public health function is administered chiefly through counties, the nationwide data on the prevalence of Covid-19 cases and deaths is reported county-by-county. And analysts (ourselves included) have used this county-level data to plot the prevalence of the disease in different parts of the country. Our approach has been to aggregate data to the metropolitan level for the nation’s largest metro areas, based on the understanding that county units vary widely across the nation, and that the the labor, commuting and economic markets formed by metro areas are probably a more robust basis for comparing the extent of the disease nationwide.
As we noted earlier, several very good analysts, including Bill Frey at the Brookings Institution, Jed Kolko of Indeed and Bill Bishop at the Daily Yonder, have used the county-level data to look at the comparative prevalence of the disease in cities as compared to suburbs. The style of their analyses is similar: They look at county-level data, and classify counties as either urban, or some flavor of suburban or exurban. They then aggregate the data for all the similarly classified counties across the nation, and compute prevalence rates (reported cases or deaths divided by population). Jed Kolko’s analysis provides a representative example.
Translating with the Rosetta Stone
The challenge in interpreting their results comes from the fact that each of the three analysts uses a different system for classifying counties based on “urbanness,” as we explored in our earlier commentary on this subject. We concluded that there was no right or ideal way to provide decide such a classification of counties, and noted that the three methods differ substantially in both the number of places (and people) classified as “urban” and also in which counties fall into which bins.
To illustrate the practical differences between the different definitions, we created a kind of Rosetta Stone (with data graciously provided by each of the three authors). Let’s take a look at the city/suburb split using each of the three definitions. For this exercise, we use the USA data county level data on Covid Cases as of May 12. As we usually do at City Observatory, our focus is on the 53 metropolitan areas in the US with a population of one million or more.
Covid-19 prevalence
The following table illustrates Covid-19 prevalence aggregated by each of the three classification systems. The table shows the total population living in each county classification, the total number of reported cases in those counties, and computes the rate of cases (per 100,000) population.
Urban/Suburban Population, Covid-19 Cases, and Rate per 100,000,
May 12, 2020, (Metropolitan Areas with 1 million or more population).
As you can see, one gets a very different impression of whether city or suburb rates are higher depending on which classification one uses. Overall, the prevalence rate across categories is 551 cases per 100,000. But the Kolko and Brookings classifications imply that the average prevalence is about twice as high in cities as in suburbs, while the Yonder classification implies the reverse, that the rate is about 30 percent higher in suburbs than in cities.
Outside of New York
The New York City metropolitan area has been the epicenter of the pandemic, and has accounted for a disproportionate share of reported cases and deaths. Because of that concentration of cases, and the region’s large (nearly 20 million) population), it could be that this single metro area skews the totals. In addition, there are significant differences in how the three typologies classify counties in the New York metropolitan area. For example, the Brookings and Kolko methods classify Westchester and Nassau counties as “urban” while Yonder classifies those counties, and also Queens County, as suburban.
To filter out the effects of New York’s direct contribution to the pandemic, and to sidestep the disagreements about how to classify counties there, we construct a second table aggregating the data for the remaining 52 large metropolitan areas. First, its worth noting that the aggregate rate of reported cases per 100,000 population drops about 40 percent, to about 354 cases per 100,000.
Urban/Suburban Population, Covid-19 Cases, and Rate per 100,000,
May 12, 2020, (Metropolitan Areas with 1 million or more population, excluding New York metro)
Excluding New York shifts the apparent city/suburb balance in each of the three classification systems. Brookings reports essentially the same relative gap between city and suburban rates (with city rates about double those in mature suburbs). In the data that exclude the New York metro, Kolko still reports a higher prevalence of Covid in suburbs than in cities, although by a smaller margin (50 percent higher in cities, rather than nearly double). The Yonder tabulation now reports a higher rate of prevalence in urban counties than suburban ones, although by a relatively small margin (366 in cities vs 352 in suburbs).
In the end, we’d recommend that anyone interested in understanding the geography of the pandemic closely read the work of Kolko, Frey and Bishop–they’re all smart analysts who paint vivid pictures with the data. But as this analysis makes clear, drawing clear lines between urban and suburban counties is more of an art than a science, and its useful to understand the different definitions in order to be able to make sense of the conclusions. Even with the best efforts, it is hard to use highly aggregated county data to make a clear cases about whether the pandemic is much worse in cities than in suburbs. The devil is very much in the definitional details, as the range of estimates presented here illustrates.
In addition, it may help to look at the issue from different perspectives. As our own analysis comparing city and suburban rates within metropolitan areas shows, it matters a great deal more whether you are in a metro area with a high rate of infections, than whether you are in a city or suburb of any given metro area. Its also clearly the case that within metros, city and suburban rates are highly correlated, and that when the virus was spreading rapidly, the average suburb was only about six days or so behind its central city in the prevalence of the virus. Plus, when more detailed data becomes available, such as zip code level tabulations of cases and deaths, we may be able to more carefully discern the differences between cities and their suburbs.
Related Commentary