"From manufacturing belt, to rust belt, to college country"

A Visual Narrative of the US Urban Growth

[Dani Arribas-Bel & Michiel Gerritse]

Results and code

Data preparation

This page describes the sources and preparation required to obtain the dataset we visualize in the paper. We used STATA to process these data so all the code shown here is written in that language.

The eventual analysis relies heavily on a dataset from the Integrated Public Use Microdata Series (IPUMS, link ipums.org). The access is free after registration. Reproduction of these results require the decadal census data, in particular the variables:

Type

Variable

Label

H

YEAR

Census year

H

DATANUM

Data set number

H

SERIAL

Household serial number

H

HHWT

Household weight

H

STATEFIP

State (FIPS code)

H

COUNTY

County

H

METAREA (general)

Metropolitan area [general version]

H

METAREAD (detailed)

Metropolitan area [detailed version]

H

GQ

Group quarters status

H

VALUEH

House value

H

ROOMS

Number of rooms

H

BUILTYR

Age of structure

H

BUILTYR2

Age of structure, decade

H

UNITSSTR

Units in structure

P

PERNUM

Person number in sample unit

P

PERWT

Person weight

P

SEX

Sex

P

AGE

Age

P

RACE (general)

Race [general version]

P

RACED (detailed)

Race [detailed version]

P

EDUC (general)

Educational attainment [general version]

P

EDUCD (detailed)

Educational attainment [detailed version]

P

IND

Industry

P

INCWAGE

Wage and salary income

Out of the samples:

1940 1%

1.0%

1950 1%

1.0%

1960 1%

1.0%

1970 1% metro fm1

1.0%

1970 1% metro fm2

1.0%

1980 5% state

5.0%

1990 5%

5.0%

2000 5%

5.0%

2010 ACS

1.0%

This yields a dataset of around 4Gb. Some of the variables were used to run hedonic land price and Mincer wage regression, so you might drop variables like rooms, structure age, sex, age and race, but not industries.

Running these files requires modifying the bits [folder] to the locations the dataset is loaded from and results are saved to. To construct the relevant variables, we take the log of average wages and rents.

use "[folder]\microdata.dta", clear
keep incwage valueh perwt year metarea
replace incwage=. if incwage==999999
replace incwage=.  if incwage==0
replace valueh=. if valueh==9999999
replace valueh=.  if valueh==0
egen yearweightwage=total(perwt) if incwage!=., by(year metarea)
gen wage=perwt/yearweightwage*incwage
egen yearweightrent=total(perwt) if valueh!=., by(year metarea)
gen rent=perwt/yearweightrent*valueh
collapse (sum) wage rent, by(year metarea)
gen lwage=log(wage)
gen lrent=log(rent)
keep year metarea lwage lrent
save "[folder]\metarealevel.dta", replace

Next, we take the share of total workers that classifies as a manufacturing worker or a service worker according to industry digits. We also define highly educated workers as having a college education or higher:

use "[folder]\microdata.dta", clear
keep year metarea statefip county ind educ perwt
drop if metarea==0
egen pop=total(perwt), by(year metarea)
replace pop=log(pop)
egen workers=total(perwt) if ind>1 & ind<994, by(year metarea)
egen workerman=total(perwt) if ind>300& ind<500, by(year metarea)
egen workerserv=total(perwt) if ind>700& ind<900, by(year metarea)
gen manufacturing=workerman/workers
gen service=workerserv/workers
drop workers*
egen obseduc=total(perwt) if educ>0, by(year metarea)
egen higheduc=total(perwt) if educ>6, by(year metarea)
gen college=higheduc/obseduc

Finally, we construct the specialization index (the Krugman K index). We preserve and restore the data to collapse for the specialization indexes and worker sectors and education:

preserve
drop if ind>900 &ind<1000
drop if ind>9000
drop if ind==0
bysort year: egen max=max(ind)
gen ind2=int(ind/10) if max<1000
replace ind2=int(ind/1000) if max>1000
bysort metarea year ind2: egen empl=total(perwt)
collapse (mean) empl, by(metarea year ind2)
bysort year metarea: egen emplmet=total(empl)
bysort year: egen emplnet=total(empl)
bysort year ind2: egen emplind=total(empl)
gen b=abs(empl/emplmet-emplind/emplnat)
bysort metarea year: egen k=total(b)
collapse (mean) k, by(year metarea)
sort metarea year
save "[folder]\specialization.dta", replace
restore 
collapse (mean) pop manufacturing service government college, by(year metarea)
save "[folder]\jobsdata.dta", replace

By now, we can put together the data and run the main regression:

clear
use "[folder]\metarealevel.dta", clear
drop if metarea==0 | metarea==.
sort metarea year
merge m:1 metarea using weatherdatastation.dta
rm weatherdatastation.dta
drop _merge
merge 1:1 metarea year using "[folder]\jobsdata.dta"
drop _merge
merge 1:1 metarea year using "[folder]\specialization.dta"
foreach var in manufacturing service college k{
    gen we`var'=.
    gen re`var'=.
    foreach x in 1940 1970 1980 1990 2000 2010 { /*1950 has no rents*/
        cap sureg (lwage `var') (lrent `var') if year==`x'
        cap replace we`var'=[lwage]`var'*`var' if year==`x'
        cap replace re`var'=[lrent]`var'*`var' if year==`x' 
    }
}

Where SUREG refers to the seemingly unrelated regression to allow unobserved to be correlated across the wage and rent regressions for the same metropolitan area in a year. This output, in a .csv form, serves as all necessary input for the visualization exercise.

keep year metarea we* re*
destring metarea, replace
outsheet using "[folder]\robackeffects.csv", comma nolabel replace