US Employment Centers

Access and software requirements¶

In [1]:

%matplotlib inline
import os
import pandas as pd
import employment_centers_tools as tools

Data¶

Employment data for 1990 and 2000 comes from the “Census 2000 Special Tabulation Product 64” (stp64), which offers tract- to-tract commuting flows for the whole country. We consider only tracts within boundaries of the MSAs studied and sum over inflows to obtain employment numbers at the tract level. For 2010, there is no stp64 so we use the Census Transportation Planning Products (CTPP) 5-year small area data, which is based on ACS 2006-2010 Census Data and provides employment counts at the tract level for 2010.

Since it is not clear to the authors whether it is legal to redistribute original Census data (particularly when the stp64 is not available as a free download), the raw dataset used for this paper is not included. However, in order to illustrate how the process was carried out, below we show the structure of the data and how that is entered into the code to identify employment centers. This should allow users who have accessed the original data from the Census to replicate our results. We do provide the output of the center identification in the form of shapefiles (one per MSA per year). These can be found in the repository where this notebook and other code is hosted.

We store employment data in csv files, using a separate one for each MSA for each year. The raw format of these files is as follows, where the top of the file for MSA 10180 in 1990 is shown:

In [2]:

!head /Users/dani/AAA/LargeData/T-CentersData/attributes/3-empDen/empDen1990/10180.csv

gisjoin2,emp,area,dens,densEB
48044100102,2855.0,45088315.8862,6.33201738385e-05,6.32174061116e-05
48044100105,357.0,16138974.527,2.212036455e-05,2.20337682175e-05
48044100104,272.0,18017057.8625,1.5096804488e-05,1.56769988243e-05
48044100107,565.0,18156353.5203,3.11185833305e-05,3.18837079257e-05
48044100108,1484.0,25713732.5898,5.77123525267e-05,5.73154004704e-05
48044100106,342.0,25402742.4933,1.34631132875e-05,1.35600665231e-05
48044100110,1095.0,12176093.1624,8.99303237414e-05,8.84590480255e-05
48044100112,1339.0,20690727.6527,6.47149787323e-05,6.86296980824e-05
48044100113,2364.0,30324316.5093,7.79572393421e-05,7.73806762075e-05

These files can be loaded and combined for 1990 in one DataFrame with the following code:

In [3]:

empF90 = '/Users/dani/AAA/LargeData/T-CentersData/attributes/3-empDen/empDen1990/'
emp90 = pd.concat([tools.load_msa_data(empF90+f, y90=True) for f in os.listdir(empF90)])

In [4]:

emp90.info()

<class 'pandas.core.frame.DataFrame'>
Index: 54994 entries, G48044100102 to G04001900049
Data columns (total 3 columns):
emp           54994 non-null float64
Shape_area    54994 non-null float64
msa           54994 non-null object
dtypes: float64(2), object(1)

In [5]:

emp90.head()

Out[5]:

	emp	Shape_area	msa
gisjoin2
G48044100102	2855	4.188840	m10180
G48044100105	357	1.499359	m10180
G48044100104	272	1.673839	m10180
G48044100107	565	1.686780	m10180
G48044100108	1484	2.388883	m10180

Similar steps can be taken for 2000:

In [6]:

empF00 = '/Users/dani/AAA/LargeData/T-CentersData/attributes/3-empDen/empDen2000/'
emp00 = pd.concat([tools.load_msa_data(empF00+f) for f in os.listdir(empF00)])

Steps for 2010 are slightly different as the data are collected from a different source. We begin with a csv that looks as follows:

In [7]:

!head /Users/dani/AAA/LargeData/ctpp/data2010/tractPOW_all.csv

head: /Users/dani/AAA/LargeData/ctpp/data2010/tractPOW_all.csv: No such file or directory

In [8]:

emp10_link = '/Users/dani/AAA/LargeData/ctpp/data2010/tractPOW_all.csv'
shp_link = '/Users/dani/AAA/LargeData/nhgis/shapeFiles/nhgis0019_shapefile_tl2010_us_tract_2010/US_tract_2010.shp'
cty2msa = pd.read_csv('cty2msa.csv', names=['cty', 'msa'], index_col=0, squeeze=True)
shpW_out = '/Users/dani/AAA/LargeData/nhgis/shapeFiles/nhgis0019_shapefile_tl2010_us_tract_2010/msa_shapes_ws/'
emp10 = pd.read_csv(emp10_link)[['GISJOIN', 'emp', 'Shape_area']]
emp10['Shape_area'] = emp10['Shape_area'] * 0.000001 #Area in Km2
emp10['msa'] = emp10['GISJOIN'].apply(lambda x: \
        tools.msafy('c'+x[1:3]+x[4:7], cty2msa))
emp10 = emp10.dropna() # Only tracts in MSA && w/ employment

Once all the data are loaded, it is very straightforward to replicate Table 1 in the paper:

In [9]:

table = pd.DataFrame()
for year, emp in [(1990, emp90), (2000, emp00), (2010, emp10)]:
    minTpMSA = emp.groupby('msa').count()['emp'].min()
    maxTpMSA = emp.groupby('msa').count()['emp'].max()
    meanTpMSA = emp.groupby('msa').count()['emp'].mean()
    summary = pd.Series({'N. MSAs': len(emp['msa'].unique()), \
                          'N. Tracts': len(emp), \
                          'Min. N. Tracts/MSA': minTpMSA, \
                          'Max. N. Tracts/MSA': maxTpMSA, \
                          'Average. N. Tracts/MSA': meanTpMSA})
    table[year] = summary
table.reindex(['N. MSAs', 'N. Tracts', 'Min. N. Tracts/MSA', \
               'Max. N. Tracts/MSA', 'Average. N. Tracts/MSA'])

Out[9]:

	1990	2000	2010
N. MSAs	359.00000	359.000000	359.000000
N. Tracts	54994.00000	52329.000000	58808.000000
Min. N. Tracts/MSA	17.00000	10.000000	13.000000
Max. N. Tracts/MSA	4565.00000	4493.000000	4510.000000
Average. N. Tracts/MSA	153.18663	145.763231	163.810585

Center identification¶

We use the data we have just loaded and combine them with geographical data to derive spatial relationships (these are needed to run the main identification algorithm). We use shapefiles of the tracts for each year downloaded from NHGIS and projected to state plane so areas can be calculated (not included in the repository either for similar redistribution reasons as with employment data).

Since the data are not shipped in the repository, the code below will only run if you provide it separately. This implies you need to point cent_inYY (replace YY by the year) to the folder where you have separate files for each MSA. Once you run the code below, shapefiles and .gal files will be created in the cent_outYY directory.

In [10]:

import numpy as np
import multiprocessing as mp
#pool = mp.Pool(mp.cpu_count())
seed = np.random.seed(1234)

For 1990, this would identify the centers (make sure to modify cent_in90 and cent_out90 to fit your setup):

In [11]:

cent_in90 = '/Users/dani/AAA/LargeData/T-CentersData/shapes/msaTracts1990polygonsSP/'
cent_out90 = '/Users/dani/Desktop/test90/'

emp90['dens_raw'] = (emp90['emp'] * 1.) / emp90['Shape_area']
emp90['GISJOIN'] = emp90.index
pars = [(emp90[emp90['msa']==msa], cent_in90+msa[1:]+'.shp', cent_out90) \
        for msa in emp90['msa'].unique()]

out = pool.map(tools.act_on_msa, pars[:2])

For 2000, this would identify the centers (make sure to modify cent_in00 and cent_out00 to fit your setup):

In [12]:

cent_in00 = '/Users/dani/AAA/LargeData/T-CentersData/shapes/msaTracts2000polygonsSP/'
cent_out00 = '/Users/dani/Desktop/test00/'

emp00['dens_raw'] = (emp00['emp'] * 1.) / emp00['Shape_area']
emp00['GISJOIN'] = emp00.index
pars = [(emp00[emp00['msa']==msa], cent_in00+msa[1:]+'.shp', cent_out00) \
        for msa in emp00['msa'].unique()]

out = pool.map(tools.act_on_msa, pars[:2])

For 2010, this would identify the centers (make sure to modify cent_in10 and cent_out10 to fit your setup). Note that in this case, the way the shapefile is passed is slightly different because we are using 2010 data that we have sourced differently.

In [13]:

cent_out10 = '/Users/dani/Desktop/test10/'
shp_link = '/Users/dani/AAA/LargeData/nhgis/shapeFiles/nhgis0019_shapefile_tl2010_us_tract_2010/US_tract_2010.shp'

emp10['dens_raw'] = (emp10['emp'] * 1.) / emp10['Shape_area']
emp10['GISJOIN'] = emp10.index
pars = [(emp10[emp10['msa']==msa], shp_link, cent_out10) \
        for msa in emp10['msa'].unique()]

out = pool.map(tools.act_on_msa, pars[:2])

Evolution tables¶

Once this has run, we have created three folders that contain shapefiles with the identified centers in each MSA in each period. A compilation of these can be found in the repository in the geojson format, one for each year. The output can be reloaded again and used to recreate Tables 2-4 in the paper.

In [14]:

cent_out90 = '/Users/dani/Desktop/test90/'
cent_out00 = '/Users/dani/Desktop/test00/'
cent_out10 = '/Users/dani/Desktop/test10/'

# Collect all results
years = [1990, 2000, 2010]
links = [cent_out90, cent_out00, cent_out10]
db = []
for year, link in zip(years, links):
    temp = pd.concat(map(lambda l: pd.read_csv(l, index_col=0), \
            [link+l for l in os.listdir(link) if l[-4:]=='.csv']))
    temp['year'] = year
    db.append(temp)
db = pd.concat(db)

In [15]:

tab = tools.evol_tab(db)
tab

Out[15]:

1990         2000         2010       
empty        empty        empty            1
             monocentric  monocentric     15
                          polycentric      2
             polycentric  monocentric      1
monocentric  empty        monocentric      1
             monocentric  empty            2
                          monocentric    148
                          polycentric     22
             polycentric  monocentric      7
                          polycentric     23
polycentric  monocentric  empty            1
                          monocentric     29
                          polycentric     11
             polycentric  monocentric      6
                          polycentric     90
dtype: int64

Which we can re-format to make it look more appealing:

In [16]:

print "Table 2"
tab2 = tab.groupby(level=[0, 1]).sum().unstack().fillna(0)
tab2['Total 1990'] = tab2.sum(axis=1)
tab2.loc['Total 2000', :] = tab2.sum(axis=0)
tab2

Table 2

Out[16]:

2000	empty	monocentric	polycentric	Total 1990
1990
empty	1	17	1	19
monocentric	1	172	30	203
polycentric	0	41	96	137
Total 2000	2	230	127	359

In [17]:

print "Table 3"
tab3 = tab.groupby(level=[1, 2]).sum().unstack().fillna(0)
tab3['Total 2000'] = tab3.sum(axis=1)
tab3.loc['Total 2010', :] = tab3.sum(axis=0)
tab3

Table 3

Out[17]:

2010	empty	monocentric	polycentric	Total 2000
2000
empty	1	1	0	2
monocentric	3	192	35	230
polycentric	0	14	113	127
Total 2010	4	207	148	359

In [18]:

print "Table 4"
tab4 = tab.groupby(level=[0, 2]).sum().unstack().fillna(0)
tab4['Total 2000'] = tab4.sum(axis=1)
tab4.loc['Total 2010', :] = tab4.sum(axis=0)
tab4

Table 4

Out[18]:

2010	empty	monocentric	polycentric	Total 2000
1990
empty	1	16	2	19
monocentric	2	156	45	203
polycentric	1	35	101	137
Total 2010	4	207	148	359

Spatial analysis of polycentricity¶

For this section, we also rely on two additional layers. One is the centroid of the MSAs, and is derived from the previous shapefiles by performing a simple GIS operation and projecting the output into the EPSG:2146. The second one is an additional shapefile downloaded from NHGIS as well that contains the polygons of the US continental lower states. Since this is very common and is used only for aesthetic purposes, the shapefile is not provided either.

For this illustration, we use only 99 random permutations to build the empirical distribution of both the global and local versions of Moran's I. The results presented in the paper, however, are based on 99,999 and thus are more reliable (although take much longer to run).

In [19]:

import pysal as ps
msa_pts_link = '/Users/dani/Desktop/msas_points2163.shp'
states_link = '/Users/dani/Desktop/states2163.shp'
years = [1990, 2000, 2010]
perms = 99

In [19]:

msas = db.groupby(['msa', 'year']).apply(\
            lambda x: x.groupby('center_id').ngroups).unstack()
ord = ['m'+i for i in ps.open(msa_pts_link.replace('.shp', '.dbf')).by_col('CBSA')]
msas = msas.reindex(ord)

Moran's I¶

Moran's I indices that are used to justify further spatial analysis in the paper. The idea is to explore whether the spatial arrangement of polycentricity follows any specific pattern discernible from spatial randomness. Rejecting the null points to such case.

In [20]:

morans = []
for k in [9]:
    w = ps.knnW_from_shapefile(msa_pts_link, k=k)
    w.transform = 'R'
    m_90 = ps.Moran(msas[1990].astype(float), w, permutations=perms)
    m_00 = ps.Moran(msas[2000].astype(float), w, permutations=perms)
    m_10 = ps.Moran(msas[2010].astype(float), w, permutations=perms)
    t90_00 = (msas[2000] - msas[1990]).astype(float)
    m_90_00 = ps.Moran(t90_00, w, permutations=perms)
    t00_10 = (msas[2010] - msas[2000]).astype(float)
    m_00_10 = ps.Moran(t00_10, w, permutations=perms)
    out = pd.DataFrame({'I': [m_90.I, m_00.I, m_10.I, m_90_00.I, m_00_10.I], \
            'p': [m_90.p_sim, m_00.p_sim, m_10.p_sim, m_90_00.p_sim, m_00_10.p_sim], \
            'var': ['m_90', 'm_00', 'm_10', 'm_90_00', 'm_00_10']})
    out['w'] = 'w' + str(k)
    morans.append(out)
morans = pd.concat(morans).pivot('var', 'w')
morans

Out[20]:

	I	p
w	w9	w9
var
m_00	0.061979	0.03
m_00_10	0.036066	0.09
m_10	0.074421	0.01
m_90	0.036422	0.11
m_90_00	0.020798	0.12

In [21]:

# Within a loop in case the user wants to explore different values for k
for k in [9]:
    w = ps.knnW_from_shapefile(msa_pts_link, k=k)
    w.transform = 'R'
    m_90 = ps.Moran_Local(msas[1990].astype(float).values, w, permutations=perms)
    p = tools.plot_lisa(m_90, st=states_link, msa=msa_pts_link, title='Centers in 1990')
    m_00 = ps.Moran_Local(msas[2000].astype(float).values, w, permutations=perms)
    p = tools.plot_lisa(m_00, st=states_link, msa=msa_pts_link, title='Centers in 2000 | Map 1')
    m_10 = ps.Moran_Local(msas[2010].astype(float).values, w, permutations=perms)
    p = tools.plot_lisa(m_10, st=states_link, msa=msa_pts_link, title='Centers in 2010')

    t90_00 = (msas[2000] - msas[1990]).astype(float).values
    m_90_00 = ps.Moran_Local(t90_00, w, permutations=perms)
    p = tools.plot_lisa(m_90_00, st=states_link, msa=msa_pts_link, \
                        title='Change in centers 1990-2000')
    t00_10 = (msas[2010] - msas[2000]).astype(float).values
    m_00_10 = ps.Moran_Local(t00_10, w, permutations=perms)
    p = tools.plot_lisa(m_00_10, st=states_link, msa=msa_pts_link, 
                        title='Change in centers 2000-2010 | Map 2')

City Characterization¶

Results from the random labeling analysis performed in Section 6 and presented in Tables 5 to 8 of the paper. This basically calculates average values for several variables by type of MSA (no centers, monocentric, polycentric) and tests for statistically significant differences between the averages by using the random labeling technique, borrowed from Rey & Sastré-Gutiérrez (2010). The variables employed are: total population, employment density, income per capita and percentage of people below the poverty line.

Population, income and poverty are also obtained from NHGIS and not redistributed in this context. Employment and density are calculated from the previous data shown in the sections above.

In [22]:

pop00F='/Users/dani/AAA/LargeData/T-CentersData/attributes/5-pop/pop2000/'
pop00 = pd.concat([tools.load_soc_ec(pop00F+l) for l in os.listdir(pop00F)])
pop00 = pop00.groupby('msa').sum()
                                                                              
sec00F='/Users/dani/AAA/LargeData/T-CentersData/attributes/7-socEc/socEc2000/'
sec00 = pd.concat([tools.load_soc_ec(sec00F+l) for l in os.listdir(sec00F)])
sec00 = sec00.groupby('msa').sum().drop('incpc', axis=1)
                 
msas_soec = db.groupby(['msa', 'year']).apply(\
            lambda x: x.groupby('center_id').ngroups)\
            .apply(tools._monopoly).unstack()
msas_soec = msas_soec.join(pop00).join(sec00)
msas_soec = msas_soec.join(db[db['year']==2000][['emp', 'Shape_area', 'msa']]\
        .groupby('msa').sum())
msas_soec['emp_dens'] = msas_soec['emp'] * 1. / msas_soec['Shape_area']
msas_soec['inc_pc'] = msas_soec['inc'] * 1. / msas_soec['pop']
msas_soec['pct_below'] = msas_soec['below'] * 100. / msas_soec['pop']
msas_soec = msas_soec[[1990, 2000, 2010, \
        'pop', 'emp_dens', 'inc_pc', 'pct_below']]

In [23]:

perms = 99
rl90_00 = tools.do_rl(msas_soec, [1990, 2000], perms)
for var in rl90_00:
    print "\n\t%s"%var
    print rl90_00[var].unstack().fillna('')

print '-------------------------------------------------------------------'
rl_tables = []   
rl00_10 = tools.do_rl(msas_soec, [2000, 2010], perms)
for var in rl00_10:
    print "\n\t%s"%var
    tab = rl00_10[var].unstack().fillna('')
    print tab
    rl_tables.append(tab)


	pop
2000             empty       monocentric       polycentric
1990                                                      
empty        57961.0-*   123688.117647-*           79551.0
monocentric    71914.0  188404.046512-**     416521.833333
polycentric                359553.609756  1749223.21875+**

	emp_dens
2000                 empty       monocentric       polycentric
1990                                                          
empty        3.97835490525  13.8418248027-**     41.5310305712
monocentric   1.7953228977  23.1552069292-**     32.0401867937
polycentric                    33.7103582425  78.0635013324+**

	inc_pc
2000                    empty       monocentric       polycentric
1990                                                             
empty           18261.7035593  17865.5425507-**     21530.0851026
monocentric  13672.9440721-**  19173.5165033-**     20090.7586838
polycentric                       19209.1533279  21906.4143172+**

	pct_below
2000                 empty      monocentric       polycentric
1990                                                         
empty        13.1122651438  14.233278484+**     8.09417857726
monocentric  14.5312456545    12.5869946497     12.4074469974
polycentric                    12.295737511  11.1312509008-**
-------------------------------------------------------------------

	pop
2010                 empty       monocentric       polycentric
2000                                                          
empty              57961.0           71914.0                  
monocentric  122379.666667  197135.020833-**     315223.685714
polycentric                    473461.142857  1538691.84956+**

	emp_dens
2010                 empty       monocentric      polycentric
2000                                                         
empty        3.97835490525      1.7953228977                 
monocentric  24.2420604204  22.4362925013-**    34.8467702828
polycentric                    28.1182516111  71.709533095+**

	inc_pc
2010                 empty       monocentric       polycentric
2000                                                          
empty        18261.7035593   13672.9440721-*                  
monocentric  18383.7643691  18922.0857024-**     20026.9314404
polycentric                    20818.9988839  21555.7755371+**

	pct_below
2010                 empty       monocentric       polycentric
2000                                                          
empty        13.1122651438     14.5312456545                  
monocentric  11.0085750881  12.8765604742+**     11.5922490179
polycentric                    12.0246560865  11.3324999094-**

These tables can be presented in a more pleasant way, adding the global averages as well the way they are shown in the paper:

In [24]:

print "Table 5: Size"
print "Global average: %.1f"%msas_soec['pop'].mean()
rl_tables[0]

Table 5: Size
Global average: 640335.4

Out[24]:

2010	empty	monocentric	polycentric
2000
empty	57961.0	71914.0
monocentric	122379.666667	197135.020833-**	315223.685714
polycentric		473461.142857	1538691.84956+**

In [25]:

print "Table 6: Density"
print "Global average: %.1f"%msas_soec['emp_dens'].mean()
rl_tables[1]

Table 6: Density
Global average: 39.3

Out[25]:

2010	empty	monocentric	polycentric
2000
empty	3.97835490525	1.7953228977
monocentric	24.2420604204	22.4362925013-**	34.8467702828
polycentric		28.1182516111	71.709533095+**

In [26]:

print "Table 7: Income per capita"
print "Global average: %.1f"%msas_soec['inc_pc'].mean()
rl_tables[2]

Table 7: Income per capita
Global average: 19911.8

Out[26]:

2010	empty	monocentric	polycentric
2000
empty	18261.7035593	13672.9440721-*
monocentric	18383.7643691	18922.0857024-**	20026.9314404
polycentric		20818.9988839	21555.7755371+**

In [27]:

print "Table 8: Poverty (in %)"
print "Global average: %.1f"%msas_soec['pct_below'].mean()
rl_tables[3]

Table 8: Poverty (in %)
Global average: 12.2

Out[27]:

2010	empty	monocentric	polycentric
2000
empty	13.1122651438	14.5312456545
monocentric	11.0085750881	12.8765604742+**	11.5922490179
polycentric		12.0246560865	11.3324999094-**

"The Validity of the Monocentric City Model in a Polycentric Age"

US Metropolitan Areas in 1990, 2000 and 2010

[Dani Arribas-Bel & Fernando Sanz-Gracia]

Results and code