Spatial Weights
Spatial weights are mathematical structures used to represent spatial relationships. Many spatial analytics, such as spatial autocorrelation statistics and regionalization algorithms rely on spatial weights. Generally speaking, a spatial weight $w_{i,j}$ expresses the notion of a geographical relationship between locations $i$ and $j$. These relationships can be based on a number of criteria including contiguity, geospatial distance and general distances.
PySAL offers functionality for the construction, manipulation, analysis, and conversion of a wide array of spatial weights.
We begin with construction of weights from common spatial data formats.
import pysal as ps
import numpy as np
There are functions to construct weights directly from a file path.
shp_path = "../data/texas.shp"
Weight Types
Contiguity:
Queen Weights
A commonly-used type of weight is a queen contigutiy weight, which reflects adjacency relationships as a binary indicator variable denoting whether or not a polygon shares an edge or a vertex with another polygon. These weights are symmetric, in that when polygon $A$ neighbors polygon $B$, both $w{AB} = 1$ and $w{BA} = 1$.
To construct queen weights from a shapefile, use the queen_from_shapefile
function:
qW = ps.queen_from_shapefile(shp_path)
dataframe = ps.pdio.read_files(shp_path)
qW
<pysal.weights.weights.W at 0x104142860>
All weights objects have a few traits that you can use to work with the weights object, as well as to get information about the weights object.
To get the neighbors & weights around an observation, use the observation's index on the weights object, like a dictionary:
qW[4] #neighbors & weights of the 5th observation (0-index remember)
{0: 1.0, 3: 1.0, 5: 1.0, 6: 1.0, 7: 1.0}
By default, the weights and the pandas dataframe will use the same index. So, we can view the observation and its neighbors in the dataframe by putting the observation's index and its neighbors' indexes together in one list:
self_and_neighbors = [4]
self_and_neighbors.extend(qW.neighbors[4])
print(self_and_neighbors)
[4, 0, 3, 5, 6, 7]
and grabbing those elements from the dataframe:
dataframe.loc[self_and_neighbors]
NAME | STATE_NAME | STATE_FIPS | CNTY_FIPS | FIPS | STFIPS | COFIPS | FIPSNO | SOUTH | HR60 | ... | BLK90 | GI59 | GI69 | GI79 | GI89 | FH60 | FH70 | FH80 | FH90 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | Ochiltree | Texas | 48 | 357 | 48357 | 48 | 357 | 48357 | 1 | 0.000000 | ... | 0.021911 | 0.236998 | 0.352940 | 0.343949 | 0.374461 | 5.172414 | 4.0 | 4.758392 | 9.159159 | <pysal.cg.shapes.Polygon object at 0x10d91efd0> |
0 | Lipscomb | Texas | 48 | 295 | 48295 | 48 | 295 | 48295 | 1 | 0.000000 | ... | 0.031817 | 0.286929 | 0.378219 | 0.407005 | 0.373005 | 6.724512 | 4.5 | 3.835360 | 6.093580 | <pysal.cg.shapes.Polygon object at 0x10d91ee10> |
3 | Hansford | Texas | 48 | 195 | 48195 | 48 | 195 | 48195 | 1 | 0.000000 | ... | 0.000000 | 0.253527 | 0.357813 | 0.393938 | 0.383924 | 7.591786 | 4.7 | 5.542986 | 7.125457 | <pysal.cg.shapes.Polygon object at 0x10d91ef60> |
5 | Roberts | Texas | 48 | 393 | 48393 | 48 | 393 | 48393 | 1 | 0.000000 | ... | 0.000000 | 0.320275 | 0.318656 | 0.398681 | 0.339626 | 5.762712 | 5.3 | 6.231454 | 4.885993 | <pysal.cg.shapes.Polygon object at 0x10d91b080> |
6 | Hemphill | Texas | 48 | 211 | 48211 | 48 | 211 | 48211 | 1 | 0.000000 | ... | 0.188172 | 0.286707 | 0.385605 | 0.352996 | 0.346318 | 6.300115 | 6.9 | 6.451613 | 6.330366 | <pysal.cg.shapes.Polygon object at 0x10d91b0f0> |
7 | Hutchinson | Texas | 48 | 233 | 48233 | 48 | 233 | 48233 | 1 | 1.936915 | ... | 2.635369 | 0.199287 | 0.322579 | 0.323864 | 0.370207 | 5.387316 | 5.1 | 6.222679 | 8.484271 | <pysal.cg.shapes.Polygon object at 0x10d91b160> |
6 rows × 70 columns
A full, dense matrix describing all of the pairwise relationships is constructed using the .full
method, or when pysal.full
is called on a weights object:
Wmatrix, ids = qW.full()
#Wmatrix, ids = ps.full(qW)
Wmatrix
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 1., ..., 0., 0., 0.],
[ 0., 1., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 1., 1.],
[ 0., 0., 0., ..., 1., 0., 1.],
[ 0., 0., 0., ..., 1., 1., 0.]])
n_neighbors = Wmatrix.sum(axis=1) # how many neighbors each region has
n_neighbors[4]
5.0
qW.cardinalities[4]
5
Note that this matrix is binary, in that its elements are either zero or one, since an observation is either a neighbor or it is not a neighbor.
However, many common use cases of spatial weights require that the matrix is row-standardized. This is done simply in PySAL using the .transform
attribute
qW.transform = 'r'
Now, if we build a new full matrix, its rows should sum to one:
Wmatrix, ids = qW.full()
Wmatrix.sum(axis=1) #numpy axes are 0:column, 1:row, 2:facet, into higher dimensions
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1.])
Since weight matrices are typically very sparse, there is also a sparse weights matrix constructor:
qW.sparse
<254x254 sparse matrix of type '<class 'numpy.float64'>'
with 1460 stored elements in Compressed Sparse Row format>
qW.pct_nonzero #Percentage of nonzero neighbor counts
2.263004526009052
By default, PySAL assigns each observation an index according to the order in which the observation was read in. This means that, by default, all of the observations in the weights object are indexed by table order. If you have an alternative ID variable, you can pass that into the weights constructor.
For example, the texas.shp
dataset has a possible alternative ID Variable, a FIPS
code.
dataframe.head()
NAME | STATE_NAME | STATE_FIPS | CNTY_FIPS | FIPS | STFIPS | COFIPS | FIPSNO | SOUTH | HR60 | ... | BLK90 | GI59 | GI69 | GI79 | GI89 | FH60 | FH70 | FH80 | FH90 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lipscomb | Texas | 48 | 295 | 48295 | 48 | 295 | 48295 | 1 | 0.0 | ... | 0.031817 | 0.286929 | 0.378219 | 0.407005 | 0.373005 | 6.724512 | 4.5 | 3.835360 | 6.093580 | <pysal.cg.shapes.Polygon object at 0x10d91ee10> |
1 | Sherman | Texas | 48 | 421 | 48421 | 48 | 421 | 48421 | 1 | 0.0 | ... | 0.139958 | 0.288976 | 0.359377 | 0.415453 | 0.378041 | 5.665722 | 1.7 | 3.253796 | 3.869407 | <pysal.cg.shapes.Polygon object at 0x10d91ee80> |
2 | Dallam | Texas | 48 | 111 | 48111 | 48 | 111 | 48111 | 1 | 0.0 | ... | 2.050906 | 0.331667 | 0.385996 | 0.370037 | 0.376015 | 7.546049 | 7.2 | 9.471366 | 14.231738 | <pysal.cg.shapes.Polygon object at 0x10d91eef0> |
3 | Hansford | Texas | 48 | 195 | 48195 | 48 | 195 | 48195 | 1 | 0.0 | ... | 0.000000 | 0.253527 | 0.357813 | 0.393938 | 0.383924 | 7.591786 | 4.7 | 5.542986 | 7.125457 | <pysal.cg.shapes.Polygon object at 0x10d91ef60> |
4 | Ochiltree | Texas | 48 | 357 | 48357 | 48 | 357 | 48357 | 1 | 0.0 | ... | 0.021911 | 0.236998 | 0.352940 | 0.343949 | 0.374461 | 5.172414 | 4.0 | 4.758392 | 9.159159 | <pysal.cg.shapes.Polygon object at 0x10d91efd0> |
5 rows × 70 columns
The observation we were discussing above is in the fifth row: Ochiltree county, Texas. Note that its FIPS code is 48357.
Then, instead of indexing the weights and the dataframe just based on read-order, use the FIPS
code as an index:
qW = ps.queen_from_shapefile(shp_path, idVariable='FIPS')
qW[4] #fails, since no FIPS is 4.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-21-1d8a3009bc1e> in <module>()
----> 1 qW[4] #fails, since no FIPS is 4.
/Users/dani/anaconda/envs/gds-scipy16/lib/python3.5/site-packages/pysal/weights/weights.py in __getitem__(self, key)
504 {1: 1.0, 4: 1.0, 101: 1.0, 85: 1.0, 5: 1.0}
505 """
--> 506 return dict(list(zip(self.neighbors[key], self.weights[key])))
507
508 def __iter__(self):
KeyError: 4
Note that a KeyError
in Python usually means that some index, here 4
, was not found in the collection being searched, the IDs in the queen weights object. This makes sense, since we explicitly passed an idVariable
argument, and nothing has a FIPS
code of 4.
Instead, if we use the observation's FIPS
code:
qW['48357']
{'48195': 1.0, '48211': 1.0, '48233': 1.0, '48295': 1.0, '48393': 1.0}
We get what we need.
In addition, we have to now query the dataframe using the FIPS
code to find our neighbors. But, this is relatively easy to do, since pandas will parse the query by looking into python objects, if told to.
First, let us store the neighbors of our target county:
self_and_neighbors = ['48357']
self_and_neighbors.extend(qW.neighbors['48357'])
Then, we can use this list in .query
:
dataframe.query('FIPS in @self_and_neighbors')
NAME | STATE_NAME | STATE_FIPS | CNTY_FIPS | FIPS | STFIPS | COFIPS | FIPSNO | SOUTH | HR60 | ... | BLK90 | GI59 | GI69 | GI79 | GI89 | FH60 | FH70 | FH80 | FH90 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lipscomb | Texas | 48 | 295 | 48295 | 48 | 295 | 48295 | 1 | 0.000000 | ... | 0.031817 | 0.286929 | 0.378219 | 0.407005 | 0.373005 | 6.724512 | 4.5 | 3.835360 | 6.093580 | <pysal.cg.shapes.Polygon object at 0x10d91ee10> |
3 | Hansford | Texas | 48 | 195 | 48195 | 48 | 195 | 48195 | 1 | 0.000000 | ... | 0.000000 | 0.253527 | 0.357813 | 0.393938 | 0.383924 | 7.591786 | 4.7 | 5.542986 | 7.125457 | <pysal.cg.shapes.Polygon object at 0x10d91ef60> |
4 | Ochiltree | Texas | 48 | 357 | 48357 | 48 | 357 | 48357 | 1 | 0.000000 | ... | 0.021911 | 0.236998 | 0.352940 | 0.343949 | 0.374461 | 5.172414 | 4.0 | 4.758392 | 9.159159 | <pysal.cg.shapes.Polygon object at 0x10d91efd0> |
5 | Roberts | Texas | 48 | 393 | 48393 | 48 | 393 | 48393 | 1 | 0.000000 | ... | 0.000000 | 0.320275 | 0.318656 | 0.398681 | 0.339626 | 5.762712 | 5.3 | 6.231454 | 4.885993 | <pysal.cg.shapes.Polygon object at 0x10d91b080> |
6 | Hemphill | Texas | 48 | 211 | 48211 | 48 | 211 | 48211 | 1 | 0.000000 | ... | 0.188172 | 0.286707 | 0.385605 | 0.352996 | 0.346318 | 6.300115 | 6.9 | 6.451613 | 6.330366 | <pysal.cg.shapes.Polygon object at 0x10d91b0f0> |
7 | Hutchinson | Texas | 48 | 233 | 48233 | 48 | 233 | 48233 | 1 | 1.936915 | ... | 2.635369 | 0.199287 | 0.322579 | 0.323864 | 0.370207 | 5.387316 | 5.1 | 6.222679 | 8.484271 | <pysal.cg.shapes.Polygon object at 0x10d91b160> |
6 rows × 70 columns
Note that we have to use @
before the name in order to show that we're referring to a python object and not a column in the dataframe.
#dataframe.query('FIPS in self_and_neighbors') will fail because there is no column called 'self_and_neighbors'
Of course, we could also reindex the dataframe to use the same index as our weights:
fips_frame = dataframe.set_index(dataframe.FIPS)
fips_frame.head()
NAME | STATE_NAME | STATE_FIPS | CNTY_FIPS | FIPS | STFIPS | COFIPS | FIPSNO | SOUTH | HR60 | ... | BLK90 | GI59 | GI69 | GI79 | GI89 | FH60 | FH70 | FH80 | FH90 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FIPS | |||||||||||||||||||||
48295 | Lipscomb | Texas | 48 | 295 | 48295 | 48 | 295 | 48295 | 1 | 0.0 | ... | 0.031817 | 0.286929 | 0.378219 | 0.407005 | 0.373005 | 6.724512 | 4.5 | 3.835360 | 6.093580 | <pysal.cg.shapes.Polygon object at 0x10d91ee10> |
48421 | Sherman | Texas | 48 | 421 | 48421 | 48 | 421 | 48421 | 1 | 0.0 | ... | 0.139958 | 0.288976 | 0.359377 | 0.415453 | 0.378041 | 5.665722 | 1.7 | 3.253796 | 3.869407 | <pysal.cg.shapes.Polygon object at 0x10d91ee80> |
48111 | Dallam | Texas | 48 | 111 | 48111 | 48 | 111 | 48111 | 1 | 0.0 | ... | 2.050906 | 0.331667 | 0.385996 | 0.370037 | 0.376015 | 7.546049 | 7.2 | 9.471366 | 14.231738 | <pysal.cg.shapes.Polygon object at 0x10d91eef0> |
48195 | Hansford | Texas | 48 | 195 | 48195 | 48 | 195 | 48195 | 1 | 0.0 | ... | 0.000000 | 0.253527 | 0.357813 | 0.393938 | 0.383924 | 7.591786 | 4.7 | 5.542986 | 7.125457 | <pysal.cg.shapes.Polygon object at 0x10d91ef60> |
48357 | Ochiltree | Texas | 48 | 357 | 48357 | 48 | 357 | 48357 | 1 | 0.0 | ... | 0.021911 | 0.236998 | 0.352940 | 0.343949 | 0.374461 | 5.172414 | 4.0 | 4.758392 | 9.159159 | <pysal.cg.shapes.Polygon object at 0x10d91efd0> |
5 rows × 70 columns
Now that both are using the same weights, we can use the .loc
indexer again:
fips_frame.loc[self_and_neighbors]
NAME | STATE_NAME | STATE_FIPS | CNTY_FIPS | FIPS | STFIPS | COFIPS | FIPSNO | SOUTH | HR60 | ... | BLK90 | GI59 | GI69 | GI79 | GI89 | FH60 | FH70 | FH80 | FH90 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FIPS | |||||||||||||||||||||
48357 | Ochiltree | Texas | 48 | 357 | 48357 | 48 | 357 | 48357 | 1 | 0.000000 | ... | 0.021911 | 0.236998 | 0.352940 | 0.343949 | 0.374461 | 5.172414 | 4.0 | 4.758392 | 9.159159 | <pysal.cg.shapes.Polygon object at 0x10d91efd0> |
48295 | Lipscomb | Texas | 48 | 295 | 48295 | 48 | 295 | 48295 | 1 | 0.000000 | ... | 0.031817 | 0.286929 | 0.378219 | 0.407005 | 0.373005 | 6.724512 | 4.5 | 3.835360 | 6.093580 | <pysal.cg.shapes.Polygon object at 0x10d91ee10> |
48195 | Hansford | Texas | 48 | 195 | 48195 | 48 | 195 | 48195 | 1 | 0.000000 | ... | 0.000000 | 0.253527 | 0.357813 | 0.393938 | 0.383924 | 7.591786 | 4.7 | 5.542986 | 7.125457 | <pysal.cg.shapes.Polygon object at 0x10d91ef60> |
48393 | Roberts | Texas | 48 | 393 | 48393 | 48 | 393 | 48393 | 1 | 0.000000 | ... | 0.000000 | 0.320275 | 0.318656 | 0.398681 | 0.339626 | 5.762712 | 5.3 | 6.231454 | 4.885993 | <pysal.cg.shapes.Polygon object at 0x10d91b080> |
48211 | Hemphill | Texas | 48 | 211 | 48211 | 48 | 211 | 48211 | 1 | 0.000000 | ... | 0.188172 | 0.286707 | 0.385605 | 0.352996 | 0.346318 | 6.300115 | 6.9 | 6.451613 | 6.330366 | <pysal.cg.shapes.Polygon object at 0x10d91b0f0> |
48233 | Hutchinson | Texas | 48 | 233 | 48233 | 48 | 233 | 48233 | 1 | 1.936915 | ... | 2.635369 | 0.199287 | 0.322579 | 0.323864 | 0.370207 | 5.387316 | 5.1 | 6.222679 | 8.484271 | <pysal.cg.shapes.Polygon object at 0x10d91b160> |
6 rows × 70 columns
Rook Weights
Rook weights are another type of contiguity weight, but consider observations as neighboring only when they share an edge. The rook neighbors of an observation may be different than its queen neighbors, depending on how the observation and its nearby polygons are configured.
We can construct this in the same way as the queen weights, using the special rook_from_shapefile
function:
rW = ps.rook_from_shapefile(shp_path, idVariable='FIPS')
rW['48357']
{'48195': 1.0, '48295': 1.0, '48393': 1.0}
These weights function exactly like the Queen weights, and are only distinguished by what they consider "neighbors."
self_and_neighbors = ['48357']
self_and_neighbors.extend(rW.neighbors['48357'])
fips_frame.loc[self_and_neighbors]
NAME | STATE_NAME | STATE_FIPS | CNTY_FIPS | FIPS | STFIPS | COFIPS | FIPSNO | SOUTH | HR60 | ... | BLK90 | GI59 | GI69 | GI79 | GI89 | FH60 | FH70 | FH80 | FH90 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FIPS | |||||||||||||||||||||
48357 | Ochiltree | Texas | 48 | 357 | 48357 | 48 | 357 | 48357 | 1 | 0.0 | ... | 0.021911 | 0.236998 | 0.352940 | 0.343949 | 0.374461 | 5.172414 | 4.0 | 4.758392 | 9.159159 | <pysal.cg.shapes.Polygon object at 0x10d91efd0> |
48295 | Lipscomb | Texas | 48 | 295 | 48295 | 48 | 295 | 48295 | 1 | 0.0 | ... | 0.031817 | 0.286929 | 0.378219 | 0.407005 | 0.373005 | 6.724512 | 4.5 | 3.835360 | 6.093580 | <pysal.cg.shapes.Polygon object at 0x10d91ee10> |
48195 | Hansford | Texas | 48 | 195 | 48195 | 48 | 195 | 48195 | 1 | 0.0 | ... | 0.000000 | 0.253527 | 0.357813 | 0.393938 | 0.383924 | 7.591786 | 4.7 | 5.542986 | 7.125457 | <pysal.cg.shapes.Polygon object at 0x10d91ef60> |
48393 | Roberts | Texas | 48 | 393 | 48393 | 48 | 393 | 48393 | 1 | 0.0 | ... | 0.000000 | 0.320275 | 0.318656 | 0.398681 | 0.339626 | 5.762712 | 5.3 | 6.231454 | 4.885993 | <pysal.cg.shapes.Polygon object at 0x10d91b080> |
4 rows × 70 columns
Bishop Weights
In theory, a "Bishop" weighting scheme is one that arises when only polygons that share vertexes are considered to be neighboring. But, since Queen contiguigy requires either an edge or a vertex and Rook contiguity requires only shared edges, the following relationship is true:
where $\mathcal{Q}$ is the set of neighbor pairs via queen contiguity, $\mathcal{R}$ is the set of neighbor pairs via Rook contiguity, and $\mathcal{B}$ via Bishop contiguity. Thus:
Bishop weights entail all Queen neighbor pairs that are not also Rook neighbors.
PySAL does not have a dedicated bishop weights constructor, but you can construct very easily using the w_difference
function. This function is one of a family of tools to work with weights, all defined in ps.weights
, that conduct these types of set operations between weight objects.
bW = ps.w_difference(qW, rW, constrained=False, silent_island_warning=True) #silence because there will be a lot of warnings
bW.histogram
[(0, 161), (1, 48), (2, 33), (3, 8), (4, 4)]
Thus, the vast majority of counties have no bishop neighbors. But, a few do. A simple way to see these observations in the dataframe is to find all elements of the dataframe that are not "islands," the term for an observation with no neighbors:
islands = bW.islands
# Using `.head()` to limit the number of rows printed
dataframe.query('FIPS not in @islands').head()
NAME | STATE_NAME | STATE_FIPS | CNTY_FIPS | FIPS | STFIPS | COFIPS | FIPSNO | SOUTH | HR60 | ... | BLK90 | GI59 | GI69 | GI79 | GI89 | FH60 | FH70 | FH80 | FH90 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lipscomb | Texas | 48 | 295 | 48295 | 48 | 295 | 48295 | 1 | 0.0 | ... | 0.031817 | 0.286929 | 0.378219 | 0.407005 | 0.373005 | 6.724512 | 4.5 | 3.835360 | 6.093580 | <pysal.cg.shapes.Polygon object at 0x10d91ee10> |
1 | Sherman | Texas | 48 | 421 | 48421 | 48 | 421 | 48421 | 1 | 0.0 | ... | 0.139958 | 0.288976 | 0.359377 | 0.415453 | 0.378041 | 5.665722 | 1.7 | 3.253796 | 3.869407 | <pysal.cg.shapes.Polygon object at 0x10d91ee80> |
2 | Dallam | Texas | 48 | 111 | 48111 | 48 | 111 | 48111 | 1 | 0.0 | ... | 2.050906 | 0.331667 | 0.385996 | 0.370037 | 0.376015 | 7.546049 | 7.2 | 9.471366 | 14.231738 | <pysal.cg.shapes.Polygon object at 0x10d91eef0> |
3 | Hansford | Texas | 48 | 195 | 48195 | 48 | 195 | 48195 | 1 | 0.0 | ... | 0.000000 | 0.253527 | 0.357813 | 0.393938 | 0.383924 | 7.591786 | 4.7 | 5.542986 | 7.125457 | <pysal.cg.shapes.Polygon object at 0x10d91ef60> |
4 | Ochiltree | Texas | 48 | 357 | 48357 | 48 | 357 | 48357 | 1 | 0.0 | ... | 0.021911 | 0.236998 | 0.352940 | 0.343949 | 0.374461 | 5.172414 | 4.0 | 4.758392 | 9.159159 | <pysal.cg.shapes.Polygon object at 0x10d91efd0> |
5 rows × 70 columns
Distance
There are many other kinds of weighting functions in PySAL. Another separate type use a continuous measure of distance to define neighborhoods.
radius = ps.cg.sphere.RADIUS_EARTH_MILES
radius
3958.755865744055
#ps.min_threshold_dist_from_shapefile?
threshold = ps.min_threshold_dist_from_shapefile('../data/texas.shp',radius) # now in miles, maximum nearest neighbor distance between the n observations
threshold
60.47758554135752
knn defined weights
knn4_bad = ps.knnW_from_shapefile('../data/texas.shp', k=4) # ignore curvature of the earth
knn4_bad.histogram
[(4, 254)]
knn4 = ps.knnW_from_shapefile('../data/texas.shp', k=4, radius=radius)
knn4.histogram
[(4, 254)]
knn4[0]
{3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0}
knn4_bad[0]
{4: 1.0, 5: 1.0, 6: 1.0, 13: 1.0}
Kernel W
Kernel Weights are continuous distance-based weights that use kernel densities to define the neighbor relationship.
Typically, they estimate a bandwidth
, which is a parameter governing how far out observations should be considered neighboring. Then, using this bandwidth, they evaluate a continuous kernel function to provide a weight between 0 and 1.
Many different choices of kernel functions are supported, and bandwidths can either be fixed (constant over all units) or adaptive in function of unit density.
For example, if we want to use adaptive bandwidths for the map and weight according to a gaussian kernel:
kernelWa = ps.adaptive_kernelW_from_shapefile('../data/texas.shp', radius=radius)
kernelWa
<pysal.weights.Distance.Kernel at 0x7f8fe4cfe080>
dataframe.loc[kernelWa.neighbors[4] + [4]]
NAME | STATE_NAME | STATE_FIPS | CNTY_FIPS | FIPS | STFIPS | COFIPS | FIPSNO | SOUTH | HR60 | ... | BLK90 | GI59 | GI69 | GI79 | GI89 | FH60 | FH70 | FH80 | FH90 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | Ochiltree | Texas | 48 | 357 | 48357 | 48 | 357 | 48357 | 1 | 0.0 | ... | 0.021911 | 0.236998 | 0.352940 | 0.343949 | 0.374461 | 5.172414 | 4.0 | 4.758392 | 9.159159 | <pysal.cg.shapes.Polygon object at 0x7f8fe4ee1... |
5 | Roberts | Texas | 48 | 393 | 48393 | 48 | 393 | 48393 | 1 | 0.0 | ... | 0.000000 | 0.320275 | 0.318656 | 0.398681 | 0.339626 | 5.762712 | 5.3 | 6.231454 | 4.885993 | <pysal.cg.shapes.Polygon object at 0x7f8fe4ee1... |
3 | Hansford | Texas | 48 | 195 | 48195 | 48 | 195 | 48195 | 1 | 0.0 | ... | 0.000000 | 0.253527 | 0.357813 | 0.393938 | 0.383924 | 7.591786 | 4.7 | 5.542986 | 7.125457 | <pysal.cg.shapes.Polygon object at 0x7f8fe4ee1... |
4 | Ochiltree | Texas | 48 | 357 | 48357 | 48 | 357 | 48357 | 1 | 0.0 | ... | 0.021911 | 0.236998 | 0.352940 | 0.343949 | 0.374461 | 5.172414 | 4.0 | 4.758392 | 9.159159 | <pysal.cg.shapes.Polygon object at 0x7f8fe4ee1... |
4 rows × 70 columns
kernelWa.bandwidth[0:7]
array([[ 30.30546757],
[ 30.05684855],
[ 39.14876899],
[ 29.96302462],
[ 29.96302462],
[ 30.21084447],
[ 30.23619029]])
kernelWa[4]
{3: 9.99999900663795e-08, 4: 1.0, 5: 0.002299013803371608}
kernelWa[2]
{1: 9.99999900663795e-08, 2: 1.0, 8: 0.23409571720488287}
Distance Thresholds
#ps.min_threshold_dist_from_shapefile?
# find the largest nearest neighbor distance between centroids
threshold = ps.min_threshold_dist_from_shapefile('../data/texas.shp', radius=radius) # decimal degrees
Wmind0 = ps.threshold_binaryW_from_shapefile('../data/texas.shp', radius=radius, threshold=threshold*.9)
WARNING: there are 2 disconnected observations
Island ids: [133, 181]
Wmind0.histogram
[(0, 2),
(1, 3),
(2, 5),
(3, 4),
(4, 10),
(5, 26),
(6, 16),
(7, 31),
(8, 70),
(9, 32),
(10, 29),
(11, 12),
(12, 5),
(13, 2),
(14, 5),
(15, 2)]
Wmind = ps.threshold_binaryW_from_shapefile('../data/texas.shp', radius=radius, threshold=threshold)
Wmind.histogram
[(1, 2),
(2, 3),
(3, 4),
(4, 8),
(5, 5),
(6, 20),
(7, 26),
(8, 9),
(9, 32),
(10, 31),
(11, 37),
(12, 33),
(13, 23),
(14, 6),
(15, 7),
(16, 2),
(17, 4),
(18, 2)]
centroids = np.array([list(poly.centroid) for poly in dataframe.geometry])
centroids[0:10]
array([[-100.27156111, 36.27508641],
[-101.8930971 , 36.27325425],
[-102.59590795, 36.27354996],
[-101.35351324, 36.27230422],
[-100.81561379, 36.27317803],
[-100.81482387, 35.8405153 ],
[-100.2694824 , 35.83996075],
[-101.35420366, 35.8408377 ],
[-102.59375964, 35.83958662],
[-101.89248229, 35.84058246]])
Wmind[0]
{3: 1, 4: 1, 5: 1, 6: 1, 13: 1}
knn4[0]
{3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0}
Visualization
%matplotlib inline
import matplotlib.pyplot as plt
from pylab import figure, scatter, show
wq = ps.queen_from_shapefile('../data/texas.shp')
wq[0]
{4: 1.0, 5: 1.0, 6: 1.0}
fig = figure(figsize=(9,9))
plt.plot(centroids[:,0], centroids[:,1],'.')
plt.ylim([25,37])
show()
wq.neighbors[0]
[4, 5, 6]
from pylab import figure, scatter, show
fig = figure(figsize=(9,9))
plt.plot(centroids[:,0], centroids[:,1],'.')
#plt.plot(s04[:,0], s04[:,1], '-')
plt.ylim([25,37])
for k,neighs in wq.neighbors.items():
#print(k,neighs)
origin = centroids[k]
for neigh in neighs:
segment = centroids[[k,neigh]]
plt.plot(segment[:,0], segment[:,1], '-')
plt.title('Queen Neighbor Graph')
show()
wr = ps.rook_from_shapefile('../data/texas.shp')
fig = figure(figsize=(9,9))
plt.plot(centroids[:,0], centroids[:,1],'.')
#plt.plot(s04[:,0], s04[:,1], '-')
plt.ylim([25,37])
for k,neighs in wr.neighbors.items():
#print(k,neighs)
origin = centroids[k]
for neigh in neighs:
segment = centroids[[k,neigh]]
plt.plot(segment[:,0], segment[:,1], '-')
plt.title('Rook Neighbor Graph')
show()
fig = figure(figsize=(9,9))
plt.plot(centroids[:,0], centroids[:,1],'.')
#plt.plot(s04[:,0], s04[:,1], '-')
plt.ylim([25,37])
for k,neighs in Wmind.neighbors.items():
origin = centroids[k]
for neigh in neighs:
segment = centroids[[k,neigh]]
plt.plot(segment[:,0], segment[:,1], '-')
plt.title('Minimum Distance Threshold Neighbor Graph')
show()
Wmind.pct_nonzero
3.8378076756153514
wr.pct_nonzero
2.0243040486080974
wq.pct_nonzero
2.263004526009052
Exercise
- Answer this question before writing any code: What spatial weights structure would be more dense, Texas counties based on rook contiguity or Texas counties based on knn with k=4?
- Why?
- Write code to see if you are correct.