# Planar Point Patterns in PySAL

**Author: Serge Rey <sjsrey@gmail.com> and Wei Kang <weikang9009@gmail.com>**

## Introduction
This notebook introduces the basic PointPattern class in PySAL and covers the following:

* [What is a point pattern?](#What-is-a-point-pattern?)
* [Creating Point Patterns](#Creating-Point-Patterns)
* [Atributes of Point Patterns](#Attributes-of-PySAL-Point-Patterns)
* [Intensity Estimates](#Intensity-Estimates)
* [Next steps](#Next-steps)

## What is a point pattern?

We introduce basic terminology here and point the interested reader to more [detailed references](#References) on the underlying theory of the statistical analysis of point patterns.

### Points and Event Points

To start we consider a series of *point locations*, $(s_1, s_2, \ldots, s_n)$ in a study region $\Re$. We limit our focus here to a two-dimensional space so that $s_j = (x_j, y_j)$ is the spatial coordinate pair for point location $j$.

We will be interested in two different types of points.

#### Event Points

*Event Points* are locations where something of interest has occurred. The term *event* is very general here and could be used to represent a wide variety of phenomena. Some examples include:

* [locations of individual plants of a certain species](http://link.springer.com/chapter/10.1007/978-3-642-01976-0_7#page-1)
* [archeological sites](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjA46Si2oTKAhUU1GMKHZUBCBEQFgghMAA&url=http%3A%2F%2Fdiscovery.ucl.ac.uk%2F11345%2F&usg=AFQjCNG5dKBcsVJQZ9M20U5AOMTt3P6AWQ&sig2=Nt8ViSs8Q2G_-q1BSnNvKg&bvm=bv.110151844,d.cGc)
* [addresses of disease cases](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwiy7NSE2oTKAhUOyWMKHb7cDA4QFgghMAA&url=http%3A%2F%2Fwww.jstor.org%2Fstable%2F622936&usg=AFQjCNExfettAsU3i-Hs7twmB6_iVkghUA&sig2=tPROSM6wMtbZT0qlg_N6Hw&bvm=bv.110151844,d.cGc)
* [locations of crimes](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwiogfbl2YTKAhVT42MKHfTFCdUQFggqMAE&url=https%3A%2F%2Fgeodacenter.asu.edu%2Fsystem%2Ffiles%2Fpoints.pdf&usg=AFQjCNFase8ykAPuopayUDHQRvgj8S4Vsw&sig2=Ezzx45MLZIFaepvcOjV-aw)
* the [distribution of neurons](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2889688/)

among [many others](https://en.wikipedia.org/wiki/Point_process).

It is important to recognize that in the statistical analysis of point patterns the interest extends beyond the observed point pattern at hand.
The observed patterns are viewed as realizations from some underlying spatial stochastic process.


#### Arbitrary Points

The second type of point we consider are those locations where the phenomena  of interest has not been observed. These go by various names such as "empty space" or "regular" points, and at first glance might seem less interesting to a spatial analayst. However, these types of points play a central role in a class of point pattern methods that we explore below.


### Point Pattern Analysis

The analysis of event points focuses on a number of different characteristics of the collective spatial pattern that is observed. Often the pattern is jugded against the hypothesis of complete spatial randomness (CSR). That is, one assumes that the point events arise independently of one another and with constant probability across $\Re$, loosely speaking.

Of course, many of the empirical point patterns we encounter do not appear to be generated from such a simple stochastic process. The depatures from CSR can be due to two types of effects.

#### First order effects

For a point process, the first-order properties pertain to the intensity of the process across space. Whether and how the intensity of the point pattern varies within our study region are questions that assume center stage. Such variation in the itensity of the pattern of, say, addresses of individuals with a certain type of non-infectious disease may reflect the underlying population density. In other words, although the point pattern of disease cases may display variation in intensity in our study region, and thus violate the constant probability of an event condition, that spatial drift in the pattern intensity could be driven by an underlying covariate. 



#### Second order effects

The second channel by which departures from CSR can arise is through interaction and dependence between events in space. The canonical example being contagious diseases whereby the presence of an infected individual increases the probability of subsequent additional cases nearby.


When a pattern departs from expectation under CSR, this is suggestive that the underlying process may have some spatial structure that merits further investigation. Thus methods for detection of deviations from CSR and testing for alternative processes have given rise to a large literature in point pattern statistics.


### Methods of Point Pattern Analysis in PySAL

The points module in PySAL implements basic methods of point pattern analysis organized into the following groups:

* Point Processing
* Centrography and Visualization
* Quadrat Based Methods
* Distance Based Methods

In the remainder of this notebook we shall focus on point processing.

In [1]:
import pysal.lib as ps
import numpy as np
from pysal.explore.pointpats import PointPattern

## Creating Point Patterns

### From lists

We can build a point pattern by using Python lists of coordinate pairs $(s_0, s_1,\ldots, s_m)$ as follows:

In [2]:
points = [[66.22, 32.54], [22.52, 22.39], [31.01, 81.21],
          [9.47, 31.02],  [30.78, 60.10], [75.21, 58.93],
          [79.26,  7.68], [8.23, 39.93],  [98.73, 77.17],
          [89.78, 42.53], [65.19, 92.08], [54.46, 8.48]]
p1 = PointPattern(points)

In [3]:
p1.mbb

array([ 8.23,  7.68, 98.73, 92.08])

Thus $s_0 = (66.22, 32.54), \ s_{11}=(54.46, 8.48)$.

In [4]:
p1.summary()

Point Pattern
12 points
Bounding rectangle [(8.23,7.68), (98.73,92.08)]
Area of window: 7638.200000000002
Intensity estimate for window: 0.0015710507711240865
       x      y
0  66.22  32.54
1  22.52  22.39
2  31.01  81.21
3   9.47  31.02
4  30.78  60.10


In [5]:
type(p1.points)

pandas.core.frame.DataFrame

In [6]:
np.asarray(p1.points)

array([[66.22, 32.54],
       [22.52, 22.39],
       [31.01, 81.21],
       [ 9.47, 31.02],
       [30.78, 60.1 ],
       [75.21, 58.93],
       [79.26,  7.68],
       [ 8.23, 39.93],
       [98.73, 77.17],
       [89.78, 42.53],
       [65.19, 92.08],
       [54.46,  8.48]])

In [7]:
p1.mbb

array([ 8.23,  7.68, 98.73, 92.08])

### From numpy arrays

In [8]:
points = np.asarray(points)
points

array([[66.22, 32.54],
       [22.52, 22.39],
       [31.01, 81.21],
       [ 9.47, 31.02],
       [30.78, 60.1 ],
       [75.21, 58.93],
       [79.26,  7.68],
       [ 8.23, 39.93],
       [98.73, 77.17],
       [89.78, 42.53],
       [65.19, 92.08],
       [54.46,  8.48]])

In [9]:
p1_np = PointPattern(points)
p1_np.summary()

Point Pattern
12 points
Bounding rectangle [(8.23,7.68), (98.73,92.08)]
Area of window: 7638.200000000002
Intensity estimate for window: 0.0015710507711240865
       x      y
0  66.22  32.54
1  22.52  22.39
2  31.01  81.21
3   9.47  31.02
4  30.78  60.10


### From shapefiles

This example uses 200 randomly distributed points within the counties of Virginia. Coordinates are for UTM zone 17 N.

In [10]:
f = ps.examples.get_path('vautm17n_points.shp')
fo = ps.io.open(f)
pp_va = PointPattern(np.asarray([pnt for pnt in fo]))
fo.close()
pp_va.summary()

Point Pattern
200 points
Bounding rectangle [(273959.664381352,4049220.903414295), (972595.9895779632,4359604.85977962)]
Area of window: 216845506675.0557
Intensity estimate for window: 9.223156295311261e-10
               x             y
0  865322.486181  4.150317e+06
1  774479.213103  4.258993e+06
2  308048.692232  4.054700e+06
3  670711.529980  4.258864e+06
4  666254.475614  4.256514e+06


## Attributes of PySAL Point Patterns

In [11]:
pp_va.summary()

Point Pattern
200 points
Bounding rectangle [(273959.664381352,4049220.903414295), (972595.9895779632,4359604.85977962)]
Area of window: 216845506675.0557
Intensity estimate for window: 9.223156295311261e-10
               x             y
0  865322.486181  4.150317e+06
1  774479.213103  4.258993e+06
2  308048.692232  4.054700e+06
3  670711.529980  4.258864e+06
4  666254.475614  4.256514e+06


In [12]:
pp_va.points

Unnamed: 0,x,y
0,865322.486181,4.150317e+06
1,774479.213103,4.258993e+06
2,308048.692232,4.054700e+06
3,670711.529980,4.258864e+06
4,666254.475614,4.256514e+06
5,664464.571678,4.061242e+06
6,784718.209785,4.076109e+06
7,972595.989578,4.183781e+06
8,657798.357403,4.253278e+06
9,682259.020242,4.282441e+06


In [13]:
pp_va.head()

Unnamed: 0,x,y
0,865322.486181,4150317.0
1,774479.213103,4258993.0
2,308048.692232,4054700.0
3,670711.52998,4258864.0
4,666254.475614,4256514.0


In [14]:
pp_va.tail()

Unnamed: 0,x,y
195,876485.065262,4148120.0
196,621600.1114,4177462.0
197,450246.610116,4106031.0
198,740919.375814,4359605.0
199,797522.610898,4208606.0


### Intensity Estimates

The intensity of a point process at point $s_i$ can be defined as:

$$\lambda(s_j) = \lim \limits_{|\mathbf{A}s_j| \to 0} \left \{ \frac{E(Y(\mathbf{A}s_j)}{|\mathbf{A}s_j|} \right \}   $$

where $\mathbf{A}s_j$ is a small region surrounding location $s_j$ with area $|\mathbf{A}s_j|$, and $E(Y(\mathbf{A}s_j)$ is the expected number of event points in $\mathbf{A}s_j$. 

The intensity is the mean number of event points per unit of area at point $s_j$. 



Recall that one of the implications of CSR is that the intensity of the point process is constant in our study area $\Re$. In other words $\lambda(s_j) = \lambda(s_{j+1}) = \ldots = \lambda(s_n) = \lambda \ \forall s_j \in \Re$. Thus, if the area of $\Re$ = $|\Re|$ the expected number of event points in the study region is: $E(Y(\Re)) = \lambda |\Re|.$

In PySAL, the intensity is estimated by using a geometric object to encode the study region. We refer to this as the window, $W$. The reason for distinguishing between $\Re$ and $W$ is that the latter permits alternative definitions of the bounding object.

**Intensity estimates are based on the following:**
$$\hat{\lambda} = \frac{n}{|W|}$$

where $n$ is the number of points in the *window* $W$, and $|W|$ is the area of $W$.

**Intensity based on minimum bounding box:**
$$\hat{\lambda}_{mbb} = \frac{n}{|W_{mbb}|}$$

where $W_{mbb}$ is the minimum bounding box for the point pattern.

In [15]:
pp_va.lambda_mbb

9.223156295311263e-10

**Intensity based on convex hull:**
$$\hat{\lambda}_{hull} = \frac{n}{|W_{hull}|}$$

where $W_{hull}$ is the convex hull for the point pattern.

In [16]:
pp_va.lambda_hull

1.5973789098179388e-09

## Next steps


There is more to learn about point patterns in PySAL. 

The [centrographic notebook](centrography.ipynb) illustrates a number of spatial descriptive statistics and visualization of point patterns.

Clearly the window chosen will impact the intensity estimate. For more on **windows** see the [window notebook](window.ipynb).

To test if your point pattern departs from complete spatial randomness see the [distance statistics notebook](distance_statistics.ipynb) and  [quadrat statistics notebook](Quadrat_statistics.ipynb).


To simulate different types of point processes in various windows see [process notebook](process.ipynb).

If you have point pattern data with additional attributes associated with each point see how to handle this in the [marks notebook](marks.ipynb).

