diff options
| author | Thibaut Horel <thibaut.horel@gmail.com> | 2015-09-16 00:08:42 -0400 |
|---|---|---|
| committer | Thibaut Horel <thibaut.horel@gmail.com> | 2015-09-16 00:08:42 -0400 |
| commit | 0ed0af6356b6ca962725d7a8bc4a07aad2daf437 (patch) | |
| tree | f0e23285a84dff894ecb083bb4bdb541cd7b1502 | |
| parent | 0aa80a77be642ed1cc8b8f6f0ccd46912e19f697 (diff) | |
| download | criminal_cascades-0ed0af6356b6ca962725d7a8bc4a07aad2daf437.tar.gz | |
First pass on supplemts up to first paragraph of 1.2
| -rw-r--r-- | supplements/main.tex | 104 |
1 files changed, 64 insertions, 40 deletions
diff --git a/supplements/main.tex b/supplements/main.tex index 72a0950..8b7d0f2 100644 --- a/supplements/main.tex +++ b/supplements/main.tex @@ -1,11 +1,15 @@ \documentclass{article} -\usepackage[utf8]{inputenc} +\usepackage[utf8x]{inputenc} \usepackage{amsmath} \usepackage{algorithm}% http://ctan.org/pkg/algorithms \usepackage{algpseudocode}% http://ctan.org/pkg/algorithmicx \usepackage{graphicx} +\usepackage{microtype} +\usepackage{verbatim} + \DeclareMathOperator*{\argmax}{argmax} \providecommand{\e}[1]{\ensuremath{\times 10^{#1}}} +\newcommand{\given}{\,|\,} \title{Hawkes contagion model} \author{Ben Green \and Thibaut Horel \and Andrew Papachristos} @@ -15,61 +19,76 @@ \maketitle -\section{Hawkes contagion model} -We model the contagion of violence using a multidimensional Hawkes process over the co-offending network. +\section{Contagion Model} -\subsection{Background} -We first develop the theory and notation behind the Hawkes process as it is traditionally presented. +We model the contagion of violence using a multi-dimensional Hawkes process +over the co-offending network. Section~\ref{sec:background} briefly presents +the general definition of Hawkes Processes which is then instantiated and +adapted to the contagion of gun violence in Section~\ref{sec:model}. -The Hawkes process models the instantaneous propensity to become infected at time $t$ based on both exogenous and endogenous factors: a background rate $\mu$ that captures infections unrelated to social interactions and a peer contagion component that considers the social influence of previous infections. The instantaneous infection rate $\lambda(t)$ (also known as the ``hazard function" and ``conditional intensity function") is typically defined as follows: -\begin{equation} -\lambda(t) = \mu + \alpha\sum_{t_i < t}e^{-\beta(t-t_i)} -\end{equation} +\subsection{Hawkes Processes} +\label{sec:background} -More generally, the Hawkes process can be written as -\begin{equation}\label{eq:rate} -\lambda(t) = \mu + \sum_{t_u < t_v}p(u,v) -\end{equation} -where in the typical case $p(u,v)=\alpha e^{-\beta(t_v-t_u)}$. +Hawkes processes are a class of multivariate self-exciting temporal point +processes originally introduced by Alan G. Hawkes in the early 1970s (CITE) and +have since been used to model a wide range of phenomena ranging from seismic +events to information spread in social networks to stock market trading +dynamics. -Based on the instantaneous infection rate $\lambda(t)$ we can define the probability of certain events. Doing so relies on the following functions: -\begin{itemize} -\item Conditional density function $f$: the probability that an infection will occur at a given time. -\item Cumulative distribution function $F$: the probability that an infection will occur before the current time $t$. -\item Survival function $S$: the probability that an infection will not have occurred before the current time $t$. This implies that $S = 1 - F$. -\end{itemize} +In a Hawkes process, the conditional intensity function at any given time $t$ +(\emph{i.e} the instantaneous probability of occurrence of an event) can be +written as the sum of a constant exogenous intensity and endogenous +time-varying intensities for the events preceding time $t$. -We can define the probability of infection at a given time $t$ as the instantaneous infection rate at $t$ multiplied by the probability that the item survived uninfected up until $t$. +Formally, for a $D$ dimensional Hawkes process, let us introduce the set of +events $\{(t_i, k_i)\}_{1\leq i \leq N}$ where $t_i$ denotes the time of event +$i$ and $k_i$ the dimension (or coordinate) on which it occurs. The conditional +intensity function is defined as follows: \begin{equation} -f(t) = \lambda(t) S(t) + \label{eq:hawkes} + \lambda_k(t) = \mu_k + \sum_{i=1}^N \phi_{k_i, k}(t-t_i), + \quad 1\leq k\leq D \end{equation} +where $M = (\mu_k)_{1\leq k\leq D}$ is the vector of exogenous intensities and +the functions $\Phi = (\phi_{i,j})_{1\leq i, j\leq D}$ is the matrix of kernel +functions (also known as exciting functions). For a pair of coordinates $(i, +j)$, $\phi_{i,j}$ models the temporal variations of the influence of coordinate +$i$ over coordinate $j$. The kernel functions are \emph{(i)} positive: +$\phi_{i,j}(t)\geq 0$ and \emph{(ii)} causal: $\phi_{i,j}(t) = 0$ whenever +$t<0$. In particular, this implies that the summation in \eqref{eq:hawkes} +is only over the indices $i$ such that $t_i< t$. -The survival function is given by +We refer the reader to (CITE) for a formal definition of the conditional +intensity function. We will simply use the following formula for the +log-likelihood of events $\mathcal{E} = \{(t_i, k_i)\}_{1\leq i\leq N}$ given +$M$ and $\Phi$ and observation period $[0, T]$: \begin{equation} -S(t) = \exp\left(-\int_{t_{last}}^t \lambda(s) ds\right) + \label{eq:likelihood} + \mathcal{L}(\mathcal{E}\given M, \Phi) = \sum_{i=1}^N \log\lambda_{k_i}(t_i) + - \sum_{k=1}^D\int_{0}^T \lambda_k(t) \end{equation} -where $t_{last}$ is the time of the most recent infection before $t$. -Now we can define -\begin{equation} -f(t) = \lambda(t) \exp\left(-\int_{t_{last}}^t \lambda(s) ds\right) -\end{equation} +\subsection{Contagion of Gun Violence as a Hawkes Process} +\label{sec:model} -The likelihood is given by the density function for each observed infection and the survival function for all times without an infection. For a Hawkes process over the period $[0,T]$ with $n$ infections, the likelihood is defined as -\begin{equation} -L = \left[ f(t_1) \ldots f(t_n) \right] S(T) = \left[\prod_{i=1}^{n} \lambda(t_i) \right] \exp\left(-\int_{0}^{T} \lambda(s) ds \right) -\end{equation} +We model the contagion of gun violence as a Hawkes Process by making the +following identifications: each network vertex (\emph{i.e} each individual) is +a coordinate of the Hawkes Process and each gunshot injury is an event of the +process occurring on a coordinate of the process, the victim of the injury. -\subsection{Our model} -We extend the single-dimensional Hawkes process to a multi-dimensional variant that can properly describe the contagion process over our network. We utilize a multidimensional approach where each network vertex (i.e. each individual) occupies its own dimension. This allows us to specify the unique set of social influences that each person encounters vis-a-vis his or her pattern of co-offending ties. +\paragraph{Exogenous intensity.} -\subsubsection{Infection rate} +The background rate $\mu(t)$ captures the seasonal rates of violence observed in the data. Given the regularity with which broad rates of violence fluctuate, we assume this process occurs exogenously and is not solely driven by peer contagion. We fit a time-varying function to the data, as described in Section SX.X). + +\paragraph{Exciting functions.} + +\begin{comment} We define an instantaneous infection rate that is a variant of the traditional one presented in Equation~\ref{eq:rate}. In particular, we define a unique infection rate for each network vertex $v$. \begin{equation} \lambda_v(t) = \underbrace{\mu(t)}_\text{background} + \underbrace{\sum_{u \in V} \Lambda_{uv}(t)}_\text{peer infection} \end{equation} +\end{comment} -The background rate $\mu(t)$ captures the seasonal rates of violence observed in the data. Given the regularity with which broad rates of violence fluctuate, we assume this process occurs exogenously and is not solely driven by peer contagion. We fit a time-varying function to the data, as described in Section SX.X). The infection intensity function $\Lambda_{uv}(t)$ models the effect of person $u$ on person $v$ at time $t$. It is based on two common assumptions regarding the spread of contagions. \begin{enumerate} @@ -92,7 +111,10 @@ With our infection rate fully-defined, we can now formulate the likelihood funct \label{fig:hawkes-diagram} \end{figure} -\subsection{Finding the background rate} +\section{Model Inference} + +\subsection{Background rate} + Because the seasonal variations in gunshot rates (Figure SX) remain consistent throughout the study period, we assume these are inherent to the infection process and not purely driven by noise or social contagion. Instead of having a constant background rate, we capture seasonal variations as a periodic sinusoidal function. We first compute the aggregate background rate of all the nodes, based on the number of infections each day. \begin{equation} @@ -141,7 +163,8 @@ This yields the result \end{align} where $\mu_0=\mu_0'/|V|$. -\subsection{Learning the parameters [THIBAUT FILL IN THIS SECTION]} +\subsection{Kernel function parameters} + We learn parameters using $\mu_0 = 1.1845e-05$, $\alpha = 0.00317$, and $\beta = 0.0039$. @@ -150,8 +173,9 @@ $\mu_0 = 1.1845e-05$, $\alpha = 0.00317$, and $\beta = 0.0039$. \lambda_v(t) = 1.1845\e{-5} \left[1 + 0.43 \sin\left(\frac{2\pi}{365.24} t + 4.36\right) \right] + \sum_{u \in V} \frac{0.00317}{\text{dist}(u,v)} 0.0039 e^{-0.0039(t-t_u)} \end{equation} -\subsection{Inferring infections} +\section{Inferring infections} [how we determine background vs peer infection] + We can estimate if a person was primarily infected via peer contagion by comparing the contributions from the background rate and from his or her peers. We take this approach one step further to determine the person most responsible for infecting each of these 7,016 individuals infected by social contagion. |
