summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2015-09-17 17:42:37 -0400
committerThibaut Horel <thibaut.horel@gmail.com>2015-09-17 17:42:37 -0400
commit4b8dd88bc89a5b71ec7e78b8977087816226b68e (patch)
treeb2fef777e6e3d857643cd992005488dcd7fa21ab
parent8fadc0e29d96b4394a787e730c732ba21252147a (diff)
downloadcriminal_cascades-4b8dd88bc89a5b71ec7e78b8977087816226b68e.tar.gz
Supplement, end of the model section
-rw-r--r--supplements/main.tex100
1 files changed, 78 insertions, 22 deletions
diff --git a/supplements/main.tex b/supplements/main.tex
index 8b7d0f2..46bd105 100644
--- a/supplements/main.tex
+++ b/supplements/main.tex
@@ -65,45 +65,100 @@ $M$ and $\Phi$ and observation period $[0, T]$:
\begin{equation}
\label{eq:likelihood}
\mathcal{L}(\mathcal{E}\given M, \Phi) = \sum_{i=1}^N \log\lambda_{k_i}(t_i)
- - \sum_{k=1}^D\int_{0}^T \lambda_k(t)
+ - \sum_{k=1}^D\int_{0}^T \lambda_k(t) dt
\end{equation}
\subsection{Contagion of Gun Violence as a Hawkes Process}
\label{sec:model}
We model the contagion of gun violence as a Hawkes Process by making the
-following identifications: each network vertex (\emph{i.e} each individual) is
-a coordinate of the Hawkes Process and each gunshot injury is an event of the
-process occurring on a coordinate of the process, the victim of the injury.
+following identifications: each network vertex in (\emph{i.e} each
+individual) is a coordinate of the Hawkes Process and each gunshot injury is an
+event of the process occurring on a coordinate of the process, the victim of
+the injury.
\paragraph{Exogenous intensity.}
-The background rate $\mu(t)$ captures the seasonal rates of violence observed in the data. Given the regularity with which broad rates of violence fluctuate, we assume this process occurs exogenously and is not solely driven by peer contagion. We fit a time-varying function to the data, as described in Section SX.X).
+We assume that the exogenous intensity is the same for all the individuals in
+the network. However, we attribute the regular fluctuations of the rate of
+violence observed to a seasonal effect independent of peer contagion. For this
+reason, we fit a time-varying function $\mu(t)$ to the data and use it for the
+common exogenous intensity (see Section~\ref{sec:background}).
-\paragraph{Exciting functions.}
+\paragraph{Exciting functions.} The exciting function $\phi_{u, v}(t)$ models
+the effect of person $u$ on person $v$ at time $t$ and captures two common
+assumptions regarding the spread of contagions.
+\begin{itemize}
+ \item \emph{time:} consistent with previous models used to infer the spread
+ of contagions over social networks (4, 5), we assume that the impact of
+ earlier infections on future events decays as the time passed since the
+ original infection increases. Additionally, influence can only travel
+ forward in time: an infection has no impact on those that came before
+ it. As commonly done with Hawkes processes, we assume an exponential
+ decay, obtaining the following formula for the temporal component of
+ the exciting functions:
+ \begin{displaymath}
+ f_\beta(t) = \begin{cases}
+ \beta e^{-\beta t} & \text{if $t\geq 0$}\\
+ 0 & \text{otherwise}
+ \end{cases}
+ \end{displaymath}
+ \item \emph{network structure:} epidemiologists commonly assume that
+ contagious events are localized and that the transmission probability
+ increases closer to the source (CITE). In our case, we assume that
+ violence is more likely to spread between people who are more closely
+ linked in the network and measure the distance between individuals
+ based on network topology. Based on previous studies of violence in
+ social networks, we assume that infections are able to occur across
+ a distance of up to three degrees of separation (6); people who are
+ further away in the network have no effect on one another. For two
+ vertices $u$ and $v$ whose network distance $d(u, v)$ is less than or
+ equal to 3, we assume a decay of the form $\frac{\alpha}{d(u,v)^2}$.
+ Hence, we obtain the following formula for the structural component:
+ \begin{displaymath}
+ g_\alpha(u,v) = \begin{cases}
+ \frac{\alpha}{d(u,v)^2} & \text{if $d(u,v)\leq 3$}\\
+ 0 & \text{otherwise}
+ \end{cases}
+ \end{displaymath}
+\end{itemize}
+Finally, we combine the above two components by multiplying them to obtain the
+exciting function: $\phi_{u,v}(t) = f_\beta(t)g_\alpha(u,v)$.
-\begin{comment}
-We define an instantaneous infection rate that is a variant of the traditional one presented in Equation~\ref{eq:rate}. In particular, we define a unique infection rate for each network vertex $v$.
-\begin{equation}
-\lambda_v(t) = \underbrace{\mu(t)}_\text{background} + \underbrace{\sum_{u \in V} \Lambda_{uv}(t)}_\text{peer infection}
-\end{equation}
-\end{comment}
+\subsection{Likelihood}
+Using Equation~\eqref{eq:likelihood} and the model presented in
+Section~\ref{sec:model}, we can now write the log-likelihood of observed
+infection events $\mathcal{E} = \{(t_i, u_i)\}_{1\leq i\leq N}$ where $t_i$ is
+the time of infection $i$ and $u_i$ is the vertex infected at time $t_i$. We
+denote by $V$ the set of vertices in the network, and by $[0, T]$ the study
+period.
-The infection intensity function $\Lambda_{uv}(t)$ models the effect of person $u$ on person $v$ at time $t$. It is based on two common assumptions regarding the spread of contagions.
-\begin{enumerate}
-\item Time: Consistent with previous models used to infer the spread of contagions over social networks (4, 5), we assume that the impact of earlier infections on future events decays as the time passed since the original infection increases. Additionally, influence can only travel forward in time: an infection has no impact on those that came before it. We assume that influence decays over time based on the distribution $p_t(u,v)=e^{-\beta(t_v-t_u)}$.
-\item Network Structure: Epidemiologists commonly assume that contagious events are localized and that the transmission probability increases closer to the source (CITE). In our case, we assume that violence is more likely to spread between people who are more closely linked in the network and measure the distance between individuals based on network topology. Based on previous studies of violence in social networks, we assume that infections are able to occur across a distance of up to three degrees (6); people who are further away in the network have no effect on one another. We assume that influence decays over the network based on the distribution $p_s(u,v)=e^{-\alpha \cdot \text{dist}(u,v)}$.
-\end{enumerate}
+Furthermore, since some individuals died during the study period, the
+conditional intensity function only needs to be integrated until their time of
+death in the second summand of Equation~\eqref{eq:likelihood}. Denoting by
+$T_u$ the time of death of vertex $u$ ($T_u = T$ if the individual didn't die
+during the study period), we obtain:
+\begin{displaymath}
+ \mathcal{L}(\mathcal{E}\given \mu, \alpha, \beta) = \sum_{i=1}^N \log\lambda_{u_i}(t_i)
+ - \sum_{v\in V}\int_{0}^{T_v} \lambda_v(t) dt
+\end{displaymath}
-Combining these two components, we
+Using \eqref{eq:hawkes} and the explicit formula for $\phi_{u,v}(t)$, this can
+be expanded to:
\begin{equation}
-\Lambda_{uv}(t) = p_s(u,v) p_t(u,v)= \frac{\alpha}{\text{dist}(u,v)} e^{-\beta(t_v-t_u)}
+ \label{eq:final-likelihood}
+ \begin{split}
+ \mathcal{L}(\mathcal{E}\given \mu, \alpha, \beta) =&
+ \sum_{i=1}^N \log\left(\mu(t_i)
+ + \sum_{j:t_j< t_i} g_\alpha(u_i, u_j)\beta e^{-\beta (t_i - t_j)}\right)
+ \\
+ &- \sum_{v\in V}\int_{0}^{T_v} \mu(t)
+ - \sum_{v\in V}\sum_{i: t_i < T_v} g_\alpha(u_i, v)
+ \left(1-e^{-\beta(T_v-t_u)}\right)
+\end{split}
\end{equation}
-\subsubsection{Likelihood}
-With our infection rate fully-defined, we can now formulate the likelihood function
-
\begin{figure}
\centering
\includegraphics{hawkes-diagram}
@@ -114,6 +169,7 @@ With our infection rate fully-defined, we can now formulate the likelihood funct
\section{Model Inference}
\subsection{Background rate}
+\label{sec:background}
Because the seasonal variations in gunshot rates (Figure SX) remain consistent throughout the study period, we assume these are inherent to the infection process and not purely driven by noise or social contagion. Instead of having a constant background rate, we capture seasonal variations as a periodic sinusoidal function. We first compute the aggregate background rate of all the nodes, based on the number of infections each day.