diff options
Diffstat (limited to 'poster/poster.tex')
| -rw-r--r-- | poster/poster.tex | 140 |
1 files changed, 0 insertions, 140 deletions
diff --git a/poster/poster.tex b/poster/poster.tex deleted file mode 100644 index 27f404a..0000000 --- a/poster/poster.tex +++ /dev/null @@ -1,140 +0,0 @@ -%% Use the hmcposter class with the thesis document-class option. -\documentclass[thesis]{hmcposter} -\usepackage{algpseudocode} -\usepackage{algorithm} -\usepackage{graphicx} -\usepackage{natbib} -\usepackage{booktabs} -\usepackage{subfig} -\usepackage{amsmath} -\usepackage{textcomp} -\usepackage{url} - -\author{Ben Green and Thibaut Horel} - -\title{Identifying Cascades of Violence} - -\class{CS 284r: Social Data Mining} - -\advisor{Yaron Singer \and Andrew Papachristos} - -%% Define the \BibTeX command, used in our example document. -\providecommand{\bibtex}{{\rmfamily B\kern-.05em% - \textsc{i\kern-.025em b}\kern-.08em% - T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}} - - -\pagestyle{fancy} - -\begin{document} - -\begin{poster} - -\section{Introduction} - -Most discussions of violence evaluate crime in broad terms: this neighborhood is dangerous, that population is at risk. Yet recent studies of of criminal behavior have shown that crime and violence are highly concentrated in social networks rather than neighborhoods or demographic groups. - -\textbf{What does data on criminal behavior reveal about how violence spreads in a social network?} - -\begin{figure} -\begin{center} -\fbox{\includegraphics[width=13in]{lcc-newark}} -\caption{A co-offending criminal network with labeled victims.} -\vspace{-2em} -\end{center} -\end{figure} - -\section{Data} - -Our data consists of police records from the City of Chicago, IL between 2006 and 2014 and contains: - -\begin{itemize} -\item Arrest records -\item Demographic data -\item Victims and dates of non-fatal gunshots -\end{itemize} - -Every person arrested during the study period represents a node. We generate a co-offending network by assigning edges between all pairs of individuals who have been arrested together. We define infection as being the victim of a gunshot. - -The co-offending network contains 396,042 nodes and 415,454 edges. The largest connected component has 123,506 nodes and 366,514 edges, and is a small-world network. It contains 8,237 of the whole network's 11,206 victim nodes. - -\columnbreak - -\section{Simulating Cascades} - -We tested for evidence of violence cascades by simulating the data given independent infection time for all victims. Using the same set of victim nodes as the data, we sampled for each a random date from the study period using a uniform distribution. - -The plot below compares how many pairs of infected neighbors were infected within 100 days of each other for the simulations and data. The data has more such pairs of nodes than any simulation, implying that the observed data could not have been generated by independently occurring infections. - -\begin{figure} -\begin{center} -\fbox{\includegraphics[width=10in]{mean-100-sim2}} -\caption{Simulated infections (blue) versus the data (red).} -\end{center} -\end{figure} - -\section{Model} - -The probability of infection between $u$ (infected as $t_u$) and $v$ infected at $t_v$ is modeled as $p_e = p_s\cdot p_t$: -\begin{itemize} -\item structural component $p_s \propto \gamma^{d_{u,v}}$ where $d_{u,v}$ is the distance between $u$ and $v$ in the underlying network. -\item temporal component $p_t \propto \frac{1}{1 + \alpha (t_v - t_u)}$ -\end{itemize} -These probabilities define a directed acyclic graph $D$ along which the spread of violence can occur. - -Then, the cascades are modeled as a set of disjoint trees $\mathcal{T} = (T_1, \ldots T_m)$ whose probability of occuring is given by: -\begin{displaymath} -p(\mathcal{T}\;|\;\alpha, \beta) = \prod_{i=1}^m \beta(1-\beta)^{|T_1|}\prod_{e\in T_i} p_e\prod_{e\in\delta(T_i)} (1- p_s) -\end{displaymath} -\begin{itemize} -\item $\beta$ is the probability of a node becoming the root of a new cascade. -\item $\delta(T_i)$ is the set of edges from $T_i$ to non-victim nodes. -\end{itemize} -\columnbreak - -\section{Recovering cascades} - -Given our probabilistic model, we can recover the unobserved cascades by \emph{Maximum Likelihood Estimation}. - -For fixed values of $\alpha$ and $\beta$, the most likely cascades can be found by a simple algorithm: - -\vspace{0.5em} - -\begin{figure} -\begin{algorithmic}[1] - \ForAll{victim node $v$} - \State Consider its most likely parent $u$ - \If{$p_{u,v} > \text{threshold}$} - \State Include edge $(u,v)$ in the solution. - \Else - \State Consider $u$ to be the root of a new cascade - \EndIf - \EndFor -\end{algorithmic} -\vspace{1em} -\caption{Algorithm for cascade recovery by Maximum Likelihood Estimation} -\end{figure} - -\vspace{0.5em} - -where the threshold can be defined analytically as a function of $\beta$. Note that this algorithm enforces that all cascades are trees. - -This gives a function of $\alpha$ and $\beta$ whose maximum can then be found using a generic optimizer. - - - -\section{Analyzing Cascades} - -The algorithm described above returns a set of disjoint trees, whose union is all of the network's infected nodes. Each tree represents a distinct cascade. Below are two cascades inferred from the data. Further analysis of the cascades is forthcoming. - -\begin{figure} -\begin{center} -\fbox{\includegraphics[width=13in]{cascades}} -\caption{Sample cascades returned by our algorithm.} -\end{center} -\end{figure} - - -\end{poster} - -\end{document}
\ No newline at end of file |
