aboutsummaryrefslogtreecommitdiffstats
path: root/notes
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2014-11-23 03:33:10 -0500
committerThibaut Horel <thibaut.horel@gmail.com>2014-11-23 03:33:10 -0500
commitf96af5373ee10f846cc6a3c5e59ea4e5473fe583 (patch)
treefb90dbacd60b7a0d8bc06c939a74d97602f1d0ff /notes
parentb6d9207a03a8babea55b1e97493633c36744dd72 (diff)
downloadcascades-f96af5373ee10f846cc6a3c5e59ea4e5473fe583.tar.gz
Some corrections in the ICM section
I was confused by the active/infected distinction. I think it is clearer now.
Diffstat (limited to 'notes')
-rw-r--r--notes/formalisation.pdfbin194430 -> 194405 bytes
-rw-r--r--notes/formalisation.tex29
2 files changed, 24 insertions, 5 deletions
diff --git a/notes/formalisation.pdf b/notes/formalisation.pdf
index 75d5480..2bdcb11 100644
--- a/notes/formalisation.pdf
+++ b/notes/formalisation.pdf
Binary files differ
diff --git a/notes/formalisation.tex b/notes/formalisation.tex
index 72e7cb9..871fe50 100644
--- a/notes/formalisation.tex
+++ b/notes/formalisation.tex
@@ -109,24 +109,43 @@ Recap on what the model is
\subsection{Results under isotropic condition}
\section{Independent Cascade Model}
+
\subsection{The Model}
-Recall the independent cascade model, where nodes have 3 states: susceptible, active, and inactive. All nodes start either as susceptible or active. At each time step t, for every (active, susceptible)-pair $(j,i)$, $j$ infects $i$ with probability $p_{j,i}$. At t+1, $j$ becomes inactive and $i$ becomes active. The cascade continues until no active nodes remain.
-Consider a node i. If $p_{j,i}$ is the probability that node $j$ infects $i$ at the next step, let $\theta_{i,j} := \log (1 - p_{j,i})$. Note that $\theta_{j,i} = 0 \Leftrightarrow p_{j,i} = 0$. Also note that note that
+Recall the independent cascade model, where nodes have 3 states: susceptible,
+active, and inactive. All nodes start either as susceptible or active. At each
+time step $t$, for every (active, susceptible)-pair $(j,i)$, $j$ attempts to
+infect $i$ with probability $p_{j,i}$ and become inactive. If $j$ succeeds, $i$
+will become active at time step $t+1$. The cascade continues until no active
+nodes remain.
+
+Consider a node $i$. If $p_{j,i}$ is the probability that node $j$ infects $i$
+at the next step, let $\theta_{i,j} := \log (1 - p_{j,i})$. Note that
+$\theta_{j,i} = 0 \Leftrightarrow p_{j,i} = 0$. Also note that:
$$\sum_{j\in S} \theta_{j,i} = \log \prod_{j \in S} (1 - p_{j,i}) = \log \mathbb{P}(\text{i not infected by any j} \in S)$$
Our objective is to recover the vector $\theta_{*,i}$ for every node $i$.
\subsection{Formulating the sparse recovery problem}
+
The idea behind what follows is the assumption that $p_{*,i}$ and therefore $\theta_{*,i}$ are sparse vectors: a node has few neighbors (parents in the directed graph) compared to the size of the graph.
-Let $\mathbbm{1}_S$ be the indicator variable for the nodes that are active at a certain time step t of a certain cascade. Under the previous assumption, we have: $$< \mathbbm{1}_S, \theta_{*,i}> = \log \mathbb{P}( \text{i not infected at } t+1| S \text{ active at } t)$$
+Let $\mathbbm{1}_S$ be the indicator variable for the nodes that are active at
+a certain time step $t$ of a certain cascade. Under the previous assumption, we
+have:
+$$\inprod{\mathbbm{1}_S, \theta_{*,i}} = \log \mathbb{P}( \text{i not infected at } t+1| S \text{ active at } t)$$
-Suppose we observe a series of cascades $(I_k^t)$, where $I_k^t$ are the nodes infected in cascade $k$ at time $t$. Recall that once a node has been infected (becomes active), it cannot be infected in the future (is inactive). Let us therefore look at all the sets $(I_k^t)_{t \leq t_i^k} $ such that for every cascade $k$, $t_i^k > t$, where $t_i^k = + \infty$ if $i$ was not infected in that cascade. If we concatenate these rows $\mathbbm{1}_{I_K^t}$ into a matrix $M$ (for measurement), we have:
+Suppose we observe a series of cascades $(A_k^t)$, where $A_k^t$ are the nodes
+active in cascade $k$ at time $t$ (these are exactly the nodes which became
+infected at time $t-1$). Recall that once a node has been infected (becomes
+active), it cannot be infected in the future (is inactive). Let us therefore
+look at all the sets $(A_k^t)_{t \leq t_i^k} $ such that for every cascade $k$,
+$t_i^k > t$, where $t_i^k = + \infty$ if $i$ was not infected in that cascade.
+If we concatenate these rows $\mathbbm{1}_{A_k^t}$ into a matrix $M$ (for
+measurement), we have:
$$M \theta_{*,i} = b$$
-
where $b$ is a column vector corresponding to $\log \mathbb{P}(\text{i was not infected at } t+1| S \text{ active at } t)$. Note that even if we had access to $b$ directly but $M$ does not have full column rank, then solving for $x$ is not trivial! However, under sparsity assumptions on $x$, we could apply results for the sparse recovery/sparse coding/compressed sensing literature
In reality, we do not have access to $b$. We have access to the realization vector $\vec w$ of $\{i$ was infected at $t+1 | S$ active at $t\}$: