Some corrections in the ICM section

I was confused by the active/infected distinction. I think it is clearer now.
author: Thibaut Horel <thibaut.horel@gmail.com> 2014-11-23 03:33:10 -0500
committer: Thibaut Horel <thibaut.horel@gmail.com> 2014-11-23 03:33:10 -0500
commit: f96af5373ee10f846cc6a3c5e59ea4e5473fe583 (patch)
tree: fb90dbacd60b7a0d8bc06c939a74d97602f1d0ff /notes
parent: b6d9207a03a8babea55b1e97493633c36744dd72 (diff)
download: cascades-f96af5373ee10f846cc6a3c5e59ea4e5473fe583.tar.gz
2 files changed, 24 insertions, 5 deletions
diff --git a/notes/formalisation.pdf b/notes/formalisation.pdf
index 75d5480..2bdcb11 100644
--- a/notes/formalisation.pdf
+++ b/notes/formalisation.pdf
diff --git a/notes/formalisation.tex b/notes/formalisation.tex
index 72e7cb9..871fe50 100644
--- a/notes/formalisation.tex
+++ b/notes/formalisation.tex
@@ -109,24 +109,43 @@ Recap on what the model is
 \subsection{Results under isotropic condition}
 
 \section{Independent Cascade Model}
+
 \subsection{The Model}
-Recall the independent cascade model, where nodes have 3 states: susceptible, active, and inactive. All nodes start either as susceptible or active. At each time step t, for every (active, susceptible)-pair $(j,i)$, $j$ infects $i$ with probability $p_{j,i}$. At t+1, $j$ becomes inactive and $i$ becomes active. The cascade continues until no active nodes remain.
 
-Consider a node i. If $p_{j,i}$ is the probability that node $j$ infects $i$ at the next step, let $\theta_{i,j} := \log (1 - p_{j,i})$. Note that $\theta_{j,i} = 0 \Leftrightarrow p_{j,i} = 0$. Also note that note that 
+Recall the independent cascade model, where nodes have 3 states: susceptible,
+active, and inactive. All nodes start either as susceptible or active. At each
+time step $t$, for every (active, susceptible)-pair $(j,i)$, $j$ attempts to
+infect $i$ with probability $p_{j,i}$ and become inactive. If $j$ succeeds, $i$
+will become active at time step $t+1$. The cascade continues until no active
+nodes remain.
+
+Consider a node $i$. If $p_{j,i}$ is the probability that node $j$ infects $i$
+at the next step, let $\theta_{i,j} := \log (1 - p_{j,i})$. Note that
+$\theta_{j,i} = 0 \Leftrightarrow p_{j,i} = 0$. Also note that:
 
 $$\sum_{j\in S} \theta_{j,i} = \log \prod_{j \in S} (1 - p_{j,i}) = \log \mathbb{P}(\text{i not infected by any j} \in S)$$
 
 Our objective is to recover the vector $\theta_{*,i}$ for every node $i$.
 
 \subsection{Formulating the sparse recovery problem}
+
 The idea behind what follows is the assumption that $p_{*,i}$ and therefore $\theta_{*,i}$ are sparse vectors: a node has few neighbors (parents in the directed graph) compared to the size of the graph.
 
-Let $\mathbbm{1}_S$ be the indicator variable for the nodes that are active at a certain time step t of a certain cascade. Under the previous assumption, we have: $$< \mathbbm{1}_S, \theta_{*,i}> = \log \mathbb{P}( \text{i not infected at } t+1| S \text{ active at } t)$$
+Let $\mathbbm{1}_S$ be the indicator variable for the nodes that are active at
+a certain time step $t$ of a certain cascade. Under the previous assumption, we
+have:
+$$\inprod{\mathbbm{1}_S, \theta_{*,i}} = \log \mathbb{P}( \text{i not infected at } t+1| S \text{ active at } t)$$
 
-Suppose we observe a series of cascades $(I_k^t)$, where $I_k^t$ are the nodes infected in cascade $k$ at time $t$. Recall that once a node has been infected (becomes active), it cannot be infected in the future (is inactive). Let us therefore look at all the sets $(I_k^t)_{t \leq t_i^k} $ such that for every cascade $k$, $t_i^k > t$, where $t_i^k = + \infty$ if $i$ was not infected in that cascade. If we concatenate these rows $\mathbbm{1}_{I_K^t}$ into a matrix $M$ (for measurement), we have:
+Suppose we observe a series of cascades $(A_k^t)$, where $A_k^t$ are the nodes
+active in cascade $k$ at time $t$ (these are exactly the nodes which became
+infected at time $t-1$). Recall that once a node has been infected (becomes
+active), it cannot be infected in the future (is inactive). Let us therefore
+look at all the sets $(A_k^t)_{t \leq t_i^k} $ such that for every cascade $k$,
+$t_i^k > t$, where $t_i^k = + \infty$ if $i$ was not infected in that cascade.
+If we concatenate these rows $\mathbbm{1}_{A_k^t}$ into a matrix $M$ (for
+measurement), we have:
 
 $$M \theta_{*,i} = b$$
-
 where $b$ is a column vector corresponding to $\log \mathbb{P}(\text{i was not infected at } t+1| S \text{ active at } t)$. Note that even if we had access to $b$ directly but $M$ does not have full column rank, then solving for $x$ is not trivial! However, under sparsity assumptions on $x$, we could apply results for the sparse recovery/sparse coding/compressed sensing literature 
 
 In reality, we do not have access to $b$. We have access to the realization vector $\vec w$ of $\{i$ was infected at $t+1 | S$ active at $t\}$:
author	Thibaut Horel <thibaut.horel@gmail.com>	2014-11-23 03:33:10 -0500
committer	Thibaut Horel <thibaut.horel@gmail.com>	2014-11-23 03:33:10 -0500
commit	f96af5373ee10f846cc6a3c5e59ea4e5473fe583 (patch)
tree	fb90dbacd60b7a0d8bc06c939a74d97602f1d0ff /notes
parent	b6d9207a03a8babea55b1e97493633c36744dd72 (diff)
download	cascades-f96af5373ee10f846cc6a3c5e59ea4e5473fe583.tar.gz