diff options
Diffstat (limited to 'notes')
| -rw-r--r-- | notes/formalisation.pdf | bin | 194430 -> 194405 bytes | |||
| -rw-r--r-- | notes/formalisation.tex | 29 |
2 files changed, 24 insertions, 5 deletions
diff --git a/notes/formalisation.pdf b/notes/formalisation.pdf Binary files differindex 75d5480..2bdcb11 100644 --- a/notes/formalisation.pdf +++ b/notes/formalisation.pdf diff --git a/notes/formalisation.tex b/notes/formalisation.tex index 72e7cb9..871fe50 100644 --- a/notes/formalisation.tex +++ b/notes/formalisation.tex @@ -109,24 +109,43 @@ Recap on what the model is \subsection{Results under isotropic condition} \section{Independent Cascade Model} + \subsection{The Model} -Recall the independent cascade model, where nodes have 3 states: susceptible, active, and inactive. All nodes start either as susceptible or active. At each time step t, for every (active, susceptible)-pair $(j,i)$, $j$ infects $i$ with probability $p_{j,i}$. At t+1, $j$ becomes inactive and $i$ becomes active. The cascade continues until no active nodes remain. -Consider a node i. If $p_{j,i}$ is the probability that node $j$ infects $i$ at the next step, let $\theta_{i,j} := \log (1 - p_{j,i})$. Note that $\theta_{j,i} = 0 \Leftrightarrow p_{j,i} = 0$. Also note that note that +Recall the independent cascade model, where nodes have 3 states: susceptible, +active, and inactive. All nodes start either as susceptible or active. At each +time step $t$, for every (active, susceptible)-pair $(j,i)$, $j$ attempts to +infect $i$ with probability $p_{j,i}$ and become inactive. If $j$ succeeds, $i$ +will become active at time step $t+1$. The cascade continues until no active +nodes remain. + +Consider a node $i$. If $p_{j,i}$ is the probability that node $j$ infects $i$ +at the next step, let $\theta_{i,j} := \log (1 - p_{j,i})$. Note that +$\theta_{j,i} = 0 \Leftrightarrow p_{j,i} = 0$. Also note that: $$\sum_{j\in S} \theta_{j,i} = \log \prod_{j \in S} (1 - p_{j,i}) = \log \mathbb{P}(\text{i not infected by any j} \in S)$$ Our objective is to recover the vector $\theta_{*,i}$ for every node $i$. \subsection{Formulating the sparse recovery problem} + The idea behind what follows is the assumption that $p_{*,i}$ and therefore $\theta_{*,i}$ are sparse vectors: a node has few neighbors (parents in the directed graph) compared to the size of the graph. -Let $\mathbbm{1}_S$ be the indicator variable for the nodes that are active at a certain time step t of a certain cascade. Under the previous assumption, we have: $$< \mathbbm{1}_S, \theta_{*,i}> = \log \mathbb{P}( \text{i not infected at } t+1| S \text{ active at } t)$$ +Let $\mathbbm{1}_S$ be the indicator variable for the nodes that are active at +a certain time step $t$ of a certain cascade. Under the previous assumption, we +have: +$$\inprod{\mathbbm{1}_S, \theta_{*,i}} = \log \mathbb{P}( \text{i not infected at } t+1| S \text{ active at } t)$$ -Suppose we observe a series of cascades $(I_k^t)$, where $I_k^t$ are the nodes infected in cascade $k$ at time $t$. Recall that once a node has been infected (becomes active), it cannot be infected in the future (is inactive). Let us therefore look at all the sets $(I_k^t)_{t \leq t_i^k} $ such that for every cascade $k$, $t_i^k > t$, where $t_i^k = + \infty$ if $i$ was not infected in that cascade. If we concatenate these rows $\mathbbm{1}_{I_K^t}$ into a matrix $M$ (for measurement), we have: +Suppose we observe a series of cascades $(A_k^t)$, where $A_k^t$ are the nodes +active in cascade $k$ at time $t$ (these are exactly the nodes which became +infected at time $t-1$). Recall that once a node has been infected (becomes +active), it cannot be infected in the future (is inactive). Let us therefore +look at all the sets $(A_k^t)_{t \leq t_i^k} $ such that for every cascade $k$, +$t_i^k > t$, where $t_i^k = + \infty$ if $i$ was not infected in that cascade. +If we concatenate these rows $\mathbbm{1}_{A_k^t}$ into a matrix $M$ (for +measurement), we have: $$M \theta_{*,i} = b$$ - where $b$ is a column vector corresponding to $\log \mathbb{P}(\text{i was not infected at } t+1| S \text{ active at } t)$. Note that even if we had access to $b$ directly but $M$ does not have full column rank, then solving for $x$ is not trivial! However, under sparsity assumptions on $x$, we could apply results for the sparse recovery/sparse coding/compressed sensing literature In reality, we do not have access to $b$. We have access to the realization vector $\vec w$ of $\{i$ was infected at $t+1 | S$ active at $t\}$: |
