4 files changed, 18 insertions, 15 deletions
diff --git a/paper/sections/discussion.tex b/paper/sections/discussion.tex
index 9d7f491..10aa50b 100644
--- a/paper/sections/discussion.tex
+++ b/paper/sections/discussion.tex
@@ -1,6 +1,3 @@
-
-
-\paragraph{Future Work}
 Solving the Graph Inference problem with sparse recovery techniques opens new
 venues for future work. Firstly, the sparse recovery literature has already
 studied regularization patterns beyond the $\ell_1$-norm, notably the
@@ -12,16 +9,14 @@ has been obtained for the Lasso in the recent series of papers
 Finally, the linear threshold model is a commonly studied diffusion process and can also be cast as a \emph{generalized linear cascade} with inverse link function $z \mapsto \mathbbm{1}_{z > 0}$:
 $
     \label{eq:lt}
-    \tag{LT}
     X^{t+1}_j = \text{sign} \left(\inprod{\theta_j}{X^t} - t_j \right)
-    $
-
-This model therefore falls into the 1-bit compressed sensing model
-\cite{Boufounos:2008} framework. Several recent papers study the theoretical
+    $.
+This model therefore falls into the 1-bit compressed sensing framework
+\cite{Boufounos:2008}. Several recent papers study the theoretical
 guarantees obtained for 1-bit compressed sensing with specific measurements
 \cite{Gupta:2010, Plan:2014}. Whilst they obtained bounds of the order
 ${\cal O}(n \log \frac{n}{s}$), no current theory exists for recovering
-positive bounded signals from bernoulli hyperplanes. This research direction
+positive bounded signals from biniary measurememts. This research direction
 may provide the first clues to solve the ``adaptive learning'' problem: if we
 are allowed to adaptively \emph{choose} the source nodes at the beginning of
 each cascade, how much can we improve the current results?
diff --git a/paper/sections/experiments.tex b/paper/sections/experiments.tex
index 1b72753..2369b11 100644
--- a/paper/sections/experiments.tex
+++ b/paper/sections/experiments.tex
@@ -45,10 +45,19 @@ We did not benchmark against other known algorithms (\textsc{netrate} \cite{gome
 In the case of the \textsc{lasso}, \textsc{mle} and \textsc{sparse mle} algorithms, we construct the edges of $\hat {\cal G} : \cup_{j \in V} \{i : \Theta_{ij} > 0.1\}$, \emph{i.e} by thresholding. The true positives are the edges which appear both in ${\cal G}$ and $\hat {\cal G}$ and the true negatives are the edges which appear in neither. Finally, we report the F1-score$=2 \text{precision}\cdot\text{recall}/(\text{precision}+\text{recall})$, which considers the number of true edges recovered by the algorithm over the total number of edges returned by the algorithm (\emph{precision}) with the number of true edges recovered by the algorithm over the total number of edges it should have recovered (\emph{recall}).
 
 Over all experiments, \textsc{sparse mle} achieves higher rates of precision,
-recall, and f1-score. Interestingly, both \textsc{mle} and \textsc{sparse mle} perform exceptionally well on the Watts-Strogatz graph. The recovery rate converges at
+recall, and F1-score. Interestingly, both \textsc{mle} and \textsc{sparse mle} perform exceptionally well on the Watts-Strogatz graph.
+\begin{comment}
+    The recovery rate converges at
 around $5000$ cascades, which is more than $15$ times the number of nodes. By
 contrast, \textsc{sparse mle} achieves a reasonable F$1$-score of $.75$ for roughly $500$ observed cascades. 
+\end{comment}
 
 \paragraph{Quantifying robustness}
 
-The previous experiments only considered graphs with strong edges. To test the algorithms in the approximately sparse case, we add sparse edges to the previous graphs according to a bernoulli variable of parameter $1/3$ for every non-edge, and drawing a weight uniformly from $[0,0.1]$. The results are reported in Figure~\ref{fig:four_figs} (d)-(e) by plotting the convergence of the $\ell2$-norm error, and show that both the \textsc{lasso}, followed by \textsc{sparse mle} are the most robust to noise.
+The previous experiments only considered graphs with strong edges. To test the
+algorithms in the approximately sparse case, we add sparse edges to the
+previous graphs according to a bernoulli variable of parameter $1/3$ for every
+non-edge, and drawing a weight uniformly from $[0,0.1]$. The non-sparse case is
+compared to the sparse case in Figure~\ref{fig:four_figs} (d)--(e) for the $\ell_2$
+norm showing that both the \textsc{lasso}, followed by \textsc{sparse mle} are
+the most robust to noise. 
diff --git a/paper/sections/model.tex b/paper/sections/model.tex
index 36829f2..d8706bc 100644
--- a/paper/sections/model.tex
+++ b/paper/sections/model.tex
@@ -12,8 +12,7 @@ state space $\{0, 1, \dots, K-1\}^V$ with the following properties:
     that all transition probabilities of the Markov process are either constant
     or can be expressed as a function of the graph parameters $\Theta$ and the
     set of ``contagious nodes'' at the previous time step.
-\item The initial probability over $\{0, 1, \dots, K-1\}^V$ of the Cascade
-    Model is such that all nodes can eventually reach a \emph{contagious state}
+\item The initial probability over $\{0, 1, \dots, K-1\}^V$ is such that all nodes can eventually reach a \emph{contagious state}
     with non-zero probability. The ``contagious'' nodes at $t=0$ are called
     \emph{source nodes}.
 \end{enumerate}
@@ -26,7 +25,7 @@ the ``single source'' assumption made in \cite{Daneshmand:2014} and
 \cite{Abrahao:13} as well as the ``uniformly chosen source set'' assumption
 made in \cite{Netrapalli:2012} verify condition 3.
 
-In the context of Graph Inference, \cite{Netrapalli:2012} focuses specifically
+In the context of Graph Inference, \cite{Netrapalli:2012} focus
 on the well-known discrete-time independent cascade model recalled below, which
 \cite{Abrahao:13} and \cite{Daneshmand:2014} generalize to continuous time. We
 extend the independent cascade model in a different direction by considering
diff --git a/paper/sections/results.tex b/paper/sections/results.tex
index 8a7b4f9..25f540a 100644
--- a/paper/sections/results.tex
+++ b/paper/sections/results.tex
@@ -203,7 +203,7 @@ with probability $1-\frac{1}{m}$. If $\theta_i = 0$ and $\hat \theta > \eta$,
 then $\|\hat \theta - \theta\|_2 \geq |\hat \theta_i-\theta_j| > \eta$, which
 is a contradiction. Therefore we get no false positives. If $\theta_i \leq \eta
 + \epsilon$, then $|\hat{\theta}_i- \theta_i| < \epsilon \implies \theta_j
-> \eta$. Therefore, we get all strong parents.
+> \eta$ and we get all strong parents.
 \end{proof}
 
 Assuming we know a lower bound $\alpha$ on $\Theta_{i,j}$,