1 files changed, 2 insertions, 2 deletions
diff --git a/paper/sections/experiments.tex b/paper/sections/experiments.tex
index 80f95fa..9122e7d 100644
--- a/paper/sections/experiments.tex
+++ b/paper/sections/experiments.tex
@@ -28,7 +28,7 @@ $\ell_2$-norm $\|\hat \Theta - \Theta\|_2$ for a Kronecker graph which is: (d) e
 \label{fig:four_figs}
 \end{table*}
 
-In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for for varying levels of sparsity and different initializations of parameters ($n$, $m$, $\lambda$, $p_{\text{init}}$), where $p_{\text{init}}$ is the initial probability of a node being a source node. We compare our algorithm to two different state-of-the-art algorithms: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012}. As an extra benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates our \textsc{sparse mle} algorithm.
+In this section, we validate empirically the results and assumptions of Section~\ref{sec:results} for varying levels of sparsity and different initializations of parameters ($n$, $m$, $\lambda$, $p_{\text{init}}$), where $p_{\text{init}}$ is the initial probability of a node being a source node. We compare our algorithm to two different state-of-the-art algorithms: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012}. As an extra benchmark, we also introduce a new algorithm \textsc{lasso}, which approximates our \textsc{sparse mle} algorithm.
 
 \paragraph{Experimental setup}
 We evaluate the performance of the algorithms on synthetic graphs, chosen for their similarity to real social networks. We therefore consider a Watts-Strogatz graph ($300$ nodes, $4500$ edges) \cite{watts:1998}, a Barabasi-Albert graph ($300$ nodes, $16200$ edges) \cite{barabasi:2001}, a Holme-Kim power law graph ($200$ nodes, $9772$ edges) \cite{Holme:2002}, and the recently introduced Kronecker graph ($256$ nodes, $10000$ edges) \cite{Leskovec:2010}. Undirected graphs are converted to directed graphs by doubling the edges.
@@ -38,7 +38,7 @@ from the (IC) model for $n \in \{100, 500, 1000, 2000, 5000\}$. We compare for e
 
 \paragraph{Benchmarks}
 
-We compare our \textsc{sparse mle} algorithm to 3 benchmarks: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012} and \textsc{lasso}. The \textsc{mle} algorithm is a maximum-likelihood estimator without $\ell1$-norm penalization. \textsc{greedy} is an iterative algorithm. We introduced the \textsc{lasso} algorithm in our experiments to achieve faster computation time:
+We compare our \textsc{sparse mle} algorithm to 3 benchmarks: \textsc{greedy} and \textsc{mle} from \cite{Netrapalli:2012} and \textsc{lasso}. The \textsc{mle} algorithm is a maximum-likelihood estimator without $\ell_1$-norm penalization. \textsc{greedy} is an iterative algorithm. We introduced the \textsc{lasso} algorithm in our experiments to achieve faster computation time:
 $$\hat \theta_i \in \arg \min_{\theta} \sum_{t \in {\cal T}} |f(\theta_i\cdot x^t) - x_i^{t+1}|^2 + \lambda \|\theta_i\|_1$$
 \textsc{Lasso} has the merit of being both easier and faster to optimize numerically than the other convex-optimization based algorithms. It approximates the $\textsc{sparse mle}$ algorithm by making the assumption that the observations $x_i^{t+1}$ are of the form: $x_i^{t+1} = f(\theta_i\cdot x^t) + \epsilon$, where $\epsilon$ is random white noise. This is not valid in theory since $\epsilon$ \emph{depends on} $f(\theta_i\cdot x^t)$, however the approximation is validated in practice.