summaryrefslogtreecommitdiffstats
path: root/paper/sections
diff options
context:
space:
mode:
Diffstat (limited to 'paper/sections')
-rw-r--r--paper/sections/appendix.tex6
1 files changed, 6 insertions, 0 deletions
diff --git a/paper/sections/appendix.tex b/paper/sections/appendix.tex
index aa0ef1d..c49e8cc 100644
--- a/paper/sections/appendix.tex
+++ b/paper/sections/appendix.tex
@@ -1,4 +1,5 @@
\section{Adaptivity proofs}
+\label{sec:ad-proofs}
\begin{proof}[of Proposition~\ref{prop:gap}]
We will first show that the optimal adaptive policy can be interpreted as
a non-adaptive policy. Let $S$ be the optimal adaptive
@@ -75,12 +76,14 @@ $(S, \mathbf{q})$ is an $\alpha$-approximate non adaptive solution, hence $F(\ma
\end{proof}
\section{Algorithms Proofs}
+\label{sec:alg-proofs}
We first discuss the NP-hardness of the problem.
\noindent \textbf{\textsc{NP}-Hardness.} In contrast to standard influence maximization, adaptive seeding is already \textsc{NP}-Hard even for the simplest cases. In the case when $f(S)=|S|$ and all probabilities equal one, the decision problem is whether given a budget $k$ and target value $\ell$ there exists a subset of $X$ of size $k-t$ which yields a solution with expected value of $\ell$ using $t$ nodes in $\mathcal{N}(X)$. This is equivalent to deciding whether there are $k-t$ nodes in $X$ that have $t$ neighbors in $\mathcal{N}(X)$. To see this is \textsc{NP}-hard, consider reducing from \textsc{Set-Cover} where there is one node $i$ for each input set $T_i$, $1\leq i\leq n$, with $\neigh{i}= T_i$ and integers $k,\ell$, and the output is ``yes'' if there is a family of $k$ sets in the input which cover at least $\ell$ elements, and ``no'' otherwise.
\subsection{LP-based approach}
+\label{sec:lp-proofs}
In the LP-based approach we rounded the solution using the pipage rounding method. We discuss this with greater detail here.
\noindent \textbf{Pipage Rounding.}
@@ -89,6 +92,7 @@ We make the following key observation: for any given $\textbf{q}$, one can remov
Our rounding procedure can therefore be described as follows: given a solution $(\boldsymbol\lambda,\textbf{q})$ we remove all nodes $u \in \mathcal{N}(X)$ for which $q_{u}=0$, which leaves us with a fractional solution to a (weighted) version of the \textsc{Max-K-Cover} problem where nodes in $X$ are the sets and the universe is the set of weighted nodes in $\mathcal{N}(X)$ that were not removed. We can therefore apply pipage rounding and lose only a factor of $(1-1/e)$ in quality of the solution.
\subsection{Combinatorial Algorithm}
+\label{sec:comb-proofs}
We include the missing proofs from the combinatorial algorithm section. The scalability and implementation in MapReduce are discussed in this section as well.
\begin{proof}[of Lemma~\ref{lemma:nd}]
@@ -249,11 +253,13 @@ which concludes the proof of the lemma.
\end{proof}
\subsection{Parallelization}
+\label{sec:para}
As discussed in the body of the paper, the algorithm can be parallelized across $k$ different machines, each one computing an approximation for a fixed budget $k-t$ in the first stage and $t$ in the second.
A slightly more sophisticated approach is to consider only $\log n$ splits: $(1,k-1),(2,k-2),\ldots,(2^{\lfloor \log n \rfloor},1)$ and then select the best solution from this set. It is not hard to see that in comparison to the previous approach, this would reduce the approximation guarantee by a factor of at most 2: if the optimal solution is obtained by spending $t$ on the first stage and $k-t$ in the second stage, then since $t \leq 2\cdot2^{\lfloor \log t \rfloor}$ the solution computed for $(2^{\lfloor \log t \rfloor}, k - 2^{\lfloor \log t \rfloor})$ will have at least half that value.
More generally, for any $\epsilon>0$ one can parallelize over $\log_{1+\epsilon}n$ machines at the cost of losing a factor of $(1+\epsilon)$ in the approximation guarantee.
\subsection{Implementation in MapReduce}
+\label{sec:mr}
As noted in Section~\ref{sec:comb}, lines 4 to 7 of Algorithm~\ref{alg:comb}
correspond to the greedy heuristic of \cite{nemhauser} applied to the