Section~\ref{sec:adaptivity} shows that the adaptive seeding problem reduces to the non-adaptive problem. We will now discuss two approaches to construct approximate non-adaptive solutions. The first is an LP-based approach, and the second is a combinatorial algorithm. Both approaches have the same $(1-1/e)$ approximation ratio, which is then translated to a $(1-1/e)$ approximation ratio for the adaptive seeding problem~\eqref{eq:problem} via Proposition~\ref{prop:cr}. As we will show in Section~\ref{sec:experiments}, both algorithms have their advantages and disadvantages in terms of scalability. \subsection{An LP-Based Approach} \label{sec:lp} Note that due to linearity of expectation, for a linear function $f$ of the form given by \eqref{eq:voter} we have: \begin{equation}\label{eq:multi-voter} \begin{split} F(\textbf{p}) &=\mathbb{E}_{R}\big[f(R)\big] =\mathbb{E}_{R}\Bigg[\sum_{u\in\neigh{X}}w_u\mathbf{1}_{\{u\in R\}}\Bigg]\\ &=\sum_{u\in\neigh{X}}w_u\mathbb{P}[u\in R] =\sum_{u\in\neigh{X}}p_uw_u \end{split} \end{equation} Thus, the non-adaptive optimization problem \eqref{eq:relaxed} can be written as: \begin{displaymath} \begin{split} \max_{\substack{S\subseteq X\\\mathbf{q}\in[0,1]^n} } & \sum_{u\in\neigh{X}}p_uq_uw_u\\ \text{s.t. } & |S|+ \textbf{p}^T\textbf{q} \leq k,\; q_u \leq \mathbf{1}\{u\in\neigh{S}\} \end{split} \end{displaymath} The choice of the set $S$ can be relaxed by introducing a variable $\lambda_v\in[0,1]$ for each $v\in X$. We obtain the following LP for the adaptive seeding problem: \begin{equation}\label{eq:lp} \begin{split} \max_{\substack{\mathbf{q}\in[0,1]^n\\\boldsymbol\lambda\in[0,1]^m}} & \;\sum_{u\in\neigh{X}}p_uq_uw_u\\ \text{s.t. } & \sum_{v\in X}\lambda_v+\textbf{p}^T\textbf{q} \leq k,\; q_u \leq \sum_{v\in\neigh{u}} \lambda_v \end{split} \end{equation} An optimal solution to the above problem can be found in polynomial time using standard LP-solvers. The solution returned by the LP is \emph{fractional}, and requires a rounding procedure to return a feasible solution to our problem, where $S$ is integral. To round the solution we use the pipage rounding method~\cite{pipage}. We defer the details to the full version of the paper~\cite{full}. \begin{lemma} For \mbox{\textsc{AdaptiveSeeding-LP}} defined in \eqref{eq:lp}, any fractional solution $(\boldsymbol\lambda, \mathbf{q})\in[0,1]^m\times[0,1]^n$ can be rounded to an integral solution $\bar{\boldsymbol\lambda} \in \{0,1\}^{m}$ s.t. $(1-1/e) F(\mathbf{p}\circ\mathbf{q}) \leq A(\bar{\lambda})$ in $O(m + n)$ steps. \end{lemma} \subsection{A Combinatorial Algorithm} \label{sec:comb} In this section, we introduce a combinatorial algorithm with an identical approximation guarantee to the LP-based approach. However, its running time, stated in Proposition~\ref{prop:running_time} can be better than the one given by LP solvers depending on the relative sizes of the budget and the number of nodes in the graph. Furthermore, as we discuss at the end of this section, this algorithm is amenable to parallelization. The main idea is to reduce the problem to a monotone submodular maximization problem and apply a variant of the celebrated greedy algorithm~\cite{nemhauser}. %This property is quite a remarkable feature of %linear influence models; for influence models such as independent cascade and %linear threshold, the adaptive seeding problem does not reduce to submodular %maximization, and a completely different approach is required~\cite{singer}. In contrast to standard influence maximization, the submodularity of the non-adaptive seeding problem is not simply a consequence of properties of the influence function; it also strongly relies on the combinatorial structure of the two-stage optimization. Intuitively, we can think of our problem as trying to find a set $S$ in the first stage, for which the nodes that can be seeded on the second stage have the largest possible value. To formalize this, for a budget $b\in\reals^+$ used in the second stage and a set of neighbors $T\subseteq\mathcal{N}(X)$, we will use $\mathcal{O}(T,b)$ to denote the solution to: \begin{equation}\label{eq:knap} \begin{split} \mathcal{O}(T,b)\defeq \max_{\textbf{q}\in[0,1]^n} & \sum_{u\in\neigh{X} \cap T} p_uq_uw_u\\ \text{s.t. } & \mathbf{p}^T\mathbf{q}\leq b %\text{ and } q_u=0\text{ if}u\notin T \end{split} \end{equation} The optimization problem \eqref{eq:relaxed} for non-adaptive policies can now be written as: \begin{equation}\label{eq:sub} \max_{S\subseteq X} \; \mathcal{O}\big(\neigh{S},k-|S|\big) \quad \text{s.t. } |S|\leq k \end{equation} We start by proving in Proposition~\ref{prop:sub} that for fixed $t$, $\mathcal{O}(\neigh{\cdot}, t)$ is submodular. This proposition relies on lemmas~\ref{lemma:nd} and~\ref{lemma:sub} about the properties of $\mathcal{O}(T,b)$. \begin{lemma}\label{lemma:nd} Let $T \subseteq \mathcal{N}(X)$ and $x \in \mathcal{N}(X)$, then $\mathcal{O}(T\cup\{x\},b)-\mathcal{O}(T,b)$ is non-decreasing in $b$. \end{lemma} The proof of this lemma can be found in the full version of the paper~\cite{full}. The main idea consists in writing: \begin{multline*} \mathcal{O}(T\cup\{x\},c)-\mathcal{O}(T\cup\{x\},b)=\int_b^c\partial_+\mathcal{O}_{T\cup\{x\}}(t)dt \end{multline*} where $\partial_+\mathcal{O}_T$ denotes the right derivative of $\mathcal{O}(T,\cdot)$. For a fixed $T$ and $b$, $\mathcal{O}(T,b)$ defines a fractional Knapsack problem over the set $T$. Knowing the form of the optimal fractional solution, we can verify that $\partial_+\mathcal{O}_{T\cup\{x\}}\geq\partial_+\mathcal{O}_T$ and obtain: \begin{multline*} \mathcal{O}(T\cup\{x\},c)-\mathcal{O}(T\cup\{x\},b)\geq \mathcal{O}(T,c)-\mathcal{O}(T,b) \end{multline*} \begin{lemma}\label{lemma:sub} For any $b\in\reals^+$, $\mathcal{O}(T,b)$ is submodular in $T$, $T\subseteq\neigh{X}$. \end{lemma} The proof of this lemma is more technical. For $T\subseteq\neigh{X}$ and $x, y\in\neigh{X}\setminus T$, we need to show that: \begin{displaymath} \mathcal{O}(T\cup\{x\},b)-\mathcal{O}(T,b)\geq \mathcal{O}(T\cup\{y, x\},b)-\mathcal{O}(T\cup\{y\},b) \end{displaymath} This can be done by partitioning the set $T$ into ``high value items'' (those with weight greater than $w_x$) and ``low value items'' and carefully applying Lemma~\ref{lemma:nd} to the associated subproblems. The proof is in the full version of the paper~\cite{full}. Finally, Lemma~\ref{lemma:sub} can be used to show Proposition~\ref{prop:sub} whose proof can be found in the full version~\cite{full}. \begin{proposition}\label{prop:sub} Let $b\in\mathbf{R}^+$, then $\mathcal{O}(\neigh{S},b)$ is monotone and submodular in $S$, $S\subseteq X$. \end{proposition} We can now use Proposition~\ref{prop:sub} to reduce \eqref{eq:sub} to a monotone submodular maximization problem. First, we note that~\eqref{eq:sub} can be rewritten: \begin{equation}\label{eq:sub-mod} \max_{\substack{S\subseteq X\\ t \in \mathbb{N}}} \; \mathcal{O}\big(\neigh{S},t\big) \quad\text{s.t. } |S| + t\leq k \end{equation} Intuitively, we fix $t$ arbitrarily so that the maximization above becomes a submodular maximization problem with fixed budget $t$. We then optimize over the value of $t$. Combining this observation with the greedy algorithm for monotone submodular maximization~\cite{nemhauser}, we obtain Algorithm~\ref{alg:comb}, whose performance guarantee is summarized in Proposition~\ref{prop:main_result}. \begin{algorithm} \caption{Combinatorial algorithm} \label{alg:comb} \algsetup{indent=2em} \begin{algorithmic}[1] \STATE $S\leftarrow \emptyset$ \FOR{$t=1$ \TO $k-1$} \STATE $S_t\leftarrow \emptyset$ \FOR{$i=1$ \TO $k-t$} \STATE $x^*\leftarrow\argmax_{x\in X\setminus S_t}\mathcal{O}(\neigh{S_t\cup\{x\}},t) -\mathcal{O}(\neigh{S_t},t)$\label{line:argmax} \STATE $S_t\leftarrow S_t\cup\{x^*\}$ \ENDFOR \IF{$\mathcal{O}(\neigh{S_t},t)>\mathcal{O}(\neigh{S},k-|S|)$} \STATE $S\leftarrow S_t$ \ENDIF \ENDFOR \RETURN $S$ \end{algorithmic} \end{algorithm} \begin{proposition}\label{prop:main_result} Let $S$ be the set computed by Algorithm~\ref{alg:comb} and let us denote by $\mathrm{A}(S)$ the value of the adaptive policy selecting $S$ on the first stage. Then $\mathrm{A}(S) \geq (1-1/e)\mathrm{OPT}_A$. \end{proposition} %\subsubsection{Scalability} \noindent\textbf{Parallelization.} The algorithm described above considers all possible ways to split the seeding budget between the first and the second stage. For each possible split $\{(t,k-t)\}_{t=1\ldots,k-1}$, the algorithm computes an approximation to the optimal non adaptive solution that uses $k-t$ nodes in the first stage and $t$ nodes in the second stage, and returns the solution for the split with the highest value (breaking ties arbitrarily). This process can be trivially parallelized across $k-1$ machines, each performing a computation of a single split. With slightly more effort, for any $\epsilon>0$ one can parallelize over $\log_{1+\epsilon}n$ machines at the cost of losing a factor of $\epsilon$ in the approximation guarantee (see full version of the paper~\cite{full} for details).\newline \noindent \textbf{Implementation in MapReduce.} While the previous paragraph describes how to parallelize the outer \texttt{for} loop of Algorithm~\ref{alg:comb}, we note that its inner loop can also be parallelized in the MapReduce framework. Indeed, it corresponds to the greedy algorithm applied to the function $\mathcal{O}\left(\neigh{\cdot}, t\right)$. The \textsc{Sample\&Prune} approach successfully applied in \cite{mr} to obtain MapReduce algorithms for various submodular maximizations can also be applied to Algorithm~\ref{alg:comb} to cast it in the MapReduce framework. The details of the algorithm can be found in the full version of the paper~\cite{full}. \newline %A slightly more sophisticated approach is to consider only $\log n$ splits: $(1,k-1),(2,k-2),\ldots,(2^{\lfloor \log n \rfloor},1)$ and then select the best solution from this set. It is not hard to see that in comparison to the previous approach, this would reduce the approximation guarantee by a factor of at most 2: if the optimal solution is obtained by spending $t$ on the first stage and $k-t$ in the second stage, then since $t \leq 2\cdot2^{\lfloor \log t \rfloor}$ the solution computed for $(2^{\lfloor \log t \rfloor}, k - 2^{\lfloor \log t \rfloor})$ will have at least half that value. %More generally, for any $\epsilon>0$ one can parallelize over $\log_{1+\epsilon}n$ machines at the cost of losing a factor of $(1+\epsilon)$ in the approximation guarantee. \noindent \textbf{Algorithmic speedups.} To implement Algorithm~\ref{alg:comb} efficiently, the computation of the $\argmax$ on line 5 must be dealt with carefully. $\mathcal{O}(\neigh{S_t\cup\{x\}},t)$ is the optimal solution to the fractional Knapsack problem~\eqref{eq:knap} with budget $t$ and can be computed in time $\min(\frac{t}{p_\text{min}},n)$ by iterating over the list of nodes in $\neigh{S_t\cup\{x\}}$ in decreasing order of the degrees. This decreasing order of $\neigh{S_t}$ can be maintained throughout the greedy construction of $S_t$ by: \begin{itemize} \item ordering the list of neighbors of nodes in $X$ by decreasing order of the degrees when initially constructing the graph. This is responsible for a $O(n\log n)$ pre-processing time. \item when adding node $x$ to $S_t$, observe that $\neigh{S_t\cup\{x\}} = \neigh{S_t}\cup\neigh{\{x\}}$. Hence, if $\neigh{S_t}$ and $\neigh{\{x\}}$ are sorted lists, then $\mathcal{O}(\neigh{S_t\cup\{x\}},t)$ can be computed in a single iteration of length $\min(\frac{t}{p_\text{min}},n)$ where the two sorted lists are merged on the fly. \end{itemize} As a consequence, the running time of line 5 is bounded from above by $m\min(\frac{t}{p_\text{min}},n)$. The two nested \textsf{for} loops are responsible for the additional $k^2$ factor. The running time of Algorithm~\ref{alg:comb} is summarized in Proposition~\ref{prop:running_time}. \begin{proposition}\label{prop:running_time} Let $p_\text{min} =\min\{p_u,u\in\neigh{X}\}$, then Algorithm~\ref{alg:comb} runs in time ${O\big(n\log n + k^2 m \min(\frac{k}{p_\text{min}},n)\big)}$. \end{proposition}