diff options
Diffstat (limited to 'paper/sections/model.tex')
| -rw-r--r-- | paper/sections/model.tex | 67 |
1 files changed, 67 insertions, 0 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex new file mode 100644 index 0000000..10c34d0 --- /dev/null +++ b/paper/sections/model.tex @@ -0,0 +1,67 @@ +%\subsection{Problem and notations} +Let us begin by introducing the following notation. +%We will use the following notation to formally discuss the model. +Given a graph $G=(V,E)$, for a node $u\in V$ we denote by $\neigh{u}$ +the neighborhood of $u$. By extension, for any subset of nodes $S\subseteq V$, +$\neigh{S}\defeq \bigcup_{u\in S}\neigh{u}$ will denote the neighborhood of +$S$. The notion of influence in the graph is captured by a function +$f:2^{|V|}\rightarrow \reals_+$ mapping a subset of nodes to a non-negative +influence value.\newline + +\noindent \textbf{The adaptive seeding model.} +The input of the \emph{adaptive seeding} problem is a set of initial nodes +$X\subseteq V$ and for any node $u\in\neigh{X}$ a probability $p_u$ that $u$ +realizes if one of its neighbor in $X$ is seeded. We will write $m=|X|$ and $n=|\neigh{X}|$ the parameters controlling the input size. The seeding process +is the following: +\begin{enumerate} + \item \emph{Seeding:} the seeder selects a subset of nodes $S\subseteq + X$ among the initial nodes. + \item \emph{Realization of the neighbors:} every node $u\in\neigh{S}$ + realizes independently with probability $p_u$. We denote by + $R\subseteq\neigh{S}$ the subset of nodes that is realized during this + stage. + \item \emph{Influence maximization:} the seeder selects the set of nodes + $T\subseteq R$ that maximizes the influence function $f$. +\end{enumerate} + +There is a budget constraint $k$ on the total number of nodes that can be +selected: $S$ and $T$ must satisfy $|S|+|T|\leq k$. The seeder chooses the set +$S$ before observing the realization $R$ and thus wishes to select optimally in +expectation over all such possible realizations. Formally, the objective can be stated +as: +\begin{equation}\label{eq:problem} + \begin{split} + &\max_{S\subseteq X} \sum_{R\subseteq\neigh{S}} p_R + \max_{\substack{T\subseteq R\\|T|\leq k-|S|}}f(T)\\ + &\text{s.t. }|S|\leq k + \end{split} +\end{equation} +where $p_R$ is the probability that the set $R$ realizes under the +probabilities $p_u,\,u\in\neigh{S}$: +\begin{displaymath} + p_R \defeq \prod_{u\in R}p_u\prod_{u\in\neigh{S}\setminus R}(1-p_u) +\end{displaymath} + +It is important to note that the process through which nodes arrive in the second stage is \emph{not} an influence process. The nodes in the second stage arrive if they are willing to spread information in exchange for a unit of the budget. Only when they have arrived does the influence process occur. This process is encoded in the influence function and occurs during the influence maximization stage without incentivizing nodes along the propagation path.\newline + + +\noindent \textbf{Influence functions.} +In this paper we focus on \emph{linear} (or additive) influence models: influence models for which the value of a subset of nodes can be expressed as a weighted sum of their individual influence. +One important example of such models is the \emph{voter model} \cite{richardson} used to represent the spread of opinions in a social network: at each time step, a node +adopts an opinion with a probability equal to the fraction of its neighbors +sharing this opinion at the previous time step. Formally, this can be written +as a discrete-time Markov chain over opinion configurations of the network. +In this model influence maximization amounts to ``converting'' the +optimal subset of nodes to a given opinion at the initial time so as to +maximize the number of converts after a given period of time. +Remarkably, a simple analysis shows that under this model, the influence +function $f$ is additive: +\begin{equation}\label{eq:voter} + \forall S\subseteq V,\; f(S) = \sum_{u\in S} w_u +\end{equation} +where $w_u, u\in V$ are weights which can be easily computed from the powers of +the transition matrix of the Markov chain. This observation led to the +development of fast algorithms for influence maximization under the voter +model~\cite{even-dar}.\newline + +\noindent \textbf{\textsc{NP}-Hardness.} In contrast to standard influence maximization, adaptive seeding is already \textsc{NP}-Hard even for the simplest influence functions. In the case when $f(S)=|S|$ and all probabilities equal one, the decision problem is whether given a budget $k$ and target value $\ell$ there exists a subset of $X$ of size $k-t$ which yields a solution with expected value of $\ell$ using $t$ nodes in $\mathcal{N}(X)$. This is equivalent to deciding whether there are $k-t$ nodes in $X$ that have $t$ neighbors in $\mathcal{N}(X)$. To see this is \textsc{NP}-hard, consider reducing from \textsc{Set-Cover} where there is one node $i$ for each input set $T_i$, $1\leq i\leq n$, with $\neigh{i}= T_i$ and integers $k,\ell$, and the output is ``yes'' if there is a family of $k$ sets in the input which cover at least $\ell$ elements, and ``no'' otherwise. |
