diff options
| author | Thibaut Horel <thibaut.horel@gmail.com> | 2014-11-22 21:08:41 -0500 |
|---|---|---|
| committer | Thibaut Horel <thibaut.horel@gmail.com> | 2014-11-22 21:08:41 -0500 |
| commit | 36eb1fee5492e57368846cbf4e107f1e4cb31589 (patch) | |
| tree | 6380028284779e10d01fb9ff51f3c561ae9ce57c /paper/sections/model.tex | |
| parent | 4f7d4804234f5515a4dded8b05d9568653b7ae3c (diff) | |
| download | fast-seeding-36eb1fee5492e57368846cbf4e107f1e4cb31589.tar.gz | |
WWW version
Diffstat (limited to 'paper/sections/model.tex')
| -rw-r--r-- | paper/sections/model.tex | 67 |
1 files changed, 41 insertions, 26 deletions
diff --git a/paper/sections/model.tex b/paper/sections/model.tex index 10c34d0..d26046a 100644 --- a/paper/sections/model.tex +++ b/paper/sections/model.tex @@ -1,21 +1,22 @@ %\subsection{Problem and notations} -Let us begin by introducing the following notation. +%Let us begin by introducing the following notation. %We will use the following notation to formally discuss the model. -Given a graph $G=(V,E)$, for a node $u\in V$ we denote by $\neigh{u}$ -the neighborhood of $u$. By extension, for any subset of nodes $S\subseteq V$, -$\neigh{S}\defeq \bigcup_{u\in S}\neigh{u}$ will denote the neighborhood of +Given a graph $G=(V,E)$, for a node $v\in V$ we denote by $\neigh{u}$ +the neighborhood of $v$. By extension, for any subset of nodes $S\subseteq V$, +$\neigh{S}\defeq \bigcup_{v\in S}\neigh{v}$ will denote the neighborhood of $S$. The notion of influence in the graph is captured by a function $f:2^{|V|}\rightarrow \reals_+$ mapping a subset of nodes to a non-negative influence value.\newline \noindent \textbf{The adaptive seeding model.} -The input of the \emph{adaptive seeding} problem is a set of initial nodes +The input of the \emph{adaptive seeding} problem is a \emph{core set} of nodes $X\subseteq V$ and for any node $u\in\neigh{X}$ a probability $p_u$ that $u$ -realizes if one of its neighbor in $X$ is seeded. We will write $m=|X|$ and $n=|\neigh{X}|$ the parameters controlling the input size. The seeding process +realizes if one of its neighbor in $X$ is seeded. We will write $m=|X|$ and +$n=|\neigh{X}|$ the parameters controlling the input size. The seeding process is the following: \begin{enumerate} \item \emph{Seeding:} the seeder selects a subset of nodes $S\subseteq - X$ among the initial nodes. + X$ in the core set. \item \emph{Realization of the neighbors:} every node $u\in\neigh{S}$ realizes independently with probability $p_u$. We denote by $R\subseteq\neigh{S}$ the subset of nodes that is realized during this @@ -27,8 +28,8 @@ is the following: There is a budget constraint $k$ on the total number of nodes that can be selected: $S$ and $T$ must satisfy $|S|+|T|\leq k$. The seeder chooses the set $S$ before observing the realization $R$ and thus wishes to select optimally in -expectation over all such possible realizations. Formally, the objective can be stated -as: +expectation over all such possible realizations. Formally, the objective can be +stated as: \begin{equation}\label{eq:problem} \begin{split} &\max_{S\subseteq X} \sum_{R\subseteq\neigh{S}} p_R @@ -36,26 +37,38 @@ as: &\text{s.t. }|S|\leq k \end{split} \end{equation} -where $p_R$ is the probability that the set $R$ realizes under the -probabilities $p_u,\,u\in\neigh{S}$: -\begin{displaymath} - p_R \defeq \prod_{u\in R}p_u\prod_{u\in\neigh{S}\setminus R}(1-p_u) -\end{displaymath} +where $p_R$ is the probability that the set $R$ realizes, +%\begin{displaymath} +$ p_R \defeq \prod_{u\in R}p_u\prod_{u\in\neigh{S}\setminus R}(1-p_u)$. +%\end{displaymath} -It is important to note that the process through which nodes arrive in the second stage is \emph{not} an influence process. The nodes in the second stage arrive if they are willing to spread information in exchange for a unit of the budget. Only when they have arrived does the influence process occur. This process is encoded in the influence function and occurs during the influence maximization stage without incentivizing nodes along the propagation path.\newline +It is important to note that the process through which nodes arrive in the +second stage is \emph{not} an influence process. The nodes in the second stage +arrive if they are willing to spread information in exchange for a unit of the +budget. Only when they have arrived does the influence process occur. This +process is encoded in the influence function and occurs after the influence +maximization stage without incentivizing nodes along the propagation path. In +general, the idea of a two-stage (or in general, multi-stage) approach is to +use the nodes who arrive in the first stage to recruit influential users who +can be incentivized to spread information. In standard influence maximization, +the nodes who are not in the core set do not receive incentives to propagate +information, and cascades tend to die off +quickly~\cite{YC10,BHMW11,GWG12,CADKL14}.\newline \noindent \textbf{Influence functions.} -In this paper we focus on \emph{linear} (or additive) influence models: influence models for which the value of a subset of nodes can be expressed as a weighted sum of their individual influence. -One important example of such models is the \emph{voter model} \cite{richardson} used to represent the spread of opinions in a social network: at each time step, a node -adopts an opinion with a probability equal to the fraction of its neighbors -sharing this opinion at the previous time step. Formally, this can be written -as a discrete-time Markov chain over opinion configurations of the network. -In this model influence maximization amounts to ``converting'' the -optimal subset of nodes to a given opinion at the initial time so as to -maximize the number of converts after a given period of time. -Remarkably, a simple analysis shows that under this model, the influence -function $f$ is additive: +In this paper we focus on \emph{linear} (or additive) influence models: +in these models the value of a subset of nodes can be expressed as +a weighted sum of their individual influence. One important example of such +models is the \emph{voter model} \cite{richardson} used to represent the spread +of opinions in a social network: at each time step, a node adopts an opinion +with a probability equal to the fraction of its neighbors sharing this opinion +at the previous time step. Formally, this can be written as a discrete-time +Markov chain over opinion configurations of the network. In this model +influence maximization amounts to ``converting'' the optimal subset of nodes to +a given opinion at the initial time so as to maximize the number of converts +after a given period of time. Remarkably, a simple analysis shows that under +this model, the influence function $f$ is additive: \begin{equation}\label{eq:voter} \forall S\subseteq V,\; f(S) = \sum_{u\in S} w_u \end{equation} @@ -64,4 +77,6 @@ the transition matrix of the Markov chain. This observation led to the development of fast algorithms for influence maximization under the voter model~\cite{even-dar}.\newline -\noindent \textbf{\textsc{NP}-Hardness.} In contrast to standard influence maximization, adaptive seeding is already \textsc{NP}-Hard even for the simplest influence functions. In the case when $f(S)=|S|$ and all probabilities equal one, the decision problem is whether given a budget $k$ and target value $\ell$ there exists a subset of $X$ of size $k-t$ which yields a solution with expected value of $\ell$ using $t$ nodes in $\mathcal{N}(X)$. This is equivalent to deciding whether there are $k-t$ nodes in $X$ that have $t$ neighbors in $\mathcal{N}(X)$. To see this is \textsc{NP}-hard, consider reducing from \textsc{Set-Cover} where there is one node $i$ for each input set $T_i$, $1\leq i\leq n$, with $\neigh{i}= T_i$ and integers $k,\ell$, and the output is ``yes'' if there is a family of $k$ sets in the input which cover at least $\ell$ elements, and ``no'' otherwise. +\noindent \textbf{\textsc{NP}-Hardness.} In contrast to standard influence maximization, adaptive seeding is already \textsc{NP}-Hard even for the simplest influence functions such as $f(S) = |S|$ and when all probabilities are one. We discuss this in the full version of the paper~\cite{full}. +%In the case when $f(S)=|S|$ and all probabilities equal one, the decision problem is whether given a budget $k$ and target value $\ell$ there exists a subset of $X$ of size $k-t$ which yields a solution with expected value of $\ell$ using $t$ nodes in $\mathcal{N}(X)$. It is easy to see that this problem is \textsc{NP}-hard by reduction from \textsc{Set-Cover}. +%This is equivalent to deciding whether there are $k-t$ nodes in $X$ that have $t$ neighbors in $\mathcal{N}(X)$. To see this is \textsc{NP}-hard, consider reducing from \textsc{Set-Cover} where there is one node $i$ for each input set $T_i$, $1\leq i\leq n$, with $\neigh{i}= T_i$ and integers $k,\ell$, and the output is ``yes'' if there is a family of $k$ sets in the input which cover at least $\ell$ elements, and ``no'' otherwise. |
