paper/sections/model.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82

%\subsection{Problem and notations}
%Let us begin by introducing the following notation.
%We will use the following notation to formally discuss the model.  
Given a graph $G=(V,E)$, for a node $v\in V$ we denote by $\neigh{v}$
the neighborhood of $v$. By extension, for any subset of nodes $S\subseteq V$,
$\neigh{S}\defeq \bigcup_{v\in S}\neigh{v}$ will denote the neighborhood of
$S$. The notion of influence in the graph is captured by a function
$f:2^{|V|}\rightarrow \reals_+$ mapping a subset of nodes to a non-negative
influence value.\newline

\noindent \textbf{The adaptive seeding model.} 
The input of the \emph{adaptive seeding} problem is a \emph{core set} of nodes
$X\subseteq V$ and for any node $u\in\neigh{X}$ a probability $p_u$ that $u$
realizes if one of its neighbor in $X$ is seeded. We will write $m=|X|$ and
$n=|\neigh{X}|$ the parameters controlling the input size. The seeding process
is the following:
\begin{enumerate}
    \item \emph{Seeding:} the seeder selects a subset of nodes $S\subseteq
    X$ in the core set.
    \item \emph{Realization of the neighbors:} every node $u\in\neigh{S}$
    realizes independently with probability $p_u$. We denote by
    $R\subseteq\neigh{S}$ the subset of nodes that is realized during this
    stage.
    \item \emph{Influence maximization:} the seeder selects the set of nodes
    $T\subseteq R$ that maximizes the influence function $f$.
\end{enumerate}

There is a budget constraint $k$ on the total number of nodes that can be
selected: $S$ and $T$ must satisfy $|S|+|T|\leq k$. The seeder chooses the set
$S$ before observing the realization $R$ and thus wishes to select optimally in
expectation over all such possible realizations. Formally, the objective can be
stated as:
\begin{equation}\label{eq:problem}
    \begin{split}
        &\max_{S\subseteq X} \sum_{R\subseteq\neigh{S}} p_R
        \max_{\substack{T\subseteq R\\|T|\leq k-|S|}}f(T)\\
        &\text{s.t. }|S|\leq k
    \end{split}
\end{equation}
where $p_R$ is the probability that the set $R$ realizes,
%\begin{displaymath}
$    p_R \defeq \prod_{u\in R}p_u\prod_{u\in\neigh{S}\setminus R}(1-p_u)$.
%\end{displaymath}

It is important to note that the process through which nodes arrive in the
second stage is \emph{not} an influence process.  The nodes in the second stage
arrive if they are willing to spread information in exchange for a unit of the
budget. Only when they have arrived does the influence process occur. This
process is encoded in the influence function and occurs after the influence
maximization stage without incentivizing nodes along the propagation path.  In
general, the idea of a two-stage (or in general, multi-stage) approach is to
use the nodes who arrive in the first stage to recruit influential users who
can be incentivized to spread information. In standard influence maximization,
the nodes who are not in the core set do not receive incentives to propagate
information, and cascades tend to die off
quickly~\cite{YC10,BHMW11,GWG12,CADKL14}.\newline


\noindent \textbf{Influence functions.}
In this paper we focus on \emph{linear} (or additive) influence models:
in these models the value of a subset of nodes can be expressed as
a weighted sum of their individual influence.  One important example of such
models is the \emph{voter model} \cite{richardson} used to represent the spread
of opinions in a social network: at each time step, a node adopts an opinion
with a probability equal to the fraction of its neighbors sharing this opinion
at the previous time step. Formally, this can be written as a discrete-time
Markov chain over opinion configurations of the network.  In this model
influence maximization amounts to ``converting'' the optimal subset of nodes to
a given opinion at the initial time so as to maximize the number of converts
after a given period of time.  Remarkably, a simple analysis shows that under
this model, the influence function $f$ is additive:
\begin{equation}\label{eq:voter}
    \forall S\subseteq V,\; f(S) = \sum_{u\in S} w_u
\end{equation}
where $w_u, u\in V$ are weights which can be easily computed from the powers of
the transition matrix of the Markov chain. This observation led to the
development of fast algorithms for influence maximization under the voter
model~\cite{even-dar}.\newline

\noindent \textbf{\textsc{NP}-Hardness.} In contrast to standard influence maximization, adaptive seeding is already \textsc{NP}-Hard even for the simplest influence functions such as $f(S) = |S|$ and when all probabilities are one.  We discuss this in the full version of the paper~\cite{full}.
%In the case when $f(S)=|S|$ and all probabilities equal one, the decision problem is whether given a budget $k$ and target value $\ell$ there exists a subset of $X$ of size $k-t$ which yields a solution with expected value of $\ell$ using $t$ nodes in $\mathcal{N}(X)$.  It is easy to see that this problem is \textsc{NP}-hard by reduction from \textsc{Set-Cover}.
%This is equivalent to deciding whether there are $k-t$ nodes in $X$ that have $t$ neighbors in $\mathcal{N}(X)$.  To see this is \textsc{NP}-hard, consider reducing from \textsc{Set-Cover} where there is one node $i$ for each input set $T_i$, $1\leq i\leq n$, with $\neigh{i}= T_i$ and integers $k,\ell$, and the output is ``yes'' if there is a family of $k$ sets in the input which cover at least $\ell$ elements, and ``no'' otherwise.