paper/sections/introduction.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

%The massive adoption of social networking services in recent years creates a unique platform for disseminating information and promoting ideas.  Effective means to spread information through this platform are continuously being developed, and in particular methods for selecting influential users who can trigger large word-of-mouth cascade have been heavily studied throughout the past decade.  Initially studied by Domingos and Richarson~\cite{} and formalized by Kempe  
%
%
%Effective means to spread information through this platform are continuously being developed, and in particular methods for s
The massive adoption of social networking services in recent years creates a unique platform for promoting ideas and spreading information.  The joy and ease of communicating through online social networks leaves traces of behavioral data which allow observing, predicting and even engineering processes of information diffusion.  First posed by Domingos and Richardson~\cite{DR01,RD02} and elegantly formulated and further developed by Kempe, Kleinberg, and Tardos~\cite{KKT03}, \emph{influence maximization} is the algorithmic challenge of selecting individuals who can serve as early adopters of a new idea, product, or technology in a manner that will trigger a large cascade in the social network.  Since its inception, numerous techniques and improvements have been developed ranging from sophisticated predictive models of influence ~\cite{LAH06,RLK10,BHMW11,ACKS13,manuel2013icml,du13nips} to fast approximation methods~\cite{LKGFVG07,MR07,C08,KDD11,borgs2012influence}. 

While there has been a great deal of progress on efficient algorithmic methods for this problem and impressive methods for learning models of influence from data, a fundamental problem has been largely overlooked.  Due to the heavy-tailed degree distribution of social networks, influential users are rare, and thus the application of influence maximization techniques can often become ineffective.

For a concrete example, consider a scenario where the goal is to select influential users who visit an online store and reward them for promoting a product through their social network.  In such a case, if the users who visit the online store are not influential, even the best techniques for identifying influential users would have poor performance.  \newline

\begin{figure}
    \centering
    \includegraphics[scale=0.55]{images/dist.pdf}
    \caption{CDF of the degree distribution of users who liked a post by Kiva on Facebook and that of their friends.}
    \label{fig:para}
\end{figure}


\noindent \textbf{The power of random neighbors.}  Remarkably, while a random individual in a social networks is not likely to be influential, it turns out that she is likely to know someone who is.  Recent work shows that for any network with a power law degree distribution with a small fraction of random edges, with constant probability there is an asymptotic gap between influence (either the total number of neighbors or sum of degrees) of small samples of the network and that of their neighbors~\cite{LS13}.  %Intuitively, as influential nodes are well connected, one can hope that many users know an influential user who can serve as an early adopter. 
The implication is that when considering the accessible users (e.g. those who visit the online store) as random samples from a social network, any algorithm which can use their neighbors as influencers will have dramatic improvement over the direct application of influence maximization over the accessible users.\newline

%\begin{figure*}[t!]
%    \centering
%    \includegraphics[scale=0.60]{images/dist.pdf}
%    \caption{CDF of the degree distribution of seed users and their friends.}
%    \label{fig:para}
%\end{figure*}


\noindent \textbf{Adaptive Seeding.}
An alternative approach to spending the entire budget on the users who are accessible is to follow a two-stage approach.  In the first stage, one can spend a fraction of the budget on the accessible users to invite their friends to participate in the campaign, hope their friends arrive, and in the second stage spend the rest of the budget on the friends who arrived.  This approach, called \emph{adaptive seeding} has been recently formalized in~\cite{singer}.  As in the standard formulation of influence maximization, the setup in adaptive seeding includes a network, a budget, and an influence function. %which encodes the expected number of nodes influenced as a result of selecting a subset in the network.
In addition, there is a subset of nodes $X$, the set of accessible users, and their neighbors $\mathcal{N}(X)$, where each neighbor has some probability indicating their likelihood to arrive if invited.  The framework is a two-stage stochastic optimization problem.  In the first stage any fraction of the budget can be used to select a subset of nodes in $X$, and as a result each neighbor of this set is then realized independently.  In the second stage the remaining budget can be used to select from the set of neighbors that arrived.  The goal is to select a subset of nodes in $X$, s.t. the influence function is maximized in expectation over all possible arrivals of the neighbors.  The main result in~\cite{singer} is a constant factor approximation algorithm for well-studied influence models such as independent cascade and linear threshold.    
%More formally, adaptive seeding is phrased as a two-stage stochastic optimization problem.  

\subsection{A scalable approach to adaptive seeding}
The constant-factor approximation guarantees for adaptive seeding are, at large, a theoretical triumph.
Although the algorithms for adaptive seeding have desirable approximation guarantees~\cite{singer}, they rely on various forms of sampling, which creates significant blowup in the input size.  While such techniques provide strong theoretical guarantees, for social network data sets which are often either large or massive, such approaches are inapplicable.  The natural question is therefore:

\begin{center}
\textit{Is adaptive seeding scalable?}
\end{center}

In this paper our main goal is to develop scalable approaches to adaptive seeding and to provide compelling experimental evidence of the dramatic impact such approaches can have in various applications of influence maximization.  
\newline

\noindent \textbf{Challenges.}  
The challenges in designing adaptive seeding algorithms are due to both the combinatorial and stochastic nature of the problem.  The combinatorial aspect involves the selection of the nodes in the first stage which affects which nodes will realize in the second stage.  As we will later show, this aspect makes the problem $\textsc{NP}$-Hard even for the simplest of objective functions.  The stochastic aspect requires the optimization to be over potentially exponentially-many realizations.  A common approach in stochastic optimization is Sample Average Approximation (SAA).  The main idea behind SAA is to sample realizations of the second stage, solve the optimization problem by taking a solution which is optimal on average over the sampled instances.
 For the instances we are interested in here that have size of over $N=10^5$ nodes, the required number of samples are on the order of $O(N^3)$, the algorithms can have $O(N^2)$ running time, and the total running time is then on the order of $10^{30}$ operations and SAA is simply infeasible.\footnote{In \cite{singer} SAA is used for linear models.  For models such as independent cascade and linear threshold, a different sampling method is introduced which requires similar blowup in the input size to generate a concave relaxation of the optimization problem.}\newline  

%The first of these challenges can be addressed by sampling realizations of the
%neighbors $R$ and averaging over them to get an arbitrarly-accurate
%approximation of the objective function. This approach, which is widely used in
%classical influence maximization, has been succesfully applied in \cite{singer}
%to obtain general guarantees on the approximability of the adaptive seeding
%problem.
%
%The second challenge has been addressed in \cite{singer} by considering relaxed
%an non-adaptive policies which are discussed in Section~\ref{sec:relaxed}.


\noindent \textbf{Stochastic optimization sans sampling.}
In this paper we develop a framework for adaptive seeding which circumvents the use of sampling, while maintaining desirable approximation guarantees.  We say that a policy is \emph{adaptive} if it selects nodes in the first stage, and only after the realization of their neighbors, selects a subset of nodes with the remaining budget in the second stage.  In contract, a \emph{non-adaptive} policy makes all its decisions before the realizations occur. 
The framework we develop in this work involves designing non-adaptive policies which could then be turned into adaptive ones.  At a high level, the main idea would be to use a particular version of non-adaptive policies whose optimal solution is an upper bound to the optimal adaptive policy.  We will then argue that a solution to the non-adaptive problem can be turned into a solution to the adaptive policy, without losing almost any value.  This will therefore reduce our problem to that of designing solutions to the non-adaptive problem we define, for which we then develop specialized algorithms.  The main advantage in the non-adaptive framework is that unlike standard approaches in stochastic optimization, it avoids using sampling.  As we will later discuss, this dramatically reduces the running time of the algorithms, both in theory and in practice.\newline 

%\noindent \textbf{Main results.}
\subsection{Main results}
Our results are for linear models of influence, i.e. models for which the influence of a set can be expressed as the sum of the influence of its members.  While this class does not include models such as the independent cascade and the linear threshold model, it includes the well-studied \emph{voter model}~\cite{holley1975ergodic} and measures such as node degree and click-through-rate of users which serve as a natural proxies of influence in many settings.  In comparison to submodular influence functions, the relative simplicity of linear models allows making substantial progress on this challenging problem.  
Using this framework, we develop two algorithms, both achieving an approximation ratio of $(1-1/e)$ for the adaptive problem.  The first algorithm is implemented through a linear program, which proves to be extremely efficient over instances where there is a large budget.  The second approach is a combinatorial algorithm with the same approximation guarantee which has good theoretical guarantees on its running time and does well on instances with smaller budgets. 

To show the applicability of these approaches, and the potential of adaptive seeding at large, we performed several experiments on real social network data sets.  We collected publicly available data from Facebook on users who expressed interest (``liked'') a certain post from a topic they follow and data on their friends.  The premise here is that such users mimic potential participants in a viral marketing campaign.  The performance of our algorithms on these data sets shows adaptive seeding can have dramatic improvements over standard approaches for influence maximization. 


%\subsection{Stochastic optimization sans sampling}
%A common approach in stochastic optimization is Sample Average Approximation (SAA).
%The main idea behind SAA is to sample realizations of the second stage, solve the optimization problem on the sampled instances, and average the solution.  Often, when the number of samples is polynomial in the input size of the problem.  In our case the problem is too large.
%
%In this paper we show a new technique for solving stochastic optimization problems which does not use sampling.
%
%
%\subsection{Roadmap}
%The main framework we apply in this paper is to develop non-adaptive algorithms which we will then use to create adaptive solutions. At a high level, the main idea would be to use a particular version of non-adaptive algorithms whose optimal solution is an upper bound to the optimal adaptive policy.  We will then argue that a solution to the non-adaptive problem can be turned into a solution to the adaptive policy, without losing almost any value.  This will therefore reduce our problem to that of designing solutions to the non-adaptive problem we define, for which we develop specialized algorithms.   
%
%Beyond the approximation guarantees, the main advantage in the non-adaptive framework is that unlike standard approaches in stochastic optimization, it avoids using sampling.  As we will later discuss, this dramatically reduces the running time the algorithms, both in theory and practice.  
%
%