\documentclass{sig-alternate-2013}
\pdfpagewidth=8.5in
\pdfpageheight=11in
\usepackage[utf8x]{inputenc}
\usepackage[english]{babel}
\usepackage{microtype}
\usepackage{caption}
\usepackage{subcaption}

\usepackage{amsmath, amsfonts, amssymb, bbm}
\usepackage{verbatim}
\newcommand{\reals}{\mathbb{R}}
\newcommand{\ints}{\mathbb{N}}
\renewcommand{\O}{\mathcal{O}}
\DeclareMathOperator{\E}{\mathbb{E}}
\let\P\relax
\DeclareMathOperator{\P}{\mathbb{P}}
\newcommand{\ex}[1]{\E\left[#1\right]}
\newcommand{\prob}[1]{\P\left[#1\right]}
\newcommand{\inprod}[2]{#1 \cdot #2}
\newcommand{\neigh}[1]{\mathcal{N}(#1)}
\newcommand{\defeq}{\equiv}
\DeclareMathOperator*{\argmax}{argmax}
\DeclareMathOperator*{\argmin}{argmin}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}{Lemma}
\newtheorem{corollary}{Corollary}
\newtheorem{remark}{Remark}
\newtheorem{proposition}{Proposition}
\newtheorem{definition}{Definition}

\permission{Permission to make digital or hard copies of part or all of this
work for personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage, and that copies
bear this notice and the full citation on the first page. Copyrights for
third-party components of this work must be honored. For all other uses,
contact the owner/author(s).  Copyright is held by the author/owner(s).}
\conferenceinfo{WWW 2015 Companion,}{May 18--22, 2015, Florence, Italy.}
\copyrightetc{ACM \the\acmcopyr}
\crdata{978-1-4503-3473-0/15/05. \\
http://dx.doi.org/10.1145/2740908.2744108}

\clubpenalty=10000
\widowpenalty = 10000

\title{Scalable Methods for Adaptively Seeding a Social Network\titlenote{The
full version of this work is available as~\cite{full}.}}
\numberofauthors{2}
\author{
\alignauthor
Thibaut Horel\\
       \affaddr{Harvard University}\\
       \email{thorel@seas.harvard.edu}
\alignauthor
Yaron Singer\\
       \affaddr{Harvard University}\\
       \email{yaron@seas.harvard.edu}
}

\begin{document}

\maketitle

\begin{abstract}
In many applications of influence maximization, one is restricted to select
influencers from a set of users who engaged with the topic being promoted, and
due to the structure of social networks, these users often rank low in terms of
their influence potential. To alleviate this issue, one can consider an
adaptive method which selects users in a manner which targets their influential
neighbors. The advantage of such an approach is that it leverages the
friendship paradox in social networks: while users are often not influential,
they often know someone who is.  

Despite the various complexities in such optimization problems, we show that
scalable adaptive seeding is achievable. To show the effectiveness of our
methods we collected data from various verticals social network users follow,
and applied our methods on it.  Our experiments show that adaptive seeding is
scalable, and that it obtains dramatic improvements over standard
approaches of information dissemination.
\end{abstract}

\category{H.2.8}{Database Management}{Database Applications}[Data Mining]
\category{F.2.2}{Analysis of Algorithms and Problem Complexity}{Nonnumerical
Algorithms and Problems}

\section{Introduction}

Influence Maximization~\cite{DR01} is the algorithmic challenge of
selecting a fixed number of individuals who can serve as early adopters of
a new idea, product, or technology in a manner that will trigger a large
cascade in the social network.  In many cases where influence maximization
methods are applied one cannot select any user in the network but is limited to
some subset of users. In general, we will call the \emph{core set} the set of
users an influence maximization campaign can access.  When the goal is to
select influential users from the core set, the laws governing social
networks can lead to poor outcomes: due to the heavy-tailed degree
distribution of social networks, high degree nodes are rare, and since
influence maximization techniques often depend on the ability to select high
degree nodes, a naive application of influence maximization techniques to the
core set is ineffective.

\begin{figure}
    \centering
    \includegraphics[scale=0.55]{images/dist.pdf}
    \vspace{-20pt}
    \caption{\small{CDF of the degree distribution of users who liked a post by Kiva
    on Facebook and that of their friends.}}
    \label{fig:para}
    \vspace{-15pt}
\end{figure}

An alternative method recently introduced in~\cite{singer} is a two-stage
approach called adaptive seeding. In the first stage, one can spend a fraction
of the budget on the core users so that they invite their friends to
participate in the campaign, then in the second stage spend the rest of the
budget on the influential friends who hopefully have arrived.  The idea behind
this approach is to leverage a structural phenomenon in social networks known
as the friendship paradox~\cite{feld1991}: even though individuals are not
likely to have many friends, they likely have a friend that does (``your
friends have more friends than you''). Figure~\ref{fig:para} illustrates this
effect on Facebook.

In this work, we present efficient algorithms for adaptive seeding achieving an
optimal approximation ratio of $(1-1/e)$.  The guarantees of our algorithms
hold for linear models of influence. While this class does not include models
such as the independent cascade and the linear threshold model, it includes the
well-studied \emph{voter model}~\cite{holley1975ergodic}. We then use these
algorithms to conduct a series of experiments to show the potential of adaptive
approaches for influence maximization both on synthetic and real social
networks.

%The main component of the experiments involved collecting publicly
%available data from Facebook on users who expressed interest (``liked'')
%a certain post from a topic they follow and data on their friends.  The premise
%here is that such users mimic potential participants in a viral marketing
%campaign.  The results on these data sets suggest that adaptive seeding can
%have dramatic improvements over standard influence maximization methods.

\section{Framework}

\noindent\textbf{Model.} Given a graph $G=(V,E)$, for $S\subseteq  V$ we denote
by $\neigh{S}$ the neighborhood of $S$. The notion of influence in the graph is
captured by a function $f:2^{|V|}\rightarrow \reals_+$ mapping a subset of
nodes to a non-negative influence value. In this work, we focus on linear
influence functions: $f(S) = \sum_{u\in S} w_u$ where $(w_u)_{u\in V}$ are
non-negative weights capturing the influence of individual vertices. The input
of the \emph{adaptive seeding} problem is a \emph{core set} of nodes
$X\subseteq V$ and for any node $u\in\neigh{X}$ a probability $p_u$ that $u$
realizes if one of its neighbor in $X$ is seeded. The goal is to solve:
\begin{equation}\label{eq:problem}
    \begin{split}
        &\max_{S\subseteq X} \sum_{R\subseteq\neigh{S}} p_R
        \max_{\substack{T\subseteq R\\|T|\leq k-|S|}}f(T)\\
        &\text{s.t. }|S|\leq k
    \end{split}
\end{equation}
where $p_R$ is the probability that the set $R$ realizes, $p_R \defeq
\prod_{u\in R}p_u\prod_{u\in\neigh{S}\setminus R}(1-p_u)$. Intuitively, we want
to select at most $k$ nodes in the core set $X$ such that the expected maximum
influence which can be derived from the set $R$ of neighbors realizing using
the remaining budget is maximal.\newline

\noindent\textbf{Non-adaptive Optimization.} We say that a policy is
\emph{non-adaptive} if it selects a set of nodes $S \subseteq X$ to be seeded
in the first stage and a vector of probabilities $\mathbf{q}\in[0,1]^n$, such
that each neighbor $u$ of $S$ which realizes is included in the solution
independently with probability $q_u$.  The constraint will now be that the
budget is only respected in expectation, \emph{i.e.}  $|S|
+ \textbf{p}^T\textbf{q} \leq k$. Formally the optimization problem for
non-adaptive policies can be written as:
\begin{equation}\label{eq:relaxed}
    \begin{split}
        \max_{\substack{S\subseteq X\\\textbf{q}\in[0,1]^n} }& \;
    \sum_{R\subseteq\neigh{X}} \Big (\prod_{u\in R} p_uq_u\prod_{u\in\neigh{X}\setminus
    R}(1-p_uq_u) \Big )
 f(R)\\
    \text{s.t. } & \; |S|+\textbf{p}^T\textbf{q} \leq k,\;
q_u \leq \mathbf{1}\{u\in\neigh{S}\}
\end{split}
\end{equation}
where $\mathbf{1}\{E\}$ is the indicator variable of the event $E$.
Non-adaptive policies are related to adaptive policies:
\begin{proposition}\label{prop:cr}
    Let $(S,\textbf{q})$ be an $\alpha$-approximate solution to
    \eqref{eq:relaxed}, then $S$ is an $\alpha$-approximate solution to
    \eqref{eq:problem}.
\end{proposition}  

\begin{figure}[t]
    \centerline{ \includegraphics[width=0.4\textwidth]{images/comp2.pdf} }
    \vspace{-10pt}
    \caption{\small{Ratio of the performance of adaptive seeding to \textsf{IM}. Bars
    represents the mean improvement across all verticals, and the ``error bars''
represents the range of improvement across verticals.}}
    \label{fig:compare}
    \vspace{-15pt}
\end{figure}

\section{Algorithms}

Proposition~\ref{prop:cr} allows us to focus on designing non-adaptive policies
for \eqref{eq:relaxed} which is easier to solve than \eqref{eq:problem}.

Our first algorithm is obtained by considering a relaxation of
\eqref{eq:relaxed} where the binary choices of including vertices in $S$ are
relaxed to fractional values. The solution must then be rounded using the
Pipage Rounding framework.

The second algorithm is combinatorial: first, we
note that for additive influence functions and for fixed $S$, the maximization
over $\mathbf{q}$ in \eqref{eq:relaxed} is a simple fractional knapsack problem
which can be solved efficiently. Furthermore, the optimal value of this problem
is a monotone submodular function of $S$. Our algorithm can thus be obtained by
applying the celebrated greedy algorithm for monotone submodular maximization
where we repeatedly solve fractional knapsack problems when greedily
constructing the solution.

Both algorithms achieve an optimal $(1-1/e)$ approximation ratio. The first
algorithm is extremely efficient over instances where there is a large budget.
The second algorithm can be easily parallelized and implemented in MapReduce,
has good theoretical guarantees on its running time and does well on instances
with smaller budgets.

\section{Experiments}

The main component of our experiments involved collecting publicly available
data from Facebook.  Despite the extreme difficulty of collecting such data, we
were able to collect large networks. For 10 several Facebook Pages, each
associated with a commercial entity that uses the Facebook page to communicate
with its followers, we selected a post and then collected data about the users
who expressed interest (``liked'') the post and their friends.  The advantage
of this data set is that it is highly representative of the scenario we study
here. We focused on posts which were liked by about 1,000 users, which when we
include their friends, leads to networks of about 100,000 users.


\begin{figure}[t]
    \begin{subfigure}[t]{0.23\textwidth}
    \includegraphics[scale=0.48]{images/prob.pdf}
    \vspace{-15pt}
    \caption{}
    \label{fig:prob}
\end{subfigure}
\hspace{1pt}
\begin{subfigure}[t]{0.23\textwidth}
    \includegraphics[scale=0.48]{images/hbo_likes.pdf}
    \vspace{-15pt}
    \caption{}
    \label{fig:killer}
     \end{subfigure}
     \vspace{-10pt}
 \caption{\small{(a) Performance of adaptive seeding for various propagation
     probabilities. (b) Performance of \emph{adaptive seeding} when restricted
 to the subgraph of users who \emph{liked} HBO (red line).}}
 \vspace{-15pt}
\end{figure}

Figure~\ref{fig:compare} compares the performance of our approach to running
the standard influence maximization (IM) approach to the core set.
Figure~\ref{fig:prob} shows the impact of the probability of neighbors
realizing, while Figure~\ref{fig:killer} shows the performance of adaptive
seeding when restricted to users who previously expressed interest in the
vertical and for which we could expect the probability of realizing to be close
to one.  These results suggest that adaptive seeding can have dramatic
improvements over standard IM. \cite{full} contains additional experiments to
analyze the impact of various parameters as well as evaluations on synthetic
data.\newline

\noindent\textbf{Acknowledgement.}
This research is supported in part by a Google Research Grant and NSF grant
CCF-1301976.

\bibliographystyle{abbrv}
\bibliography{main}

%\balancecolumns
\end{document}