\documentclass{sig-alternate-2013} \pdfpagewidth=8.5in \pdfpageheight=11in \usepackage[utf8x]{inputenc} \usepackage[english]{babel} \usepackage{microtype} \usepackage{caption} \usepackage{subcaption} \usepackage{amsmath, amsfonts, amssymb, bbm} \usepackage{verbatim} \newcommand{\reals}{\mathbb{R}} \newcommand{\ints}{\mathbb{N}} \renewcommand{\O}{\mathcal{O}} \DeclareMathOperator{\E}{\mathbb{E}} \let\P\relax \DeclareMathOperator{\P}{\mathbb{P}} \newcommand{\ex}[1]{\E\left[#1\right]} \newcommand{\prob}[1]{\P\left[#1\right]} \newcommand{\inprod}[2]{#1 \cdot #2} \newcommand{\neigh}[1]{\mathcal{N}(#1)} \newcommand{\defeq}{\equiv} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \newtheorem{theorem}{Theorem} \newtheorem{lemma}{Lemma} \newtheorem{corollary}{Corollary} \newtheorem{remark}{Remark} \newtheorem{proposition}{Proposition} \newtheorem{definition}{Definition} \permission{Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s).} \conferenceinfo{WWW 2015 Companion,}{May 18--22, 2015, Florence, Italy.} \copyrightetc{ACM \the\acmcopyr} \crdata{978-1-4503-3473-0/15/05. \\ http://dx.doi.org/10.1145/2740908.2744108} \clubpenalty=10000 \widowpenalty = 10000 \title{Scalable Methods for Adaptively Seeding a Social Network\titlenote{The full version of this work is available as~\cite{full}.}} \numberofauthors{2} \author{ \alignauthor Thibaut Horel\\ \affaddr{Harvard University}\\ \email{thorel@seas.harvard.edu} \alignauthor Yaron Singer\\ \affaddr{Harvard University}\\ \email{yaron@seas.harvard.edu} } \begin{document} \maketitle \begin{abstract} In many applications of influence maximization, one is restricted to select influencers from a set of users who engaged with the topic being promoted, and due to the structure of social networks, these users often rank low in terms of their influence potential. To alleviate this issue, one can consider an adaptive method which selects users in a manner which targets their influential neighbors. The advantage of such an approach is that it leverages the friendship paradox in social networks: while users are often not influential, they often know someone who is. Despite the various complexities in such optimization problems, we show that scalable adaptive seeding is achievable. To show the effectiveness of our methods we collected data from various verticals social network users follow, and applied our methods on it. Our experiments show that adaptive seeding is scalable, and that it obtains dramatic improvements over standard approaches of information dissemination. \end{abstract} \category{H.2.8}{Database Management}{Database Applications}[Data Mining] \category{F.2.2}{Analysis of Algorithms and Problem Complexity}{Nonnumerical Algorithms and Problems} \section{Introduction} Influence Maximization~\cite{DR01} is the algorithmic challenge of selecting a fixed number of individuals who can serve as early adopters of a new idea, product, or technology in a manner that will trigger a large cascade in the social network. In many cases where influence maximization methods are applied one cannot select any user in the network but is limited to some subset of users. In general, we will call the \emph{core set} the set of users an influence maximization campaign can access. When the goal is to select influential users from the core set, the laws governing social networks can lead to poor outcomes: due to the heavy-tailed degree distribution of social networks, high degree nodes are rare, and since influence maximization techniques often depend on the ability to select high degree nodes, a naive application of influence maximization techniques to the core set is ineffective. \begin{figure} \centering \includegraphics[scale=0.55]{images/dist.pdf} \vspace{-20pt} \caption{\small{CDF of the degree distribution of users who liked a post by Kiva on Facebook and that of their friends.}} \label{fig:para} \vspace{-15pt} \end{figure} An alternative method recently introduced in~\cite{singer} is a two-stage approach called adaptive seeding. In the first stage, one can spend a fraction of the budget on the core users so that they invite their friends to participate in the campaign, then in the second stage spend the rest of the budget on the influential friends who hopefully have arrived. The idea behind this approach is to leverage a structural phenomenon in social networks known as the friendship paradox~\cite{feld1991}: even though individuals are not likely to have many friends, they likely have a friend that does (``your friends have more friends than you''). Figure~\ref{fig:para} illustrates this effect on Facebook. In this work, we present efficient algorithms for adaptive seeding achieving an optimal approximation ratio of $(1-1/e)$. The guarantees of our algorithms hold for linear models of influence. While this class does not include models such as the independent cascade and the linear threshold model, it includes the well-studied \emph{voter model}~\cite{holley1975ergodic}. We then use these algorithms to conduct a series of experiments to show the potential of adaptive approaches for influence maximization both on synthetic and real social networks. %The main component of the experiments involved collecting publicly %available data from Facebook on users who expressed interest (``liked'') %a certain post from a topic they follow and data on their friends. The premise %here is that such users mimic potential participants in a viral marketing %campaign. The results on these data sets suggest that adaptive seeding can %have dramatic improvements over standard influence maximization methods. \section{Framework} \noindent\textbf{Model.} Given a graph $G=(V,E)$, for $S\subseteq V$ we denote by $\neigh{S}$ the neighborhood of $S$. The notion of influence in the graph is captured by a function $f:2^{|V|}\rightarrow \reals_+$ mapping a subset of nodes to a non-negative influence value. In this work, we focus on linear influence functions: $f(S) = \sum_{u\in S} w_u$ where $(w_u)_{u\in V}$ are non-negative weights capturing the influence of individual vertices. The input of the \emph{adaptive seeding} problem is a \emph{core set} of nodes $X\subseteq V$ and for any node $u\in\neigh{X}$ a probability $p_u$ that $u$ realizes if one of its neighbor in $X$ is seeded. The goal is to solve: \begin{equation}\label{eq:problem} \begin{split} &\max_{S\subseteq X} \sum_{R\subseteq\neigh{S}} p_R \max_{\substack{T\subseteq R\\|T|\leq k-|S|}}f(T)\\ &\text{s.t. }|S|\leq k \end{split} \end{equation} where $p_R$ is the probability that the set $R$ realizes, $p_R \defeq \prod_{u\in R}p_u\prod_{u\in\neigh{S}\setminus R}(1-p_u)$. Intuitively, we want to select at most $k$ nodes in the core set $X$ such that the expected maximum influence which can be derived from the set $R$ of neighbors realizing using the remaining budget is maximal.\newline \noindent\textbf{Non-adaptive Optimization.} We say that a policy is \emph{non-adaptive} if it selects a set of nodes $S \subseteq X$ to be seeded in the first stage and a vector of probabilities $\mathbf{q}\in[0,1]^n$, such that each neighbor $u$ of $S$ which realizes is included in the solution independently with probability $q_u$. The constraint will now be that the budget is only respected in expectation, \emph{i.e.} $|S| + \textbf{p}^T\textbf{q} \leq k$. Formally the optimization problem for non-adaptive policies can be written as: \begin{equation}\label{eq:relaxed} \begin{split} \max_{\substack{S\subseteq X\\\textbf{q}\in[0,1]^n} }& \; \sum_{R\subseteq\neigh{X}} \Big (\prod_{u\in R} p_uq_u\prod_{u\in\neigh{X}\setminus R}(1-p_uq_u) \Big ) f(R)\\ \text{s.t. } & \; |S|+\textbf{p}^T\textbf{q} \leq k,\; q_u \leq \mathbf{1}\{u\in\neigh{S}\} \end{split} \end{equation} where $\mathbf{1}\{E\}$ is the indicator variable of the event $E$. Non-adaptive policies are related to adaptive policies: \begin{proposition}\label{prop:cr} Let $(S,\textbf{q})$ be an $\alpha$-approximate solution to \eqref{eq:relaxed}, then $S$ is an $\alpha$-approximate solution to \eqref{eq:problem}. \end{proposition} \begin{figure}[t] \centerline{ \includegraphics[width=0.4\textwidth]{images/comp2.pdf} } \vspace{-10pt} \caption{\small{Ratio of the performance of adaptive seeding to \textsf{IM}. Bars represents the mean improvement across all verticals, and the ``error bars'' represents the range of improvement across verticals.}} \label{fig:compare} \vspace{-15pt} \end{figure} \section{Algorithms} Proposition~\ref{prop:cr} allows us to focus on designing non-adaptive policies for \eqref{eq:relaxed} which is easier to solve than \eqref{eq:problem}. Our first algorithm is obtained by considering a relaxation of \eqref{eq:relaxed} where the binary choices of including vertices inĀ $S$ are relaxed to fractional values. The solution must then be rounded using the Pipage Rounding framework. The second algorithm is combinatorial: first, we note that for additive influence functions and for fixed $S$, the maximization over $\mathbf{q}$ in \eqref{eq:relaxed} is a simple fractional knapsack problem which can be solved efficiently. Furthermore, the optimal value of this problem is a monotone submodular function of $S$. Our algorithm can thus be obtained by applying the celebrated greedy algorithm for monotone submodular maximization where we repeatedly solve fractional knapsack problems when greedily constructing the solution. Both algorithms achieve an optimal $(1-1/e)$ approximation ratio. The first algorithm is extremely efficient over instances where there is a large budget. The second algorithm can be easily parallelized and implemented in MapReduce, has good theoretical guarantees on its running time and does well on instances with smaller budgets. \section{Experiments} The main component of our experiments involved collecting publicly available data from Facebook. Despite the extreme difficulty of collecting such data, we were able to collect large networks. For 10 several Facebook Pages, each associated with a commercial entity that uses the Facebook page to communicate with its followers, we selected a post and then collected data about the users who expressed interest (``liked'') the post and their friends. The advantage of this data set is that it is highly representative of the scenario we study here. We focused on posts which were liked by about 1,000 users, which when we include their friends, leads to networks of about 100,000 users. \begin{figure}[t] \begin{subfigure}[t]{0.23\textwidth} \includegraphics[scale=0.48]{images/prob.pdf} \vspace{-15pt} \caption{} \label{fig:prob} \end{subfigure} \hspace{1pt} \begin{subfigure}[t]{0.23\textwidth} \includegraphics[scale=0.48]{images/hbo_likes.pdf} \vspace{-15pt} \caption{} \label{fig:killer} \end{subfigure} \vspace{-10pt} \caption{\small{(a) Performance of adaptive seeding for various propagation probabilities. (b) Performance of \emph{adaptive seeding} when restricted to the subgraph of users who \emph{liked} HBO (red line).}} \vspace{-15pt} \end{figure} Figure~\ref{fig:compare} compares the performance of our approach to running the standard influence maximization (IM) approach to the core set. Figure~\ref{fig:prob} shows the impact of the probability of neighbors realizing, while Figure~\ref{fig:killer} shows the performance of adaptive seeding when restricted to users who previously expressed interest in the vertical and for which we could expect the probability of realizing to be close to one. These results suggest that adaptive seeding can have dramatic improvements over standard IM. \cite{full} contains additional experiments to analyze the impact of various parameters as well as evaluations on synthetic data.\newline \noindent\textbf{Acknowledgement.} This research is supported in part by a Google Research Grant and NSF grant CCF-1301976. \bibliographystyle{abbrv} \bibliography{main} %\balancecolumns \end{document}