\documentclass{sig-alternate-2013} \pdfpagewidth=8.5in \pdfpageheight=11in \usepackage[utf8x]{inputenc} \usepackage{amsmath, amsfonts, amssymb, bbm} \usepackage{verbatim} \newcommand{\reals}{\mathbb{R}} \newcommand{\ints}{\mathbb{N}} \renewcommand{\O}{\mathcal{O}} \DeclareMathOperator{\E}{\mathbb{E}} \let\P\relax \DeclareMathOperator{\P}{\mathbb{P}} \newcommand{\ex}[1]{\E\left[#1\right]} \newcommand{\prob}[1]{\P\left[#1\right]} \newcommand{\inprod}[2]{#1 \cdot #2} \newcommand{\neigh}[1]{\mathcal{N}(#1)} \newcommand{\defeq}{\equiv} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} \newtheorem{theorem}{Theorem} \newtheorem{lemma}{Lemma} \newtheorem{corollary}{Corollary} \newtheorem{remark}{Remark} \newtheorem{proposition}{Proposition} \newtheorem{definition}{Definition} \permission{Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s).} \conferenceinfo{WWW 2015 Companion,}{May 18--22, 2015, Florence, Italy.} \copyrightetc{ACM \the\acmcopyr} \crdata{978-1-4503-3473-0/15/05. \\ http://dx.doi.org/10.1145/2740908.2744108} \clubpenalty=10000 \widowpenalty = 10000 \title{Scalable Methods for Adaptively Seeding a Social Network\titlenote{The full version of this work is available as~\cite{full}.}} \numberofauthors{2} \author{ \alignauthor Thibaut Horel\\ \affaddr{Harvard University}\\ \email{thorel@seas.harvard.edu} \alignauthor Yaron Singer\\ \affaddr{Harvard University}\\ \email{yaron@seas.harvard.edu} } \begin{document} \maketitle \begin{abstract} In many applications of influence maximization, one is restricted to select influencers from a set of users who engaged with the topic being promoted, and due to the structure of social networks, these users often rank low in terms of their influence potential. To alleviate this issue, one can consider an adaptive method which selects users in a manner which targets their influential neighbors. The advantage of such an approach is that it leverages the friendship paradox in social networks: while users are often not influential, they often know someone who is. Despite the various complexities in such optimization problems, we show that scalable adaptive seeding is achievable. To show the effectiveness of our methods we collected data from various verticals social network users follow, and applied our methods on it. Our experiments show that adaptive seeding is scalable, and importantly, that it obtains dramatic improvements over standard approaches of information dissemination. \end{abstract} \category{H.2.8}{Database Management}{Database Applications}[Data Mining] \category{F.2.2}{Analysis of Algorithms and Problem Complexity}{Nonnumerical Algorithms and Problems} \section{Introduction} Influence Maximization~\cite{KKT03, DR01} is the algorithmic challenge of selecting a fixed number of individuals who can serve as early adopters of a new idea, product, or technology in a manner that will trigger a large cascade in the social network. In many cases where influence maximization methods are applied one cannot select any user in the network but is limited to some subset of users. In general, we will call the \emph{core set} the set of users an influence maximization campaign can access. When the goal is to select influential users from the core set, the laws governing social networks can lead to poor outcomes: due to the heavy-tailed degree distribution of social networks, high degree nodes are rare, and since influence maximization techniques often depend on the ability to select high degree nodes, a naive application of influence maximization techniques to the core set is ineffective. \begin{figure} \centering \includegraphics[scale=0.55]{images/dist.pdf} \vspace{-20pt} \caption{CDF of the degree distribution of users who liked a post by Kiva on Facebook and that of their friends.} \label{fig:para} \vspace{-15pt} \end{figure} An alternative method recently introduced in~\cite{singer} is a two-stage approach called adaptive seeding. In the first stage, one can spend a fraction of the budget on the core users so that they invite their friends to participate in the campaign, then in the second stage spend the rest of the budget on the influential friends who hopefully have arrived. The idea behind this approach is to leverage a structural phenomenon in social networks known as the friendship paradox~\cite{feld1991}: even though individuals are not likely to have many friends, they likely have a friend that does (``your friends have more friends than you''). Figure~\ref{fig:para} gives an example of such an effect on Facebook. In this work, we present efficient algorithms for adaptive seeding achieving an optimal approximation ratio of $(1-1/e)$. The guarantees of our algorithms hold for linear models of influence. While this class does not include models such as the independent cascade and the linear threshold model, it includes the well-studied \emph{voter model}~\cite{holley1975ergodic}. We then use these algorithms to conduct a series of experiments to show the potential of adaptive approaches for influence maximization both on synthetic and real social networks. %The main component of the experiments involved collecting publicly %available data from Facebook on users who expressed interest (``liked'') %a certain post from a topic they follow and data on their friends. The premise %here is that such users mimic potential participants in a viral marketing %campaign. The results on these data sets suggest that adaptive seeding can %have dramatic improvements over standard influence maximization methods. \section{Framework} \noindent\textbf{Model.} Given a graph $G=(V,E)$, for $S\subseteq V$ we denote by $\neigh{S}$ the neighborhood of $S$. The notion of influence in the graph is captured by a function $f:2^{|V|}\rightarrow \reals_+$ mapping a subset of nodes to a non-negative influence value. In this work, we focus on linear influence functions: $f(S) = \sum_{u\in S} w_u$ where $(w_u)_{u\in V}$ are non-negative weights capturing the influence of individual vertices. The input of the \emph{adaptive seeding} problem is a \emph{core set} of nodes $X\subseteq V$ and for any node $u\in\neigh{X}$ a probability $p_u$ that $u$ realizes if one of its neighbor in $X$ is seeded. The goal is to solve the following problem: \begin{equation}\label{eq:problem} \begin{split} &\max_{S\subseteq X} \sum_{R\subseteq\neigh{S}} p_R \max_{\substack{T\subseteq R\\|T|\leq k-|S|}}f(T)\\ &\text{s.t. }|S|\leq k \end{split} \end{equation} where $p_R$ is the probability that the set $R$ realizes, $p_R \defeq \prod_{u\in R}p_u\prod_{u\in\neigh{S}\setminus R}(1-p_u)$. Intuitively, we want to select at most $k$ nodes in the core set $X$ such that the expected maximum influence which can be derived from the set $R$ of neighbors realizing using the remaining budget is maximal.\newline \noindent\textbf{Non-adaptive Optimization.} We say that a policy is \emph{non-adaptive} if it selects a set of nodes $S \subseteq X$ to be seeded in the first stage and a vector of probabilities $\mathbf{q}\in[0,1]^n$, such that each neighbor $u$ of $S$ which realizes is included in the solution independently with probability $q_u$. The constraint will now be that the budget is only respected in expectation, \emph{i.e.} $|S| + \textbf{p}^T\textbf{q} \leq k$. Formally the optimization problem for non-adaptive policies can be written as: \begin{equation}\label{eq:relaxed} \begin{split} \max_{\substack{S\subseteq X\\\textbf{q}\in[0,1]^n} }& \; \sum_{R\subseteq\neigh{X}} \Big (\prod_{u\in R} p_uq_u\prod_{u\in\neigh{X}\setminus R}(1-p_uq_u) \Big ) f(R)\\ \text{s.t. } & \; |S|+\textbf{p}^T\textbf{q} \leq k,\; q_u \leq \mathbf{1}\{u\in\neigh{S}\} \end{split} \end{equation} where $\mathbf{1}\{E\}$ is the indicator variable of the event $E$. Non-adaptive policies are related to adaptive policies: \begin{proposition}\label{prop:cr} Let $(S,\textbf{q})$ be an $\alpha$-approximate solution to \eqref{eq:relaxed}, then $S$ is an $\alpha$-approximate solution to \eqref{eq:problem}. \end{proposition} \bibliographystyle{abbrv} \bibliography{main} %\balancecolumns \end{document}