summaryrefslogtreecommitdiffstats
path: root/poster_abstract/main.tex
blob: e5688ff0153d2696b9568de0214eb396bf73a38d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
\documentclass{sig-alternate-2013}
\pdfpagewidth=8.5in
\pdfpageheight=11in
\usepackage[utf8x]{inputenc}

\usepackage{amsmath, amsfonts, amssymb, bbm}
\usepackage{verbatim}
\newcommand{\reals}{\mathbb{R}}
\newcommand{\ints}{\mathbb{N}}
\renewcommand{\O}{\mathcal{O}}
\DeclareMathOperator{\E}{\mathbb{E}}
\let\P\relax
\DeclareMathOperator{\P}{\mathbb{P}}
\newcommand{\ex}[1]{\E\left[#1\right]}
\newcommand{\prob}[1]{\P\left[#1\right]}
\newcommand{\inprod}[2]{#1 \cdot #2}
\newcommand{\neigh}[1]{\mathcal{N}(#1)}
\newcommand{\defeq}{\equiv}
\DeclareMathOperator*{\argmax}{argmax}
\DeclareMathOperator*{\argmin}{argmin}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}{Lemma}
\newtheorem{corollary}{Corollary}
\newtheorem{remark}{Remark}
\newtheorem{proposition}{Proposition}
\newtheorem{definition}{Definition}

\permission{Permission to make digital or hard copies of part or all of this
work for personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage, and that copies
bear this notice and the full citation on the first page. Copyrights for
third-party components of this work must be honored. For all other uses,
contact the owner/author(s).  Copyright is held by the author/owner(s).}
\conferenceinfo{WWW 2015 Companion,}{May 18--22, 2015, Florence, Italy.}
\copyrightetc{ACM \the\acmcopyr}
\crdata{978-1-4503-3473-0/15/05. \\
http://dx.doi.org/10.1145/2740908.2744108}

\clubpenalty=10000
\widowpenalty = 10000

\title{Scalable Methods for Adaptively Seeding a Social Network\titlenote{The
full version of this work is available as~\cite{full}.}}
\numberofauthors{2}
\author{
\alignauthor
Thibaut Horel\\
       \affaddr{Harvard University}\\
       \email{thorel@seas.harvard.edu}
\alignauthor
Yaron Singer\\
       \affaddr{Harvard University}\\
       \email{yaron@seas.harvard.edu}
}

\begin{document}

\maketitle

\begin{abstract}
In many applications of influence maximization, one is restricted to select
influencers from a set of users who engaged with the topic being promoted, and
due to the structure of social networks, these users often rank low in terms of
their influence potential. To alleviate this issue, one can consider an
adaptive method which selects users in a manner which targets their influential
neighbors. The advantage of such an approach is that it leverages the
friendship paradox in social networks: while users are often not influential,
they often know someone who is.  

Despite the various complexities in such optimization problems, we show that
scalable adaptive seeding is achievable. To show the effectiveness of our
methods we collected data from various verticals social network users follow,
and applied our methods on it.  Our experiments show that adaptive seeding is
scalable, and importantly, that it obtains dramatic improvements over standard
approaches of information dissemination.
\end{abstract}

\category{H.2.8}{Database Management}{Database Applications}[Data Mining]
\category{F.2.2}{Analysis of Algorithms and Problem Complexity}{Nonnumerical
Algorithms and Problems}

\section{Introduction}

Influence Maximization~\cite{KKT03, DR01} is the algorithmic challenge of
selecting a fixed number of individuals who can serve as early adopters of
a new idea, product, or technology in a manner that will trigger a large
cascade in the social network.  In many cases where influence maximization
methods are applied one cannot select any user in the network but is limited to
some subset of users. In general, we will call the \emph{core set} the set of
users an influence maximization campaign can access.  When the goal is to
select influential users from the core set, the laws governing social
networks can lead to poor outcomes: due to the heavy-tailed degree
distribution of social networks, high degree nodes are rare, and since
influence maximization techniques often depend on the ability to select high
degree nodes, a naive application of influence maximization techniques to the
core set is ineffective.

\begin{figure}
    \centering
    \includegraphics[scale=0.55]{images/dist.pdf}
    \vspace{-20pt}
    \caption{CDF of the degree distribution of users who liked a post by Kiva
    on Facebook and that of their friends.}
    \label{fig:para}
    \vspace{-15pt}
\end{figure}

An alternative method recently introduced in~\cite{singer} is a two-stage
approach called adaptive seeding. In the first stage, one can spend a fraction
of the budget on the core users so that they invite their friends to
participate in the campaign, then in the second stage spend the rest of the
budget on the influential friends who hopefully have arrived.  The idea behind
this approach is to leverage a structural phenomenon in social networks known
as the friendship paradox~\cite{feld1991}: even though individuals are not
likely to have many friends, they likely have a friend that does (``your
friends have more friends than you''). Figure~\ref{fig:para} gives an
example of such an effect on Facebook.

In this work, we present efficient algorithms for adaptive seeding achieving an
optimal approximation ratio of $(1-1/e)$.  The guarantees of our algorithms
hold for linear models of influence. While this class does not include models
such as the independent cascade and the linear threshold model, it includes the
well-studied \emph{voter model}~\cite{holley1975ergodic}. We then use these
algorithms to conduct a series of experiments to show the potential of adaptive
approaches for influence maximization both on synthetic and real social
networks.

%The main component of the experiments involved collecting publicly
%available data from Facebook on users who expressed interest (``liked'')
%a certain post from a topic they follow and data on their friends.  The premise
%here is that such users mimic potential participants in a viral marketing
%campaign.  The results on these data sets suggest that adaptive seeding can
%have dramatic improvements over standard influence maximization methods.

\section{Framework}

\noindent\textbf{Model.} Given a graph $G=(V,E)$, for $S\subseteq  V$ we denote by $\neigh{S}$ the
neighborhood of $S$. The notion of influence in the graph is captured by
a function $f:2^{|V|}\rightarrow \reals_+$ mapping a subset of nodes to
a non-negative influence value. In this work, we focus on linear influence
functions: $f(S) = \sum_{u\in S} w_u$ where $(w_u)_{u\in V}$ are non-negative
weights capturing the influence of individual vertices. The input of the
\emph{adaptive seeding} problem is a \emph{core set} of nodes $X\subseteq V$
and for any node $u\in\neigh{X}$ a probability $p_u$ that $u$ realizes if one
of its neighbor in $X$ is seeded. The goal is to solve the following problem:
\begin{equation}\label{eq:problem}
    \begin{split}
        &\max_{S\subseteq X} \sum_{R\subseteq\neigh{S}} p_R
        \max_{\substack{T\subseteq R\\|T|\leq k-|S|}}f(T)\\
        &\text{s.t. }|S|\leq k
    \end{split}
\end{equation}
where $p_R$ is the probability that the set $R$ realizes, $p_R \defeq
\prod_{u\in R}p_u\prod_{u\in\neigh{S}\setminus R}(1-p_u)$. Intuitively, we want
to select at most $k$ nodes in the core set $X$ such that the expected maximum
influence which can be derived from the set $R$ of neighbors realizing using
the remaining budget is maximal.\newline

\noindent\textbf{Non-adaptive Optimization.} We say that a policy is
\emph{non-adaptive} if it selects a set of nodes $S \subseteq X$ to be seeded
in the first stage and a vector of probabilities $\mathbf{q}\in[0,1]^n$, such
that each neighbor $u$ of $S$ which realizes is included in the solution
independently with probability $q_u$.  The constraint will now be that the
budget is only respected in expectation, \emph{i.e.}  $|S|
+ \textbf{p}^T\textbf{q} \leq k$. Formally the optimization problem for
non-adaptive policies can be written as:
\begin{equation}\label{eq:relaxed}
    \begin{split}
        \max_{\substack{S\subseteq X\\\textbf{q}\in[0,1]^n} }& \;
    \sum_{R\subseteq\neigh{X}} \Big (\prod_{u\in R} p_uq_u\prod_{u\in\neigh{X}\setminus
    R}(1-p_uq_u) \Big )
 f(R)\\
    \text{s.t. } & \; |S|+\textbf{p}^T\textbf{q} \leq k,\;
q_u \leq \mathbf{1}\{u\in\neigh{S}\}
\end{split}
\end{equation}
where $\mathbf{1}\{E\}$ is the indicator variable of the event $E$.
Non-adaptive policies are related to adaptive policies:
\begin{proposition}\label{prop:cr}
    Let $(S,\textbf{q})$ be an $\alpha$-approximate solution to
    \eqref{eq:relaxed}, then $S$ is an $\alpha$-approximate solution to
    \eqref{eq:problem}.
\end{proposition}  

\bibliographystyle{abbrv}
\bibliography{main}

%\balancecolumns
\end{document}