summaryrefslogtreecommitdiffstats
path: root/paper/sections/experiments.tex
blob: 201066a6383da66eadd53951e20cb6c609558a18 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
In this section we validate the adaptive seeding approach through experimentation.
%In the sections above we showed how to design algorithms that have provable performance guarantees for adaptive seeding, which and even parallelizable.    
Specifically, we show that our algorithms for adaptive seeding obtain
significant improvement over standard influence maximization, that these
improvements are robust to changes in environment variables, and that our
approach is efficient in terms of running-time and scalable to large social
networks.


\subsection{Experimental setup}
We tested our algorithms on three types of datasets. Each of them allows us to
experiment on a different aspect of the adaptive seeding problem. The Facebook
Pages dataset that we collected ourselves has a central place in our
experiments since it is the one which is closet to actual applications of
adaptive seeding.

\textbf{Synthetic networks.} Using standard models of social networks we
generated large-scale graphs to model the social network.  To emulate the
process of users following a topic (the core set $X$) we sampled subsets of
nodes at random, and applied our algorithms on the sample and their neighbors.
The main advantage of these data sets is that they allow us to generate graphs
of arbitrary sizes and experiment with various parameters that govern the
structure of the graph.  The disadvantages are that users who follow a topic
are not necessarily random samples, and that social networks often have
structural properties that are not captured in generative models.

\textbf{Real networks.}  We used publicly available data sets of real social
networks available at \cite{snapnets}. As for synthetic networks, we used
a random sample of nodes to emulate users who follow a topic, which is the main
disadvantage of this approach.  The advantage however, is that such datasets
contain an entire network which allows testing different propagation
parameters.   
  
\textbf{Facebook Pages.}  We collected data from several Facebook Pages, each
associated with a commercial entity that uses the Facebook page to communicate
with its followers. For each page, we selected a post and then collected data
about the users who expressed interest (``liked'') the post and their friends.
The advantage of this data set is that it is highly representative of the
scenario we study here. Campaigns run on a social network will primarily target
users who have already expressed interests in the topic being promoted. The
main disadvantage of this method is that such data is extremely difficult to
collect due to the crawling restrictions that Facebook applies and gives us
only the 2-hop neighborhood around a post. This makes it difficult to
experiment with different propagation parameters. Fortunately, as we soon
discuss, we were able to circumvent some of the crawling restrictions and
collect large networks, and the properties of the voter influence model are
such that these datasets suffice to accurately account for influence
propagation in the graph.\newline


\begin{figure}[t]
    \centering
    \includegraphics[width=0.4\textwidth]{images/para.pdf}
    \caption{Comparison of the average degree of the core set users and the
    average degree of their friends.}
    \label{fig:paradox}
    \vspace{-10pt}
\end{figure}

\noindent\textbf{Data collection.}
We selected Facebook Pages in different verticals (topics). Each page is
operated by an institution or an entity whose associated Facebook Page is
regularly used for promotional posts related to this topic. On each of these
pages, we selected a recent post (posted no later than January 2014) with
approximately 1,000 \emph{likes}. The set of users who liked those posts
constitute our core set.  We then crawled the social network of
these sets: for each user, we collected her list of friends, and the degrees
(number of friends) of these friends.\newline


\noindent\textbf{Data description.} Among the several verticals we collected,
we select eight of them for which we will report our results. We obtained
similar results for the other ones.  Table~\ref{tab:data} summarizes statistics
about the selected verticals.  We note that depending on the privacy settings
of the core set users, it was not always possible to access their list of
friends.  We decided to remove these users since their ability to spread
information could not be readily determined. This effect, combined with various
errors encountered during the data collection, accounts for an approximate 15\%
reduction between the users who liked a post and the number of users in the
datasets we used.  Following our discussion in the introduction, we observe
that on average, the degrees of core set users is much lower than the degrees of
their friends. This is highlighted on Figure~\ref{fig:paradox} and justifies
our approach.

\begin{table}[t]
    \small
    \centering
    \setlength{\tabcolsep}{3pt}
    \begin{tabular}{llrr}
    \toprule
    Vertical & Page & $m$ & $n$ \\%& $S$ & $F$\\
    \midrule
    Charity & Kiva & 978 & 131334 \\%& 134.29 & 1036.26\\
    Travel & Lonely Planet & 753 & 113250 \\%& 150.40 & 898.50\\
    %Public Action & LaManifPourTous & 1041 & 97959 \\%& 94.10 & 722.02\\
    Fashion & GAP & 996 & 115524 \\%& 115.99 & 681.98\\
    Events & Coachella & 826 & 102291 \\%& 123.84 & 870.16\\
    Politics & Green Party & 1044 & 83490 \\%& 79.97 & 1053.25\\
    Technology & Google Nexus & 895 & 137995 \\%& 154.19 & 827.84\\
    News & The New York Times & 894 & 156222 \\%& 174.74 & 1033.94 \\
    %Consumption & Peet's & 776 & 56268 \\%& 72.51 & 520.47\\
    Entertainment & HBO & 828 & 108699 \\%& 131.28 & 924.09\\
    \bottomrule
\end{tabular}
\caption{Dataset statistics. $m$: number of users in the core set, $n$: number
of friends of core set users.}
%$S$: avg. degree of an initial user, $F$: avg. degree of a friend of an initial user.}
\label{tab:data}
    \vspace{-10pt}
\end{table}

\subsection{Performance of Adaptive Seeding}
\label{sec:performance} For a given problem instance with a budget of $k$ we
applied the adaptive seeding algorithm (the combinatorial version). Recall from
Section~\ref{sec:model} that performance is defined as the expected influence
that the seeder can obtain by optimally selecting users on the second stage,
where \emph{influence} is defined as the sum of the degrees of the selected
users. We tested our algorithm against the following benchmarks:

\begin{itemize}
    \item \emph{Random Node} (\textsf{RN}): we randomly select $k$ users from
        the core set.  This is a typical benchmark in comparing influence
        maximization algorithms~\cite{KKT03}.
   \item \emph{Influence Maximization} (\textsf{IM}): we apply the optimal
       influence maximization algorithm on the core set.  This is the naive
       application of influence maximization. For the voter model, when the
       propagation time is polynomially large in the network size, the optimal
       solution is to simply take the $k$ highest degree nodes~\cite{even-dar}.
       We study the case of bounded time horizons in Section~\ref{sec:inf}.
    \item \emph{Random Friend} (\textsf{RF}): we implement a naive two-stage approach:
        randomly select $k/2$ nodes from the core set, and for each
        node select a random neighbor (hence spending the budget of $k$ rewards
        overall).  This method was recently shown to outperform standard
        influence maximization when the core set is random~\cite{LS13}.
\end{itemize}


\begin{figure*}[t]
    \centerline{\includegraphics[width=0.99\textwidth]{images/perf2.pdf}}
    \vspace{-5pt}
    \caption{\small{Performance of adaptive seeding compared to other influence
    maximization approaches. The horizontal axis represents the budget used as
a fraction of the size of the core set. The vertical axis is the
expected influence reachable by optimally selecting nodes on the second stage.}}
    \label{fig:performance}
    \vspace{-10pt}
\end{figure*}

\subsection{Performance on Facebook Pages} Figure~\ref{fig:performance}
compares the performance of \emph{adaptive seeding}, our own approach, to the
afore-mentioned approaches for all the verticals we collected.  In this first
experiment we made simplifying assumptions about the parameters of the model.
The first assumption is that all probabilities in the adaptive seeding model
are equal to one. This implicitly assumes that every friend of a user who
followed a certain topic is interested in promoting the topic given a reward.
Although this is a strong assumption that we will revisit, we note that the
probabilities can be controlled to some extent by the social networking service
on which the campaign is being run by showing prominently the campaign material
(sponsored links, fund-raising banners, etc.).  The second assumption is that
the measure of influence is the sum of the degrees of the selected set.  This
measure is an appealing proxy as it is known that in the voter model, after
polynomially many time steps, the influence of each node is proportional to its
degree with high probability~\cite{even-dar}. Since the influence process
cannot be controlled by the designer, the assumption is often that the
influence process runs until it stabilizes (in linear thresholds and
independent cascades for example, the process terminates after a linear number
of steps~\cite{KKT03}).  We perform a set of experiments for different time
horizons in Section~\ref{sec:inf}.

 

%One can conside    
%
%In the next sections we perform tests with different parameters of probabilities and time horizons.  We   
%
%\begin{itemize}
%\item In the first experiment represented the probability of initial users'
%friends joining on the second day was set to one, i.e. it assumes that every friend of a user who followed a certain topic is interested in promoting the topic given a reward.
%In the following sections we analyze cases when the probabilities vary.  
%We note however that this assumption is not unrealistic since the probabilities can be controlled to some extent by the social networking service on which the campaign is being run.  By showing prominently the campaign
%material (sponsored links, fund-raising banners, etc.).
%%, the conversion rate can be increased beyond what would happen via regular word-of-mouth propagation.
%
%\item 
%The measure of influence reported in this experiment, is the sum of the degrees of the selected set.  
%This measure is an appealing proxy due to known results about influence maximization.  It is shown in~\cite{ES08} that in the voter model, after polynomially many time steps the influence of each node is proportional to its degree, with high probability.  Since the influence process cannot be controlled by the designer, the assumption is often that the influence process runs until it stabilizes (in linear thresholds and independent cascades for example, the process terminates after a linear number of steps~\cite{KKT03}).  We perform a set of experiments for different time horizons in the following section.     
%\end{itemize}

%\noindent\textbf{Performance.}
It is striking to see how well adaptive seeding does in comparison to other
methods.  Even when using a small budget (0.1 fraction of the core set, which
in these cases is about 100 nodes), adaptive seeding improves influence by
a factor of at least 10, across all verticals.  To confirm this, we plot the
relative improvements of \emph{adaptive seeding} over \textsf{IM} in aggregate
over the different pages.  The results are shown in Figure~\ref{fig:compare}.
This dramatic improvement is largely due to the friendship paradox phenomenon
that adaptive seeding leverages.  Returning to Figure~\ref{fig:performance}, it
is also interesting to note that the \textsf{RF} heuristic significantly
outperforms the standard \textsf{IM} benchmark.  Using the same budget, the
degree gain induced by moving from the core set to its neighborhood is such
that selecting at random among the core set users' friends already does better
than the best heuristic restricted only on the core set. Using \emph{adaptive
seeding} to optimize the choice of core set users based on their friends'
degrees then results in an order of magnitude increase over \textsf{RF},
consistently for all the pages.\newline

\begin{figure}[t]
    \vspace{-10pt}
    \centerline{ \includegraphics[width=0.4\textwidth]{images/comp2.pdf} }
    \vspace{-10pt}
    \caption{Ratio of the performance of adaptive seeding to \textsf{IM}. Bars represents the mean improvement across all verticals, and the ``error bar'' represents the range of improvement across verticals.}
    \label{fig:compare}
    \vspace{-15pt}
\end{figure}

\subsection{The Effect of the Probabilistic Model}
\label{sec:robustness}

The results presented in Section~\ref{sec:performance} were computed assuming
the probabilities in the adaptive seeding model are one.  We now describe
several experiments we performed with the Facebook Pages data set that test the
advantages of adaptive seeding under different probability models.\newline 

%Estimating this probability is a research problem on its own; however,
%we note that it can be controlled to some extent by the social networking
%service on which the campaign is being run. By showing prominently the campaign
%material (sponsored links, fund-raising banners, etc.), the conversion rate can
%be increased beyond what would happen via regular word-of-mouth propagation.
%\newline

\noindent\textbf{Impact of the Bernouilli parameter.} 
Figure~\ref{fig:prob} shows the impact of the probability of nodes realizing in
the second stage.  We computed the performance of \emph{adaptive seeding} when
each friend of a seeded user in the core set joins during the second stage
independently with probability $p$, using different values of $p$.  We call $p$
the \emph{Bernouilli} parameter, since the event that a given user joins on the
second stage of adaptive seeding is governed by a Bernouilli variable of
parameter $p$. We see that even with $p=0.01$, \emph{adaptive seeding} still
outperforms \textsf{IM}.  As $p$ increases, the performance of \emph{adaptive
seeding} quickly increases and reaches $80\%$ of the values of
Figure~\ref{fig:performance} at $p=0.5$.\newline

\begin{figure}[t]
    \begin{subfigure}[t]{0.23\textwidth}
    \includegraphics[scale=0.48]{images/prob.pdf}
    \vspace{-10pt}
    \caption{}
    \label{fig:prob}
\end{subfigure}
\hspace{1pt}
\begin{subfigure}[t]{0.23\textwidth}
    \includegraphics[scale=0.48]{images/hbo_likes.pdf}
    \caption{}
    \label{fig:killer}
     \end{subfigure}
 \vspace{-5pt}
 \caption{\small{(a) Performance of adaptive seeding for various propagation
     probabilities. (b) Performance of \emph{adaptive seeding} when restricted
 to the subgraph of users who \emph{liked} HBO (red line).}}
 \vspace{-20pt}
\end{figure}

\noindent\textbf{Coarse estimation of probabilities.}  
In practice, the probability a user may be interested in promoting a campaign
her friend is promoting may vary. However, for those who have already expressed
interest in the promoted content, we can expect this probability to be close to
one.  We therefore conducted the following experiment.  We chose a page (HBO)
and trimmed the social graph we collected by only keeping on the second stage
users who indicated this page (HBO) in their list of interests.  This is
a coarse estimation of the probabilities as it assumes that if a friend follows
HBO she will be willing to promote with probability 1 (given a reward), and
otherwise the probability of her promoting anything for HBO is 0.
Figure~\ref{fig:killer} shows that even on this very restricted set of users,
\emph{adaptive seeding} still outperforms \textsf{IM} and reaches approximately
$50\%$ of the unrestricted adaptive seeding.\newline

\noindent\textbf{Impact of the probability distribution.} In order to test
scenarios where users have a rich spectrum of probabilities of realizing on the
second stage. We consider a setting where the Bernouilli parameter $p$ is drawn
from a distribution. We considered four different distributions; for each
distribution for fixed values of the budget and the parameter $p$, we tuned the
parameters of the distribution so that its mean is exactly $p$. We then plotted
the performance as a function of the budget and mean $p$. 

For the Beta distribution, we fixed $\beta=5$ and tuned the $\alpha$ parameter
to obtain a mean of $p$, thus obtaining a unimodal distribution. For the normal
distribution, we chose a standard deviation of $0.01$ to obtain a distribution
more concentrated around its mean than the Beta distribution. Finally, for the
inverse degree distribution, we took the probability of a node joining on
the second stage to be proportional to the inverse of its degree (scaled so that on
average, nodes join with probability $p$).  The results are shown in
Figure~\ref{fig:bernouilli}.

We observe that the results are comparable to the one we obtained in the
uniform case in Figure~\ref{fig:prob} except in the case of the inverse degree
distribution for which the performance is roughly halved. Remember that the
value of a user $v$ on the second stage of adaptive seeding is given by $p_v
d_v$ where $d_v$ is its degree and $p_v$ is the its probability of realizing on
the second stage.  Choosing $p_v$ to be proportional to ${1}/{d_v}$ has the
effect of normalizing the nodes on the second stage and is a strong
perturbation of the original degree distribution of the nodes available on the
second stage.

\begin{figure*}[t!]
\centering
\begin{subfigure}[b]{0.25\textwidth}
\includegraphics[width=\textwidth]{images/beta.pdf}
\caption{Beta distribution}
\end{subfigure}
\begin{subfigure}[b]{0.25\textwidth}
\includegraphics[width=\textwidth]{images/gauss.pdf}
\caption{Normal Distribution}
\end{subfigure}
\begin{subfigure}[b]{0.24\textwidth}
\includegraphics[width=\textwidth]{images/power.pdf}
\caption{Power-law distribution}
\end{subfigure}
\begin{subfigure}[b]{0.24\textwidth}
\includegraphics[width=\textwidth]{images/deg.pdf}
\caption{Inverse degree}
\end{subfigure}
\caption{Performance of adaptive seeding as a function of the budget and the
mean of the distribution from which the Bernouilli parameters are drawn. The
details of the parameters for each distribution can be found in
Section~\ref{sec:robustness}.}.
\label{fig:bernouilli}
\vspace{-10pt}
\end{figure*}

\subsection{Impact of the Influence Model}
\label{sec:inf}

The Facebook Pages data set we collected is limited in that we only have access
to the 2-hop neighborhood around the seed users and we use the degree of the
second stage users as a proxy for their influence. As proved in
\cite{even-dar}, in the voter model, the influence of nodes converges to their
degree with high probability when the number of time steps become polynomially
large in the network size.

\begin{figure}
    \centering
    \includegraphics[width=0.48\textwidth]{images/voter.pdf}
    \vspace{-20pt}
    \caption{Performance of adaptive seeding compared to \textsf{IM} for the voter
    influence model with $t$ steps.}
    \vspace{-10pt}
    \label{fig:voter}
\end{figure}

In order to analyze the expected number of nodes influenced according to the
voter model that terminates after some fixed number of time steps, we use
publicly available data sets from \cite{snapnets} where the entire network is
at our disposal.  As discussed above, we sample nodes uniformly at random to
model the core set. We then run the voter model for $t$ time steps
to compute the influence of the second stage users. Figure~\ref{fig:voter}
shows the performance of adaptive seeding as a function of $t$ compared to the
performance of the \textsf{IM} benchmark. In this experiment, the budget was
set to half the size of the core set.

We see that the performance of adaptive seeding quickly converges (5 time steps
for \textsf{Slashdot}, 15 time steps for \textsf{Epinions}). In practice, the
voter model converges much faster than the theoretical guarantee of
\cite{even-dar}, which justifies using the degree of the second stage users as
measure of influence as we did for the Facebook Pages data sets.
Furthermore, we see that similarly to the Facebook data sets, adaptive seeding
significantly outperforms \textsf{IM}.


\subsection{Performance on Synthetic Networks} 

In order to analyze the impact of topological variations we generated synthetic
graphs using standard network models. All the generated graphs have $100,000$
vertices, for each model, we tuned the generative parameters to obtain when
possible a degree distribution (or graph density otherwise) similar to what we
observed in the Facebook Pages data sets.

\begin{itemize}
    \item \emph{Barabási-Albert:} this well-known model is often used to model
        social graphs because its degree distribution is a power law. We took
        10 initial vertices and added 10 vertices at each step, using the
        preferential attachment model, until we reached 100,000 vertices.
    \item \emph{Small-World:} also known as the Watts-Strogatz model. This
        model was one of the first models proposed for social networks. Its
        diameter and clustering coefficient are more representative of a social
        network than what one would get with the Erdős–Rényi model. We started
        from a regular lattice of degree 200 and rewired each edge with
        probability 0.3.
    \item \emph{Kronecker:} Kronecker graphs were more recently introduced in
        \cite{kronecker} as a scalable and easy-to-fit model for social
        networks. We started from a star graph with 4 vertices and computed
        Kronecker products until we reached 100,000 nodes.
    \item \emph{Configuration model:} The configuration model allows us to
        construct a graph with a given degree distribution. We chose a page
        (GAP) and generated a graph with the same degree distribution using the
        configuration model.
\end{itemize}
The performance of adaptive seeding compared to our benchmarks can be found in
Figure~\ref{fig:synth}. We note that the improvement obtained by adaptive
seeding is comparable to the one we had on real data except for the
\emph{Small-World} model. This is explained by the nature of the model:
starting from a regular lattice, some edges are re-wired at random. This model
has similar properties to a random graph where the friendship paradox does not
hold~\cite{LS13}. Since adaptive seeding is designed to leverage the friendship
paradox, such graphs are not amenable to this approach.

\begin{figure}[t]
    \centering
    \includegraphics[width=0.48\textwidth]{images/perf_synth.pdf}
    \vspace{-15pt}
    \caption{Performance of adaptive seeding on synthetic networks.}
    \label{fig:synth}
    \vspace{-10pt}
\end{figure}

\subsection{Scalability}\label{sec:scalability}
To test the scalability of adaptive seeding we were guided by two central
questions.  First, we were interested to witness the benefit our non-sampling
approach has over the standard SAA method. Secondly, we wanted to understand
when one should prefer to use the LP-based approach from Section~\ref{sec:lp}
over the combinatorial one from Section~\ref{sec:comb}. The computations in
this section were run on Intel Core i5 CPU 4x2.40Ghz. For each computation, we
plot the time and number of CPU cycles it took.\newline

\noindent\textbf{Comparison with SAA.} The objective function of the
non-adaptive problem \eqref{eq:relaxed} is an expectation over exponentially
many sets, all possible realizations of the neighbors in the second stage.
Following the sampling-based approach introduced in \cite{singer}, this
expectation can be computed by averaging the values obtained in
$O\left(n^2\right)$ independent sample realizations of the second stage users
($n$ is the number of neighbors of core set users).  One important aspect of
the algorithms introduced in this paper is that in the additive case, this
expectation can be computed exactly without sampling, thus significantly
improving the theoretical complexity.

In Figure~\ref{fig:sampling}, we compare the running time of our combinatorial
algorithm to the same algorithm where the expectation is computed via sampling.
We note that this sampling-based algorithm is still simpler than the algorithm
introduced in \cite{singer} for general influence models.  However, we observe
a significant gap between its running time and the one of the combinatorial
algorithm. Since each sample takes linear time to compute, this gap is in fact
$O(n^3)$, quickly leading to impracticable running times as the size of the
graph increases. This highlights the importance of the \emph{sans-sampling}
approach underlying the algorithms we introduced.\newline

\begin{figure}[t]
    \centerline{ \includegraphics[width=0.48\textwidth]{images/sampling.pdf} }
    \vspace{-10pt}
    \caption{Running time and number of CPU cycles used by the sampling-based
    algorithm and the combinatorial adaptive seeding algorithm for different
sizes of the core set.}
    \label{fig:sampling}
    \vspace{-15pt}
\end{figure}

\noindent\textbf{Combinatorial vs. LP algorithm.}
We now compare the running time of the LP-based approach and the combinatorial
approach for different instance sizes.

Figure~\ref{fig:time} shows the running time and number of CPU cycles used by
the LP algorithm and the combinatorial algorithm as a function of the network
size $n$. The varying size of the network was obtained by randomly sampling
a varying fraction of core set users and then trimming the social graph by only
keeping friends of this random sample on the second stage.  The LP solver used
was CLP~\cite{clp}.

\begin{figure}[t]
    \centerline{ \includegraphics[scale=0.9]{images/time.pdf} }
    \vspace{-10pt}
    \caption{Running time and number of CPU cycles of the combinatorial algorithm and the LP algorithm as a function of the number of nodes $n$. First row with budget $k=100$, second row with budget $k=500$.}
    \label{fig:time}
    \vspace{-10pt}
\end{figure}

We observe that for a \emph{small} value of the budget $k$ (first row
of Figure~\ref{fig:time}), the combinatorial algorithm outperforms the LP
algorithm. When $k$ becomes \emph{large} (second row of
Figure~\ref{fig:time}), the LP algorithm becomes faster. This can be explained
by the $k^2$ factor in the running time of the combinatorial
algorithm (see Proposition~\ref{prop:running_time}). Even though the asymptotic
guarantee of the combinatorial algorithm should theoretically
outperform the LP-based approach for large $n$, we were not able to observe it
for our instance sizes. In practice, one can choose which of the two algorithms
to apply depending on the relative sizes of $k$ and $n$.\newline


%\begin{figure}
%    \centerline{ \includegraphics[width=0.47\textwidth]{images/sampling.pdf} }
%    \vspace{-10pt}
%    \caption{}
%    \label{fig:sampling}
%\end{figure}

%\noindent \textbf{Others}
%\begin{itemize}
%\item One plot with Twitter data on the same verticals/ same posts.
%\item Run pipeline on available data sets, with random samples;
%\item It's true that adaptive seeding is looking at more people, but so what?  If you waited a day, it would still be better: take a vertical, split it at random to two days.  Perform adaptive seeding on one day, and compare to influence max. on two days.
%\item compare to other influence maximization algorithms?
%\item test on several public datasets
%\item LP vs combinatorial algorithm vs sampling running time
%\end{itemize}