diff options
Diffstat (limited to 'finale/sections/bayesian.tex')
| -rw-r--r-- | finale/sections/bayesian.tex | 47 |
1 files changed, 24 insertions, 23 deletions
diff --git a/finale/sections/bayesian.tex b/finale/sections/bayesian.tex index dda5c81..5d95ddb 100644 --- a/finale/sections/bayesian.tex +++ b/finale/sections/bayesian.tex @@ -9,7 +9,6 @@ $\sigma_{ij}$. The source distribution, parameterized by $\phi$, is considered fixed here.} \end{figure} -\subsection{Advantages of the Bayesian Framework} In this section, we develop a Bayesian approach to the Network Inference Problem by placing priors on the edge weights of the graph. The quantity of interest is the posterior distribution, given through Bayes' rule by: @@ -20,41 +19,43 @@ the posterior distribution, given through Bayes' rule by: where $\mathcal{L}_\Theta(\bx)$ is the likelihood expressed in Eq.~\ref{eq:dist}. +\subsection{Advantages for Graph inference} + One advantage of the Bayesian approach is its ability to convey distributional -information about our belief for each parameter rather than the pointwise -estimates accessible by MLE.~For example, exploring the entropy -of the posterior on each parameter allows us to quantify how uncertain we are of -each edge parameters' value. In the next section, we will explore how to -exploit this knowledge to improve the rate at which we decrease our uncertainty -by focusing on the most relevant parts of the network. +information about our belief of each parameter rather than the pointwise +estimates accessible by MLE.~For example, exploring the entropy of the +posterior on each parameter allows us to quantify the uncertainty on edge +weights. In the next section, we will exploit this information to improve the +rate at which we decrease the uncertainty (and hence learn the network) by +focusing on the most relevant parts of the network. Another advantage of the Bayesian approach is the ability to encode domain-knowledge through well-chosen prior distributions. For example, there is an extensive literature~\cite{} on parametric representations of social -networks, which attempt to reproduce certain properties of such networks: -density of triangles, diameter, degree distribution, clustering coefficient etc. -Accounting for known graph properties, such as reciprocal links or the high -density of triangles has the potential to greatly increase the information we -leverage from each cascade. Of course, such priors no longer allow us to +networks, which attempt to reproduce observed properties of such networks: +density of triangles, diameter, degree distribution, clustering coefficient +etc. Accounting for known graph properties, such as reciprocal links or the +high density of triangles has the potential to greatly increase the information +we extract from each cascade. Of course, such priors no longer allow us to perform inference in parallel, which was leveraged in prior work. \subsection{Inference} Depending on the link function $f$, the GLC model may not possess conjugate -priors (e.g.~the IC model). Even if conjugate priors exist, they may be -restricted to product form. In these cases, we resort to the use of sampling -algorithms (MCMC) and approximate Bayesian methods (variational inference), -which we cover here. +priors (this is for example the case in the IC model). Even if conjugate priors +exist, they may be restricted to product form. In these cases, we resort to the +use of sampling algorithms (MCMC) and approximate Bayesian methods (variational +inference), which we cover here. -\paragraph{MCMC} +\paragraph{MCMC.} The Metropolis-Hastings (MCMC) algorithm allows us to draw samples from the -posterior directly using the un-normalized posterior distribution. The advantage -of this method is the ability to sample from the exact posterior and the wide -availability of software packages which will work `out-of-the-box'. However, -vanilla MCMC scales badly and is unsuitable for Bayesian learning of large -networks ($\geq 100$ nodes). +posterior directly using the un-normalized posterior distribution. The +advantage of this method is the ability to sample from the exact posterior and +the wide availability of software packages which will work `out-of-the-box'. +However, as we show in our experiments, vanilla MCMC scales badly and is +unsuitable for Bayesian learning of large networks ($\geq 100$ nodes). -\paragraph{Variational Inference} +\paragraph{Variational Inference.} Variational inference algorithms consist in fitting an approximate family of distributions to the exact posterior. The variational objective can be |
