diff options
Diffstat (limited to 'paper.tex')
| -rw-r--r-- | paper.tex | 780 |
1 files changed, 6 insertions, 774 deletions
@@ -3,789 +3,21 @@ \usepackage{amsmath,amsfonts} \usepackage{algorithm} \usepackage{algpseudocode} -\newtheorem{lemma}{Lemma} -\newtheorem{fact}{Fact} -\newtheorem{example}{Example} -\newtheorem{prop}{Proposition} -\newtheorem{theorem}{Theorem} -\newcommand*{\defeq}{\stackrel{\text{def}}{=}} -\newcommand{\var}{\mathop{\mathrm{Var}}} -\newcommand{\condexp}[2]{\mathop{\mathbb{E}}\left[#1|#2\right]} -\newcommand{\expt}[1]{\mathop{\mathbb{E}}\left[#1\right]} -\newcommand{\norm}[1]{\lVert#1\rVert} -\newcommand{\tr}[1]{#1^*} -\newcommand{\ip}[2]{\langle #1, #2 \rangle} -\newcommand{\mse}{\mathop{\mathrm{MSE}}} -\DeclareMathOperator{\trace}{tr} -\DeclareMathOperator*{\argmax}{arg\,max} +\input{definitions} \title{Budgeted Auctions for Experiment Design} \begin{document} \maketitle \section{Intro} - -\begin{itemize} - \item already existing field of experiment design: survey-like setup, what - are the best points to include in your experiment? Measure of the - usefulness of the data: variance-reduction or entropy-reduction. - \item nowadays, there is also a big focus on purchasing data: paid surveys, - mechanical turk, etc. that add economic aspects to the problem of - experiment design - \item recent advances (Singer, Chen) in the field of budgeted mechanisms - \item we study ridge regression, very widely used in statistical learning, - and treat it as a problem of budgeted experiment design - \item we make the following contributions: ... - \item extension to a more general setup which includes a wider class of - machine learning problems -\end{itemize} +\input{intro} \section{Problem formulation} - -\subsection{Notations} - -Throughout the paper, we will make use of the following notations: if $x$ is -a (column) vector in $\mathbf{R}^d$, $x^*$ denotes its transposed (line) -vector. Thus, the standard inner product between two vectors $x$ and $y$ is -simply $x^* y$. $\norm{x}_2 = x^*x$ will denote the $L_2$ norm of $x$. - -We will also often use the following order over symmetric matrices: if $A$ and -$B$ are two $d\times d$ and $B$ are two $d\times d$ real symmetric matrices, we -write that $A\leq B$ iff: -\begin{displaymath} - \forall x\in\mathbf{R}^d,\quad - x^*Ax \leq x^*Bx -\end{displaymath} -That is, iff $B-A$ is symmetric semi-definite positive. - -This order let us define the notion of a \emph{decreasing} or \emph{convex} -matrix function similarly to their real counterparts. In particular, let us -recall that the matrix inversion is decreasing and convex over symmetric -definite positive matrices. - -\subsection{Data model} - -There is a set of $n$ users, $\mathcal{N} = \{1,\ldots, n\}$. Each user -$i\in\mathcal{N}$ has a public vector of features $x_i\in\mathbf{R}^d$ and an -undisclosed piece of information $y_i\in\mathbf{R}$. We assume that the data -has already been normalized so that $\norm{x_i}_2\leq 1$ for all -$i\in\mathcal{N}$. - -The experimenter is going to select a set of users and ask them to reveal their -private piece of information. We are interested in a \emph{survey setup}: the -experimenter has not seen the data yet, but he wants to know which users he -should be selecting. His goal is to learn the model underlying the data. Here, -we assume a linear model: -\begin{displaymath} - \forall i\in\mathcal{N},\quad y_i = \beta^* x_i + \varepsilon_i -\end{displaymath} -where $\beta\in\mathbf{R}^d$ and $\varepsilon_i\in\mathbf{R}$ follows a normal -distribution of mean $0$ and variance $\sigma^2$. Furthermore, we assume the -error $\varepsilon$ to be independent of the user: -$(\varepsilon_i)_{i\in\mathcal{N}}$ are mutually independent. - -After observing the data, the experimenter could simply do linear regression to -learn the model parameter $\beta$. However, in a more general setup, the -experimenter has a prior knowledge about $\beta$, a distribution over -$\mathbf{R}^d$. After observing the data, the experimenter performs -\emph{maximum a posteriori estimation}: computing the point which maximizes the -posterior distribution of $\beta$ given the observations. - -Here, we will assume, as it is often done, that the prior distribution is -a multivariate normal distribution of mean zero and covariance matrix $\kappa -I_d$. Maximum a posteriori estimation leads to the following maximization -problem: -\begin{displaymath} - \beta_{\text{max}} = \argmax_{\beta\in\mathbf{R}^d} \sum_i (y_i - \beta^*x_i)^2 - + \frac{1}{\mu}\sum_i \norm{\beta}_2^2 -\end{displaymath} -which is the well-known \emph{ridge regression}. $\mu -= \frac{\kappa}{\sigma^2}$ is the regularization parameter. Ridge regression -can thus be seen as linear regression with a regularization term which -prevents $\beta$ from having a large $L_2$-norm. - -\subsection{Value of data} - -Because the user private variables $y_i$ have not been observed yet when the -experimenter has to decide which users to include in his experiment, we treat -$\beta$ as a random variable whose distribution is updated after observing the -data. - -Let us recall that if $\beta$ is random variable over $\mathbf{R}^d$ whose -probability distribution has a density function $f$ with respect to the -Lebesgue measure, its entropy is given by: -\begin{displaymath} - \mathbb{H}(\beta) \defeq - \int_{b\in\mathbf{R}^d} \log f(b) f(b)\text{d}b -\end{displaymath} - -A usual way to measure the decrease of uncertainty induced by the observation -of data is to use the entropy. This leads to the following definition of the -value of data called the \emph{value of information}: -\begin{displaymath} - \forall S\subset\mathcal{N},\quad V(S) = \mathbb{H}(\beta) - - \mathbb{H}(\beta\,|\, - Y_S) -\end{displaymath} -where $Y_S = \{y_i,\,i\in S\}$ is the set of observed data. - -\begin{theorem} - Under the ridge regression model explained in section TODO, the value of data - is equal to: - \begin{align*} - \forall S\subset\mathcal{N},\; V(S) - & = \frac{1}{2}\log\det\left(I_d - + \mu\sum_{i\in S} x_ix_i^*\right)\\ - & \defeq \frac{1}{2}\log\det A(S) - \end{align*} -\end{theorem} - -\begin{proof} - -Let us denote by $X_S$ the matrix whose rows are the vectors $(x_i^*)_{i\in -S}$. Observe that $A_S$ can simply be written as: -\begin{displaymath} - A_S = I_d + \mu X_S^* X_S -\end{displaymath} - -Let us recall that the entropy of a multivariate normal variable $B$ over -$\mathbf{R}^d$ of covariance $\Sigma I_d$ is given by: -\begin{equation}\label{eq:multivariate-entropy} - \mathbb{H}(B) = \frac{1}{2}\log\big((2\pi e)^d \det \Sigma I_d\big) -\end{equation} - -Using the chain rule for conditional entropy, we get that: -\begin{displaymath} - V(S) = \mathbb{H}(Y_S) - \mathbb{H}(Y_S\,|\,\beta) -\end{displaymath} - -Conditioned on $\beta$, $(Y_S)$ follows a multivariate normal -distribution of mean $X\beta$ and of covariance matrix $\sigma^2 I_n$. Hence: -\begin{equation}\label{eq:h1} - \mathbb{H}(Y_S\,|\,\beta) - = \frac{1}{2}\log\left((2\pi e)^n \det(\sigma^2I_n)\right) -\end{equation} - -$(Y_S)$ also follows a multivariate normal distribution of mean zero. Let us -compute its covariance matrix, $\Sigma_Y$: -\begin{align*} - \Sigma_Y & = \expt{YY^*} = \expt{(X_S\beta + E)(X_S\beta + E)^*}\\ - & = \kappa X_S X_S^* + \sigma^2I_n -\end{align*} -Thus, we get that: -\begin{equation}\label{eq:h2} - \mathbb{H}(Y_S) - = \frac{1}{2}\log\left((2\pi e)^n \det(\kappa X_S X_S^* + \sigma^2 I_n)\right) -\end{equation} - -Combining \eqref{eq:h1} and \eqref{eq:h2} we get: -\begin{displaymath} - V(S) = \frac{1}{2}\log\det\left(I_n+\frac{\kappa}{\sigma^2}X_S - X_S^*\right) -\end{displaymath} - -Finally, we can use Sylvester's determinant theorem to get the result. -\end{proof} - -It is also interesting to look at the marginal contribution of a user to a set: the -increase of value induced by adding a user to an already existing set of users. -We have the following lemma. - -\begin{lemma}[Marginal contribution] - \begin{displaymath} - \Delta_i V(S)\defeq V(S\cup\{i\}) - V(S) - = \frac{1}{2}\log\left(1 + \mu x_i^*A(S)^{-1}x_i\right) - \end{displaymath} -\end{lemma} - -\begin{proof} - We have: - \begin{align*} - V(S\cup\{i\}) & = \frac{1}{2}\log\det A(S\cup\{i\})\\ - & = \frac{1}{2}\log\det\left(A(S) + \mu x_i x_i^*\right)\\ - & = V(S) + \frac{1}{2}\log\det\left(I_d + \mu A(S)^{-1}x_i - x_i^*\right)\\ - & = V(S) + \frac{1}{2}\log\left(1 + \mu x_i^* A(S)^{-1}x_i\right) - \end{align*} - where the last equality comes from Sylvester's determinant formula. -\end{proof} - -Because $A(S)$ is symmetric definite positive, the marginal contribution is -positive, which proves that the value function is set increasing. Furthermore, -it is easy to see that if $S\subset S'$, then $A(S)\leq A(S')$. Using the fact -that matrix inversion is decreasing, we see that the marginal contribution of -a fixed user is a set decreasing function. This is the \emph{submodularity} of -the value function. - -TODO? Explain what are the points which are the most valuable : points which -are aligned along directions where the spread over the already existing points -is small. - -\subsection{Auction} - -TODO Explain the optimization problem, why it has to be formulated as an auction -problem. Explain the goals: -\begin{itemize} - \item truthful - \item individually rational - \item budget feasible - \item has a good approximation ratio - -TODO Explain what is already known: it is ok when the function is submodular. When -should we introduce the notion of submodularity? -\end{itemize} - +\input{problem} \section{Main result} - -All the budget feasible mechanisms studied recently (TODO ref Singer Chen) -rely on using a greedy heuristic extending the idea of the greedy heuristic for -the knapsack problem. In knapsack, objects are selected based on their -\emph{value-per-cost} ratio. The quantity which plays a similar role for -general submodular functions is the \emph{marginal-contribution-per-cost} -ratio: let us assume that you have already selected a set of points $S$, then -the \emph{marginal-contribution-per-cost} ratio per cost of a new point $i$ is -defined by: -\begin{displaymath} - \frac{V(S\cup\{i\}) - V(S)}{c_i} -\end{displaymath} - -The greedy heuristic then simply repeatedly selects the point whose -marginal-contribution-per-cost ratio is the highest until it reaches the budget -limit. Mechanism considerations aside, this is known to have an unbounded -approximation ratio. However, lemma TODO ref in Chen or Singer shows that the -maximum between the greedy heuristic and the point with maximum value (as -a singleton set) provides a $\frac{5e}{e-1}$ approximation ratio. - -Unfortunately, TODO Singer 2011 points out that taking the maximum between the -greedy heuristic and the most valuable point is not incentive compatible. -Singer and Chen tackle this issue similarly: instead of comparing the most -valuable point to the greedy solution, they compare it to a solution which is -close enough to keep a constant approximation ratio: -\begin{itemize} - \item Chen suggests using $OPT(V,\mathcal{N}\setminus\{i\}, B)$. - Unfortunately, in the general case, this cannot be computed exactly in - polynomial time. - \item Singer uses using the optimal value of a relaxed objective function - which can be proven to be close to the optimal of the original - objective function. The function used is tailored to the specific - problem of coverage. -\end{itemize} - -Here, we use a relaxation of the objective function which is tailored to the -problem of ridge regression. We define: -\begin{displaymath} - \forall\lambda\in[0,1]^{|\mathcal{N}|}\,\quad L_{\mathcal{N}}(\lambda) \defeq - \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}} \lambda_i x_i x_i^*\right) -\end{displaymath} - -We can now present the mechanism we use, which has a same flavor to Chen's and -Singer's - -\begin{algorithm}\label{mechanism} - \caption{Mechanism for ridge regression} - \begin{algorithmic}[1] - \State $i^* \gets \argmax_{j\in\mathcal{N}}V(j)$ - \State $x^* \gets \argmax_{x\in[0,1]^n} \{L_{\mathcal{N}\setminus\{i^*\}}(x) - \,|\, c(x)\leq B\}$ - \Statex - \If{$L(x^*) < CV(i^*)$} - \State \textbf{return} $\{i^*\}$ - \Else - \State $i \gets \argmax_{1\leq j\leq n}\frac{V(j)}{c_j}$ - \State $S \gets \emptyset$ - \While{$c_i\leq \frac{B}{2}\frac{V(S\cup\{i\})-V(S)}{V(S\cup\{i\})}$} - \State $S \gets S\cup\{i\}$ - \State $i \gets \argmax_{j\in\mathcal{N}\setminus S} - \frac{V(S\cup\{j\})-V(S)}{c_j}$ - \EndWhile - \State \textbf{return} $S$ - \EndIf - \end{algorithmic} -\end{algorithm} - -Notice, that the stopping condition in the while loop is more sophisticated -than just ensuring that the sum of the costs does not exceed the budget. This -is because the selected users will be payed more than their costs, and this -stopping condition ensures budget feasibility when the users are paid their -threshold payment. - -We can now state the main result of this section: -\begin{theorem} - The mechanism in \ref{mechanism} is truthful, individually rational, budget - feasible. Furthermore, choosing: - \begin{multline*} - C = C^* = \frac{5e-1 + C_\mu(2e+1)}{2C_\mu(e-1)}\\ - + \frac{\sqrt{C_\mu^2(1+2e)^2 - + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)} - \end{multline*} - we get an approximation ratio of: - \begin{multline*} - 1 + C^* = \frac{5e-1 + C_\mu(4e-1)}{2C_\mu(e-1)}\\ - + \frac{\sqrt{C_\mu^2(1+2e)^2 - + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)} - \end{multline*} - where: - \begin{displaymath} - C_\mu = \frac{\log(1+\mu)}{2\mu} - \end{displaymath} -\end{theorem} - -The proof will consist of the claims of the theorem broken down into lemmas. - -Because the user strategy is parametrized by a single parameter, truthfulness -is equivalent to monotonicity (TODO ref). The proof here is the same as in TODO -and is given for the sake of completeness. - -\begin{lemma} -The mechanism is monotone. -\end{lemma} - -\begin{proof} - We assume by contradiction that there exists a user $i$ that has been - selected by the mechanism and that would not be selected had he reported - a cost $c_i'\leq c_i$ (all the other costs staying the same). - - If $i\neq i^*$ and $i$ has been selected, then we are in the case where - $L(x^*) \geq C V(i^*)$ and $i$ was included in the result set by the greedy - part of the mechanism. By reporting a cost $c_i'\leq c_i$, using the - submodularity of $V$, we see that $i$ will satisfy the greedy selection - rule: - \begin{displaymath} - i = \argmax_{j\in\mathcal{N}\setminus S} \frac{V(S\cup\{j\}) - - V(S)}{c_j} - \end{displaymath} - in an earlier iteration of the greedy heuristic. Let us denote by $S_i$ - (resp. $S_i'$) the set to which $i$ is added when reporting cost $c_i$ - (resp. $c_i'$). We have $S_i'\subset S_i$. Moreover: - \begin{align*} - c_i' & \leq c_i \leq - \frac{B}{2}\frac{V(S_i\cup\{i\})-V(S_i)}{V(S_i\cup\{i\})}\\ - & \leq \frac{B}{2}\frac{V(S_i'\cup\{i\})-V(S_i')}{V(S_i'\cup\{i\})} - \end{align*} - Hence $i$ will still be included in the result set. - - If $i = i^*$, $i$ is included iff $L(x^*) \leq C V(i^*)$. Reporting $c_i'$ - instead of $c_i$ does not change the value $V(i^*)$ nor $L(x^*)$ (which is - computed over $\mathcal{N}\setminus\{i^*\}$). Thus $i$ is still included by - reporting a different cost. -\end{proof} - -\begin{lemma} -The mechanism is budget feasible. -\end{lemma} - -The proof is the same as in Chen and is given here for the sake of -completeness. -\begin{proof} - -\end{proof} - -Using the characterization of the threshold payments from Singer TODO, we can -prove individual rationality similarly to Chen TODO. -\begin{lemma} -The mechanism is individually rational -\end{lemma} - -\begin{proof} - -\end{proof} - -The following lemma requires to have a careful look at the relaxation function -we chose in the mechanism. The next section will be dedicated to studying this -relaxation and will contain the proof of the following lemma: -\begin{lemma} - We have: - \begin{displaymath} - OPT(L_\mathcal{N}, B) \leq \frac{1}{C_\mu}\big(2 OPT(V,\mathcal{N},B) - + \max_{i\in\mathcal{N}}V(i)\big) - \end{displaymath} -\end{lemma} - -\begin{lemma} - Let us denote by $S_M$ the set returned by the mechanism. Let us also - write: - \begin{displaymath} - C_{\textrm{max}} = \max\left(1+C,\frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot - C_\mu(e-1) -5e +1}\right)\right) - \end{displaymath} - - Then: - \begin{displaymath} - OPT(V, \mathcal{N}, B) \leq - C_\text{max}\cdot V(S_M) - \end{displaymath} -\end{lemma} - -\begin{proof} - - If the condition on line 3 of the algorithm holds, then: - \begin{displaymath} - V(i^*) \geq \frac{1}{C}L(x^*) \geq - \frac{1}{C}OPT(V,\mathcal{N}\setminus\{i\}, B) - \end{displaymath} - - But: - \begin{displaymath} - OPT(V,\mathcal{N},B) \leq OPT(V,\mathcal{N}\setminus\{i\}, B) + V(i^*) - \end{displaymath} - - Hence: - \begin{displaymath} - V(i^*) \geq \frac{1}{C+1} OPT(V,\mathcal{N}, B) - \end{displaymath} - - If the condition of the algorithm does not hold: - \begin{align*} - V(i^*) & \leq \frac{1}{C}L(x^*) \leq \frac{1}{C\cdot C_\mu} - \big(2 OPT(V,\mathcal{N}, B) + V(i^*)\big)\\ - & \leq \frac{1}{C\cdot C_\mu}\left(\frac{2e}{e-1}\big(3 V(S_M) - + 2 V(i^*)\big) - + V(i^*)\right) - \end{align*} - - Thus: - \begin{align*} - V(i^*) \leq \frac{6e}{C\cdot C_\mu(e-1)- 5e + 1} V(S_M) - \end{align*} - - Finally, using again that: - \begin{displaymath} - OPT(V,\mathcal{N},B) \leq \frac{e}{e-1}\big(3 V(S_M) + 2 V(i^*)\big) - \end{displaymath} - - We get: - \begin{displaymath} - OPT(V, \mathcal{N}, B) \leq \frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot - C_\mu(e-1) -5e +1}\right) V(S_M) - \end{displaymath} -\end{proof} - -The optimal value for $C$ is: -\begin{displaymath} - C^* = \arg\min C_{\textrm{max}} -\end{displaymath} - -This equation has two solutions. Only one of those is such that: -\begin{displaymath} - C\cdot C_\mu(e-1) -5e +1 \geq 0 -\end{displaymath} -which is needed in the proof of the previous lemma. Computing this solution, -we can state the main result of this section. - -\subsection{Relaxations of the value function} - -To prove lemma TODO, we will use a general method called pipage rounding, -introduced in TODO. Two consecutive relaxations are used: the one that we are -interested in, whose optimization can be computed efficiently, and the -multilinear extension which presents a \emph{cross-convexity} like behavior -which allows for rounding of fractional solution without decreasing the value -of the objective function and thus ensures a constant approximation of the -value function. The difficulty resides in showing that the ratio of the two -relaxations is bounded. - -We say that $R_\mathcal{N}:[0,1]^n\rightarrow\mathbf{R}$ is a relaxation of the -value function $V$ over $\mathcal{N}$ if it coincides with $V$ at binary -points. Formally, for any $S\subset\mathcal{N}$, let $\mathbf{1}_S$ denote the -indicator vector of $S$. $R_\mathcal{N}$ is a relaxation of $V$ over -$\mathcal{N}$ iff: -\begin{displaymath} - \forall S\subset\mathcal{N},\; R_\mathcal{N}(\mathbf{1}_S) = V(S) -\end{displaymath} - -We can extend the optimisation problem defined above to a relaxation by -extending the cost function: -\begin{displaymath} - \forall \lambda\in[0,1]^n,\; c(\lambda) - = \sum_{i\in\mathcal{N}}\lambda_ic_i -\end{displaymath} -The optimisation problem becomes: -\begin{displaymath} - OPT(R_\mathcal{N}, B) = - \max_{\lambda\in[0,1]^n}\left\{R_\mathcal{N}(\lambda)\,|\, c(\lambda)\leq B\right\} -\end{displaymath} -The relaxations we will consider here rely on defining a probability -distribution over subsets of $\mathcal{N}$. - -Let $\lambda\in[0,1]^n$, let us define: -\begin{displaymath} - P_\mathcal{N}^\lambda(S) = \prod_{i\in S}\lambda_i - \prod_{i\in\mathcal{N}\setminus S}(1-\lambda_i) -\end{displaymath} -$P_\mathcal{N}^\lambda(S)$ is the probability of picking the set $S$ if we select -a subset of $\mathcal{N}$ at random by deciding independently for each point to -include it in the set with probability $\lambda_i$ (and to exclude it with -probability $1-\lambda_i$). - -We will consider two relaxations of the value function $V$ over $\mathcal{N}$: -\begin{itemize} - \item the \emph{multi-linear extension} of $V$: - \begin{align*} - F_\mathcal{N}(\lambda) - & = \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[\log\det A(S)\big]\\ - & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) V(S)\\ - & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) \log\det A(S)\\ - \end{align*} - \item the \emph{concave relaxation} of $V$: - \begin{align*} - L_{\mathcal{N}}(\lambda) - & = \log\det \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[A(S)\big]\\ - & = \log\det\left(\sum_{S\subset N} - P_\mathcal{N}^\lambda(S)A(S)\right)\\ - & = \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}} - \lambda_ix_ix_i^*\right)\\ - & \defeq \log\det \tilde{A}(\lambda) - \end{align*} -\end{itemize} - -\begin{lemma} - The \emph{concave relaxation} $L_\mathcal{N}$ is concave\footnote{Hence - this relaxation is well-named!}. -\end{lemma} - -\begin{proof} - This follows from the concavity of the $\log\det$ function over symmetric - positive semi-definite matrices. More precisely, if $A$ and $B$ are two - symmetric positive semi-definite matrices, then: - \begin{multline*} - \forall\alpha\in [0, 1],\; \log\det\big(\alpha A + (1-\alpha) B\big)\\ - \geq \alpha\log\det A + (1-\alpha)\log\det B - \end{multline*} -\end{proof} - -\begin{lemma}[Rounding]\label{lemma:rounding} - For any feasible $\lambda\in[0,1]^n$, there exists a feasible - $\bar{\lambda}\in[0,1]^n$ such that at most one of its component is - fractional, that is, lies in $(0,1)$ and: - \begin{displaymath} - F_{\mathcal{N}}(\lambda)\leq F_{\mathcal{N}}(\bar{\lambda}) - \end{displaymath} -\end{lemma} - -\begin{proof} - We give a rounding procedure which given a feasible $\lambda$ with at least - two fractional components, returns some $\lambda'$ with one less fractional - component, feasible such that: - \begin{displaymath} - F_\mathcal{N}(\lambda) \leq F_\mathcal{N}(\lambda') - \end{displaymath} - Applying this procedure recursively yields the lemma's result. - - Let us consider such a feasible $\lambda$. Let $i$ and $j$ be two - fractional components of $\lambda$ and let us define the following - function: - \begin{displaymath} - F_\lambda(\varepsilon) = F(\lambda_\varepsilon) - \quad\textrm{where} \quad - \lambda_\varepsilon = \lambda + \varepsilon\left(e_i-\frac{c_i}{c_j}e_j\right) - \end{displaymath} - - It is easy to see that if $\lambda$ is feasible, then: - \begin{multline}\label{eq:convex-interval} - \forall\varepsilon\in\Big[\max\Big(-\lambda_i,(\lambda_j-1)\frac{c_j}{c_i}\Big), \min\Big(1-\lambda_i, \lambda_j - \frac{c_j}{c_i}\Big)\Big],\;\\ - \lambda_\varepsilon\;\;\textrm{is feasible} - \end{multline} - - Furthermore, the function $F_\lambda$ is convex, indeed: - \begin{align*} - F_\lambda(\varepsilon) - & = \mathbb{E}_{S'\sim P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[ - (\lambda_i+\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i,j\})\\ - & + (\lambda_i+\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i\})\\ - & + (1-\lambda_i-\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{j\})\\ - & + (1-\lambda_i-\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S')\Big]\\ - \end{align*} - Thus, $F_\lambda$ is a degree 2 polynomial whose dominant coefficient is: - \begin{multline*} - \frac{c_i}{c_j}\mathbb{E}_{S'\sim - P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[ - V(S'\cup\{i\})+V(S'\cup\{i\})\\ - -V(S'\cup\{i,j\})-V(S')\Big] - \end{multline*} - which is positive by submodularity of $V$. Hence, the maximum of - $F_\lambda$ over the interval given in \eqref{eq:convex-interval} is - attained at one of its limit, at which either the $i$-th or $j$-th component of - $\lambda_\varepsilon$ becomes integral. -\end{proof} - -\begin{lemma}\label{lemma:relaxation-ratio} - The following inequality holds: - \begin{displaymath} - \forall\lambda\in[0,1]^n,\; - \frac{\log\big(1+\mu\big)}{2\mu} - \,L_\mathcal{N}(\lambda)\leq - F_\mathcal{N}(\lambda)\leq L_{\mathcal{N}}(\lambda) - \end{displaymath} -\end{lemma} - -\begin{proof} - - We will prove that: - \begin{displaymath} - \frac{\log\big(1+\mu\big)}{2\mu} - \end{displaymath} - is a lower bound of the ratio $\partial_i F_\mathcal{N}(\lambda)/\partial_i - L_\mathcal{N}(\lambda)$. - - This will be enough to conclude, by observing that: - \begin{displaymath} - \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)} - \sim_{\lambda\rightarrow 0} - \frac{\sum_{i\in \mathcal{N}}\lambda_i\partial_i F_\mathcal{N}(0)} - {\sum_{i\in\mathcal{N}}\lambda_i\partial_i L_\mathcal{N}(0)} - \end{displaymath} - and that an interior critical point of the ratio - $F_\mathcal{N}(\lambda)/L_\mathcal{N}(\lambda)$ is defined by: - \begin{displaymath} - \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)} - = \frac{\partial_i F_\mathcal{N}(\lambda)}{\partial_i - L_\mathcal{N}(\lambda)} - \end{displaymath} - - Let us start by computing the derivatives of $F_\mathcal{N}$ and - $L_\mathcal{N}$ with respect to - the $i$-th component. - - For $F$, it suffices to look at the derivative of - $P_\mathcal{N}^\lambda(S)$: - \begin{displaymath} - \partial_i P_\mathcal{N}^\lambda(S) = \left\{ - \begin{aligned} - & P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})\;\textrm{if}\; i\in S \\ - & - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\;\textrm{if}\; - i\in \mathcal{N}\setminus S \\ - \end{aligned}\right. - \end{displaymath} - - Hence: - \begin{multline*} - \partial_i F_\mathcal{N} = - \sum_{\substack{S\subset\mathcal{N}\\ i\in S}} - P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})V(S)\\ - - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}} - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\ - \end{multline*} - - Now, using that every $S$ such that $i\in S$ can be uniquely written as - $S'\cup\{i\}$, we can write: - \begin{multline*} - \partial_i F_\mathcal{N} = - \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}} - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S\cup\{i\})\\ - - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}} - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\ - \end{multline*} - - Finally, by using the expression for the marginal contribution of $i$ to - $S$: - \begin{displaymath} - \partial_i F_\mathcal{N}(\lambda) = - \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}} - P_{\mathcal{N}\setminus\{i\}}^\lambda(S) - \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big) - \end{displaymath} - - The computation of the derivative of $L_\mathcal{N}$ uses standard matrix - calculus and gives: - \begin{displaymath} - \partial_i L_\mathcal{N}(\lambda) - = \mu x_i^* \tilde{A}(\lambda)^{-1}x_i - \end{displaymath} - - Using the following inequalities: - \begin{gather*} - \forall S\subset\mathcal{N}\setminus\{i\},\quad - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\geq - P_{\mathcal{N}\setminus\{i\}}^\lambda(S\cup\{i\})\\ - \forall S\subset\mathcal{N},\quad P_{\mathcal{N}\setminus\{i\}}^\lambda(S) - \geq P_\mathcal{N}^\lambda(S)\\ - \forall S\subset\mathcal{N},\quad A(S)^{-1} \geq A(S\cup\{i\})^{-1}\\ - \end{gather*} - we get: - \begin{align*} - \partial_i F_\mathcal{N}(\lambda) - & \geq \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}} - P_\mathcal{N}^\lambda(S) - \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\ - & \geq \frac{1}{2} - \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}} - P_\mathcal{N}^\lambda(S) - \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\ - &\hspace{-3.5em}+\frac{1}{2} - \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}} - P_\mathcal{N}^\lambda(S\cup\{i\}) - \log\Big(1 + \mu x_i^*A(S\cup\{i\})^{-1}x_i\Big)\\ - &\geq \frac{1}{2} - \sum_{S\subset\mathcal{N}} - P_\mathcal{N}^\lambda(S) - \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\ - \end{align*} - - Using that $A(S)\geq I_d$ we get that: - \begin{displaymath} - \mu x_i^*A(S)^{-1}x_i \leq \mu - \end{displaymath} - - Moreover: - \begin{displaymath} - \forall x\leq\mu,\; \log(1+x)\geq - \frac{\log\big(1+\mu\big)}{\mu} x - \end{displaymath} - - Hence: - \begin{displaymath} - \partial_i F_\mathcal{N}(\lambda) \geq - \frac{\log\big(1+\mu\big)}{2\mu} - x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)^{-1}\bigg)x_i - \end{displaymath} - - Finally, using that the inverse is a matrix convex function over symmetric - positive definite matrices: - \begin{align*} - \partial_i F_\mathcal{N}(\lambda) &\geq - \frac{\log\big(1+\mu\big)}{2\mu} - x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)\bigg)^{-1}x_i\\ - & \geq \frac{\log\big(1+\mu\big)}{2\mu} - \partial_i L_\mathcal{N}(\lambda) - \end{align*} -\end{proof} - -We can now prove lemma TODO from previous section. - -\begin{proof} - Let us consider a feasible point $\lambda^*\in[0,1]^n$ such that $L_\mathcal{N}(\lambda^*) - = OPT(L_\mathcal{N}, B)$. By applying lemma~\ref{lemma:relaxation-ratio} - and lemma~\ref{lemma:rounding} we get a feasible point $\bar{\lambda}$ with at most - one fractional component such that: - \begin{equation}\label{eq:e1} - L_\mathcal{N}(\lambda^*) \leq \frac{1}{C_\mu} - F_\mathcal{N}(\bar{\lambda}) - \end{equation} - - Let $\lambda_i$ denote the fractional component of $\bar{\lambda}$ and $S$ - denote the set whose indicator vector is $\bar{\lambda} - \lambda_i e_i$. - Using the fact that $F_\mathcal{N}$ is linear with respect to the $i$-th - component and is a relaxation of the value function, we get: - \begin{displaymath} - F_\mathcal{N}(\bar{\lambda}) = V(S) +\lambda_i V(S\cup\{i\}) - \end{displaymath} - - Using the submodularity of $V$: - \begin{displaymath} - F_\mathcal{N}(\bar{\lambda}) \leq 2 V(S) + V(i) - \end{displaymath} - - Note that since $\bar{\lambda}$ is feasible, $S$ is also feasible and - $V(S)\leq OPT(V,\mathcal{N}, B)$. Hence: - \begin{equation}\label{eq:e2} - F_\mathcal{N}(\bar{\lambda}) \leq 2 OPT(V,\mathcal{N}, B) - + \max_{i\in\mathcal{N}} V(i) - \end{equation} - - Putting \eqref{eq:e1} and \eqref{eq:e2} together gives the results. -\end{proof} - +\input{main} \section{General setup} - +\input{general} \section{Conclusion} - +\input{conclusion} \end{document} |
