diff options
| author | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-03-20 14:25:16 -0400 |
|---|---|---|
| committer | jeanpouget-abadie <jean.pougetabadie@gmail.com> | 2015-03-20 14:25:16 -0400 |
| commit | 5d0edd895fe1f2aa3ec7c0f805cb504ffec41573 (patch) | |
| tree | 74a6f928e7a014977091bf60c0502e36693dc521 /results.tex | |
| parent | 0965d21907aaa81c7b0e11a2abae2746d7fe0d0c (diff) | |
| download | learn-optimize-5d0edd895fe1f2aa3ec7c0f805cb504ffec41573.tar.gz | |
some reorganizing
Diffstat (limited to 'results.tex')
| -rw-r--r-- | results.tex | 52 |
1 files changed, 35 insertions, 17 deletions
diff --git a/results.tex b/results.tex index e167a00..bae1694 100644 --- a/results.tex +++ b/results.tex @@ -116,33 +116,51 @@ results about the probability of the adjacency matrix to be non singular. \section{Passive Optimization} -In general, we say we can efficiently optimize a function $f$ under constraints +Let $\Omega$ be the universe of elements and $f$ a function defined on subsets +of $\Omega$: $f : S \in 2^{[\Omega]} \mapsto f(S) \in \mathbb{R}$. Let $K$ be a +collection of sets of $2^{[\Omega]}$, which we call \emph{constraints}. Let +$S^*_K$ be any solution to $\max_{S \in K} f(S)$, which we will also denote by +$S^*$ when there is no ambiguity. Let $L$ be the problem size, which is often +(but not always) equal to $|\Omega|$. + +In general, we say we can efficiently optimize a function $f$ under constraint $K$ when we have a polynomial-time algorithm making adaptive value queries to -$f$,which returns a set $S$ such that: $S \in K$ and $f(S) \geq \alpha f(S^*)$ -with high probability, where $\alpha$ is an absolute constant and $S^* \in \arg -\max_{S \in K}$. +$f$,which returns a set $S$ such that $S \in K$ and $f(S) \geq \alpha f(S^*)$ +with high probability and $\alpha$ an absolute constant. Here, we consider the scenario where we cannot make adaptive value queries, and in fact, where we cannot make queries at all! Instead, we suppose that we observe a polynomial number of set-value pairs $(S, f(S))$ where $S$ is taken -from a known distribution $D$. Let $L$ be the problem size. We say we can -efficiently \emph{passively optimize} $f$ under distribution $D$ and -constraints $K$ when, after observing ${\cal O}(L^c)$ set-value pairs from $D$ -where $c > 0$ is an absolute constant, we can return a set $S$ such that: $S -\in K$ and $f(S) \geq \alpha f(S^*)$ with high probability, where $\alpha$ is -an absolute constant and $S^* \in \arg \max_{S \in K} f(S)$. +from a known distribution $D$. We say we can efficiently \emph{passively +optimize} $f$ under distribution $D$ or $D-$optimize $f$ under constraints $K$ +when, after observing ${\cal O}(L^c)$ set-value pairs from $D$ where $c > 0$ is +an absolute constant, we can return a set $S$ such that $S \in K$ and $f(S) +\geq \alpha f(S^*)$ with high probability and $\alpha$ an absolute constant. + +In the case of \emph{passive} observations of set-value pairs under a +distribution $D$ for a function $f$, recent research has focused on whether we +can efficiently and approximately \emph{learn} $f$. This was formalized in the +{\cal PMAC} model from \cite{balcan2011learning}. When thinking about passive +optimization, it is necessary to understand the link between being able to +learn $D-{\cal PMAC}$ $f$ and being able to $D-$optimize $f$. \subsection{Additive function} +We consider here the simple case of additive functions. A function $f$ is +additive if there exists a weight vector $(w_s)_{s \in \Omega}$ such that +$\forall S \subseteq \Omega, \ f(S) = \sum_{s \in S} w_s$. We will sometimes +adopt the notation $w \equiv f(\Omega)$. + \subsubsection{Inverting the system}\label{subsec:invert_the_system} -Note that for observations ${(S_j, f(S_j))}_{j \in N}$ we can write the system $A -w = b$, where each row of $A_j$ corresponds to the indicator vector of the set -$S_j$ and $b_j \equiv f(S_j)$. From Corollary~\ref{cor:number_of_rows_needed}, -it is easy to see that ${(C_K+1)}^2 \frac{n^3}{k(n - k)}$ rows are sufficient for -$A$ to have full row rank with probability $1-e^{-c_k n}$. If $A$ has full row -rank, then it is easy to invert the system in polynomial time, and both learn -and optimize $f$ for any cardinality constraint. +Note that for observations ${(S_j, f(S_j))}_{j \in N}$ we can write the system +$A w = b$, where each row of $A_j$ corresponds to the indicator vector of the +set $S_j$ and $b_j \equiv f(S_j)$. From +Corollary~\ref{cor:number_of_rows_needed}, it is easy to see that ${(C_K+1)}^2 +\frac{n^3}{k(n - k)}$ rows are sufficient for $A$ to have full row rank with +probability $1-e^{-c_k n}$. If $A$ has full row rank, then it is easy to invert +the system in polynomial time, and both learn and optimize $f$ for any +cardinality constraint. \begin{proposition} Assume that $N$ pairs of set-value ${(S_j, f(S_j))}_{j \in N}$ are observed, where $S_j \sim D_p$ and $p \equiv \frac{k}{n}$. If $N > |
