From d8a68c6917f5b6053117e0145f6d4d80a8bec26b Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Tue, 14 Apr 2015 15:20:19 -0400 Subject: Starting paper draft --- paper/sections/negative.tex | 113 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 paper/sections/negative.tex (limited to 'paper/sections/negative.tex') diff --git a/paper/sections/negative.tex b/paper/sections/negative.tex new file mode 100644 index 0000000..f88117a --- /dev/null +++ b/paper/sections/negative.tex @@ -0,0 +1,113 @@ +\subsection{Learnable $\nRightarrow$ Optimizable} + +Consider the following function: +\begin{displaymath} + g(S) = \max\left\{\sqrt{n}|X\cap S|, |S|\right\} +\end{displaymath} +where $|X|=\sqrt{n}$. Assume that you are given polynomially many samples where +each element is included with probability $1/2$. Then with high probability all +the samples will have size roughly $n/2$, so you will observe $g(S)=|S|$ with +high probability. + +\paragraph{Claim 1:} $g$ is PAC-learnable because if you output $|S|$ then you +will be correct with high probability is $S$ is drawn from the same +distribution as above. + +\paragraph{Claim 2:} $g$ is not optimizable under budget $\sqrt{n}$ because you +never learn anything about $X$. + +\paragraph{Open Question:} Cook a similar example where $g$ is submodular and +where you are observing sets of the same size as your budget. + +\paragraph{Positive results:} Try to obtain guarantees about learning +parameters of parametric submodular functions and show whether or not these +guarantees are sufficient for optimization. First look at learning weights in +a cover function. Maybe facility location? Sums of concave over modular are +probably too hard because of the connection to neural networks. + +\subsection{Optimizable $n\Rightarrow$ Learnable} + +Recall the matroid construction from~\cite{balcan2011learning}: +\begin{theorem} + For any $k$ with $k = 2^{o(n^{1/3})}$, there exists a family of sets ${\cal A} + \subseteq2^{[n]}$ and a family of matroids $\cal{M} = \{M_{\cal{B}} : + \cal{B} \subseteq\cal{A} \}$ such that: + \begin{itemize} + \item $|{\cal A}| = k$ and $|A| = n^{1/3}$ for every $A \in \cal{A}$ + \item For every $\cal{B} \subseteq\cal{A}$ and every $A \in \cal{A}$, we + have: + \begin{align*} + \text{rank}_{M_{\cal{B}}}(A) & = 8 \log k \ if A\in{\cal B} \\ + & = |A| \ if A\in {\cal A}\backslash{\cal B} + \end{align*} + \end{itemize} +\end{theorem} + +Consider the following subset of the above family of matroids: ${\cal +M}^{\epsilon} = \{M_{\cal B} : {\cal B} \subseteq{\cal A} \wedge |{\cal +A}\backslash{\cal B}| \geq \epsilon|{\cal A}|\}$ for $k = 2^{n^{1/6}}$. +Consider an \emph{unknown} function $f$, corresponding to the rank function of +one of the matroids $M_{\cal B}$ from ${\cal M}^{\epsilon}$. Note that as long +as we observe \emph{one} set from ${\cal A} \backslash {\cal B}$, we can +optimize $f$ exactly! Indeed, the largest possible value for $f$ under +cardinality constraint $n^{1/3}$ is $\max_{A\in 2^{[n]}} |A| = n^{1/3}$. + +One example of a distribution under which this occurs with probability at least +a constant is $D_u$, the uniform distribution over all sets of ${\cal A}$. For +$c>0$, after $n^c$ observations $O \equiv (S_i, f(S_i))_i$ for $S_i \sim D_u$, +we will observe at least one element from $\cal{A}\backslash\cal{B}$ with +constant probability: + +\begin{equation} \nonumber \mathbb{P}(\bigcup_{S_i} S_i \in {\cal +A}\backslash{\cal B}) \geq 1 - (1 - \epsilon)^{n^c} \approx \epsilon n^c +\end{equation} + +However, it follows from the analysis of~\cite{balcan2011learning} that we +cannot learn $f$ under any distribution, even with active value queries! +Indeed, for any polynomially-sized set of observations, there exists a +super-polynomially number of functions in ${\cal M^1}$ which coincide on this +set of observations, but which take very different values outside of this set +of observations: $8n^{1/6}$ for $A \in {\cal B}$ and $n^{1/3}$ for $A \in {\cal +A}\backslash {\cal B}$. + +{\bf TODO:} A cleaner simpler example would be nice. + +\subsection{Which learning guarantees imply optimization?} + +Here, we consider the following question: which learning guarantees imply +optimization? For example, \cite{TODO} provides the following guarantee for +cover functions: + +\begin{theorem} +There exists an algorithm such that for any $\epsilon>0$ given random and +uniform examples of a coverage function $c$ outputs a hypothesis, which is also +a coverage function $h$, such that with probability $2/3$: $\mathbb{E}_{\cal +U}[|h(x) - c(x)|] \leq \epsilon$. The algorithm runs in time $\tilde {\cal O}(n) +\cdot \text{poly}(s/\epsilon)$ and uses $\log(n)\cdot \text{poly}(s/epsilon)$ +and examples, where $s = \min\{size(c), (1/\epsilon)^{\log(1/\epsilon)}\}$. +\end{theorem} + +We would like to understand to what extent this $\ell1$-bound allows us to +optimize the coverage function $c$ under cardinality constraints using the +hypothesis $h$. Let us consider the simpler case of an additive function, and +suppose we have a similar guarantee: $\mathbb{E}_{x \sim {\cal U}}[|h(x) - +c(x)|]$ where $\forall x, c(x) \equiv \sum_i w_i x_i$ and $h(x) \equiv \sum_i +\hat w_i x_i$. Can we find a bound on $\|w - \hat w\|_{\infty}$? + +As it turns out, yes. Let $\forall i, w_i - \hat w_i \equiv \alpha_i$ and let +$V(x) \equiv |h(x) - c(x)| = |\sum_{i} \alpha_i x_i |$. Consider the collection +of good sets ${\cal G} \equiv \{S : v(S) < 4\epsilon\}$. We claim that $|{\cal G}| +\geq \frac{3}{4}\cdot2^n$. Suppose the contrary, there is at least a quarter of +the sets which have value $v(S) > 4\epsilon$ such that $\mathbb{E}_{x \sim {\cal +U}}[|v(x)|] \geq \frac{1}{2^n}\sum_{S \in {\cal G}^c} |v(S)| > +\frac{1}{4}\cdot4\epsilon = \epsilon$ which is a contradiction. Consider element +$i$ such that $|\alpha_i| \equiv \|\alpha\|_{\infty}$. Consider the collection +of sets which contain $i$: ${\cal S}_i \equiv \{S : i \in S\}$. Notice that +$|{\cal S}_i| = |{\cal S}_i^c| = 2^{n-1}$. Therefore, $|{\cal G} \cap {\cal +S}_i^c| \geq \frac{1}{4}\cdot 2^n$. For all sets $S$ in ${\cal G} \cap {\cal +S}_i^c$, $v(S\cup\{j\}) \geq \alpha_i - 4\epsilon$. It follows that +$\mathbb{E}_{x \sim {\cal U}}[|v(x)|] \geq \frac{1}{4}(\alpha_j - 4\epsilon)$ +and therefore we have $\|w - \hat w\|_{\infty} \leq 8 \epsilon$. + + + -- cgit v1.2.3-70-g09d2