aboutsummaryrefslogtreecommitdiffstats
path: root/paper/sections/negative.tex
diff options
context:
space:
mode:
Diffstat (limited to 'paper/sections/negative.tex')
-rw-r--r--paper/sections/negative.tex143
1 files changed, 143 insertions, 0 deletions
diff --git a/paper/sections/negative.tex b/paper/sections/negative.tex
new file mode 100644
index 0000000..555aa1d
--- /dev/null
+++ b/paper/sections/negative.tex
@@ -0,0 +1,143 @@
+\subsection{Model}
+
+Let $\Omega$ be the universe of elements and $f$ a function defined on subsets
+of $\Omega$: $f : S \in 2^{[\Omega]} \mapsto f(S) \in \mathbb{R}$. Let $K$ be a
+collection of sets of $2^{[\Omega]}$, which we call \emph{constraints}. Let
+$S^*_K$ be any solution to $\max_{S \in K} f(S)$, which we will also denote by
+$S^*$ when there is no ambiguity. Let $L$ be the problem size, which is often
+(but not always) equal to $|\Omega|$.
+
+In general, we say we can efficiently optimize a function $f$ under constraint
+$K$ when we have a polynomial-time algorithm making adaptive value queries to
+$f$,which returns a set $S$ such that $S \in K$ and $f(S) \geq \alpha f(S^*)$
+with high probability and $\alpha$ an absolute constant.
+
+Here, we consider the scenario where we cannot make adaptive value queries, and
+in fact, where we cannot make queries at all! Instead, we suppose that we
+observe a polynomial number of set-value pairs $(S, f(S))$ where $S$ is taken
+from a known distribution $D$. We say we can efficiently \emph{passively
+optimize} $f$ under distribution $D$ or $D-$optimize $f$ under constraints $K$
+when, after observing ${\cal O}(L^c)$ set-value pairs from $D$ where $c > 0$ is
+an absolute constant, we can return a set $S$ such that $S \in K$ and $f(S)
+\geq \alpha f(S^*)$ with high probability and $\alpha$ an absolute constant.
+
+In the case of \emph{passive} observations of set-value pairs under a
+distribution $D$ for a function $f$, recent research has focused on whether we
+can efficiently and approximately \emph{learn} $f$. This was formalized in the
+PMAC model from \cite{balcan2011learning}. When thinking about passive
+optimization, it is necessary to understand the link between being able to
+ $D-PMAC$ learn $f$ and being able to $D-$optimize $f$.
+
+\subsection{Learnable $\nRightarrow$ Optimizable}
+
+Consider the following function:
+\begin{displaymath}
+ g(S) = \max\left\{\sqrt{n}|X\cap S|, |S|\right\}
+\end{displaymath}
+where $|X|=\sqrt{n}$. Assume that you are given polynomially many samples where
+each element is included with probability $1/2$. Then with high probability all
+the samples will have size roughly $n/2$, so you will observe $g(S)=|S|$ with
+high probability.
+
+\paragraph{Claim 1:} $g$ is PAC-learnable because if you output $|S|$ then you
+will be correct with high probability is $S$ is drawn from the same
+distribution as above.
+
+\paragraph{Claim 2:} $g$ is not optimizable under budget $\sqrt{n}$ because you
+never learn anything about $X$.
+
+\paragraph{Open Question:} Cook a similar example where $g$ is submodular and
+where you are observing sets of the same size as your budget.
+
+\paragraph{Positive results:} Try to obtain guarantees about learning
+parameters of parametric submodular functions and show whether or not these
+guarantees are sufficient for optimization. First look at learning weights in
+a cover function. Maybe facility location? Sums of concave over modular are
+probably too hard because of the connection to neural networks.
+
+\subsection{Optimizable $\nRightarrow$ Learnable}
+
+Recall the matroid construction from~\cite{balcan2011learning}:
+\begin{theorem}
+ For any $k$ with $k = 2^{o(n^{1/3})}$, there exists a family of sets ${\cal A}
+ \subseteq2^{[n]}$ and a family of matroids $\cal{M} = \{M_{\cal{B}} :
+ \cal{B} \subseteq\cal{A} \}$ such that:
+ \begin{itemize}
+ \item $|{\cal A}| = k$ and $|A| = n^{1/3}$ for every $A \in \cal{A}$
+ \item For every $\cal{B} \subseteq\cal{A}$ and every $A \in \cal{A}$, we
+ have:
+ \begin{align*}
+ \text{rank}_{M_{\cal{B}}}(A) & = 8 \log k \ if A\in{\cal B} \\
+ & = |A| \ if A\in {\cal A}\backslash{\cal B}
+ \end{align*}
+ \end{itemize}
+\end{theorem}
+
+Consider the following subset of the above family of matroids: ${\cal
+M}^{\epsilon} = \{M_{\cal B} : {\cal B} \subseteq{\cal A} \wedge |{\cal
+A}\backslash{\cal B}| \geq \epsilon|{\cal A}|\}$ for $k = 2^{n^{1/6}}$.
+Consider an \emph{unknown} function $f$, corresponding to the rank function of
+one of the matroids $M_{\cal B}$ from ${\cal M}^{\epsilon}$. Note that as long
+as we observe \emph{one} set from ${\cal A} \backslash {\cal B}$, we can
+optimize $f$ exactly! Indeed, the largest possible value for $f$ under
+cardinality constraint $n^{1/3}$ is $\max_{A\in 2^{[n]}} |A| = n^{1/3}$.
+
+One example of a distribution under which this occurs with probability at least
+a constant is $D_u$, the uniform distribution over all sets of ${\cal A}$. For
+$c>0$, after $n^c$ observations $O \equiv (S_i, f(S_i))_i$ for $S_i \sim D_u$,
+we will observe at least one element from $\cal{A}\backslash\cal{B}$ with
+constant probability:
+
+\begin{equation} \nonumber \mathbb{P}(\bigcup_{S_i} S_i \in {\cal
+A}\backslash{\cal B}) \geq 1 - (1 - \epsilon)^{n^c} \approx \epsilon n^c
+\end{equation}
+
+However, it follows from the analysis of~\cite{balcan2011learning} that we
+cannot learn $f$ under any distribution, even with active value queries!
+Indeed, for any polynomially-sized set of observations, there exists a
+super-polynomially number of functions in ${\cal M^1}$ which coincide on this
+set of observations, but which take very different values outside of this set
+of observations: $8n^{1/6}$ for $A \in {\cal B}$ and $n^{1/3}$ for $A \in {\cal
+A}\backslash {\cal B}$.
+
+{\bf TODO:} A cleaner simpler example would be nice.
+
+\subsection{Which learning guarantees imply optimization?}
+
+Here, we consider the following question: which learning guarantees imply
+optimization? For example, \cite{TODO} provides the following guarantee for
+cover functions:
+
+\begin{theorem}
+There exists an algorithm such that for any $\epsilon>0$ given random and
+uniform examples of a coverage function $c$ outputs a hypothesis, which is also
+a coverage function $h$, such that with probability $2/3$: $\mathbb{E}_{\cal
+U}[|h(x) - c(x)|] \leq \epsilon$. The algorithm runs in time $\tilde {\cal O}(n)
+\cdot \text{poly}(s/\epsilon)$ and uses $\log(n)\cdot \text{poly}(s/epsilon)$
+and examples, where $s = \min\{size(c), (1/\epsilon)^{\log(1/\epsilon)}\}$.
+\end{theorem}
+
+We would like to understand to what extent this $\ell1$-bound allows us to
+optimize the coverage function $c$ under cardinality constraints using the
+hypothesis $h$. Let us consider the simpler case of an additive function, and
+suppose we have a similar guarantee: $\mathbb{E}_{x \sim {\cal U}}[|h(x) -
+c(x)|]$ where $\forall x, c(x) \equiv \sum_i w_i x_i$ and $h(x) \equiv \sum_i
+\hat w_i x_i$. Can we find a bound on $\|w - \hat w\|_{\infty}$?
+
+As it turns out, yes. Let $\forall i, w_i - \hat w_i \equiv \alpha_i$ and let
+$V(x) \equiv |h(x) - c(x)| = |\sum_{i} \alpha_i x_i |$. Consider the collection
+of good sets ${\cal G} \equiv \{S : v(S) < 4\epsilon\}$. We claim that $|{\cal G}|
+\geq \frac{3}{4}\cdot2^n$. Suppose the contrary, there is at least a quarter of
+the sets which have value $v(S) > 4\epsilon$ such that $\mathbb{E}_{x \sim {\cal
+U}}[|v(x)|] \geq \frac{1}{2^n}\sum_{S \in {\cal G}^c} |v(S)| >
+\frac{1}{4}\cdot4\epsilon = \epsilon$ which is a contradiction. Consider element
+$i$ such that $|\alpha_i| \equiv \|\alpha\|_{\infty}$. Consider the collection
+of sets which contain $i$: ${\cal S}_i \equiv \{S : i \in S\}$. Notice that
+$|{\cal S}_i| = |{\cal S}_i^c| = 2^{n-1}$. Therefore, $|{\cal G} \cap {\cal
+S}_i^c| \geq \frac{1}{4}\cdot 2^n$. For all sets $S$ in ${\cal G} \cap {\cal
+S}_i^c$, $v(S\cup\{j\}) \geq \alpha_i - 4\epsilon$. It follows that
+$\mathbb{E}_{x \sim {\cal U}}[|v(x)|] \geq \frac{1}{4}(\alpha_j - 4\epsilon)$
+and therefore we have $\|w - \hat w\|_{\infty} \leq 8 \epsilon$.
+
+
+