\subsection{Learnable $\nRightarrow$ Optimizable} Consider the following function: \begin{displaymath} g(S) = \max\left\{\sqrt{n}|X\cap S|, |S|\right\} \end{displaymath} where $|X|=\sqrt{n}$. Assume that you are given polynomially many samples where each element is included with probability $1/2$. Then with high probability all the samples will have size roughly $n/2$, so you will observe $g(S)=|S|$ with high probability. \paragraph{Claim 1:} $g$ is PAC-learnable because if you output $|S|$ then you will be correct with high probability is $S$ is drawn from the same distribution as above. \paragraph{Claim 2:} $g$ is not optimizable under budget $\sqrt{n}$ because you never learn anything about $X$. \paragraph{Open Question:} Cook a similar example where $g$ is submodular and where you are observing sets of the same size as your budget. \paragraph{Positive results:} Try to obtain guarantees about learning parameters of parametric submodular functions and show whether or not these guarantees are sufficient for optimization. First look at learning weights in a cover function. Maybe facility location? Sums of concave over modular are probably too hard because of the connection to neural networks. \subsection{Optimizable $\nRightarrow$ Learnable} Recall the matroid construction from~\cite{balcan2011learning}: \begin{theorem} For any $k$ with $k = 2^{o(n^{1/3})}$, there exists a family of sets ${\cal A} \subseteq2^{[n]}$ and a family of matroids $\cal{M} = \{M_{\cal{B}} : \cal{B} \subseteq\cal{A} \}$ such that: \begin{itemize} \item $|{\cal A}| = k$ and $|A| = n^{1/3}$ for every $A \in \cal{A}$ \item For every $\cal{B} \subseteq\cal{A}$ and every $A \in \cal{A}$, we have: \begin{align*} \text{rank}_{M_{\cal{B}}}(A) & = 8 \log k \ if A\in{\cal B} \\ & = |A| \ if A\in {\cal A}\backslash{\cal B} \end{align*} \end{itemize} \end{theorem} Consider the following subset of the above family of matroids: ${\cal M}^{\epsilon} = \{M_{\cal B} : {\cal B} \subseteq{\cal A} \wedge |{\cal A}\backslash{\cal B}| \geq \epsilon|{\cal A}|\}$ for $k = 2^{n^{1/6}}$. Consider an \emph{unknown} function $f$, corresponding to the rank function of one of the matroids $M_{\cal B}$ from ${\cal M}^{\epsilon}$. Note that as long as we observe \emph{one} set from ${\cal A} \backslash {\cal B}$, we can optimize $f$ exactly! Indeed, the largest possible value for $f$ under cardinality constraint $n^{1/3}$ is $\max_{A\in 2^{[n]}} |A| = n^{1/3}$. One example of a distribution under which this occurs with probability at least a constant is $D_u$, the uniform distribution over all sets of ${\cal A}$. For $c>0$, after $n^c$ observations $O \equiv (S_i, f(S_i))_i$ for $S_i \sim D_u$, we will observe at least one element from $\cal{A}\backslash\cal{B}$ with constant probability: \begin{equation} \nonumber \mathbb{P}(\bigcup_{S_i} S_i \in {\cal A}\backslash{\cal B}) \geq 1 - (1 - \epsilon)^{n^c} \approx \epsilon n^c \end{equation} However, it follows from the analysis of~\cite{balcan2011learning} that we cannot learn $f$ under any distribution, even with active value queries! Indeed, for any polynomially-sized set of observations, there exists a super-polynomially number of functions in ${\cal M^1}$ which coincide on this set of observations, but which take very different values outside of this set of observations: $8n^{1/6}$ for $A \in {\cal B}$ and $n^{1/3}$ for $A \in {\cal A}\backslash {\cal B}$. {\bf TODO:} A cleaner simpler example would be nice. \subsection{Which learning guarantees imply optimization?} Here, we consider the following question: which learning guarantees imply optimization? For example, \cite{TODO} provides the following guarantee for cover functions: \begin{theorem} There exists an algorithm such that for any $\epsilon>0$ given random and uniform examples of a coverage function $c$ outputs a hypothesis, which is also a coverage function $h$, such that with probability $2/3$: $\mathbb{E}_{\cal U}[|h(x) - c(x)|] \leq \epsilon$. The algorithm runs in time $\tilde {\cal O}(n) \cdot \text{poly}(s/\epsilon)$ and uses $\log(n)\cdot \text{poly}(s/epsilon)$ and examples, where $s = \min\{size(c), (1/\epsilon)^{\log(1/\epsilon)}\}$. \end{theorem} We would like to understand to what extent this $\ell1$-bound allows us to optimize the coverage function $c$ under cardinality constraints using the hypothesis $h$. Let us consider the simpler case of an additive function, and suppose we have a similar guarantee: $\mathbb{E}_{x \sim {\cal U}}[|h(x) - c(x)|]$ where $\forall x, c(x) \equiv \sum_i w_i x_i$ and $h(x) \equiv \sum_i \hat w_i x_i$. Can we find a bound on $\|w - \hat w\|_{\infty}$? As it turns out, yes. Let $\forall i, w_i - \hat w_i \equiv \alpha_i$ and let $V(x) \equiv |h(x) - c(x)| = |\sum_{i} \alpha_i x_i |$. Consider the collection of good sets ${\cal G} \equiv \{S : v(S) < 4\epsilon\}$. We claim that $|{\cal G}| \geq \frac{3}{4}\cdot2^n$. Suppose the contrary, there is at least a quarter of the sets which have value $v(S) > 4\epsilon$ such that $\mathbb{E}_{x \sim {\cal U}}[|v(x)|] \geq \frac{1}{2^n}\sum_{S \in {\cal G}^c} |v(S)| > \frac{1}{4}\cdot4\epsilon = \epsilon$ which is a contradiction. Consider element $i$ such that $|\alpha_i| \equiv \|\alpha\|_{\infty}$. Consider the collection of sets which contain $i$: ${\cal S}_i \equiv \{S : i \in S\}$. Notice that $|{\cal S}_i| = |{\cal S}_i^c| = 2^{n-1}$. Therefore, $|{\cal G} \cap {\cal S}_i^c| \geq \frac{1}{4}\cdot 2^n$. For all sets $S$ in ${\cal G} \cap {\cal S}_i^c$, $v(S\cup\{j\}) \geq \alpha_i - 4\epsilon$. It follows that $\mathbb{E}_{x \sim {\cal U}}[|v(x)|] \geq \frac{1}{4}(\alpha_j - 4\epsilon)$ and therefore we have $\|w - \hat w\|_{\infty} \leq 8 \epsilon$.