Let $\Omega$ be the universe of elements and $f$ a function defined on subsets of $\Omega$: $f : S \in 2^{[\Omega]} \mapsto f(S) \in \mathbb{R}$. Let $K$ be a collection of sets of $2^{[\Omega]}$, which we call \emph{constraints}. Let $S^*_K$ be any solution to $\max_{S \in K} f(S)$, which we will also denote by $S^*$ when there is no ambiguity. Let $L$ be the problem size, which is often (but not always) equal to $|\Omega|$. In general, we say we can efficiently optimize a function $f$ under constraint $K$ when we have a polynomial-time algorithm making adaptive value queries to $f$,which returns a set $S$ such that $S \in K$ and $f(S) \geq \alpha f(S^*)$ with high probability and $\alpha$ an absolute constant. Here, we consider the scenario where we cannot make adaptive value queries, and in fact, where we cannot make queries at all! Instead, we suppose that we observe a polynomial number of set-value pairs $(S, f(S))$ where $S$ is taken from a known distribution $D$. We say we can efficiently \emph{passively optimize} $f$ under distribution $D$ or $D-$optimize $f$ under constraints $K$ when, after observing ${\cal O}(L^c)$ set-value pairs from $D$ where $c > 0$ is an absolute constant, we can return a set $S$ such that $S \in K$ and $f(S) \geq \alpha f(S^*)$ with high probability and $\alpha$ an absolute constant. In the case of \emph{passive} observations of set-value pairs under a distribution $D$ for a function $f$, recent research has focused on whether we can efficiently and approximately \emph{learn} $f$. This was formalized in the PMAC model from \cite{balcan2011learning}. When thinking about passive optimization, it is necessary to understand the link between being able to $D-PMAC$ learn $f$ and being able to $D-$optimize $f$.