some reorganizing

author: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-03-20 14:25:16 -0400
committer: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-03-20 14:25:16 -0400
commit: 5d0edd895fe1f2aa3ec7c0f805cb504ffec41573 (patch)
tree: 74a6f928e7a014977091bf60c0502e36693dc521 /results.tex
parent: 0965d21907aaa81c7b0e11a2abae2746d7fe0d0c (diff)
download: learn-optimize-5d0edd895fe1f2aa3ec7c0f805cb504ffec41573.tar.gz
1 files changed, 35 insertions, 17 deletions
diff --git a/results.tex b/results.tex
index e167a00..bae1694 100644
--- a/results.tex
+++ b/results.tex
@@ -116,33 +116,51 @@ results about the probability of the adjacency matrix to be non singular.
 
 \section{Passive Optimization}
 
-In general, we say we can efficiently optimize a function $f$ under constraints
+Let $\Omega$ be the universe of elements and $f$ a function defined on subsets
+of $\Omega$: $f : S \in 2^{[\Omega]} \mapsto f(S) \in \mathbb{R}$. Let $K$ be a
+collection of sets of $2^{[\Omega]}$, which we call \emph{constraints}. Let
+$S^*_K$ be any solution to $\max_{S \in K} f(S)$, which we will also denote by
+$S^*$ when there is no ambiguity. Let $L$ be the problem size, which is often
+(but not always) equal to $|\Omega|$.
+
+In general, we say we can efficiently optimize a function $f$ under constraint
 $K$ when we have a polynomial-time algorithm making adaptive value queries to
-$f$,which returns a set $S$ such that:  $S \in K$ and $f(S) \geq \alpha f(S^*)$
-with high probability, where $\alpha$ is an absolute constant and $S^* \in \arg
-\max_{S \in K}$.
+$f$,which returns a set $S$ such that $S \in K$ and $f(S) \geq \alpha f(S^*)$
+with high probability and $\alpha$ an absolute constant.
 
 Here, we consider the scenario where we cannot make adaptive value queries, and
 in fact, where we cannot make queries at all! Instead, we suppose that we
 observe a polynomial number of set-value pairs $(S, f(S))$ where $S$ is taken
-from a known distribution $D$. Let $L$ be the problem size. We say we can
-efficiently \emph{passively optimize} $f$ under distribution $D$ and
-constraints $K$ when, after observing ${\cal O}(L^c)$ set-value pairs from $D$
-where $c > 0$ is an absolute constant, we can return a set $S$ such that:  $S
-\in K$ and $f(S) \geq \alpha f(S^*)$ with high probability, where $\alpha$ is
-an absolute constant and $S^* \in \arg \max_{S \in K} f(S)$.
+from a known distribution $D$. We say we can efficiently \emph{passively
+optimize} $f$ under distribution $D$ or $D-$optimize $f$ under constraints $K$
+when, after observing ${\cal O}(L^c)$ set-value pairs from $D$ where $c > 0$ is
+an absolute constant, we can return a set $S$ such that $S \in K$ and $f(S)
+\geq \alpha f(S^*)$ with high probability and $\alpha$ an absolute constant.
+
+In the case of \emph{passive} observations of set-value pairs under a
+distribution $D$ for a function $f$, recent research has focused on whether we
+can efficiently and approximately \emph{learn} $f$. This was formalized in the
+{\cal PMAC} model from \cite{balcan2011learning}. When thinking about passive
+optimization, it is necessary to understand the link between being able to
+learn $D-{\cal PMAC}$ $f$ and being able to $D-$optimize $f$.
 
 \subsection{Additive function}
 
+We consider here the simple case of additive functions. A function $f$ is
+additive if there exists a weight vector $(w_s)_{s \in \Omega}$ such that
+$\forall S \subseteq \Omega, \ f(S) = \sum_{s \in S} w_s$. We will sometimes
+adopt the notation $w \equiv f(\Omega)$.
+
 \subsubsection{Inverting the system}\label{subsec:invert_the_system}
 
-Note that for observations ${(S_j, f(S_j))}_{j \in N}$ we can write the system $A
-w = b$, where each row of $A_j$ corresponds to the indicator vector of the set
-$S_j$ and $b_j \equiv f(S_j)$. From Corollary~\ref{cor:number_of_rows_needed},
-it is easy to see that ${(C_K+1)}^2 \frac{n^3}{k(n - k)}$ rows are sufficient for
-$A$ to have full row rank with probability $1-e^{-c_k n}$. If $A$ has full row
-rank, then it is easy to invert the system in polynomial time, and both learn
-and optimize $f$ for any cardinality constraint.
+Note that for observations ${(S_j, f(S_j))}_{j \in N}$ we can write the system
+$A w = b$, where each row of $A_j$ corresponds to the indicator vector of the
+set $S_j$ and $b_j \equiv f(S_j)$. From
+Corollary~\ref{cor:number_of_rows_needed}, it is easy to see that ${(C_K+1)}^2
+\frac{n^3}{k(n - k)}$ rows are sufficient for $A$ to have full row rank with
+probability $1-e^{-c_k n}$. If $A$ has full row rank, then it is easy to invert
+the system in polynomial time, and both learn and optimize $f$ for any
+cardinality constraint.
 
 \begin{proposition} Assume that $N$ pairs of set-value ${(S_j, f(S_j))}_{j \in
   N}$ are observed, where $S_j \sim D_p$ and $p \equiv \frac{k}{n}$. If $N >
author	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-03-20 14:25:16 -0400
committer	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-03-20 14:25:16 -0400
commit	5d0edd895fe1f2aa3ec7c0f805cb504ffec41573 (patch)
tree	74a6f928e7a014977091bf60c0502e36693dc521 /results.tex
parent	0965d21907aaa81c7b0e11a2abae2746d7fe0d0c (diff)
download	learn-optimize-5d0edd895fe1f2aa3ec7c0f805cb504ffec41573.tar.gz