From d8a68c6917f5b6053117e0145f6d4d80a8bec26b Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Tue, 14 Apr 2015 15:20:19 -0400 Subject: Starting paper draft --- paper/sections/negative.tex | 113 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 paper/sections/negative.tex (limited to 'paper/sections/negative.tex') diff --git a/paper/sections/negative.tex b/paper/sections/negative.tex new file mode 100644 index 0000000..f88117a --- /dev/null +++ b/paper/sections/negative.tex @@ -0,0 +1,113 @@ +\subsection{Learnable $\nRightarrow$ Optimizable} + +Consider the following function: +\begin{displaymath} + g(S) = \max\left\{\sqrt{n}|X\cap S|, |S|\right\} +\end{displaymath} +where $|X|=\sqrt{n}$. Assume that you are given polynomially many samples where +each element is included with probability $1/2$. Then with high probability all +the samples will have size roughly $n/2$, so you will observe $g(S)=|S|$ with +high probability. + +\paragraph{Claim 1:} $g$ is PAC-learnable because if you output $|S|$ then you +will be correct with high probability is $S$ is drawn from the same +distribution as above. + +\paragraph{Claim 2:} $g$ is not optimizable under budget $\sqrt{n}$ because you +never learn anything about $X$. + +\paragraph{Open Question:} Cook a similar example where $g$ is submodular and +where you are observing sets of the same size as your budget. + +\paragraph{Positive results:} Try to obtain guarantees about learning +parameters of parametric submodular functions and show whether or not these +guarantees are sufficient for optimization. First look at learning weights in +a cover function. Maybe facility location? Sums of concave over modular are +probably too hard because of the connection to neural networks. + +\subsection{Optimizable $n\Rightarrow$ Learnable} + +Recall the matroid construction from~\cite{balcan2011learning}: +\begin{theorem} + For any $k$ with $k = 2^{o(n^{1/3})}$, there exists a family of sets ${\cal A} + \subseteq2^{[n]}$ and a family of matroids $\cal{M} = \{M_{\cal{B}} : + \cal{B} \subseteq\cal{A} \}$ such that: + \begin{itemize} + \item $|{\cal A}| = k$ and $|A| = n^{1/3}$ for every $A \in \cal{A}$ + \item For every $\cal{B} \subseteq\cal{A}$ and every $A \in \cal{A}$, we + have: + \begin{align*} + \text{rank}_{M_{\cal{B}}}(A) & = 8 \log k \ if A\in{\cal B} \\ + & = |A| \ if A\in {\cal A}\backslash{\cal B} + \end{align*} + \end{itemize} +\end{theorem} + +Consider the following subset of the above family of matroids: ${\cal +M}^{\epsilon} = \{M_{\cal B} : {\cal B} \subseteq{\cal A} \wedge |{\cal +A}\backslash{\cal B}| \geq \epsilon|{\cal A}|\}$ for $k = 2^{n^{1/6}}$. +Consider an \emph{unknown} function $f$, corresponding to the rank function of +one of the matroids $M_{\cal B}$ from ${\cal M}^{\epsilon}$. Note that as long +as we observe \emph{one} set from ${\cal A} \backslash {\cal B}$, we can +optimize $f$ exactly! Indeed, the largest possible value for $f$ under +cardinality constraint $n^{1/3}$ is $\max_{A\in 2^{[n]}} |A| = n^{1/3}$. + +One example of a distribution under which this occurs with probability at least +a constant is $D_u$, the uniform distribution over all sets of ${\cal A}$. For +$c>0$, after $n^c$ observations $O \equiv (S_i, f(S_i))_i$ for $S_i \sim D_u$, +we will observe at least one element from $\cal{A}\backslash\cal{B}$ with +constant probability: + +\begin{equation} \nonumber \mathbb{P}(\bigcup_{S_i} S_i \in {\cal +A}\backslash{\cal B}) \geq 1 - (1 - \epsilon)^{n^c} \approx \epsilon n^c +\end{equation} + +However, it follows from the analysis of~\cite{balcan2011learning} that we +cannot learn $f$ under any distribution, even with active value queries! +Indeed, for any polynomially-sized set of observations, there exists a +super-polynomially number of functions in ${\cal M^1}$ which coincide on this +set of observations, but which take very different values outside of this set +of observations: $8n^{1/6}$ for $A \in {\cal B}$ and $n^{1/3}$ for $A \in {\cal +A}\backslash {\cal B}$. + +{\bf TODO:} A cleaner simpler example would be nice. + +\subsection{Which learning guarantees imply optimization?} + +Here, we consider the following question: which learning guarantees imply +optimization? For example, \cite{TODO} provides the following guarantee for +cover functions: + +\begin{theorem} +There exists an algorithm such that for any $\epsilon>0$ given random and +uniform examples of a coverage function $c$ outputs a hypothesis, which is also +a coverage function $h$, such that with probability $2/3$: $\mathbb{E}_{\cal +U}[|h(x) - c(x)|] \leq \epsilon$. The algorithm runs in time $\tilde {\cal O}(n) +\cdot \text{poly}(s/\epsilon)$ and uses $\log(n)\cdot \text{poly}(s/epsilon)$ +and examples, where $s = \min\{size(c), (1/\epsilon)^{\log(1/\epsilon)}\}$. +\end{theorem} + +We would like to understand to what extent this $\ell1$-bound allows us to +optimize the coverage function $c$ under cardinality constraints using the +hypothesis $h$. Let us consider the simpler case of an additive function, and +suppose we have a similar guarantee: $\mathbb{E}_{x \sim {\cal U}}[|h(x) - +c(x)|]$ where $\forall x, c(x) \equiv \sum_i w_i x_i$ and $h(x) \equiv \sum_i +\hat w_i x_i$. Can we find a bound on $\|w - \hat w\|_{\infty}$? + +As it turns out, yes. Let $\forall i, w_i - \hat w_i \equiv \alpha_i$ and let +$V(x) \equiv |h(x) - c(x)| = |\sum_{i} \alpha_i x_i |$. Consider the collection +of good sets ${\cal G} \equiv \{S : v(S) < 4\epsilon\}$. We claim that $|{\cal G}| +\geq \frac{3}{4}\cdot2^n$. Suppose the contrary, there is at least a quarter of +the sets which have value $v(S) > 4\epsilon$ such that $\mathbb{E}_{x \sim {\cal +U}}[|v(x)|] \geq \frac{1}{2^n}\sum_{S \in {\cal G}^c} |v(S)| > +\frac{1}{4}\cdot4\epsilon = \epsilon$ which is a contradiction. Consider element +$i$ such that $|\alpha_i| \equiv \|\alpha\|_{\infty}$. Consider the collection +of sets which contain $i$: ${\cal S}_i \equiv \{S : i \in S\}$. Notice that +$|{\cal S}_i| = |{\cal S}_i^c| = 2^{n-1}$. Therefore, $|{\cal G} \cap {\cal +S}_i^c| \geq \frac{1}{4}\cdot 2^n$. For all sets $S$ in ${\cal G} \cap {\cal +S}_i^c$, $v(S\cup\{j\}) \geq \alpha_i - 4\epsilon$. It follows that +$\mathbb{E}_{x \sim {\cal U}}[|v(x)|] \geq \frac{1}{4}(\alpha_j - 4\epsilon)$ +and therefore we have $\|w - \hat w\|_{\infty} \leq 8 \epsilon$. + + + -- cgit v1.2.3-70-g09d2 From 90211ba1847251c552361a7d0b2eef1c540ef72a Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Tue, 14 Apr 2015 15:27:24 -0400 Subject: Nips format and minor tweaks. We are already close to the page limit… MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- paper/main.tex | 10 +- paper/nips15submit_e.sty | 236 ++++++++++++++++++++++++++++++++++++++++++++ paper/sections/negative.tex | 2 +- 3 files changed, 244 insertions(+), 4 deletions(-) create mode 100644 paper/nips15submit_e.sty (limited to 'paper/sections/negative.tex') diff --git a/paper/main.tex b/paper/main.tex index 58b6b22..0b33bdd 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -1,6 +1,11 @@ \documentclass[10pt]{article} -\usepackage{fullpage, amsmath, amssymb, amsthm, bbm} +\usepackage{nips15submit_e} +\usepackage{amsmath, amssymb, amsthm, bbm, microtype} \usepackage[utf8x]{inputenc} +\usepackage[capitalize, noabbrev]{cleveref} +\usepackage[pagebackref=false,breaklinks=true,colorlinks=true]{hyperref} +\usepackage{natbib} +\usepackage{url} \DeclareMathOperator{\E}{\mathbb{E}} \let\P\relax @@ -11,7 +16,6 @@ \newcommand{\ints}{\mathbb{N}} \renewcommand{\O}{\mathcal{O}} - \newtheorem{proposition}{Proposition} \newtheorem{corollary}{Corollary} \newtheorem{problem}{Problem} @@ -35,7 +39,7 @@ \section{Positive results} \input{sections/positive.tex} -\bibliographystyle{apalike} +\bibliographystyle{abbrv} \bibliography{optimize} diff --git a/paper/nips15submit_e.sty b/paper/nips15submit_e.sty new file mode 100644 index 0000000..75dc8a8 --- /dev/null +++ b/paper/nips15submit_e.sty @@ -0,0 +1,236 @@ +%%%% NIPS Macros (LaTex) +%%%% Style File +%%%% Dec 12, 1990 Rev Aug 14, 1991; Sept, 1995; April, 1997; April, 1999 + +% This file can be used with Latex2e whether running in main mode, or +% 2.09 compatibility mode. +% +% If using main mode, you need to include the commands +% \documentclass{article} +% \usepackage{nips10submit_e,times} +% as the first lines in your document. Or, if you do not have Times +% Roman font available, you can just use +% \documentclass{article} +% \usepackage{nips10submit_e} +% instead. +% +% If using 2.09 compatibility mode, you need to include the command +% \documentstyle[nips10submit_09,times]{article} +% as the first line in your document. Or, if you do not have Times +% Roman font available, you can include the command +% \documentstyle[nips10submit_09]{article} +% instead. + +% Change the overall width of the page. If these parameters are +% changed, they will require corresponding changes in the +% maketitle section. +% +\usepackage{eso-pic} % used by \AddToShipoutPicture + +\renewcommand{\topfraction}{0.95} % let figure take up nearly whole page +\renewcommand{\textfraction}{0.05} % let figure take up nearly whole page + +% Define nipsfinal, set to true if nipsfinalcopy is defined +\newif\ifnipsfinal +\nipsfinalfalse +\def\nipsfinalcopy{\nipsfinaltrue} +\font\nipstenhv = phvb at 8pt % *** IF THIS FAILS, SEE nips10submit_e.sty *** + +% Specify the dimensions of each page + +\setlength{\paperheight}{11in} +\setlength{\paperwidth}{8.5in} + +\oddsidemargin .5in % Note \oddsidemargin = \evensidemargin +\evensidemargin .5in +\marginparwidth 0.07 true in +%\marginparwidth 0.75 true in +%\topmargin 0 true pt % Nominal distance from top of page to top of +%\topmargin 0.125in +\topmargin -0.625in +\addtolength{\headsep}{0.25in} +\textheight 9.0 true in % Height of text (including footnotes & figures) +\textwidth 5.5 true in % Width of text line. +\widowpenalty=10000 +\clubpenalty=10000 + +% \thispagestyle{empty} \pagestyle{empty} +\flushbottom \sloppy + +% We're never going to need a table of contents, so just flush it to +% save space --- suggested by drstrip@sandia-2 +\def\addcontentsline#1#2#3{} + +% Title stuff, taken from deproc. +\def\maketitle{\par +\begingroup + \def\thefootnote{\fnsymbol{footnote}} + \def\@makefnmark{\hbox to 0pt{$^{\@thefnmark}$\hss}} % for perfect author + % name centering +% The footnote-mark was overlapping the footnote-text, +% added the following to fix this problem (MK) + \long\def\@makefntext##1{\parindent 1em\noindent + \hbox to1.8em{\hss $\m@th ^{\@thefnmark}$}##1} + \@maketitle \@thanks +\endgroup +\setcounter{footnote}{0} +\let\maketitle\relax \let\@maketitle\relax +\gdef\@thanks{}\gdef\@author{}\gdef\@title{}\let\thanks\relax} + +% The toptitlebar has been raised to top-justify the first page + +% Title (includes both anonimized and non-anonimized versions) +\def\@maketitle{\vbox{\hsize\textwidth +\linewidth\hsize \vskip 0.1in \toptitlebar \centering +{\LARGE\bf \@title\par} \bottomtitlebar % \vskip 0.1in % minus +\ifnipsfinal + \def\And{\end{tabular}\hfil\linebreak[0]\hfil + \begin{tabular}[t]{c}\bf\rule{\z@}{24pt}\ignorespaces}% + \def\AND{\end{tabular}\hfil\linebreak[4]\hfil + \begin{tabular}[t]{c}\bf\rule{\z@}{24pt}\ignorespaces}% + \begin{tabular}[t]{c}\bf\rule{\z@}{24pt}\@author\end{tabular}% +\else + \begin{tabular}[t]{c}\bf\rule{\z@}{24pt} +Anonymous Author(s) \\ +Affiliation \\ +Address \\ +\texttt{email} \\ +\end{tabular}% +\fi +\vskip 0.3in minus 0.1in}} + +\renewenvironment{abstract}{\vskip.075in\centerline{\large\bf +Abstract}\vspace{0.5ex}\begin{quote}}{\par\end{quote}\vskip 1ex} + +% sections with less space +\def\section{\@startsection {section}{1}{\z@}{-2.0ex plus + -0.5ex minus -.2ex}{1.5ex plus 0.3ex +minus0.2ex}{\large\bf\raggedright}} + +\def\subsection{\@startsection{subsection}{2}{\z@}{-1.8ex plus +-0.5ex minus -.2ex}{0.8ex plus .2ex}{\normalsize\bf\raggedright}} +\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-1.5ex +plus -0.5ex minus -.2ex}{0.5ex plus +.2ex}{\normalsize\bf\raggedright}} +\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus +0.5ex minus .2ex}{-1em}{\normalsize\bf}} +\def\subparagraph{\@startsection{subparagraph}{5}{\z@}{1.5ex plus + 0.5ex minus .2ex}{-1em}{\normalsize\bf}} +\def\subsubsubsection{\vskip +5pt{\noindent\normalsize\rm\raggedright}} + + +% Footnotes +\footnotesep 6.65pt % +\skip\footins 9pt plus 4pt minus 2pt +\def\footnoterule{\kern-3pt \hrule width 12pc \kern 2.6pt } +\setcounter{footnote}{0} + +% Lists and paragraphs +\parindent 0pt +\topsep 4pt plus 1pt minus 2pt +\partopsep 1pt plus 0.5pt minus 0.5pt +\itemsep 2pt plus 1pt minus 0.5pt +\parsep 2pt plus 1pt minus 0.5pt +\parskip .5pc + + +%\leftmargin2em +\leftmargin3pc +\leftmargini\leftmargin \leftmarginii 2em +\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em + +%\labelsep \labelsep 5pt + +\def\@listi{\leftmargin\leftmargini} +\def\@listii{\leftmargin\leftmarginii + \labelwidth\leftmarginii\advance\labelwidth-\labelsep + \topsep 2pt plus 1pt minus 0.5pt + \parsep 1pt plus 0.5pt minus 0.5pt + \itemsep \parsep} +\def\@listiii{\leftmargin\leftmarginiii + \labelwidth\leftmarginiii\advance\labelwidth-\labelsep + \topsep 1pt plus 0.5pt minus 0.5pt + \parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt + \itemsep \topsep} +\def\@listiv{\leftmargin\leftmarginiv + \labelwidth\leftmarginiv\advance\labelwidth-\labelsep} +\def\@listv{\leftmargin\leftmarginv + \labelwidth\leftmarginv\advance\labelwidth-\labelsep} +\def\@listvi{\leftmargin\leftmarginvi + \labelwidth\leftmarginvi\advance\labelwidth-\labelsep} + +\abovedisplayskip 7pt plus2pt minus5pt% +\belowdisplayskip \abovedisplayskip +\abovedisplayshortskip 0pt plus3pt% +\belowdisplayshortskip 4pt plus3pt minus3pt% + +% Less leading in most fonts (due to the narrow columns) +% The choices were between 1-pt and 1.5-pt leading +%\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt} % got rid of @ (MK) +\def\normalsize{\@setsize\normalsize{11pt}\xpt\@xpt} +\def\small{\@setsize\small{10pt}\ixpt\@ixpt} +\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt} +\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt} +\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt} +\def\large{\@setsize\large{14pt}\xiipt\@xiipt} +\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt} +\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt} +\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt} +\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt} + +\def\toptitlebar{\hrule height4pt\vskip .25in\vskip-\parskip} + +\def\bottomtitlebar{\vskip .29in\vskip-\parskip\hrule height1pt\vskip +.09in} % +%Reduced second vskip to compensate for adding the strut in \@author + +% Vertical Ruler +% This code is, largely, from the CVPR 2010 conference style file +% ----- define vruler +\makeatletter +\newbox\nipsrulerbox +\newcount\nipsrulercount +\newdimen\nipsruleroffset +\newdimen\cv@lineheight +\newdimen\cv@boxheight +\newbox\cv@tmpbox +\newcount\cv@refno +\newcount\cv@tot +% NUMBER with left flushed zeros \fillzeros[] +\newcount\cv@tmpc@ \newcount\cv@tmpc +\def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi +\cv@tmpc=1 % +\loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi + \ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat +\ifnum#2<0\advance\cv@tmpc1\relax-\fi +\loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat +\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}% +% \makevruler[][][][][] +\def\makevruler[#1][#2][#3][#4][#5]{\begingroup\offinterlineskip +\textheight=#5\vbadness=10000\vfuzz=120ex\overfullrule=0pt% +\global\setbox\nipsrulerbox=\vbox to \textheight{% +{\parskip=0pt\hfuzz=150em\cv@boxheight=\textheight +\cv@lineheight=#1\global\nipsrulercount=#2% +\cv@tot\cv@boxheight\divide\cv@tot\cv@lineheight\advance\cv@tot2% +\cv@refno1\vskip-\cv@lineheight\vskip1ex% +\loop\setbox\cv@tmpbox=\hbox to0cm{{\nipstenhv\hfil\fillzeros[#4]\nipsrulercount}}% +\ht\cv@tmpbox\cv@lineheight\dp\cv@tmpbox0pt\box\cv@tmpbox\break +\advance\cv@refno1\global\advance\nipsrulercount#3\relax +\ifnum\cv@refno<\cv@tot\repeat}}\endgroup}% +\makeatother +% ----- end of vruler + +% \makevruler[][][][][] +\def\nipsruler#1{\makevruler[12pt][#1][1][3][0.993\textheight]\usebox{\nipsrulerbox}} +\AddToShipoutPicture{% +\ifnipsfinal\else +\nipsruleroffset=\textheight +\advance\nipsruleroffset by -3.7pt + \color[rgb]{.7,.7,.7} + \AtTextUpperLeft{% + \put(\LenToUnit{-35pt},\LenToUnit{-\nipsruleroffset}){%left ruler + \nipsruler{\nipsrulercount}} + } +\fi +} diff --git a/paper/sections/negative.tex b/paper/sections/negative.tex index f88117a..2eaeb47 100644 --- a/paper/sections/negative.tex +++ b/paper/sections/negative.tex @@ -25,7 +25,7 @@ guarantees are sufficient for optimization. First look at learning weights in a cover function. Maybe facility location? Sums of concave over modular are probably too hard because of the connection to neural networks. -\subsection{Optimizable $n\Rightarrow$ Learnable} +\subsection{Optimizable $\nRightarrow$ Learnable} Recall the matroid construction from~\cite{balcan2011learning}: \begin{theorem} -- cgit v1.2.3-70-g09d2 From ead8ef85b50848f078a360b040f59ea962821cb4 Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Tue, 14 Apr 2015 16:46:34 -0400 Subject: Sketch of the introduction --- paper/sections/introduction.tex | 81 +++++++++++++++++++++++++++-------------- paper/sections/negative.tex | 30 +++++++++++++++ 2 files changed, 84 insertions(+), 27 deletions(-) (limited to 'paper/sections/negative.tex') diff --git a/paper/sections/introduction.tex b/paper/sections/introduction.tex index bcb406f..6eda59d 100644 --- a/paper/sections/introduction.tex +++ b/paper/sections/introduction.tex @@ -1,27 +1,54 @@ -Let $\Omega$ be the universe of elements and $f$ a function defined on subsets -of $\Omega$: $f : S \in 2^{[\Omega]} \mapsto f(S) \in \mathbb{R}$. Let $K$ be a -collection of sets of $2^{[\Omega]}$, which we call \emph{constraints}. Let -$S^*_K$ be any solution to $\max_{S \in K} f(S)$, which we will also denote by -$S^*$ when there is no ambiguity. Let $L$ be the problem size, which is often -(but not always) equal to $|\Omega|$. - -In general, we say we can efficiently optimize a function $f$ under constraint -$K$ when we have a polynomial-time algorithm making adaptive value queries to -$f$,which returns a set $S$ such that $S \in K$ and $f(S) \geq \alpha f(S^*)$ -with high probability and $\alpha$ an absolute constant. - -Here, we consider the scenario where we cannot make adaptive value queries, and -in fact, where we cannot make queries at all! Instead, we suppose that we -observe a polynomial number of set-value pairs $(S, f(S))$ where $S$ is taken -from a known distribution $D$. We say we can efficiently \emph{passively -optimize} $f$ under distribution $D$ or $D-$optimize $f$ under constraints $K$ -when, after observing ${\cal O}(L^c)$ set-value pairs from $D$ where $c > 0$ is -an absolute constant, we can return a set $S$ such that $S \in K$ and $f(S) -\geq \alpha f(S^*)$ with high probability and $\alpha$ an absolute constant. - -In the case of \emph{passive} observations of set-value pairs under a -distribution $D$ for a function $f$, recent research has focused on whether we -can efficiently and approximately \emph{learn} $f$. This was formalized in the -PMAC model from \cite{balcan2011learning}. When thinking about passive -optimization, it is necessary to understand the link between being able to - $D-PMAC$ learn $f$ and being able to $D-$optimize $f$. +The natural “diminishing return” property of submodular functions makes them +suitable to model many real-life scenarios. However, even when they are +suitable as a model, it is not realistic to assume that they are entirely +known. For example, most submodular functions depend on unknown parameters that +have to be learned from observations. + +A recent line of work has focused on learning submodular functions. The +PMAC-learnable framework formalizes what it means to learn a submodular +function in an “overall” sense. For specific examples of submodular functions, +results are known on how well the parameters of the function can be learned +from observations. + +When the experimenter wishes to optimize the unknown submodular function, the +well-known approximation algorithms cannot be readily applied because they rely +on a value query oracle, which is much stronger than what the present scenario +suggests (at best, random independent samples). The results on learning +submodular functions also seem to be insufficient in that they are myopic to +the ultimate use which will be made of the submodular function (optimizing). +The experimenter experimenter only cares about learning the function to the +extent that what he knows is sufficient to optimize it. As such, the main +question underlying the present work is the following: + +\begin{center} + \emph{What is a good model of learning for submodular functions when the + ultimate goal is to be able to optimize them?} +\end{center} + +We make the following contributions: +\begin{itemize} + \item We show how the PMAC learning framework does not provide a satisfying + framework for optimizing under uncertainty. In particular, we show that + there are functions which are PMAC-learnable but not optimizable and + vice-versa. + \item For specific class of submodular functions, we show how to optimize + them under uncertainty. +\end{itemize} + +\subsection{Related Work} + +\paragraph{Maximizing submodular functions} Vast literature on maximizing +submodular functions under many kind of constraints. All the algorithms rely on +some kind of oracle (e.g. value query or demand query) which is stronger than +what we have: access to polynomially many observations coming from +a distribution. In particular, an algorithm cannot make arbitrary (and +certainly not adaptive) queries to the oracle. + +\paragraph{PMAC-learning} Interesting line of work, but oblivious to the +ultimate goal. + +\paragraph{Bandits} Interesting in that it ties together learning and +optimizing. However, it assumes that arbitrary queries can be made to an oracle +allowing the experimenter to explore more efficiently (even though the standard +regret benchmark means that he will incur some cost for exploring vs +exploiting. diff --git a/paper/sections/negative.tex b/paper/sections/negative.tex index 2eaeb47..555aa1d 100644 --- a/paper/sections/negative.tex +++ b/paper/sections/negative.tex @@ -1,3 +1,33 @@ +\subsection{Model} + +Let $\Omega$ be the universe of elements and $f$ a function defined on subsets +of $\Omega$: $f : S \in 2^{[\Omega]} \mapsto f(S) \in \mathbb{R}$. Let $K$ be a +collection of sets of $2^{[\Omega]}$, which we call \emph{constraints}. Let +$S^*_K$ be any solution to $\max_{S \in K} f(S)$, which we will also denote by +$S^*$ when there is no ambiguity. Let $L$ be the problem size, which is often +(but not always) equal to $|\Omega|$. + +In general, we say we can efficiently optimize a function $f$ under constraint +$K$ when we have a polynomial-time algorithm making adaptive value queries to +$f$,which returns a set $S$ such that $S \in K$ and $f(S) \geq \alpha f(S^*)$ +with high probability and $\alpha$ an absolute constant. + +Here, we consider the scenario where we cannot make adaptive value queries, and +in fact, where we cannot make queries at all! Instead, we suppose that we +observe a polynomial number of set-value pairs $(S, f(S))$ where $S$ is taken +from a known distribution $D$. We say we can efficiently \emph{passively +optimize} $f$ under distribution $D$ or $D-$optimize $f$ under constraints $K$ +when, after observing ${\cal O}(L^c)$ set-value pairs from $D$ where $c > 0$ is +an absolute constant, we can return a set $S$ such that $S \in K$ and $f(S) +\geq \alpha f(S^*)$ with high probability and $\alpha$ an absolute constant. + +In the case of \emph{passive} observations of set-value pairs under a +distribution $D$ for a function $f$, recent research has focused on whether we +can efficiently and approximately \emph{learn} $f$. This was formalized in the +PMAC model from \cite{balcan2011learning}. When thinking about passive +optimization, it is necessary to understand the link between being able to + $D-PMAC$ learn $f$ and being able to $D-$optimize $f$. + \subsection{Learnable $\nRightarrow$ Optimizable} Consider the following function: -- cgit v1.2.3-70-g09d2