aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-03-19 11:38:52 -0400
committerjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-03-19 11:38:52 -0400
commit2d98cb411b617ae1548f1c64c9d2e9801e4ab73d (patch)
tree254e100e2979c028b2cf15277f55b81aa3730197
parent4a68b488b10e6f32e154f3316eb0026b7fa33b10 (diff)
parent3c091fa90c6a4251777f8eea4fbe3751e76a785f (diff)
downloadlearn-optimize-2d98cb411b617ae1548f1c64c9d2e9801e4ab73d.tar.gz
fixed merge conflicts
-rw-r--r--results.tex84
1 files changed, 73 insertions, 11 deletions
diff --git a/results.tex b/results.tex
index f9ca671..66e8f10 100644
--- a/results.tex
+++ b/results.tex
@@ -1,5 +1,6 @@
\documentclass[10pt]{article}
\usepackage{fullpage, amsmath, amssymb, amsthm}
+\usepackage[utf8x]{inputenc}
\newtheorem{proposition}{Proposition}
\newtheorem{corollary}{Corollary}
@@ -15,6 +16,7 @@
\section{Preliminary Results}
+\subsection{Generic Random Matrix Theory}
We cite the following result from~\cite{vershynin2010introduction} (Remark 5.40):
\begin{proposition}\label{prop:non_isotropic_isometry}
@@ -61,20 +63,80 @@ rows are sufficient before the matrix $A$ has full row rank with high
probability, and not ${\cal O}(n^2)$ as stated by
Corollary~\ref{cor:number_of_rows_needed}.
+\paragraph{Note:} $\Theta(n\log n)$ is in expectation for the coupon collector
+problem. If one want an exponentially small probability of failure, then
+I think results on the coupon collector problem also imply that a quadratic
+number of rows are required.
+
+\subsection{More Direct Approach}
+
+Here we work on $\mathbb{F}_2$. An $n\times n$ binary matrix can be seen as
+a matrix over $\mathbb{F}_2$. Let us assume that each row of $A$ is chosen
+uniformly at random among all binary rows of length $n$. A standard counting
+arguments shows that the number of non-singular matrices over $\mathbb{F}_2$
+is:
+\begin{displaymath}
+ N_n = (2^n-1)(2^n - 2)\dots (2^n- 2^{n-1})
+ = (2^n)^n\prod_{i=1}^n\left(1-\frac{1}{2^i}\right)
+\end{displaymath}
+
+Hence the probability that our random matrix $A$ is invertible is:
+\begin{displaymath}
+ P_n = \prod_{i=1}^n\left(1-\frac{1}{2^i}\right)
+\end{displaymath}
+It is easy to see that $P_n$ is a decreasing sequence. Its limit is
+$\phi(\frac{1}{2})$ where $\phi$ is the Euler function. We have
+$\phi(\frac{1}{2})\approx
+0.2887880950866024212788997219292307800889119048406857$ and $P_n$ converges
+exponentially fast to this constant (one way to see this is to use the
+Pentagonal number theorem).
+
+Hence, if we observe $\ell\cdot n$ uniformly random binary rows, the
+probability that they will have full column rank is at least:
+\begin{displaymath}
+ P_{\ell\cdot n}\geq 1 - \left(1-\phi\left(\frac{1}{2}\right)\right)^{\ell}
+\end{displaymath}
+
+Note that this is the probability of having full column rank over
+$\mathbb{F}_2$. A standard linear algebra argument shows that this implies full
+column rank over $\mathbb{R}$.
+
+\paragraph{TODO:} Study the case where we only observe sets of size exactly
+$k$, or at most $k$. This amounts to replacing $2^n$ in the computation above
+by:
+\begin{displaymath}
+{n\choose k}\quad\text{or}\quad\sum_{j=0}^k{n\choose j}
+\end{displaymath}
+
+Thinking about it, why do we assume that the sample sets are of size exactly
+$k$. I think it would make more sense from the learning perspective to consider
+uniformly random sets. In this case, the above approach allows us to conclude
+directly.
+
+More generally, I think the “right” way to think about this is to look at $A$
+as the adjacency matrix of a random $k$-regular graph. There are tons of
+results about the probability of the adjacency matrix to be non singular.
+
\section{Passive Optimization of Additive Functions}
-Let $\Omega$ be a universe of elements of size $n$. For $k \in [n]$, let $D_k$
-be the uniform probability distribution over all sets of size $k$, and let
-$D_p$ be the product probability distribution over sets such that $\forall j
-\in \Omega, \mathbb{P}_{S \sim D_p}(j \in S) = p$. Note that $D_{k}$ and
-$D_{p}$ for $p \equiv \frac{k}{n}$ are very similar, and will be considered
-almost interchangeably throughout the following notes. Consider the following
-problem:
+Let $\Omega$ be a universe of elements of size $n$. For $k \in [n]$,
+let $D_k$ be the uniform probability distribution over all sets of size $k$,
+and let $D_p$ be the product probability distribution over sets such that
+$\forall j \in \Omega, \mathbb{P}_{S \sim D_p}(j \in S) = p$. Note that $D_{k}$
+and $D_{p}$ for $p \equiv \frac{k}{n}$ are very similar, and will be considered
+almost interchangeably throughout the following notes.
+
+\textbf{TODO:} Make the argument about independently choosing elements with
+proba $k/n$ vs choosing a set of size $k$ rigorous. The “more direct” approach
+above should allow us to consider the case of sets of size $k$ directly.
+
+Consider the following problem:
-\begin{problem} Let $f = {(w_i)}_{i \in \Omega}$ be an unknown additive function:
- $f(S) \equiv \sum_{u \in S} w_u$. How many (set, value) pairs $(S, f(S))$
- where $S \sim D_k$ must be observed until we can optimize $f$ exactly or to a
- constant approximation factor under cardinality constraint $k$?
+\begin{problem}
+Let $f = (w_i)_{i \in \Omega}$ be an unknown additive function: $f(S) \equiv
+\sum_{u \in S} w_u$. How many (set, value) pairs $(S, f(S))$ where $S \sim D_k$
+must be observed until we can optimize $f$ exactly or to a constant
+approximation factor under cardinality constraint $k$?
\end{problem}