aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2015-03-18 18:05:46 -0400
committerThibaut Horel <thibaut.horel@gmail.com>2015-03-18 18:05:46 -0400
commitf0b3b883fd8f652fdaa245c99d9ccd29edff2dc5 (patch)
treed9c11c990116c3dc95e32e1448e5198f9ee2777b
parentda6f6f069e0272d237c3d29e50c737d803fb9a3b (diff)
downloadlearn-optimize-f0b3b883fd8f652fdaa245c99d9ccd29edff2dc5.tar.gz
Attempt at a more combinatorial argument
-rw-r--r--results.tex57
1 files changed, 55 insertions, 2 deletions
diff --git a/results.tex b/results.tex
index 2c0873d..974209b 100644
--- a/results.tex
+++ b/results.tex
@@ -1,5 +1,6 @@
\documentclass[10pt]{article}
\usepackage{fullpage, amsmath, amssymb, amsthm}
+\usepackage[utf8x]{inputenc}
\newtheorem{proposition}{Proposition}
\newtheorem{corollary}{Corollary}
@@ -14,6 +15,7 @@
\section{Preliminary Results}
+\subsection{Generic Random Matrix Theory}
We cite the following result from \cite{vershynin2010introduction} (Remark 5.40):
\begin{proposition}
@@ -49,9 +51,60 @@ It follows that for any vector $x \in \mathbb{R}^n$, $|\frac{1}{N}\|A x\|^2_2 -\
Note that this is clearly not optimal. For example, note that for $k=1$, the solution to the coupon collector problem states that only ${\cal O}(n\log n)$ rows are sufficient before the matrix $A$ has full row rank with high probability, and not ${\cal O}(n^2)$ as stated by Corollary~\ref{cor:number_of_rows_needed}.
+\subsection{More Direct Approach}
+
+Here we work on $\mathbb{F}_2$. An $n\times n$ binary matrix can be seen as
+a matrix over $\mathbb{F}_2$. Let us assume that each row of $A$ is chosen
+uniformly at random among all binary rows of length $n$. A standard counting
+arguments shows that the number of non-singular matrices over $\mathbb{F}_2$
+is:
+\begin{displaymath}
+ N_n = (2^n-1)(2^n - 2)\dots (2^n- 2^{n-1})
+ = (2^n)^n\prod_{i=1}^n\left(1-\frac{1}{2^i}\right)
+\end{displaymath}
+
+Hence the probability that our random matrix $A$ is invertible is:
+\begin{displaymath}
+ P_n = \prod_{i=1}^n\left(1-\frac{1}{2^i}\right)
+\end{displaymath}
+It is easy to see that $P_n$ is a decreasing sequence. Its limit is
+$\phi(\frac{1}{2})$ where $\phi$ is the Euler function. We have
+$\phi(\frac{1}{2})\approx
+0.2887880950866024212788997219292307800889119048406857$ and $P_n$ converges
+exponentially fast to this constant (one way to see this is to use the
+Pentagonal number theorem).
+
+Hence, if we observe $\ell\cdot n$ uniformly random binary rows, the
+probability that they will have full column rank is at least:
+\begin{displaymath}
+ P_{\ell\cdot n}\geq 1 - \left(1-\phi\left(\frac{1}{2}\right)\right)^{\ell}
+\end{displaymath}
+
+Note that this is the probability of having full column rank over
+$\mathbb{F}_2$. A standard linear algebra argument shows that this implies full
+column rank over $\mathbb{R}$.
+
+\paragraph{TODO:} Study the case where we only observe sets of size exactly
+$k$, or at most $k$. This amounts to replacing $2^n$ in the computation above
+by:
+\begin{displaymath}
+{n\choose k}\quad\text{or}\quad\sum_{j=0}^k{n\choose j}
+\end{displaymath}
+
+Thinking about it, why do we assume that the sample sets are of size exactly
+$k$. I think it would make more sense from the learning perspective to consider
+uniformly random sets. In this case, the above approach allows us to conclude
+directly.
+
\section{Passive Optimization of Additive Functions}
-Let $\Omega$ be a universe of elements of size $n$. For $k \in [n]$, let $D_k$ be the uniform probability distribution over all sets of size $k$, and let $D_p$ be the product probability distribution over sets such that $\forall j \in \Omega, \mathbb{P}_{S \sim D_p}(j \in S) = p$. Note that $D_{k}$ and $D_{p}$ for $p \equiv \frac{k}{n}$ are very similar, and will be considered almost interchangeably throughout the following notes. Consider the following problem:
+Let $\Omega$ be a universe of elements of size $n$. For $k \in [n]$, let $D_k$ be the uniform probability distribution over all sets of size $k$, and let $D_p$ be the product probability distribution over sets such that $\forall j \in \Omega, \mathbb{P}_{S \sim D_p}(j \in S) = p$. Note that $D_{k}$ and $D_{p}$ for $p \equiv \frac{k}{n}$ are very similar, and will be considered almost interchangeably throughout the following notes.
+
+\textbf{TODO:} Make the argument about independently choosing elements with
+proba $k/n$ vs choosing a set of size $k$ rigorous. The “more direct” approach
+above should allow us to consider the case of sets of size $k$ directly.
+
+Consider the following problem:
\begin{problem}
Let $f = (w_i)_{i \in \Omega}$ be an unknown additive function: $f(S) \equiv \sum_{u \in S} w_u$. How many (set, value) pairs $(S, f(S))$ where $S \sim D_k$ must be observed until we can optimize $f$ exactly or to a constant approximation factor under cardinality constraint $k$?
@@ -129,4 +182,4 @@ Same thing can probably be said for the average weight method.
\bibliographystyle{apalike}
-\end{document} \ No newline at end of file
+\end{document}