diff options
Diffstat (limited to 'results.tex')
| -rw-r--r-- | results.tex | 57 |
1 files changed, 55 insertions, 2 deletions
diff --git a/results.tex b/results.tex index 2c0873d..974209b 100644 --- a/results.tex +++ b/results.tex @@ -1,5 +1,6 @@ \documentclass[10pt]{article} \usepackage{fullpage, amsmath, amssymb, amsthm} +\usepackage[utf8x]{inputenc} \newtheorem{proposition}{Proposition} \newtheorem{corollary}{Corollary} @@ -14,6 +15,7 @@ \section{Preliminary Results} +\subsection{Generic Random Matrix Theory} We cite the following result from \cite{vershynin2010introduction} (Remark 5.40): \begin{proposition} @@ -49,9 +51,60 @@ It follows that for any vector $x \in \mathbb{R}^n$, $|\frac{1}{N}\|A x\|^2_2 -\ Note that this is clearly not optimal. For example, note that for $k=1$, the solution to the coupon collector problem states that only ${\cal O}(n\log n)$ rows are sufficient before the matrix $A$ has full row rank with high probability, and not ${\cal O}(n^2)$ as stated by Corollary~\ref{cor:number_of_rows_needed}. +\subsection{More Direct Approach} + +Here we work on $\mathbb{F}_2$. An $n\times n$ binary matrix can be seen as +a matrix over $\mathbb{F}_2$. Let us assume that each row of $A$ is chosen +uniformly at random among all binary rows of length $n$. A standard counting +arguments shows that the number of non-singular matrices over $\mathbb{F}_2$ +is: +\begin{displaymath} + N_n = (2^n-1)(2^n - 2)\dots (2^n- 2^{n-1}) + = (2^n)^n\prod_{i=1}^n\left(1-\frac{1}{2^i}\right) +\end{displaymath} + +Hence the probability that our random matrix $A$ is invertible is: +\begin{displaymath} + P_n = \prod_{i=1}^n\left(1-\frac{1}{2^i}\right) +\end{displaymath} +It is easy to see that $P_n$ is a decreasing sequence. Its limit is +$\phi(\frac{1}{2})$ where $\phi$ is the Euler function. We have +$\phi(\frac{1}{2})\approx +0.2887880950866024212788997219292307800889119048406857$ and $P_n$ converges +exponentially fast to this constant (one way to see this is to use the +Pentagonal number theorem). + +Hence, if we observe $\ell\cdot n$ uniformly random binary rows, the +probability that they will have full column rank is at least: +\begin{displaymath} + P_{\ell\cdot n}\geq 1 - \left(1-\phi\left(\frac{1}{2}\right)\right)^{\ell} +\end{displaymath} + +Note that this is the probability of having full column rank over +$\mathbb{F}_2$. A standard linear algebra argument shows that this implies full +column rank over $\mathbb{R}$. + +\paragraph{TODO:} Study the case where we only observe sets of size exactly +$k$, or at most $k$. This amounts to replacing $2^n$ in the computation above +by: +\begin{displaymath} +{n\choose k}\quad\text{or}\quad\sum_{j=0}^k{n\choose j} +\end{displaymath} + +Thinking about it, why do we assume that the sample sets are of size exactly +$k$. I think it would make more sense from the learning perspective to consider +uniformly random sets. In this case, the above approach allows us to conclude +directly. + \section{Passive Optimization of Additive Functions} -Let $\Omega$ be a universe of elements of size $n$. For $k \in [n]$, let $D_k$ be the uniform probability distribution over all sets of size $k$, and let $D_p$ be the product probability distribution over sets such that $\forall j \in \Omega, \mathbb{P}_{S \sim D_p}(j \in S) = p$. Note that $D_{k}$ and $D_{p}$ for $p \equiv \frac{k}{n}$ are very similar, and will be considered almost interchangeably throughout the following notes. Consider the following problem: +Let $\Omega$ be a universe of elements of size $n$. For $k \in [n]$, let $D_k$ be the uniform probability distribution over all sets of size $k$, and let $D_p$ be the product probability distribution over sets such that $\forall j \in \Omega, \mathbb{P}_{S \sim D_p}(j \in S) = p$. Note that $D_{k}$ and $D_{p}$ for $p \equiv \frac{k}{n}$ are very similar, and will be considered almost interchangeably throughout the following notes. + +\textbf{TODO:} Make the argument about independently choosing elements with +proba $k/n$ vs choosing a set of size $k$ rigorous. The “more direct” approach +above should allow us to consider the case of sets of size $k$ directly. + +Consider the following problem: \begin{problem} Let $f = (w_i)_{i \in \Omega}$ be an unknown additive function: $f(S) \equiv \sum_{u \in S} w_u$. How many (set, value) pairs $(S, f(S))$ where $S \sim D_k$ must be observed until we can optimize $f$ exactly or to a constant approximation factor under cardinality constraint $k$? @@ -129,4 +182,4 @@ Same thing can probably be said for the average weight method. \bibliographystyle{apalike} -\end{document}
\ No newline at end of file +\end{document} |
