From 7c253d520c8f20a719ac782386310348b9783660 Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Tue, 28 Oct 2014 00:08:34 -0400 Subject: [ps5] Add solutions --- ps5/main.tex | 351 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 351 insertions(+) create mode 100644 ps5/main.tex diff --git a/ps5/main.tex b/ps5/main.tex new file mode 100644 index 0000000..7ebda39 --- /dev/null +++ b/ps5/main.tex @@ -0,0 +1,351 @@ +\documentclass[10pt]{article} +\usepackage{fullpage} +\usepackage[utf8x]{inputenc} +\usepackage{amsmath,amsfonts,amsthm} +\usepackage[english]{babel} +\usepackage[capitalize, noabbrev]{cleveref} +\usepackage{algorithm} +\usepackage{algpseudocode} + +\newenvironment{enumerate*}% + {\vspace{-2ex} \begin{enumerate} % + \setlength{\itemsep}{-1ex} \setlength{\parsep}{0pt}}% + {\end{enumerate}} + +\newenvironment{itemize*}% + {\vspace{-2ex} \begin{itemize} % + \setlength{\itemsep}{-1ex} \setlength{\parsep}{0pt}}% + {\end{itemize}} + +\newenvironment{description*}% + {\vspace{-2ex} \begin{description} % + \setlength{\itemsep}{-1ex} \setlength{\parsep}{0pt}}% + {\end{description}} + +\DeclareMathOperator{\E}{\mathbb{E}} +\let\P\relax +\DeclareMathOperator{\P}{\mathbb{P}} +\newcommand{\ex}[1]{\E\left[#1\right]} +\newcommand{\prob}[1]{\P\left[#1\right]} +\newcommand{\inprod}[1]{\left\langle #1 \right\rangle} +\newcommand{\R}{\mathbf{R}} +\newcommand{\N}{\mathbf{N}} +\newcommand{\C}{\mathcal{C}} +\newcommand{\eps}{\varepsilon} +\newcommand{\eqdef}{\mathbin{\stackrel{\rm def}{=}}} +\newcommand{\llbracket}{[\![} + +\newtheorem{theorem}{Theorem} +\newtheorem{lemma}{Lemma} + +\author{Thibaut Horel} +\title{CS 224 Problem Set 5 -- Solutions} +\date{October 27, 2014} + +\begin{document} +\maketitle + +\section*{Problem 1} + +\paragraph{(a)}We are going to prove that $\text{OPT}\geq p_{max}$ and that +$\text{OPT}\geq\frac{1}{m}\sum_{i=1}^m p_i$. + +The machine to which the job of maximum processing time is +allocated in $\text{OPT}$ will have a completion time of at least $p_{max}$. +Hence $\text{OPT}\geq p_{max}$. + +Let us assume by contradiction that $\text{OPT} < \frac{1}{m}\sum_{i=1}^m p_i$, +then this implies that each machine has a completion time bounded from above by +$\frac{1}{m}\sum_{i=1}^m p_i$, but then the total amount of processing time in +this allocation is strictly bounded from above by +$m\times\frac{1}{m}\sum_{i=1}^m p_i = \sum_{i=1}^m p_i$. This contradicts the +fact that all the tasks are processed and we conclude that +$\text{OPT}\geq\frac{1}{m}\sum_{i=1}^m p_i$. + +\paragraph{(b)} Let us consider the machine of maximum completion time in the +allocation given by the greedy algorithm and let us consider the last time +a job was allocated to this machine. Without loss of generality, we can rename +the jobs so that the last job allocated to this machine is $p_n$. + +Right before $p_n$ is allocated to this machine, its load is bounded from above +by $\frac{1}{m}\sum_{i=1}^{n-1}p_i$. Otherwise, by contradiction, since this +machine is currently the least loaded, it would implie that the total +processing time on all machines is bounded from below by $\sum_{i=1}^{n-1} +p_i$, this is a contradiction since at most, we have allocated jobs $p_1$ to +$p_{n-1}$. Hence, after adding task $p_n$ to this machine, we can bound its +load, which is also the completion time of the greedy algorithm by: +\begin{align*} + \textsc{Greedy} &\leq \frac{1}{m}\sum_{i=1}^{n-1}p_i + p_n + \leq \frac{1}{m}\sum_{i=1}^n p_i +p_n\left(1-\frac{1}{m}\right) + \leq \frac{1}{m}\sum_{i=1}^n p_i +p_{max}\left(1-\frac{1}{m}\right)\\ + &\leq \max\left\{\frac{1}{m}\sum_{i=1}^n p_i, +p_{max}\right\}\left(2-\frac{1}{m}\right)\leq\left(2-\frac{1}{m}\right)\text{OPT} +\end{align*} +where the last inequality comes from question \textbf{(a)}. This concludes the +proof of the approximation ratio of \textsc{Greedy}. + +\paragraph{(c)} Consider the allocation after greedily assigning the jobs +in $S$ (on top of an allocation of load $L$ for the jobs in $B$). We +distinguish two cases: +\begin{itemize*} + \item either the load remains $L$ after allocating the jobs in $B$. + \item either the load is greater than $L$: in this case, the machine of + maximum load (greater than $L$) has an element in $S$ assigned to it. + Let us consider the last time a job in $S$ was assigned to it. Without + loss of generality we can rename the jobs so that this job is named + $p_n$. Similarly to \textbf{(b)}, we can prove that the load of this + machine at that time was bounded from above by + $\frac{1}{m}\sum_{i=1}^{n-1}p_i$ (otherwise we get a contradiction + using that it is the least loaded machine). Hence we can bound the load + of this machine after adding this last job in $S$: + \begin{displaymath} + \text{Max Load} \leq \frac{1}{m}\sum_{i=1}^{n-1}p_i + p_n + \leq \frac{1}{m}\sum_{i=1}^n p_i + + \eps\text{OPT}\leq(1+\eps)\text{OPT} + \end{displaymath} + where the second inequality uses that $p_n\leq \eps\text{OPT}$ since it + is a job in $S$ and the last inequality uses the bound of question + \textbf{(a)}. +\end{itemize*} + +\paragraph{(d)} An instance of the job scheduling problem with $k$ distinct job +processing times can be described by a $k$-tuple $(n_1,\ldots, n_k)$ with +$n_1+\ldots+n_k= n$ and $n_i$ is the number of jobs of type $i$ ($1\leq i\leq +k)$. We observe that this problem can be reduced to the bin packing problem: +there exists an allocation of load at most $T$ if and only if there exist +a packing of the jobs using at most $m$ bins where each bin has capacity at +most $T$. + +This suggests the following dynamic programming algorithm: let us +denote by $M(i_1,\ldots,i_k)$ the minimum number of machines required to +complete the instance $(i_1,\ldots,i_k)$ in time less than $T$, then we have +the following induction formula: +\begin{equation}\label{eq:induction} + 1 + \min_{(j_1,\ldots,j_k)\in S(i_1,\ldots,i_k)} M(i_1 - j_1,\ldots, + i_k-j_k) +\end{equation} +where $S(i_1,\ldots,i_k) = \{(j_1,\ldots,j_k)\in \{0,\ldots, +i_1\}\times\ldots\times\{0,\ldots, i_k\}\,|\, M(j_1,\ldots,j_k)=1\}$ is the set +of all subsets of the instance $(i_1,\ldots,i_k)$ which can fit on single +machine. The correctness of this induction formula is easy to establish: for +a given instance, we can first ``guess'' (i.e. try all possible) a sub-instance +to be allocated on a single machine, and then find the optimal allocation on +the remaining instance. + +The set $S(i_1,\ldots, i_k)$ is easy to compute since $M(j_1,\ldots,j_k)=1$ if +and only if $\sum_{l=1}^k j_j t_l \leq T$, where $t_l$ is the processing time +of jobs of type $l$. + +Then, we note that by computing the function $M$ in lexicographic order of its +argument, the induction is well-defined (i.e. we never depend on a value which +hasn't been computed yet). Hence, we can compute $M$ using a dynamic +programming algorithm: fill a $n_1\times\ldots\times n_k$ multi-array in +lexicographic order. When we are done computing $M$, either $M(n_1,\ldots, +n_k)\leq m$ and the allocation computed by the algorithm is an allocation of +load less than $T$, otherwise we report ``not feasible''. + +Finally, the running time of equation \cref{eq:induction} is $O(n^k)$, since +$S(i_1,\ldots,i_k)$ can be computed in time $i_1\times\ldots\times i_k\leq +(n+1)^k$. There are at $n_1\times\ldots\times n_k\leq n^k$ entries in the +multi-array. Hence the total running time of this algorithm is $O(n^k\times +n^k) = O(n^{2k})$. + +\paragraph{(e)} If we know $\text{OPT}$, we can compute $B$ and $S$. Then we +can round the jobs in $B$: for job $p_i$, we round it to +$\tilde{p}_i=\eps(1+\eps)^j$ where +$j=\lceil\log_{1+\eps}\frac{p_i}{\eps}\rceil$. By doing so, we have at most +$\lceil\log_{1+\eps}\frac{p_{max}}{\eps}\rceil$ distinct values for jobs in +$B$. + +We can apply the dynamic programing algorithm of \textbf{(d)} to the rounded +jobs in $B$ and obtain an allocation of load $L'$. I claim that this leads to an +allocation of the (unrounded) jobs in $B$ of load at most $(1+\eps)\text{OPT}$. +Indeed, let us denote by $p_1,\ldots,p_k$ the jobs in $B$ in the maximally +loaded machine in the allocation given by $\text{OPT}$. Then: +\begin{displaymath} + \text{OPT}\geq \sum_{i=1}^k p_k\geq \frac{1}{1+\eps}\sum_{i=1}^k + \tilde{p_i}\geq \frac{1}{1+\eps} L' +\end{displaymath} +where the last inequality comes from the fact that $L'$ is the optimal load +(given by the dynamic algorithm) and thus beats the allocation given by +$\text{OPT}$ on the rounded jobs. Now, note that $L'$ is an upper bound on the +load for the un-rounded jobs (since $\tilde{p_i}\geq p_i$ for all $i$). Hence +we obtained an allocation of the jobs in $B$ of load $L\leq +(1+\eps)\text{OPT}$. + +We can then complete the allocation by greedily allocation jobs in $S$, by +question \textbf{(c)} this loads to a load bounded by $\max\{L, +(1+\eps)\text{OPT}\} = (1+\eps)\text{OPT}$ since we have just proved that +$L\leq(1+\eps)\text{OPT}$. + +The running time of the DP algorithm will be +$n^{O(\log_{1+\eps}\frac{1}{\eps})} = n^{O(\frac{1}{\eps}\log\frac{1}{\eps})}$ +(for $\eps\leq 1$, but for $\eps>1$, we already know a $2$-approximation +aglorithm using $\textbf{(a)}$). + +The greedy allocation of the jobs in $S$ takes time $O(nm) = O(n^2)$ since we +can assume $m\leq n$ (if $n