From 85a1c1d0a819621bdccb256d320f9df40ebee995 Mon Sep 17 00:00:00 2001 From: Thibaut Date: Wed, 14 Dec 2011 18:49:57 -0800 Subject: More notes. Simple cases --- .gitignore | 6 ++- notes.tex | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 117 insertions(+), 22 deletions(-) diff --git a/.gitignore b/.gitignore index e10afc6..a2ca3fe 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,5 @@ *~ - - +*.log +*.aux +*.bbl +*.blg diff --git a/notes.tex b/notes.tex index 2d6477d..615f71b 100644 --- a/notes.tex +++ b/notes.tex @@ -1,59 +1,152 @@ \documentclass{article} \usepackage[utf8]{inputenc} - +\usepackage{amsmath} \begin{document} -\paragraph{General problem} A user is evolving in an environment, understand how to place ads in this environment. +\section{Introduction} +\paragraph{General problem} A user is evolving in an environment, understand how +to place ads in this environment. -\paragraph{Model} The environment is a weighted directed graph, the vertices are called the \emph{scenes} (aisles in a store, web pages). The outgoing edges of a vertex are the actions the user can do at this vertex. The weight of an edge is the probability of chosing this action. A user \emph{session} is a Markov process over the graph (shopping session, web browsing session). +\paragraph{Model} The environment is a weighted directed graph, the vertices are +called the \emph{scenes} (aisles in a store, web pages). The outgoing edges of a +vertex are the actions the user can do at this vertex. The weight of an edge is +the probability of choosing this action. A user \emph{session} is a Markov +process over the graph (shopping session, web browsing session). -\emph{Could be added later: each scene has a duration, the average time spent by a user on this scene.} +\emph{Could be added later: each scene has a duration, the average time spent by + a user on this scene.} -On each scene an ad can be displayed. Advertisers want to place their ad on the graph. Each advertiser's request is characterized by the following parameters: +On each scene an ad can be displayed. Advertisers want to place their ad on the +graph. Each advertiser's request is characterized by the following parameters: \begin{itemize} \item a per-impression value \item a subset of the nodes where the ad can be displayed \item the expected number of times the ad will be seen \item \emph{could be added later: the expected duration of exposure to the user} -\item \emph{could be added later: a multiplying factor for the per-impression value if the ad is displayed on consecutive scenes} +\item \emph{could be added later: a multiplying factor for the per-impression + value if the ad is displayed on consecutive scenes} \end{itemize} Two different types of problem: \begin{itemize} -\item there is a flow of request, only one request can be granted at the same time: when a request arrives, it can either be granted or delayed (with a cost). This yields an online scheduling problem. -\item all the request arrive at the same time, several requests can be granted at the same time, an auction is runned to select the winning advertisers. +\item there is a flow of request, only one request can be granted at the same + time: when a request arrives, it can either be granted or delayed (with a + cost). This yields an online scheduling problem. +\item all the request arrive at the same time, several requests can be granted + at the same time, an auction is run to select the winning advertisers. \end{itemize} \paragraph{Related work} \begin{itemize} -\item \cite{cm} “Slated Cascade model”: the graph is a matrix, the user starts at the top of a column and goes down it, only at the end of the column can he switch to another one. There is a PTAS to compute the assignment which maximizes the social welfare. -\item \cite{oad} the graph is an infinite path and is a modelisation of a web browsing session: on each page their is a constant probability of continuation. The paper gives a 7-competitive algorithm against the optimal offline solution. +\item \cite{cm} “Slated Cascade model”: the graph is a matrix, the user starts + at the top of a column and goes down it, only at the end of the column can he + switch to another one. There is a PTAS to compute the assignment which + maximizes the social welfare. +\item \cite{oad} the graph is an infinite path and is a model of a web browsing + session: on each page their is a constant probability of continuation. The + paper gives a 7-competitive algorithm against the optimal offline solution. \end{itemize} -\paragraph{Restriction} -The graph is a DAG: if the environment is well designed, the user shouldn't be passing through the same vertex twice. -Start first with a tree or even path and see if all the advertiser's requirements can be matched. +\paragraph{Restriction} The graph is a DAG: if the environment is well designed, +the user shouldn't be passing through the same vertex twice. Start first with a +tree or even path and see if all the advertiser's requirements can be matched. Muthu's strategy: \emph{find the first non trivial problem and solve it}. -\paragraph{Starting problem} Let $T$ be a weighted tree: for each $u\in T$, $C(u)$ is the set of children of $u$. For all $v$ in $C(u)$, $p(v)$ will denote the probability of transition from vertex $u$ to vertex $v$. $T(u)$ is the tree rooted at $u$. $r$ is the root of the tree. +\section{Static ads placement on a tree} + +\subsection{Notations} + +Let $T$ be a weighted tree: for each $u\in T$, $C(u)$ is the set of children of +$u$. For all $v$ in $C(u)$, $p(v)$ will denote the probability of transition +from vertex $u$ to vertex $v$. $T(u)$ is the tree rooted at $u$. $r$ is the root +of the tree. + +For any function $f$ of the nodes, $f(T)$ will denote the expected return of a +random walk on the tree starting at $r$, stopping when it reaches a leaf and +where passing through vertex $v$ adds $f(u)$ to the return: +\begin{displaymath} + f(T)= f(r) + \sum_{v\in C(u)} p_v f(T(v)) +\end{displaymath} + +Introducing the cumulative probabilities yields a simple expression for +$f(T)$. Let $\lambda_u$ denote the probability of walking from the root of the +tree to node $u$: $\lambda_u=\Pi_{i=1}^k p(u_k)$, with $u_1, \ldots, u_k$ the +path from the root to $u$, then: -For any function $f$ of the nodes, $f(T)$ will denote the expected return of a random walk on the tree starting at $r$, stopping when it reaches a leaf and where passing through vertex $v$ adds $f(u)$ to the return: \begin{displaymath} -f(T)= f(r) + \sum_{v\in C(u)} p_v f(T(v)) + f(T)= \sum_{u\in T}\lambda_u f(u) \end{displaymath} -There is a set of $n$ advertisers. $b_i$ is the per-impression value of advertiser $i$ and $n_i$ the expected number of times advertiser $i$ would like his ad to be seen. +\subsection{Problem} + +There is a set of $n$ advertisers. $b_i$ is the per-impression value +of advertiser $i$ and $n_i$ the expected number of times advertiser +$i$ would like his ad to be seen. -An assignment of advertisers is a collection of subsets of $T$: $(S_i)_{1\leq i \leq n}$. The elements of $S_i$ are the vertices of $T$ where the ad of advertiser $i$ will be displayed. The overall return of advertiser $i$ is simply $R_i = f_i(T)$ with $f_i:u\mapsto b_i\mathbf{1}_{S_i}(u)$. +An assignment is a subset $A$ of advertisers and a collection of subsets of $T$: +$(A_i)_{i\in S}$. The elements of $A_i$ are the vertices of $T$ where the ad of +advertiser $i$ will be displayed. The overall return of advertiser $i$ is simply +$R_i = f_i(T)$ with $f_i:u\mapsto b_i\mathbf{1}_{A_i}(u)$. The goal is to find an assignment which maximizes the social welfare: +\begin{equation}\label{eq:opt} + W = \sum_{i\in A} R_i = \sum_{i\in A} b_i \mathbf{1}_{A_i}(T) +\end{equation} with respect to the constraint: + +\begin{equation}\label{eq:constraint} + \forall A_i,\; \mathbf{1}_{A_i}(T)\geq n_i +\end{equation} + +If \eqref{eq:constraint} is the only constraint, then the optimization problem +is trivial: select the advertiser whose $b_i n_i$ is the greatest and place his +ad on all the nodes of the tree. However, by doing this $\mathbf{1}_{A_i}(T)$ +could be much greater than $n_i$. There are three ways of further refining the +problem to get a reasonable solution: +\begin{enumerate} +\item\label{over-serve} replacing $\mathbf{1}_{A_i}(T)$ by $n_i$ in + \eqref{eq:opt}. In other words, even if the publisher over-serve an + advertiser's request, he will not receive more than what the advertiser was + initially willing to pay for. +\item\label{min-screen} adding the constraint $\mathbf{1}_{A_i}(T)\leq n_i + + \max_{x\in A_i} \lambda_x$: this simply expresses that removing any screen to + the set of screens assigned to advertiser $i$ will lead to under-serving his + request. +\item\label{fractional} allow fractional occupation of a screen by an ad: in + addition to choosing a set of screen for advertiser $i$, you can also choose + for each screen a fraction of the time this ad will be displayed on the + screen. This way, the screen can be share between several ads by doing a + simple rotation of the ads. The main advantage of fractional allocation of the + screens is that it allows the publisher to meet exactly the advertiser's + request. +\end{enumerate} + +Solutions \ref{over-serve}. and \ref{min-screen}. seem to lead to very complex +combinatorial problems (dynamic programming gives a running time of +$O(n2^{|T|})$). Solution \ref{fractional}. seem to be more realistic: what is +the point of having screens if you do not leverage their ability to change their +content frequently? + +Without taking the constraint of the advertisers into account (for each +advertiser, there is a subset of node where his ad can be displayed), the +optimization problem is simply choosing a subset of advertisers $A$ that +maximizes: +\begin{displaymath} + \sum_{i\in A} b_i n_i +\end{displaymath} +subject to the constraint: \begin{displaymath} -W = \sum_{i=1}^n R_i = \sum_{i=1}^n b_i \mathbf{1}_{S_i}(T) + \sum_{i\in A} n_i \leq B \end{displaymath} +where $B$ is the number of views' capacity of the tree. This is nothing else +than a Knapsack Problem. In this case, the placement of ads has no importance: +the tree is just seen as a pool of available number of views that can be filled +continuously, and the position of ads has no importance. + + \bibliographystyle{plain} \bibliography{papers.bib} -\end{document} +\end{document} \ No newline at end of file -- cgit v1.2.3-70-g09d2