Skip to main content

Section 19.3 A Simple Application

Imagine that a query into an internet search engine finds \(5\) websites. The engine wants to report these sites to the user in some sort of order relating to how useful each site might be, i.e. the engine wants to work out a ranking for each site.

One approach to this problem is to look at the links between the sites. For the sake of discussion, let the links between the sites be as shown in Figure 19.3.1.

Figure 19.3.1.
We can represent the information in this diagram via a matrix \(A\) where the entries \(a_{ij}\) satisfy

\begin{equation*} a_{ij}=\begin{cases} 1 \amp \textrm{ if website } i \textrm{ has a link to it from website } j \\ 0 \amp \textrm{ otherwise } \end{cases} \end{equation*}

Thus

\begin{equation*} A=\begin{pmatrix} 0 \amp 0 \amp 1 \amp 0 \amp 0 \\ 1 \amp 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \amp 0 \amp 1 \\ 1 \amp 1 \amp 1 \amp 0 \amp 0 \\ 1 \amp 1 \amp 0 \amp 1 \amp 0 \end{pmatrix} \end{equation*}

Now let \(r_i\) be the ranking of website \(i\text{.}\) Then we require

\begin{equation*} 0\leq r_i\leq 1 \end{equation*}
\begin{equation*} \sum_{i=1}^{5} r_i =1 \end{equation*}

and

\begin{equation*} r_i \propto \textrm{ sum of the rankings of the sites which have links to it.} \end{equation*}

The idea here is that if a site has links to it from an important site (i.e. one with a high ranking) then that is better than having a link to it from an obscure site (i.e. one with a low ranking). So, letting the constant of proportionality be \(k\text{,}\) for the websites linked according to Figure 19.3.1 we have

\begin{align*} r_1 \amp =kr_3\\ r_2 \amp =kr_1\\ r_3 \amp =k(r_2+r_5)\\ r_4 \amp =k(r_1+r_2+r_3)\\ r_5 \amp =k(r_1+r_2+r_4) \end{align*}

or, in matrix form

\begin{equation*} \begin{pmatrix} r_1 \\ r_2 \\ r_3 \\ r_4 \\ r_5 \end{pmatrix} =k\begin{pmatrix} 0 \amp 0 \amp 1 \amp 0 \amp 0 \\ 1 \amp 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \amp 0 \amp 1 \\ 1 \amp 1 \amp 1 \amp 0 \amp 0 \\ 1 \amp 1 \amp 0 \amp 1 \amp 0 \end{pmatrix}\begin{pmatrix} r_1 \\ r_2 \\ r_3 \\ r_4 \\ r_5 \end{pmatrix} \end{equation*}

Thus the rankings satisfy the equation

\begin{equation*} \mathbf{r}=kA\mathbf{r} \end{equation*}

or

\begin{equation*} A\mathbf{r}=\frac{1}{k}\mathbf{r}=\lambda\mathbf{r} \end{equation*}

i.e. an eigenvalue problem. For the matrix \(A\) as given above it turns out that the ranking vector is

\begin{equation*} \mathbf{r}=\begin{pmatrix} 0.14 \amp 0.08 \amp 0.22 \amp 0.27 \amp 0.29\end{pmatrix}^T \end{equation*}

and so the sites would be listed in the order \(5, 4, 3, 1, 2\text{.}\)

Of course Google’s page ranking system is much more sophisticated than that described above but the idea is essentially the same. Also, there is much more advanced mathematics needed behind the scenes than indicated above. For example how do we know which eigenvalue and associated eigenvector to take? What happens if all of the eigenvalues are complex? And so on.