Convergence
We shall now explain why the iterative procedure described in the examples works. Define:
- \(x=\left(x_{1},x_{2},...,x_{n}\right)\) a vector whose coordinates are the correct PageRank value of each web page, that is, such that \(x_{i}=PR\left(P_{i}\right)\)
- \(H=\left(h_{i,j}\right)_{1\leq i,j\leq n}\) the \(n\times n\) matrix with entries \(h_{ij}\) equal to \(\frac{1}{C\left(P_{j}\right)}\) if a link from page \(j\) pointing to page \(i\), and equal to \(0\) if such a link does not exist
- \(e=\left(1,1,...,1\right)\)
The PageRank formula may now be rewritten in the form \[x=pHx+\left(1-p\right)e\]
as \(x_{i}=p\sum_{j=1}^{n}h_{ij}x_{j}+\left(1-p\right)\), for every \(i\epsilon\left\{ 1,...,n\right\}\). Note further that the equation above is equivalent to\[\left(I-pH\right)x=\left(1-p\right)e\]
where \(I\) represents the identity matrix. Hence, we are looking for a solution for the simultaneous equations \(Ax=b\), where \(A=I-pH\) and \(b=\left(1-p\right)e\).
The most currently used iterative method to solve linear simultaneous equation of the form \(Ax=b\), where \(A\) is an invertible matrix and \(b\) is a vector, starts by writing \(A\) in the form \(C-M\) for some suitable invertible matrix \(C\). In this case, it follows that \[\begin{array}{ccl} Ax & = & b\Leftrightarrow\\ & \Leftrightarrow & (C-M)x=b\Leftrightarrow\\ & \Leftrightarrow & Cx=Mx+b\Leftrightarrow\\ & \Leftrightarrow & x=C^{-1}Mx+C^{-1}b\Leftrightarrow\\ & \Leftrightarrow & x=M'x+b' \end{array}\] where \(M'=C^{-1}M\) and \(b'=C^{-1}b.\) Now, taking a sequence of vectors iteratively defined by \[x^{\left(k+1\right)}=M'x^{\left(k\right)}+b'\] it is known that, under convenient conditions, it holds that \(\left\Vert x^{\left(k\right)}-x\right\Vert \rightarrow0\) for a suitable norm \(\left\Vert .\right\Vert\) in the space \(\mathbb{\mathbb{\mathbb{R}}}^{n}\), where \(x\) is the solution of the given simultaneous equations. Indeed, we have that \[\begin{array}{ccl} x^{\left(1\right)}-x & = & \left(M'x^{\left(0\right)}+b'\right)-\left(M'x+b'\right)=M'x^{\left(0\right)}-M'x=\\ & = & M'\left(x^{\left(0\right)}-x\right)\\ x^{\left(2\right)}-x & = & \left(M'x^{\left(1\right)}+b'\right)-\left(M'x+b'\right)=M'x^{\left(1\right)}-M'x=\\ & = & M'\left(x^{\left(1\right)}-x\right)=\left(M'\right)^{2}\left(x^{\left(0\right)}-x\right)\\ x^{\left(3\right)}-x & = & \left(M'x^{\left(2\right)}+b'\right)-\left(M'x+b'\right)=M'x^{\left(2\right)}-M'x=\\ & = & M'\left(x^{\left(2\right)}-x\right)=\left(M'\right)^{3}\left(x^{\left(0\right)}-x\right). \end{array}\]
It follows that, for the general case, we have \[x^{\left(k\right)}-x=\left(M'\right)^{k}\left(x^{\left(0\right)}-x\right).\]
So, it follows that \[\left\Vert x^{\left(k\right)}-x\right\Vert =\left\Vert \left(M'\right)^{k}\left(x^{\left(0\right)}-x\right)\right\Vert.\]
In what concerns the matrices, we define the norm \(\left\Vert .\right\Vert \) given by \[\left\Vert A\right\Vert =max_{x\leq0}\frac{\left\Vert Ax\right\Vert }{\left\Vert x\right\Vert }\]
(if \(y=0\), the inequality is obvious; if \(y\neq0\), we have \[\left\Vert Ay\right\Vert =\frac{\left\Vert Ay\right\Vert }{\left\Vert y\right\Vert }\left\Vert y\right\Vert \leq\left\Vert A\right\Vert \left\Vert y\right\Vert )\]
\((\left\Vert AB\right\Vert =max_{x\neq0}\frac{\left\Vert ABx\right\Vert }{\left\Vert x\right\Vert }\leq max_{x\neq0}\frac{\left\Vert A\right\Vert \left\Vert Bx\right\Vert }{\left\Vert x\right\Vert }=\left\Vert A\right\Vert max_{x\neq0}\frac{\left\Vert Bx\right\Vert }{\left\Vert x\right\Vert }=\left\Vert A\right\Vert \left\Vert B\right\Vert )\)
In particular, if \(B=A\) we find that
More generally, we may prove, using an induction argument, that
Hence, we have that \[\left\Vert x^{\left(k\right)}-x\right\Vert =\left\Vert \left(M'\right)^{k}\left(x^{\left(0\right)}-x\right)\right\Vert \leq c\left\Vert \left(M'\right)^{k}\right\Vert \leq c\left\Vert M'\right\Vert ^{k}\] where \(c=\left\Vert x^{\left(0\right)}-x\right\Vert \) is constant. Now, if we have \(\left\Vert M'\right\Vert <1\), it follows that \(\left\Vert x^{\left(k\right)}-x\right\Vert \leq c\left\Vert M'\right\Vert ^{k}\rightarrow0\), so the conclusion follows: \(\left\Vert x^{\left(k\right)}-x\right\Vert \rightarrow0\).
We may thus conclude that \(x_{i}^{\left(k\right)}\rightarrow x_{i}\), for every \(i\epsilon\left\{ 1,...,n\right\} \), that is, we have an iterative method producing approximations for the exact solution of the given simultaneous equations.
The iterative procedure known as the Jacobi method is a particular case of the described approach. It consists of choosing a diagonal matrix \(C\) with entries equal to the ones on the diagonal of \(A\). As regards our given simultaneous equations, as long as there is no link from a web page pointing to itself, every diagonal entry of the matrix \(H\) is null, hence, the diagonal entries of the matrix \(A=I-pH\) are all equal to \(1\). So, we have \(C=I\) (thus, obviously, \(C\) is invertible), \(M'=M=pH\) and \(b'=b=\left(1-p\right)e\), and the iterative sequence of vectors approximating the solution is given by \[x^{\left(k+1\right)}=pHx^{\left(k\right)}+\left(1-p\right)e\] or, rewriting explicitly the coordinates, \(x_{i}^{\left(k+1\right)}=p\sum_{j=1}^{n}h_{ij}x_{j}^{\left(k\right)}+\left(1-p\right)\), for every \(i\epsilon\left\{ 1,...,n\right\} \).
This means that, starting from any set of initial values representing the initial PageRank values for each web page, described by the vector \(x^{\left(0\right)}=\left(x_{1}^{\left(0\right)},x_{2}^{\left(0\right)},...,x_{n}^{\left(0\right)}\right)\), we will successively build new approximations for the PageRank values by applying the PageRank formula to the current set of approximations. To have a closer look at the convergence of the iterative method and comparing with the exact ones, we may use this app.
Remarks:
- For the simultaneous equations we are trying to solve, it is convenient to use the \(\left\Vert .\right\Vert _{1}\) norm for vectors, defined by \(\left\Vert y\right\Vert _{1}=\sum_{i=1}^{n}\left|y_{i}\right|\).
It follows that, given a matrix \(A=\left(a_{ij}\right)_{1\leq i,j\leq n}\), its norm is \[\left\Vert A\right\Vert _{1}=max_{x\neq0}\frac{\left\Vert Ax\right\Vert _{1}}{\left\Vert x\right\Vert _{1}}=max_{1\leq j\leq n}\sum_{i=1}^{n}\left|a_{ij}\right|\] (that is, the matrix norm is the largest sum of the absolute values along columns).
In what concerns the matrix \(H\), note that the sum of the absolute values along column \(j\) is equal to \(1\) if the web page \(j\) includes some link pointing towards another page, and \(0\) if not. It follows then that \[\left\Vert M'\right\Vert _{1}=\left\Vert pH\right\Vert _{1}=p\left\Vert H\right\Vert _{1}\leq p<1.\]
Moreover, as \(\left\Vert x^{\left(k\right)}-x\right\Vert _{1}\leq c\left\Vert M'\right\Vert _{1}^{k}\leq c.p^{k}\), we conclude that the smallest we choose \(p\) the faster the convergence will be.
- Another common particular case of the general iterative procedure described is known as the Gauss-Seidel method. In this case, we take the matrix \(C\) with entries equal to the entries of \(A\) along the diagonal and below the diagonal, and null above the diagonal. For our purpose, as long as there no links from a web page pointing to itself, we have \(H=L+U\) where \(L\) has entries below the diagonal equal to the entries of \(H\), and are null on the diagonal and above the diagonal, and \(U\) holds the entries of \(H\) that are above the diagonal with all the remaining entries being null. So, we have \(C=I-pL\) and \(M=pU\), hence, starting from the equation \(Cx=Mx+b\) we find \[\begin{array}{rcl} \left(I-pL\right)x^{\left(k+1\right)} & = & pUx^{\left(k\right)}+b\\ x^{\left(k+1\right)} & = & pLx^{\left(k+1\right)}+pUx^{\left(k\right)}+b \end{array}\] or, equivalentely, \[x_{i}^{\left(k+1\right)}=p\sum_{j=1}^{i-1}h_{ij}x_{j}^{\left(k+1\right)}+p\sum_{j=1+1}^{n}h_{ij}x_{j}^{\left(k\right)}+\left(1-p\right),\forall i\epsilon\left\{ 1,...,n\right\}. \]
For practical purposes, the Gauss-Seidel method differs from the Jacobi method by using the newly computed approximations during the same iteration step instead of keeping them and only using the new approximations during the next iteration step.