The model (II)
We have justified the we are seeking for the vector \(x\) that is the fixed point for the mapping defined by the matrix \(P\), that is, the vector satisfying \(x=Px\). We have then that \[\begin{array}{ccl} x_{i} & = & \sum_{j=1}^{n}p_{ij}x_{j}=\sum_{s_{j}=0}p_{ij}x_{j}+\sum_{s_{j}\neq0}p_{ij}x_{j}=\\ & = & \sum_{s_{j}=0}\frac{1}{n}x_{j}+\sum_{s_{j}\neq0}\left(\frac{1-p}{n}+p\frac{g_{ij}}{s_{j}}\right)x_{j}=\\ & = & \frac{1}{n}\sum_{s_{j}=0}x_{j}+\frac{1}{n}\sum_{s_{j}\neq0}x_{j}-\frac{p}{n}\sum_{s_{j}\neq0}x_{j}+p\sum_{s_{j}\neq0}\frac{g_{ij}}{s_{j}}x_{j}=\\ & = & \frac{1}{n}\left(\sum_{s_{j}=0}x_{j}+\sum_{s_{j}\neq0}x_{j}\right)-\frac{p}{n}\sum_{s_{j}\neq0}x_{j}+p\sum_{s_{j}\neq0}\frac{g_{ij}}{s_{j}}x_{j}=\\ & = & \frac{1}{n}-\frac{p}{n}\sum_{s_{j}\neq0}x_{j}+p\sum_{s_{j}\neq0}\frac{g_{ij}}{s_{j}}x_{j}=\\ & = & \frac{1-p\sum_{s_{j}\neq0}x_{j}}{n}+p\sum_{s_{j}\neq0}\frac{g_{ij}}{s_{j}}x_{j}, \end{array}\] where \(\sum_{s_{j}\neq0}x_{j}\) represents the summation of the \(x_{j}\) along web pages \(j\) that include at least one link (or, equivalently, such that \(s_{j}\neq0\)) and \(\sum_{s_{j}=0}x_{j}\) represents the summation of the \(x_{j}\) for web pages \(j\) that include no link (or, such that \(s_{j}=0\)). Note that Note-se que \[\sum_{s_{j}=0}x_{j}+\sum_{s_{j}\neq0}x_{j}=\sum_{j=i}^{n}x_{j}=1.\]
Assume, for the moment, that every web page includes at least one link, that is, \(s_{j}\neq0,\forall j\in\{1,2,...,n\}.\)
Then , it follows that \(\sum_{s_{j}\neq0}x_{j}=\sum_{j=i}^{n}x_{j}=1\) and \[x_{i}=\frac{1-p}{n}+p\sum_{j=1}^{n}\frac{g_{ij}}{s_{j}}x_{j},\] from which we derive that, \[x_{i}=\frac{1-p}{n}+p\left(\frac{x_{j_{1}}}{s_{j_{1}}}+\frac{x_{j_{2}}}{s_{j_{2}}}+...+\frac{x_{j_{k}}}{s_{j_{k}}}\right),\] where\(j_{1},j_{2},...,j_{k}\) are the indices of web pages that include a link pointing to page \(i\).
Recall now the PageRank formula: \[PR\left(P_{i}\right)=\left(1-p\right)+p\left(\frac{PR\left(P_{j1}\right)}{C\left(P_{j1}\right)}+\frac{PR\left(P_{j2}\right)}{C\left(P_{j2}\right)}+...+\frac{PR\left(P_{jk}\right)}{C\left(P_{jk}\right)}\right),\] where \(P_{j}\) represents the web page \(j\), \(PR\left(P_{j}\right)\) its PageRank and \(C\left(P_{j}\right)\) the number of links it includes.
It is obvious that \(C\left(P_{j}\right)=s_{j}\), but what is the relation between \(PR\left(P_{i}\right)\) and \(x_{i}\)? Observe that, starting from \[x_{i}=\frac{1-p}{n}+p\left(\frac{x_{j_{1}}}{s_{j_{1}}}+\frac{x_{j_{2}}}{s_{j_{2}}}+...+\frac{x_{j_{k}}}{s_{j_{k}}}\right),\] by multiplying both sides of the equation by \(n\), we find \[nx_{i}=n\frac{1-p}{n}+np\left(\frac{x_{j_{1}}}{s_{j_{1}}}+\frac{x_{j_{2}}}{s_{j_{2}}}+...+\frac{x_{j_{k}}}{s_{j_{k}}}\right),\] \[nx_{i}=(1-p)+p\left(\frac{nx_{j_{1}}}{s_{j_{1}}}+\frac{nx_{j_{2}}}{s_{j_{2}}}+...+\frac{nx_{j_{k}}}{s_{j_{k}}}\right).\]
Indeed, it follows that \(PR\left(P_{i}\right)=nx_{i}\) and the PageRank value is, up to the multiplication by \(n\), the long-run probability that the user is visiting page \(i\). Hence, the page PageRank value for any web page is between \(0\) and \(n\) (or, more specifically, between \(1-p\) and \(n\)) and the sum of the PageRank values foe very existing page is \[\sum_{i=1}^{n}PR\left(P_{i}\right)=\sum_{i=1}^{n}nx_{i}=n\sum_{i=1}^{n}x_{i}=n.1=n.\]
The above assume that every web page includes at least one link. What happens if this is not true? In this case we will have that \(PR\left(P_{i}\right)<nx_{i},\forall j\in\{1,2,...,n\}\) and the sum of every PagrRank values is smaller than \(n\). Can you explain whay?