Computer Ranks – Intro to Computer Science


So, I’m going to provide a start on this code and now we’ll leave the crucial part of it for you to do as a quiz. You should feel free at any point to stop and try to figure out more of the code on your own. There are lots of different ways to do this and we’ll show you one. And you’ll finish that as a quiz. So, the first thing we’re going to do is define two constants. So, we’re going to use d as the damping factor. And I’ll use 0.8 as my damping factor. That’s the probability, thinking about our random web surfer, that she selects the link on the current page, rather than starting over with a new random page. The other constant I’m going to define here we’ll call numloops. That’s the number of times we’re going to go through the relaxation. What we’re computing is the value of rank at some time step. The number of times we go through that is going to determine the accuracy of our ranks. We’ll use ten as the number of loops. You can experiment with changing that, and one of the questions in the homework assignment will ask you to think about what happens when you change that. So, now we need to start, we said initially the rank of each url is 1 divided by the number of pages and so the dictionary ranks, we want to initialize with those values. So, we are going to create an empty dictionary. We’ll call it ranks. The number of pages, the number of pages, we can get from the graph. The graph is a dictionary of nodes and len of graph will tell us the number of entries in that dictionary. So, that’s the number of nodes in the graph which is the number of pages that we’ve crawled. And now we want to go through the pages initializing each page to the value one divided by npages, and I’m remembering to use 1.0 here to make sure the division is done as floating point division and we get an accurate number rather than integer division. So, now, we’ve initialized the ranks. We have a dictionary that maps each page to its current rank, which is one divided by the number of pages. So, now, we get to the interesting part. We need a loop that’s going to go through the number of times of numloops. Each time through this loop, what we want to do is update the newranks based on this formula, using the old ranks. And then, at the end of the loop, we are going to make the variable ranks hold what was previous newranks and that way, we can keep going, each time we are going to get a new value from newranks. At the end of doing all updates, we are going to update ranks to refer to whatever newranks is. So, that means each time through this loop we are going to create a new dictionary called newranks that starts as empty, and we’re going to add all the pages to newranks as we update their rank. So, to do that we need to go through the pages in the graph and for each page what we want to do is compute the new rank for that page. And the first thing we’ll do is this part. The new rank is 1 minus d divided by npages plus this summation. So, the first thing we will do is introduce a new variable, we will call it newrank and we will assign it to this value. Then we are going to update it as we go through the pages that link into this page. So, we’ll start by initializing newrank as 1 minus d divided by the number of pages. So, then what we need to do is update for this summation. And I’m going to leave this blank. And I’m going to skip that for now. This is going to be left as a quiz. We’ll finish the rest of the code. And then your quiz will be to finish this part of the code, which is really the most interesting part of computing the page ranks. Once we’ve done that, so we’ve used newrank as a variable to keep track of the rank for this page. Well, we want to update our dictionary, so we’re going to add an entry, newranks. We’re still within the for loop, you’re going to put your code that sums up all the links here. Once we’ve done that, we’ve got the value of newrank that reflects both the probability of starting from that page, and the popularity from all the inlinks. And so, we’ll update this to be newrank. We’ve added that to our dictionary, so once we’ve finished looping through all the pages in the graph, well now, we’re ready for the next step. So, that means, we want to make the variable ranks refer to the newranks, so we’ve changed the time step to the next time step, and now we’re going to go back through this loop, and we go through this loop, number of loops times, each time we’re updating the ranks, and when we’re done, what we want to return is ranks. That’s the dictionary that maps each page to its rank.

Add a Comment

Your email address will not be published. Required fields are marked *