Ranking Web Pages – Intro to Computer Science


So now that you’ve survived the bunny uprising, we’re ready to get to the main goal of this class, as far as building the search engine. Our goal is to improve the results by finding the web page instead of just returning a list of all the web pages that match a query. As the web has grown, it’s become more and more important for search engines to do this ranking well. That what really distinguished Google from previous search engines was they had a much smarter way of ranking pages that produced more useful results, where the first one or two results in response to a search query were often the very thing that the user was searching for. So now we’re ready to start thinking about the problem of how to rank web pages. Let’s start by recapping how our search engine works. So we started by building a crawler. And what the crawler did, and this is what we did in units one, two, and three, what the crawler did was follow all the links in the web pages. Following those links, building up an index. And the end result of the crawler, after units four and five was we had an index. By the end of unit five, it was a, a table where we could look up a key word, and we would find the entry where that key word might appear. And we’d follow, and we could look through each of those entries to find the one that matched, and that would match the key word that we were looking for. And as it’s value, it would have a list of all the urls where that keyword appears. And the order of those urls in that list was determined just by how we added them to the crawl. Every time we encountered new page, we indexed that page, and we added a url for that keyword. So the one that’s first in this list is just the one that we happen to find first. So say it’s the c page. The one that’s second would be the one that we find next. And it’s this page. So, this doesn’t tell us anything about which page is best. So the order of the URLs in the list and what we were getting as our output just depends on the order that things happen to go in the crawl. When the web was really small, which was quite a while ago now, this was sort of okay. That there were only a few pages that might match a given keyword, and you could look through them all and decide which one you wanted. With the web today, this doesn’t work at all. There are thousands of pages that match any interesting keyword. Maybe millions. Certainly many more than you want to look through by hand. So the most important thing that a good search engine does is to figure out how to rank these pages so the one that’s at the front of the list is the one the user wants. So that’s our goal for the rest of this unit, to figure out how to rank pages. Before we do this for web pages, we’re going to do something very similar, but perhaps easier to relate to. We’re going to talk about how we decide who’s popular.

Add a Comment

Your email address will not be published. Required fields are marked *