Blank gif
Section1
An Educator's Guide to Technology and the Web
Search Internet@Schools
Subscribe Today!

View Current Issue
View Past Issues

Internet @ Schools

How Google Works: Are Search Engines Really Dumb and Why Should Educators Care?

By Paul Barron - Posted Jan 1, 2011
Bookmark and Share

Before the web and search engines, libraries and librarians were the best answer to students’ question, Where do I find information about …? Today, search engines, especially Google, rule. 1

Educators have known for many years that Google is not just a search engine. “To Google” is a research behavior that is a habit for students. A study of college students’ information seeking habits found that Google is the primary resource for the majority of students for course-related research. 2 That dependency was not adopted when students entered college; that practice was embraced when they first taught themselves how to use search engines for research.

Given their popularity with students, knowing more about how search engines work is vital to understanding information access in a digital age. 3 Unfortunately, most students do not understand how Google and other search engines rank results. Laura Granka, user-experience researcher at Google, notes, “Users are not familiar with how search engines ‘find’ what they are looking for; they would benefit knowing how Google determines how a website is ranked.” 4

Results Rankings—Hits Don’t Matter, Links to a Webpage Do

Consider Google’s challenge when determining the best results for a search query, given the following: The majority of search queries are four words or less; half of the search terms occur less than once a month in Google; 20% of the hundreds of millions of queries processed daily worldwide Google has never seen before; searchers rarely look at more than five results; and searchers almost always select the first result! 5

Google solves this problem by using an authority-based algorithm that is domain-centric: Links to a webpage from well-established and reputable .edu and .gov sites and from massive and well-known sites such as Wikipedia raise the rank of the webpage they link to. 6 Sometimes, links are a vote of confidence in the quality of a webpage, similar to citations to a well-written and well-researched journal article.

When ranking results, Google evaluates more than 200 signals to determine which webpages are most relevant to a search query. 7 Although Google’s exact ranking formula is a business secret, search engine experts agree that keyword use in the title tag and the popularity of the webpage—as determined by number and quality of links to the webpage—top the list. 8 The most well-known signal is PageRank, a technology that examines the link structure of the web and determines the authority of a webpage by looking at other webpages linking to it. 9 When ranking webpages, PageRank is the accepted standard for authority on the web. As one communications strategist states: “When it comes to online credibility, Google Page Rank rules over all. Few metrics illustrate true authority on the Web more than Google’s PageRank.” 10

The key point is that a Google webpage ranking is not determined by the number of people who visit a webpage. “Hits” are not considered important in ranking webpages. In February 2010, Google Fellow Amit Singhal stated, “Individuals’ tastes and preferences [to rank results] don’t produce the quality and relevant ranking that our algorithms do.” 11

Blind Trust

The obvious question, then, is why Google doesn’t consider the popularity of sites with users when ranking sites in the list of results. One factor is that students “trust” search engines and perceive a site as credible when it is returned at the top of the results. 12 In her research Laura Granka noticed that college students clicked on the No. 1 abstract most of the time, even when the abstracts are less relevant to their needs.

Peter Norvig, director of research at Google, confirmed that Google does not use real usage data to tune its search ranking algorithm. The reason is simply that if a result on Page 4 provides better information than the results on the first three pages, users will not know this result exists because the user will not look past the first page of results. 13

Librarians substantiate student confidence in Google. One study concluded, “There is blind trust of search results, especially on whatever appears on the first couple of screens.” 14 How many of our students select the first result reasoning that the site must be relevant because it is at the top of Google’s results?

Search Engines Do Not Understand a Searcher’s Query

If searchers’ preferences are not important, Google’s ranking algorithm must be really advanced “smart” technology that students can trust to return the best results. Students’ confidence in Google to return the best results confirms that they expect a lot from search engines even though they may not know that search engines don’t understand their query—search engines are not that smart.

Search engine developer Gary Flake, Ph.D., stated: “Search engines have no understanding of words or language. They don’t recognize user intent, can’t distinguish goal-oriented search from browsing search.” 15 A Google software engineer was more critical when he cautioned that search engine developers “can’t write a program to understand a sentence with anywhere near the precision of a child.” 16

Do not expect that precision to increase for years to come. In 2010 Singhal said, “Search isn’t out of its infancy yet. The science is at the point where we are crawling. Soon we’ll walk. I hope in my lifetime I’ll see search enter its adolescence.” 17

Search Engines Do Not Always Return the Best Results

Even if Google does not understand the search query, it must return the best results because searchers usually select the first result, right? Maybe not.

An effective example is to examine the results returned for the phrase search Martin Luther King. At the time I did this search, the third result was the website Martin Luther King Jr.—A True Historical Examination (www.martinlutherking.org), which is described as, “A valuable resource for teachers and students alike.” After opening the site, scrolling to the bottom of the webpage shows a link to the hosting site, Stormfront, which describes itself as “a community of White Nationalists.” (CAUTION: There is profanity on the homepage.)

This simple search confirms that Google does not return the best results but also reveals the value of links from .edu sites. Checking the links to the Martin Luther King .org site and limiting the results to only .edu sites shows that college and university libraries use the site to teach the evaluation of online resources. Google does not know why the educational sites link to the egregious site; Google only knows that the site is an .edu site and, as noted, Google places a lot of weight on links from trusted domains such as university websites. 18 (See Figure 1.)

Top-Level Domains Matter

Search engines classify three types of queries—informational, navigational, and transactional—the majority of queries are informational, where the search seeks specific information. 19 According to Bruce Clay, a top search engine expert, Google places a heavy bias on informational resources, and .edu and .gov sites tend to rank higher than other top-level domains when sites are returned for an informational query. 20 Google “trusts” the well-established and reputable government, college, and university sites. Links from the .edu and .gov sites and high Google PageRank sites will raise a webpage’s ranking in the search results.

One of the highest PageRank websites is Wikipedia, which is ranked eighth among the 500 most important sites on the web to which other sites link. 21 As noted, Google’s authority-based algorithm is domain-centric. Google focuses on “domain-trusting by pushing to the top of the results massive sites like Wikipedia that couldn’t have been created by spammers.” 22 This explains why, for most informational queries of four words or less, Wikipedia is likely to be in the top three results.

High ranking in Google is no guarantee of the quality of the resource, and students might be interested to learn that Wikipedia’s creator discourages academic use of Wikipedia. In response to emails from students who have received a low grade for citing Wikipedia, Jimmy Wales stated, “You’re in college; don’t cite the encyclopedia.” 23

Helping Google—Crafting Queries Using Advanced Search Syntaxes

The search query is the only control that a searcher wields over a search engine. However, librarians know that the predominant difficulty students experience while performing web-based research is conceptualizing the search topic and constructing effective search strings. 24 The inability to construct appropriate search statements limits a student’s success in searching for relevant information.

Unfortunately, most students have not learned that they can influence the accuracy of the search results by stating a search query at an adequate level of detail to help the search engine grasp the intent of the query. 25 The remedy is to first gain an understanding of how search engines work and then craft queries to exploit the factors Google considers when ranking sites, such as the importance of the webpage title and the top-level domain of sites. Search engines users should also heed Greg Notess’ dictum that the more words you search for, the smaller and more refined your results list will be for the search query. 26 Also, the more words used in the query, the less likely that Wikipedia will be at the top of the results, if returned at all.

One method to better searching is to develop queries to locate webpages where the research topic is the subject or title of the webpage, to add additional search terms to describe and specify the topic, and to limit the results to sites that are interested in the issue. For example, a student researching the effects of climate change on precipitation levels could craft a query that limits the results to webpages from a .gov site that are titled “climate change” and that have the phrase “precipitation levels” on the webpage document. (See Figure 2.)

The advanced syntax search in Figure 2 returned less than 150 results from .gov sites; they included, for example, the Environmental Protection Agency and state environmental sites. Compare that with the more than 3 million results for the keyword search climate change precipitation levels, which is how the majority of students would search Google for the topic.

The advanced search query can be modified to return only .org sites by replacing site:gov with site:org. That query returned 500 results from organizations such as the Environmental Defense Fund, and Wikipedia was not in the top 10 results.

Another top-level domain that returns relevant information for the climate change topic is the top-level domain .int, which is for international organizations. Limiting the query to site:int returns webpages from the United Nations World Health Organization, United Nations Framework Convention on Climate Change, and Convention on Biological Diversity.

To limit the results to a specific country, add the country’s two-letter identifier to the query; for example, site:ca (Canada), site:cn (China), or site:uk (United Kingdom). 27 (See Figure 3.)

Limiting the query to a specific top-level domain such as .int returns webpages unlikely to be ranked in the top results in Google because of Google’s “trust” in .edu and .gov webpages for informational queries. Another benefit of limiting results to other top-level domains is that sites are returned from organizations and countries from around the globe, which enables the student to experience the “worldwideness” of the web.

Microsoft researchers studying the behavior of advanced syntax users have proven that they are better web searchers. One study concluded: “Advanced syntax users demonstrate search expertise that the majority of user population does not. They are more adept at combining query operators to formulate powerful query statements and return more relevant results. Not only were they more successful in their searching, they were consistently more successful.” 28

Nancy Keenan, former library media specialist/computer coordinator at Glenvar High School in Salem, Va., confirms this in these observations: “[The students] searched their usual way. Then I demonstrated a search using advanced search techniques. Then they searched using advanced search syntaxes. They narrowed their search results from one million to 55 and it was amazing how many hits were on target.” 29

Using Google to Hook Students

Educators know that libraries provide access to more relevant information sources and that there are specialists in libraries who enjoy helping students with their research projects. The challenge is influencing the students to use the resources.

Students’ preference to begin their research with Google provides opportunities for educators to integrate the databases hosted in the school library into their research. After teaching a student to use the advanced search features in Google, educators can show how, with minimal modifications, Google’s advanced search syntaxes are similar to the features provided by the library’s proprietary databases. After teaching students to search using Google’s advanced search options, an effective leading question is to ask the student, “Would you like me to teach you a search method that saves you time, provides more relevant resources, and that will improve the quality of your research and earn you a higher grade?”

This approach works! Lori Donovan, a teacher-librarian at Thomas Dale High School from Chester, Va., noted: “I revised my lesson plan for teaching students how to search the Web and library databases. Students were frustrated using the Web; when we got to Gale and ABC-CLIO, their amazement in the difference of the quality of information was priceless. One student researching working women of the 1930s said, ‘Google is aggravating; I found much more in Student Resource Center.’”

Google: A Partner in Educating an Information-Literate Student

In the February 2009 update to the University of Washington’s iSchool’s Project Information Literacy, the authors stated: “[N]o matter where students are enrolled, no matter what information resources they may have, and no matter how much time they have, the proliferation of digital information resources make conducting research uniquely paradoxical: Research seems to be far more difficult to conduct in the digital age than it did in previous times.” 30

Educators, especially librarians, can help ease that difficulty by embracing Google, teaching students how the search engine works, and coaching them in the use of Google’s advanced search syntaxes. Once students master how to craft queries that exploit Google advanced search options, educators can introduce the use of the high-quality library databases. Students will no longer be dependent on only Google; they will incorporate their knowledge of how to search Google in their use of the library’s databases. The result will be a more information literate student who is no longer inclined to use only Google and integrates other information sources into his or her research. By welcoming Google as a partner in the education of an information-literate student, the library and the librarian will once again be the best source when a student is seeking information.

Paul Barron is the director of library and archives at the George C. Marshall Foundation in Lexington, Va. Contact him at barronpb@marshallfoundation.org.

Endnotes

1. Google is the top search engine in the U.S. and worldwide controlling 75% of the search in the U.S. and 90% of searches worldwide. http://gs.statcounter.com

2. “Lessons Learned: How College Students Seek Information in the Digital Age.” Project Information Literacy Progress Report . www.projectinfolit.org/pdfs/PIL_Fall2009_Year1Report_12_2009.pdf

3. “The Social, Political, Economic, and Cultural Dimensions of Search Engines: An Introduction.”

Eszter Hargittai. Journal of Computer-Mediated Communication ; (2007), 12(3), article 1. http://jcmc.indiana.edu/vol12/issue3/hargittai.html

4. “In Google We Trust: Users’ Decisions on Rank, Position, and Relevance.” Laura Granka, Bing Pan, Helene Hembrooke, Thorsten Joachims, Lori Lorigo, Geri Gay. Journal of Computer-Mediated Communication , 2007, Vol. 12 No. 3. http://jcmc.indiana.edu/vol12/issue3/pan.html

5. “Masters of Information.” Forbes ; Sept. 5, 2005.

“This Is Tough Stuff.” Google European Public Policy Blog. Feb. 25, 2010. http://googlepolicyeurope.blogspot.com/2010/02/this-stuff-is-tough.html

6. Top Sites: The 500 Most Important Domains on the Internet (Updated August 2010). www.seomoz.org/top500

“Can You Please Them All?” www.bruceclay.com/blog/archives/2006/08/can_you_please.html

7. Google Corporate Information: Technology Overview. www.google.com/corporate/tech.html

8. Top Five Ranking Factors. www.seomoz.org/article/search-ranking-factors

9. PageRank is Google’s patented technology that determines the “importance” of a webpage by looking at what other pages link to it. www.google.com/corporate/tech.html

How does Google collect and rank results? Librarian Central. www.google.com/librariancenter/articles/0512_01.html

“ Link Building for High-Quality Links.” Pandia Search World. www.pandia.com/sew/2595-quality-links.html

10. “The Top 25 U.S. Newspapers According to Google.” Journalistics. Oct. 16, 2010. http://blog.journalistics.com/2010/top-25-newspapers-google-pagerank

11. “This Is Tough Stuff.” Google European Public Policy Blog. Feb. 25, 2010. http://googlepolicyeurope.blogspot.com/2010/02/this-stuff-is-tough.html

12. “Trust Online: Young Adults’ Evaluation of Web Content.” Eszter Hargittai; Journal of Communications : 2010, 4, 468–494. http://ijoc.org/ojs/index.php/ijoc/article/view/636

13. “How Google Measures Quality.” Datawonky. http://anand.typepad.com/datawocky/2008/06/how-google-measures-search-quality.html

14. “Search Engine Use Behavior of Students and Faculty: User Perceptions and Implications for Future Research” Oya Y. Rieger; First Monday . www.uic.edu/htbin/cgiwrap/bin/ojs/index .php/fm/article/viewArticle/2716/2385

15. “A ResourceShelf Interview: 20 Questions With Dr. Gary Flake, Ph.D. Head of Yahoo! Research Labs.” http://searchenginewatch.com/showPage.html?page=3372051

16. “Helping Computers Understand Language.” The Official Google Blog. http://googleblog.blogspot.com/2010/01/helping-computers-understand-language.html

17. “This Is Tough Stuff.” Google European Public Policy Blog. Feb. 25, 2010. http://googlepolicyeurope.blogspot.com/2010/02/this-stuff-is-tough.html

18. “ Link Building for High-Quality Links.” Pandia Search World. www.pandia.com/sew/2595-quality-links.html

19. “Determining the Informational, Navigational, and Transactional Intent of Web Queries.” Bernard J. Jansen, Danielle L. Booth, Amanda Spink; Information Processing and Management: an International Journal , May 2008, Vol. 44 No. 3. Eighty percent of web queries are informational, with about 10% each being navigational and transactional. The intent of informational searching is to locate content concerning a particular topic and transactional; the intent of navigational searching is to locate a particular website; the intent of transactional searching is to locate a website with the goal to obtain some other product. http://ist.psu.edu/faculty_pages/jjansen/academic/pubs/jansen_user_intent.pdf

20. “Can You Please Them All?” www.bruceclay.com/blog/archives/2006/08/can_you_please.html

21. Top Sites: The 500 Most Important Domains on the Internet. August 2010. www.seomoz.org/top500

22. “Google’s New Algorithm: if($domain==‘wikipedia.org’){ $rank=1;} .” The Google Cache. Aug. 1, 2007. www.thegooglecache.com/white-hat-seo/googles-new-algorithm-ifdomainwikipediaorgrank1

23. “Wikipedia Founder Discourages Academic Use of His Creation.” Jeff Young. The Chronicle of Higher Education . June 12, 2006. http://chronicle.com/blogs/wiredcampus/wikipedia-founder-discourages-academic-use-of-his-creation/2305

For other remarks by Wikipedia’s creator discouraging the citation of Wikipedia, see; “Wikipedia: ‘A Work in Progress.’” Burt Helm.

Bloomberg Business Week . Dec. 14, 2005. hhttp://www.businessweek.com/technology/content/dec2005/tc20051214_441708.htm

“Interview with Wikipedia Founder Jimmy Wales.” Will Paoletto. bigoak blog. April 2, 2009. http://www.bigoakinc.com/blog/interview-with-wikipedia-founder-jimmy-wales

24. “Internet Searching by K–12 Students: A Research-based Process Model.” Kathleen Guinee. http://eric.ed.gov/PDFS/ED485138.pdf

Student vs. Search Engine: Undergraduates Rank Results for Relevance. Stacy Nowicki. portal: Libraries and the Academy . July 2003, Vol. 3 No. 3, 503–515.

25. “Web Search Behavior of Internet Experts and Newbies.” Christoph Hölscher, Gerhard Strube. http://www9.org/w9cdrom/81/81.html

26. Teaching Web Search Skills: Techniques And Strategies Of Top Trainers . Greg Notess. (Information Today, Inc.) ISBN-10 1573872679, ISBN-13 978-1573872676.

27. The top-level domains and country-code top-level domains are listed at Root Zone Database. www.iana.org/domains/root/db/index.html

28. “Investigating the Querying and Browsing Behavior of Advanced Search Engine Users.” Ryen W. White, Dan Morris. http://techhouse.brown.edu/~dmorris/publications/WhiteSIGIR2007b.pdf

29. To review an instructional lesson on advanced search syntaxes, see “Best Practices for Integrating Google Searching Into Student Research Projects.” Paul Barron. Virginia Educational Media Association. http://tinyurl.com/2fupybp.

30. “Finding Context: What Today’s College Students Say about Conducting Research in the Digital Age” Project Information Literacy Progress Report. Alison J. Head, Ph.D., Michael B. Eisenberg, Ph.D., February 2009. www.projectinfolit.org/pdfs/PIL_ProgressReport_2_2009.pdf


 
Blank gif