Search discrimination
July 20, 2010 by Cynthia
“All web search engines give identical results, so there’s really no reason to use anything but Google,” she said.
I was participating in a “web for artists” seminar, and that declaration came from a fellow speaker. Being a mini-nerd, I couldn’t, oh, realize that it would embarrass the lady to contradict her in front of her audience, so I spoke right up.
“Uhm, ‘scuse me, but that’s not really true…” and got glared at for my pains.
But then I got to thinking: Is there really that much difference? If you made the same query in the top three search engines (Google, Yahoo!, Bing), would you get that much more (or better) information? And would those results differ materially from results delivered by lesser-known engines?
In other words, does extending your query to other search engines increase the quality of your results?
Short answer:
Depends on what you’re looking for. I made a little study of this over the last few months and found that, in general, the more commercially valuable or newsworthy the query, the less benefit there is in using more than one search engine. When the query is of low commercial interest (such as reference searches), using multiple search engines can greatly enrich results.
Long answer:
According to Neilsen NetRatings, the top three search sites grab more than 90 percent of all US search engine traffic, with Google taking 65 percent, Yahoo 14 and Microsoft’s Bing/Live/MSN search coming in at 12.* I doubt that the combined traffic of seven lesser engines–Duck Duck Go, AltaVista, Leapfish, Quintura, DogPile, Timmp and Yebol–even pass 1 percent. (This is in the US; numbers are VERY different elsewhere in the world)
But will these ten engines deliver similar results? I devised a simple test to answer that: I made 100 queries in all ten of these search engines over a three-month period, documented the results and looked for patterns. My choice of queries came from whatever I happened to be curious about at the time (and believe me, it’s an eclectic collection) and I divided them evenly into five categories:
- Current events: Searches for wildly popular subjects in the news like “BP oil cleanup” or “oscar winner predictions”
- High tech: Searches in technology-related areas such as “cloud computing” and “Windows 7 feature comparison”
- Cultural/arts/history: Searches for “pate de verre” or “origins of the word barf” or “tools used by aborigines”
- How-to: Searches on process-related queries, such as “calculate the volume of a sphere” or “diagnose the cause of engine knock”
- Reference: Encyclopedic-like searches for factual information, such as “how big can a raccoon grow” or “specific gravity of leaded crystal”
(And yes, I need to get a life.) I’m not pretending for a minute that this is a scientifically valid analysis or–since search algorithms and web content can change dramatically with time–that this study will be particularly reproducible. But I took a few precautions to make the test as fair as possible. (See footnotes for more info**)
I wound up with a gigantic spreadsheet containing 100 of these query forms and a bunch of pivot tables. (click to view the sample full-sized)
Colors indicate which engine produced the results in which rank, using Google’s top ten results as a baseline; they provide a visual cue to recurring hits. Results are listed in order of rank; if the space is blank in the Yahoo or Bing listings at right, the engine repeated a Google result in that position. I lumped exception results for the other 7 engines in the purple “other” listings. Except for current events queries, all seven engines rarely returned more than ten results that didn’t appear in Google, Yahoo! or Bing.
I wasn’t trying to answer the most obvious question, i.e., which search engine consistently returns the most relevant results? Instead, I was asking “Which search engines give me the most relevant results that I WON’T find in the Google top ten?”
The answer to that, I figured, would tell me whether or not additional searches were a waste of time. I found slight skews between the three major engines: Google leaned toward product or company listings in a search result, Bing preferred media and academic/explanatory results. Yahoo! usually produced less relevant results than the other two except for current events.
The more likely a query could be tied to a product or service someone wanted to sell, the less deviation I found in different search engine results, and therefore the less useful a multi-engine search was likely to be. The less commercial the query, the more varied the results from engine to engine, and the more likely I was to find rich new sources of information by extending my search.
When I thought about it, that made sense: Search engine marketing is an increasingly important weapon in a marketeer’s arsenal, and companies (or SEM agencies) frame much of their online presence to appear high in Google search results.
Since search engines spend considerable time tuning algorithms to level the playing field, there really isn’t a magic bullet to shoot your site to the top in hotly contested queries. Coming out in the top ten requires concerted, consistent efforts in site tuning, media buying (advertising), social/media presence, and content updating…something most likely practiced by an organization with deep web resources and an interest in selling something.
It stands to reason, then, that companies with the strongest search engine marketing programs will probably be very closely matched…and show up again and again in the top ten. Queries with less commercial interest have fewer champions with the resources to do this, so there should be more variance in results.
The above query for “cloud computing” was pretty typical: IBM, Rackspace and six other companies in Google’s top ten sell products and services related to cloud computing, and they (along with a few other companies) showed up in most engine’s results. In fact, only 20 sites delivered all 100 results for this query set.
The popularity of a subject at the time also influenced the diversity of results, but not necessarily the richness of the information. Querying all ten engines on a “hot” news topic, such as “BP oil spill cleanup” gave me great diversity–the results came from 95 different websites–but most of the hits were simply repackaging of the same newswire stories.
Adding a second or third search engine didn’t get me much. What did? Sticking with a single major engine and searching more deeply, i.e., reviewing next 20-40 results listings. That’s where I found local and special-interest sites that often had different insights into the topic.
In contrast, low-currency queries, of less popular or timeless topics such as ”card counting methods in blackjack” or “krakatoa eruption casualties,” seemeed to benefit most from shallower, multi-engine searches. Alternative engines, such as Yebol, generally delivered the richest troves in listings 6 through 10.
And–not much of a surprise–Wikipedia usually scored in the top five results for all but current events searches, and frequently took the number one spot. Wikipedia’s search tools are clumsy enough, however, that it’s actually easier for me to narrow my search with Google than to actually start with the Wikipedia site.
The subject category also influenced results; more about that later.
Yet in the end, does it matter?
A recent article on Search Engine Watch (by Eli Goodman) announces that search engines make us smarter. I completely disagree; search engines simply make infobits more readily available. They’re not particularly good at “smarts,” the ability to evaluate and apply information accurately to improve your situation. And if you blindly take the first couple of results from Google and run, you could actually wind up “dumber.”
That’s because one of the biggest advantages of searched-based information is also one of its greatest disadvantages: The first couple of search results tend to mirror the searcher’s bias and/or popular opinion, and may not give you the best answer.
At one time or another prevailing wisdom told us the earth is flat, human brains stop growing at 12, corn-based fuel solves all our environmental problems, the Nazis didn’t have concentration camps and Macs never get viruses. The still, small voices of dissent–the things that probably DON’T show up high in search engine listings–may tell the more accurate story.
To find them, you almost certainly have to get beyond the first couple of results, refine queries as you learn, and apply your own intelligence to filter the results. Understanding how to leverage single/multiple engine queries helps.
I’m going to stop here, for now, since this is already something like 1,600 words. If you’re interested in seeing more results (or debating the subject, yum), lemme know.
—————————
*A slightly more up-to-date view of search engine share, says about the same thing but with more detail
**How I tested
Because results vary over time, I made each query in all ten engines within the same two-hour window. I recorded only the first ten first-level results. Many engines group similar results from the same site –I counted those as a single result if they were grouped hierarchically. If other website listings separated them I counted each as a separate result. I documented the rank of each result on the page and scored its relevance to my original query. My relevancy rankings were weighted to impartial (where possible), non-commercial explanations.
I also noted the “currency” of each query, i.e., whether the subject was a hot news topic in the US and likely to be spiking in search popularity (versus a relatively timeless search with more stable results). Since Google has the lion’s share of traffic I always started my query sessions with Google; I’d record Google’s search results, then do the same in the other nine engines, looking for top 10 results that hadn’t shown up in Google’s.
Related posts:
- Private parts Interesting piece today in the New York Times on the differences in...
- License my song…please Ain't never been a problem that somebody can't turn into an opportunity....
- 10 iPhone apps that really ARE useful Fortune Magazine just named the "Top 10 must-have iPhone apps," and proved...
- Multibloggery: How many blogs do I need? Is it better to have one blog that talks about everything, or...





If you should happen to get a life, please continue with the greatness for all us wonderers. I bet you’d make a super interesting neighbor.
:O)