When we perform Google searches how do we know that Google has searched in every database on the Internet to come up with suggested matches for what we’re looking for? The easy answer is that Google can’t do that and we are presented only with hits for Web pages that Google knows about. Considering that this past summer Google added the one trillionth address to its list of Web pages, wouldn’t one think that’s far and away enough?
That’s where the Deep Web or the Invisible Web comes into play. Chris Sherman and Gary Price wrote a terrific book in 2001, The Invisible Web, that covered dozens and dozens of information sources that search engines haven’t found. There have been a few more books on this topic since then but I wish Sherman & Price would update their book.
Most of the Web pages in the Deep Web are from associations, businesses, libraries, universities and government agencies. The amazing amount of information, statistics, data, etc. that can be found within these is enormous.
There are some great developments in deeper searching that have popped up recently. Kosmix (www.kosmix.com) started out as a search engine for health and travel information. It has since developed a platform for a universal search engine that snags data from a lots of sources – Flickr, Google, Wikipedia, Yahoo Answers, YouTube, and others.
It then creates sort of a customized web page that breaks your search into segments. I searched for the topic “Burma” and Kosmix returned more information than I knew was available. Everything from reference, media, news & blogs, to ethnic groups, history, shopping, and books. Sources included Wikipedia, BBC & CNN, Shopping.com, Flickr, SeeqPod, the blog Backtype, and Slideshare.net. Yes, my search did uncover Burma Shave but the other riches outshone it. And, yes, Kosmix is one of those Mountain View, CA companies.
Another Deep Web crawler is DeepPeep (www.deeppeep.org) which is being developed by a professor at the University of Utah. When I entered my search term, “Burma,” I received 143 documents. I initially thought the search was totally off the wall but when I investigated each retrieved document I discovered what DeepPeep is trying to do.
I had also told DeepPeep to search in “all domains” rather than the more selective subjects airfare, book, rental, job, or biology. Therefore, I hit the mother lode of stuff. One of the first hits was for horse jobs and, sure enough, Burma is one of the countries listed in the horse-jobs.biz web site. I couldn’t figure out how the Hotel Oscar in Athens could be related to Burma but a very close look at the bottom of its home page listed links to other hotels. What russiamaritime.com had to do with Burma (and just who knew there was a russiamaritme.com?) was also easily discovered. This is really deep web searching and totally fascinating for those of us who love to bounce around the web discovering Web databases.
So, is there one search engine that does it all? Obviously, no. It’s great to have multiple search engines which create search strategies so differently. It makes searchers think harder about how to formulate their keyword strategies. Now, if I could just whittle down my favorite search engines to a five or six from several dozens.

Hi there,
I agree, finding a single search engine for all information is nigh impossible! At Deep Web Technologies, we have chosen certain industries and targeted our searches for professionals, rather than consumers. We also do not index, but search in real-time through advanced “connectors” which give up-to-the-minute information from deep web sites. Our technology powers Biznar.com, Mednar.com and other sites such as Science.gov.
You hit on a big plus in Deep Web searching- Source discovery. So many people search the same, familiar databases without realizing the wealth of information they are missing.
Thanks for the post!
Darcy
By: Darcy on March 20, 2009
at 4:47 pm