<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.thestandard.com" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>The Industry Standard - Deep-Net Fishing - Comments</title>
 <link>http://www.thestandard.com/deep-net-fishing</link>
 <description>Comments for &quot;Deep-Net Fishing&quot;</description>
 <language>en</language>
<item>
 <title>Deep-Net Fishing</title>
 <link>http://www.thestandard.com/deep-net-fishing</link>
 <description>&lt;p&gt;&lt;!--paging_filter--&gt;
&lt;p&gt;	Intelliseek CEO Mahendra Vora says it&#039;s a shame Firestone wasn&#039;t able to use his company&#039;s online rumor-detection software when its tires started blowing up. Firestone&#039;s media nightmare peaked in August 2000, the month it recalled 6.5 million failure-prone tires. But a search after the fact, using Intelliseek&#039;s software, &quot;found evidence of the Firestone problem as early as August 1998,&quot; boasts Vora. Firestone might have begun its damage control efforts a lot earlier had it known that people were already beginning to talk about its tires on the Net.
&lt;/p&gt;
&lt;p&gt;Intelliseek&#039;s system, called Corporate Intelligence Services, combs through areas of the Web that don&#039;t show up in search engines, such as archives of Usenet forums, message boards and chat-room discussions, to ferret out discussions relating to a brand&#039;s products or services.
&lt;/p&gt;
&lt;p&gt;Intelliseek, along with its chief competitor, BrightPlanet, is coming up with ways to work around the limitations of standard search engines to let users plumb what&#039;s become tagged the &quot;deep Web&quot; or &quot;invisible Web,&quot; the formerly unsearchable depths of the Internet. Even consumer-based engines such as Google (&lt;a href=&quot;/companies/dossier/0,1922,274559,00.html&quot; rel=&quot;nofollow&quot;&gt;dossier&lt;/a&gt;) are beginning to offer findings from parts of the Web that had been previously hard to mine.
&lt;/p&gt;
&lt;p&gt;&quot;There&#039;s a lot of material you&#039;ll never find using a general search engine, no matter how hard you search,&quot; says Gary Price, an independent research consultant and co-author of the forthcoming The Invisible Web.
&lt;/p&gt;
&lt;p&gt;While  currently 1.5 billion Web pages are available to the average searcher, BrightPlanet estimates that some 550 billion documents would never show up in an index. This includes material such as Salary.com&#039;s comparative compensation statistics, the U.S. Patent and Trademark Office&#039;s full-text and full-page image databases, Securities and Exchange Commission records, academic papers, census data, Library of Congress records, medical research and untold numbers of art images and music files.
&lt;/p&gt;
&lt;p&gt;The problem is that the software &quot;spiders&quot; used by search companies such as Lycos and AltaVista (&lt;a href=&quot;/companies/dossier/0,1922,266432,00.html&quot; rel=&quot;nofollow&quot;&gt;dossier&lt;/a&gt;) to crawl around the Web and generate indexes are too stupid to access most databases or information stored in formats other than HTML (the formatting language used to construct ordinary Web pages). Intelliseek and BrightPlanet have designed software agents that can automatically extract requested information from multiple invisible databases at the same time and present the results in customizable reports.
&lt;/p&gt;
&lt;p&gt;Chris Sherman, associate editor of the online site Search Engine Watch, and Price&#039;s co-author, says the deep Web is &quot;of huge value as a business. Look at the market that traditional proprietary database or information service providers like Dow Jones or Lexis-Nexis have.&quot; (That would be $2.2 billion and $1.8 billion in sales, respectively, for 2000.) &quot;There are a huge number of really authoritative resources out there that are either free or very low-cost. I think that they pose a real threat to some of these more established concerns.&quot;
&lt;/p&gt;
&lt;p&gt;But if the information is cheap - or free - how&#039;s money to be made from it? One way is by directing people to it and making it meaningful. That&#039;s BrightPlanet&#039;s mission. In May, the 2-year-old Sioux Falls, S.D., company launched a subscription-only Web site that lets corporate clients set up automated search queries and generate reports of both the deep Web and the surface Web.
&lt;/p&gt;
&lt;p&gt;&lt;br&gt;&lt;br /&gt;
						&lt;br&gt;&lt;/p&gt;
&lt;p&gt;					&lt;br&gt;&lt;br /&gt;
	&lt;br&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;	When a user enters a search term, say, &quot;osteoporosis,&quot; a &quot;direct query engine&quot; automatically configures the request to conform to the syntaxes of the various deep site forms, sending the same query out to multiple databases at once. Of the 40,000 deep Web databases that BrightPlanet can access, only those determined to be relevant are queried. &quot;That&#039;s part of what our automated technology does,&quot; says BrightPlanet Chairman Michael Bergman. &quot;We actually evaluate the search request, match that against our profiles of the search sites and make a selection in the background as to where the query gets directed.&quot;
&lt;/p&gt;
&lt;p&gt;Intelliseek also is dredging up treasure from the bottom of the deep Web. Founded in 1997 with backing by Ford Ventures and Nokia (&lt;a href=&quot;/companies/dossier/0,1922,NOK,00.html&quot; rel=&quot;nofollow&quot;&gt;NOK&lt;/a&gt;) Ventures, the Cincinnati-based company uses its technology to collect and analyze what people on the Web are saying about its clients&#039; products and services. Companies such as Ford, Goldman Sachs, Nokia and Procter  Gamble (&lt;a href=&quot;/companies/dossier/0,1922,PG,00.html&quot; rel=&quot;nofollow&quot;&gt;PG&lt;/a&gt;), schedule recurring searches to monitor Usenet forums, message boards, archived chat-room discussions and news articles, grabbing anything related to a particular trademark or company.
&lt;/p&gt;
&lt;p&gt;Intelliseek&#039;s Vora says the technology&#039;s real strength is not how it finds information from disparate sources but the way it can take that information and generate reports and charts to monitor competitors and identify trends in customer attitudes.
&lt;/p&gt;
&lt;p&gt;&quot;Assume that I&#039;m a Lincoln brand manager,&quot; he says. &quot;I can do a search on 10 different engines all day long and end up with 300 documents. I don&#039;t have time to read 300 documents. All I care about is that 37 percent of the people prefer my interior over the Chrysler 300M.&quot;
&lt;/p&gt;
&lt;p&gt;Because general consumers are unwilling to pay for search services, businesses are likely to be the main market for these new search capabilities for now. But BrightPlanet and Intelliseek might need a new business model before long: At least one consumer search engine is diving into the deep Web.
&lt;/p&gt;
&lt;p&gt;In late February, Mountain View, Calif.-based Google added 13 million portable-document format files to its search engine index. Adobe Systems (&lt;a href=&quot;/companies/dossier/0,1922,ADBE,00.html&quot; rel=&quot;nofollow&quot;&gt;ADBE&lt;/a&gt;)&#039; electronic file format is popular with online publications, including white papers, academic articles and business reports. Google now has 70 percent of the publicly available PDF documents on the Web, with more to come, says David Krane, a Google spokesman.
&lt;/p&gt;
&lt;p&gt;In recent weeks, Google also obtained access to Usenet archives - containing 500 million discussion messages on every topic imaginable - dating back to 1995.
&lt;/p&gt;
&lt;p&gt;Krane says Google has made the deep Web a &quot;top priority&quot; and may even endow its crawler with a bigger brain. If Google succeeds in advancing its spider on the evolutionary tree, then the other search engine crawlers will have to evolve or perish, their remains lurking somewhere in the depths of the deep Web.
&lt;/p&gt;
&lt;p&gt;Mark Frauenfelder is a frequent contributor to The Standard.&lt;br&gt;&lt;br /&gt;
						&lt;br&gt;&lt;/p&gt;
&lt;p&gt;					&lt;br&gt;&lt;br /&gt;
	&lt;br&gt;&lt;/p&gt;
</description>
 <category domain="http://www.thestandard.com/taxonomy/term/1253">Wire</category>
 <pubDate>Mon, 18 Jun 2001 18:00:00 -0400</pubDate>
 <dc:creator>Baldwin Louie</dc:creator>
 <guid isPermaLink="false">89707 at http://www.thestandard.com</guid>
</item>
</channel>
</rss>
