Wednesday, November 30, 2005

Search Engine Update

As more and more material--both written and audio--becomes available online, the need to navigate this massive amount of information to locate needed data becomes even more important.

We are, of course, talking about search engines: those programs designed to scour the Web for specific content--usually a word, a phrase or a series of words--identified by a user. A search engine retrieves the material that meets the user's identified parameters and offers a list weighted according to the individual search engine's algorithms.

This blog is an update on search engines. Two recent articles--one in the Wall Street Journal (WSJ) and the other in Wired--address the subject, although from different angles.

Today the search engine market is dominated by Google, Yahoo and Microsoft's MSN, with Google being the most widely used of the three. However, according to the WSJ, "a handful of closely held upstarts such as Technorati, Inc., Feedster, Inc. and LLC see an opportunity: Build a search engine that can track the information zipping through blogs, nearly in real time."

I first read about IceRocket on Dallas Mavericks' owner Mark Cuban's blog last month. IceRocket launched in August with Cuban as an investor. In his blog, he described the new service this way: " isn't really a blog search engine, although that's the term we use since it's the easiest way to communicate what we are doing. We are a tracking engine. We . . . index any and every source of information that is updated on an ongoing basis."

Since the WSJ article was written, Google has launched its own blog search service called, appropriately, Google Blog Search.

"No one knows exactly how many blogs exist. But the number of them tracked by Technorati has doubled every five months or so to, most recently, about 16.5 million. The rapid proliferation has made it increasingly frustrating for Web users to find what they're looking for." (WSJ)

Because the blog search sites focus on between 10 million and 20 million blogs, they are often faster than the giant search engines, which must search billions of Web pages. At the same time, the blog search engines do not attract anywhere near the number of visitors that Google, Yahoo and MSN do. For example, the WSJ says that Technorati logged 642,000 unique visitors in July. That was less than 1% of the number of visitors that Google logged.

Meanwhile, the new popularity of podcasts has sparked the development of two new search engines dedicated to searching podcasts.

Wired online magazine describes the new services: "Podzinger and blinkx scour audio content for keywords by translating the audio into text and creating an index for quick searching. It's a significant step above traditional search engines that identify only keywords in a podcast's metadata, such as the headline and introductory notes describing the audio file's general content."

For readers not familiar with the word "metadata," it simply means "data about the data." Search engines use meta tags to classify web pages. To see what I'm talking about, go to your favorite author's website (not their blog). When you arrive, go to the toolbar at the top of your screen and select "View." From the drop down menu, click on "Source." You will then see the source code for the website. The first thing you should see is a list of words starting with "meta." These are the meta tags. The two most important meta tags are "keywords" and "description." If the programmer who set up the site was doing his job, you will see a list of words that classify the website. Mystery writers are likely to use the words: mystery, suspense, thriller. Romance writers are likely to use the words: romance, love, women's fiction. The words selected help to optimize searches by search engines, although meta tags are not the only criteria used to rank searches.

Essentially, the new podcast search engines go beyond the traditional search engine's reliance on meta tags and actually listen for the sound of specific words and then use those words to create their indexes.

Wired says that "Podzinger is based on speech-recognition software that BBN, a Massachusetts-based research and development firm, created for U.S. intelligence agencies. It was intended to help analysts translate and scour foreign television broadcasts . . . for topics and speakers of interest."

Gary Price, news editor of Search Engine Watch, is quoted in the article as saying, "(t)he spoken word is now becoming as searchable as the printed word has always been."

No comments: