At a time when few Internet experts seem to agree on anything, one Web
theory is emerging: Over the next five years, what is now an open market of data
will likely shake down to only one, two, or three real winners. According to this spin on
traditional media theory, these sites (or network of sites) will prosper as the
unofficial “sites of record” for their given field — be it “children,”
“games,” “sports,” or “travel.” These monster sites stand to
become each Internet category’s ABC, NBC, and CBS, and one of the fiercest competitions
for dominance lies in the “search engine” category, where early leaders like
Yahoo!, AltaVista, and Excite are battling to become the library that marks the
starting point for each Web trip. It can be argued that there are no real winners in
any one category so far — too many start-ups with deep pockets are still entering
the game for any one site to monopolize any one area of interest. The exception is, of
all things, a search engine created and maintained in Austin — Deja News
(http://www.dejanews.com).
Last year, iGuide, an online review of websites, declared
“Deja News may do only one thing, but it does it remarkably well.” That
“thing” entails making the Usenet portion of the Internet searchable via a
retrievable archive/database system. By loose definition, the term “Usenet” envelopes all of the
Internet’s community aspects, where people communicate via publicly posted and viewed
messages — most commonly within “newsgroups,” that is, forums that range
from the general (rec.food.cooking) to the highly specific (austin.food). There’s
still some debate over what constitutes the “Usenet,” but there’s no denying
the untamed beast is big. In just over two years, Deja News has collected over 109
million different posts or “articles” from over 20,000 newsgroups. Until the advent
of Deja News, there had been neither a vehicle for archiving or searching these
Usenet posts because the process of storing that much information took up far too much
disk space. By creating the proprietary software to operate, maintain, and search a
database as large as Usenet, Deja News founder Steve Madere has not only discovered an
uncharted Internet niche but has created and seemingly held on to Deja News’
self-appointed title as “The Source For Internet Newsgroups.”
“[iGuide] is right, it’s a niche, for sure,” says Madere
of his company, which has grown from three employees in December 1995 to over 50
today. “But if you can get a sufficient handle on a niche, it can be
significant. At this point, there are five or six players battling it out for web-searching
dominance and we pretty much own Internet discussion groups. We plan to continue that
by focusing specifically on discussion groups and do discussion groups extraordinarily
well. Our focus is one of our big strengths in that it allows us to get way far
ahead of anybody who could possibly compete with us in the area. So, it may be a
niche, but owning the whole thing is very significant.”
What Deja News seems to have is a focus on the segment of the Internet
that is traditionally the most unfocused — with an estimated 24 million users
spreading the messages over those 20,000 newsgroups. The result is a complicated and
burdensome “feed,” or path that Usenet follows. And as Usenet has grown, many
local and regional providers have begun limiting their intake of that feed to save
the money on hardware, offering their customers access to only a portion of the
newsgroups. But Deja News, along with allowing a user to search the archives, also allows
users to access the Usenet’s unabridged feed and is so far the only site on the
Internet that allows “everybody and anybody” to connect to their site for
Usenet access. As such, Deja News has the distinction of originating an estimated 3%
of all newsgroups postings — numbers that even the largest regular Internet
provider in New York or Tokyo can’t claim. And that number should grow as Deja News
unveils an updated newsreading application next month, which promises to make
accessing and navigating the full feed easier. In addition, the new program should also
reduce the already impressive four-hour lag time it takes for Deja News to recognize
a post to an unheard of, almost instantaneous rate that parallels the speed it takes
to send and receive e-mail.
“Part of the reason we’re improving our newsreading capabilities, in
some sense, is to save Usenet,” says Madere, who admits that the potential
for finding more users by making Usenet more user-friendly is also in his own best
interest. “And as it turns out, our system is, in technical terms, perfectly
capable of daily reading. At the same time, we’re also finding that because of this
problem of Usenet growing faster than the machines, more and more providers start
restricting the feed or, in some cases, getting out altogether. Frankly, we think
Internet discussion groups are incredibly useful and are far and away the most powerful
communication medium ever invented.”

But while newsreading has become a popular use of Deja News, the site’s
primary purpose and use has been twofold: as a bridge between Usenet and the Web, and
as a search engine capable of recalling from the archives detailed information
or data mentions. Madere says neither effect could be achieved without the creation
of Deja News’ search software — software that Madere first looked into based on his
own Usenet frustrations.
“I’d been using Usenet since the mid-Eighties and had been wanting to
be able to search through it pretty much ever since I found it,” says
Madere, a University of Texas graduate. “I always thought, `Wow, this is great
stuff, and I know that somebody answered my question last week, I just wish I could
find it.'”
But until Madere rolled out Deja News’ first site in May of 1995,
questions and answers were disappearing just as fast as they were being posted. According
to Madere, although disc drive storage capability typically doubles in size each year,
so have the number of Usenet posts. “It’s always been just as unmanageable as it
is today,” he says. “And so because everybody’s always been expiring
what they could store after two weeks, everybody has had to buy new hardware every
year just to keep up.”
Interestingly, Madere’s decision to buy the hardware necessary to store a
Usenet archive wasn’t nearly as significant as the creation of Deja News’ search
engine. “Before, even if you had a private-collection posts on disc, it would
have taken so long to search through it that everybody else would have killed you for
taking up so much computer time,” Madere says. “So creating a specific
database just for the specific purpose of searching through everything that had ever
been done in discussion groups was really our breakthrough.”
And like most great inventions, the mechanics of Deja News’ search engine
are fairly uncomplicated. By their nature, all posts come pre-packaged in text
form, and are public information, reducing the overhead costs of transferring
messages to computer text and paying the authors a licensing fee. But Deja News’
advantage is the speed in which its search technology can cover its database and find
particular words, names, and discussions within Usenet, what experts routinely call
“the world’s largest database.” By comparison, AltaVista — the search engine
that covers the largest percentage of the Web side of the Internet — indexes just
over 30 gigabytes of information. Deja News, according to Madere, currently
indexes and stores 180 gigabytes of Internet discussion groups — at what is believed to
be a fraction of the cost it takes to run the AltaVista search engine.
“What we have that nobody else does is the database and the large
database technology to go with it,” says Madere. “And the interesting thing
is that is what also makes it plausible for us to do large scale news reading,
because when it comes down to it, Usenet is just a giant database. Discussion groups
represent a huge database of messages and when you’re reading a newsgroup, it just
means you’re reading messages that meet the particular criterion that they were posted on
this newsgroup. But we also can make large databases searchable for a reasonable
cost, and it costs us one-fifth to one-tenth as much for searching on a large
datable than it costs other people because of our special software. As a result, it’s just
cost-prohibitive for anybody else to offer an Internet discussion group search, because the
data itself is so big.”
Madere does indeed have a lock on the discussion group search market so
far. In fact, Microsoft, America Online, Yahoo!, and Excite have all signed up with
Deja News to offer, through each of their individual (and competing) services,
Deja News-driven searches. Madere believes that not only do his “strategic deals”
with his Web-based competitors help stretch the reach of Usenet itself, but also
ensures they’ll stay out of the discussion group market themselves. “Right now, it’s
still the cost advantage that is allowing us to dominate. Certain huge companies could
decide to loose a ton of money and compete with us, and that way could even be able
to take over our space,” admits Madere. “But frankly, it’s still far easier
for them to team up with us. It would be short-sighted to take us on.”
Updating Deja News’ readers and the actual interface while forging more
relationships with potential competitors appear to be Deja News’ own short-term goals,
Madere acknowledges, though he says that Deja News’ actual long-term mission is based around
marketing — particularly the DNCampaign, a suite of Internet-based marketing tools
Deja News provides other businesses interested in online advertising. As the Deja News
search engine is utilized by nearly 3.5 million different users monthly, Madere can
offer other companies an opportunity to test the effectiveness of their online
advertising by targeting distinctive demographic groups within that audience — whom
Madere can recognize and target based on what newsgroup topic the user is searching
through.
“Since we have 25,000 distinct newsgroups, we can extremely rapidly
target messages in the form of advertising banners to specific groups, be it
programmers, system administrators, travelers, or European travelers,” says Madere,
who regularly offers local businesses, such as local rock band Velvet Hammer, the chance to
advertise free so that Deja News can test the response of one banner against the other.
“Then, we can test the responses of those people in the specific demographic groups
to specific messages and report back to the companies what we’ve found and how we can
help them better achieve their goals.”
And yet, Deja News’ marketing studies would be nothing without its base of
users, who come to the site not to be tested, but to instead test the database for
information they seek. Deja News’ critics contend that increasingly, posts to several of
the more controversial newsgroups are being dragged out of Deja News’ search
engine and reposted to either make a point or discredit someone else’s opinion with what
they’d perhaps said before on the topic. Could Deja News be creating more
clutter?
“Sure, following up an old article or thread could be like walking
into a conversation three hours after it ended just to answer somebody,” Madere
says. “But our service tries to prevent that because when you try to follow up
something older than a couple of weeks it says `You can’t follow up that message,’
which is a general warning to alert people `This conversation is over.'”
Even so, there is nothing that Deja News can do about people cutting and
pasting old posts into new ones, and it’s Madere’s contention that the search engine
itself actually reduces the number of repetitive posts by cutting down on the number
of questions already asked and answered. Better yet, in a virtual community
where anybody can run amok unidentified or unchecked, the Deja News’ “Author
Profile” feature — which allows access to a complete list of where a user has posted
and what they’ve said — has been applauded for unmasking users that routinely
post misinformation or for verifying an author’s credibility on a subject.
“The reason we invented the author profile was as a convenience to
help people filter out bozos,” says Madere. “Discussion groups have a lot more
in common with traditional communities. When you meet somebody in the real world and
they say something interesting and you want to get to know them better, typically
you’ll go to other people who’ve talked with them in the past and find out about the
person. Here, you’re looking at a post from somebody giving you information and
you’ll want to get an idea of how credible this person is — what they’ve done elsewhere
on Usenet and what they’ve said in the past. And this, by looking at what and where
they’ve posted, can help you determine whether somebody’s a gadfly or significant
contributor.”
In theory, offering full access to a user’s online history might appear to
raise a new set of Internet privacy concerns. What’s to stop someone from typing in
the name of a friend interested in cooking and discovering his set of posts to
alt.personals.spanking? And is there a reasonable expectation of privacy, considering that,
pre-Deja News, posts disappeared after two weeks and were far less likely to
come back and haunt someone?
“We do not expose any private information,” says Madere.
“The only things that are available on Deja News are things that have been posted to
Usenet and have been already visible to 24 million people daily. There can be no
privacy concerns to making something that has been posted to Internet discussion
groups searchable. It’s a question of making it more clear to the original poster that maybe
they shouldn’t have posted it. No reason to get mad at us, we’re just the messenger — 24
million people saw this before we got it.”
“In fact, at our site, we have a copy of the one FAQ posts to
news.newusers and most of the newsreading software essentially forced the new user to read
this document before their first post. In this document, which was written in 1985
or 1986, it says, `Don’t say anything that you wouldn’t want to come back years
later.’ This stuff will get back to everybody that’s related to you. We didn’t have
to write anything, because it’s already there.”
Privacy issues aside, the real story behind Deja News is the search engine
itself and the ease with which information is now available on Usenet. Already, Deja
News is witnessing a 20% monthly increase in the number of users at their site,
numbers that are not only strengthening Deja News’ stronghold on the Usenet market,
but also making Madere’s chosen “niche” into something far more significant
than it was just two years ago.
“When I first started the company I knew the potential of the
discussion groups, and knew the tens of millions of people participating needed a way to
search. The only question in my mind was whether we could singularly dominate the
space. If we were able, I knew we’d have to grow at this rate. And if we didn’t, I
didn’t know if we could survive.”
“But now, we expect the Internet discussion group market to grow
rather rapidly over the next few years again. We’re going to make it easier to participate,
and as more people get more used to using the Internet they realize that the only
thing that’s unique about the Internet, that no other medium can offer, is
discussion groups. Web pages aren’t that different than a library — except being able to access
everything at once. Internet telephone is a temporary thing based on regulatory glitches
and the whole Pointcast `push’ thing is just television — and people are going
to get a clue on that soon.
“Discussion groups are the one thing that the Internet is the only
medium that can do it. There’s no other communication medium in which you can go out
and send a message to thousands of people and any of them can respond to it, also
to thousands of people. It’s like a global telepathy system, instantly capable
of reaching people with similar interests all over the world.”
This article appears in July 11 • 1997 and July 11 • 1997 (Cover).



