The Information Junkie
Austin Chronicle: How did you become a professional researcher?
Reva Basch: By the most prosaic of routes -- I got a master's degree in library science from UC Berkeley (and I'm not volunteering the year). I worked as a librarian in a variety of different contexts, including an engineering school and a couple of engineering firms. In the early Eighties, I joined Information on Demand, a pioneering fee-for-service research firm, as a research associate. There, I learned to use whatever means available -- phone, online, or searching through the stacks -- to find what our clients were looking for.
My projects at IOD covered a tremendous, sometimes bizarre, range of topics, from the market for water-pumping windmills, to the latest trends in designer sunglasses, to current therapies for canine hip dysplasia. I learned to think quickly, and creatively, about information sources and techniques.
About 10 years ago, I went into business for myself, specializing in online research on services like Dialog, Lexis-Nexis, and Dow Jones. As the Internet, and then the Web, developed as a serious research environment, I of course developed my own expertise there.
The rest is history, I guess.
AC: Your latest book, Researching Online for Dummies, is a juicy guide to Web-based research, but there are some people who believe that the Internet is too chaotic and inconsistent to have real value as a research tool. Do you think there's any substance to that argument?
RB: Actually, Researching Online for Dummies covers both Web-based resources and the traditional, proprietary research databanks like Dialog and the others I mentioned.
One of the points I make in the book is that there's a trade-off, definitely, between the neatly organized and highly indexed information that you tend to find in those proprietary services, and the price you pay for using them. Increasingly, much of the same high-quality information is becoming available on the Web, for free or very cheaply. The problem, as you suggest, is in finding it, and finding it in a timely manner.
Yes, much of the Web is still chaotic, and quality in particular is very inconsistent. But what I see happening is pockets of organization and high-quality, reliable information, with search engines tailored specifically to the specialized kinds of data found in those clusters of information. Take a site like the Electric Library's business edition, for instance. Or the National Library of Medicine's searchable front-end to Medline.
There's a perceptual change going on at the same time. People are beginning to realize that so-called "Web-wide" search engines aren't the optimal place to start when they're doing serious research -- and that they don't necessarily have to cover the entire Web.
AC: There are an increasing number of search engines online, and some of them accept money to give search preference to some sites or product affiliations over other possible links as a kind of undisclosed advertising. Some search sites also have broader indexing capabilities than others, better database organization, etc. How would you advise online researchers to go about finding the more credible and technically competent search engines?
RB: I have no problem -- or very little problem, anyway -- with "pay for placement" sites like goto.com, as long as they're up front about their method of operation. From a researcher's perspective, I can't see that they add a whole lot of value, and there's always the possibility that they might skew your search results. The important thing is to be aware whether the search engine you're using does, in fact, give preferential placement to sites that pay for the privilege, and to adjust accordingly.
Beyond that, my feeling is that every search engine is "technically competent" to some degree. You don't always want the largest database or the deepest, most comprehensive results. In fact, one of the reasons I like WebCrawler, one of the oldest search engines around, is that it isn't that large; it tends to pull up company home pages and The Complete Such and Such Page more readily than a lot of the more exhaustive engines. To tell you the truth, I use whatever search engine happens to have gotten me good results the last few times. I've always got a handful in rotation; right now it's Excite, Alta Vista, HotBot, and Infoseek. If one doesn't work, I try another. I have no one favorite. No single search engine covers the entire Web, and each one retrieves slightly different results, including some unique hits.
AC: Are there effective ways to find information on the Web that are outside the "search engine" category?
RB: There sure are. Search engines are best when you're looking for that needle in a haystack, and preferably when you have a unique or distinctive work or phrase to search on. But if you're just looking for general background on a topic, like travel in Europe or marketing on the Internet, I'd go for the subject catalog approach. Just about all the major search engine sites now offer a subject approach too. The best known subject catalog is Yahoo, but there are lots of others, including some that focus on particular fields.
The advantage of the subject catalog approach is that you can drill down from the general to the particular. That way, even if you don't know how to describe what you're looking for -- or aren't even sure of what you're looking for -- you can still narrow down your search.
Subject catalogs also offer the advantage of filtered information. Instead of including every site that purports to deal with a topic, someone's actually evaluated the information and included what they judge to be relevant.
Of course, once you start researching in a particular area, you'll discover sites that are great starting places in themselves, like FedWorld for government information. That's where your bookmark file comes in!
People online can be a great source of information, too. But that's a whole 'nother topic.
AC: You're talking about "portals" in the real sense of the word, doorways to the Web's vast database. I recall reading of an index site like this, I forget its name (maybe you'll recall), that had a participation rate by women much higher than the average for the web overall, because of the way it organized info ... does this ring a bell? I'd like to talk about some of the variations in the ways index sites are organized.
RB: That was LookSmart (http://www.looksmart.com). The explanation was that the site is set up to encourage browsing. Research apparently shows that women prefer burrowing through channels to flinging terms against a search engine, which is men's preferred style. I can see the analogy with shopping styles, though I'm not sure I completely buy the theory.
As for variations in how index sites are organized, the biggest difference I see is between the ones that are designed to appeal to a broad consumer base -- Yahoo, of course, Excite, Infoseek, the usual suspects -- and the more academic subject guides, such as the Argus Clearinghouse (http://www.clearinghouse.net) and the W3 Virtual Library. The latter tend to map toward the major scholarly disciplines, such as literature, chemistry, sociology, and so on.
AC: I was also thinking of the potential for an increasing number of indexes that focus on a particular range of material. I tend to think that the portal concept will explode and that no single site will dominate. Actually another way that Internet-as-marketplace has been misunderstood: The recent portal frenzy that resulted in high market valuations for sites like Yahoo is based on old media thinking, an attempt to see "portals" as something like "television networks." I tend to think that's not the way it's gonna be, how about you?
RB: Let's see ... We've got three or four questions here, I think.
I agree that the current portal-mania is an attempt to brand major search sites along the lines of network TV. On one level, it makes sense; having your e-mail, your Web hosting service, your real-time news feeds and whatever else a portal site might be offering, all in one place, no more than a click or two away. This can be quite convenient -- especially if you're fairly new to the Web and haven't developed that spatial sense that makes it easier to navigate your way around.
But the idea that portals will breed loyalty this way is, I think, mistaken. You don't hear about people who are NBC "loyalists" or diehard ABC fans, or CBS maniacs, do you? You do hear about ESPN nuts, but that's something different; you're talking about sports fans there.
For portals to work, you need some inherent attraction, something about the site itself that goes beyond the mere convenience of having all your online activities in one place. You need compelling content, like ESPN. Or you need a sense of community, such as you get at some of the geographically specific sites. A portal built around a magazine with an already loyal, committed readership makes more sense to me. So, in that regard, I think you're right: We'll see dozens, if not hundreds, of portal sites, not just a "Big Three" or four.
As for specialized indexes, yes, that's one of the significant ways in which the Web, and our use of it to find information, is changing. It's like starting with a bibliography on a given topic; you're definitely one step ahead.
AC: You've been online for a long time. The prevailing myth is that the Internet is male-dominated or that it is or has been a hostile environment for women. How does that align with your experience?
RB: No question the Internet has been a hostile, or at least non-welcoming, environment for women in the past. That's due, I think, to its essential geek nature, the fact that the Net was founded and run by programmer types, most of whom were, back in the early days, men. But women have always been online, in whatever small numbers, and quite able to take care of themselves. I haven't seen any percentage figures in the last couple of months, but if we're still a minority, we're a significant and, I'm sure, growing one.
Given the extent to which the Net is becoming intertwined with our lives -- personal, professional, commercial, entertainment -- and the fact that kids of both genders are getting online as soon as they're capable of manipulating a mouse, I expect the Internet population will reflect real-world demographics within the next 10 years at the outside. The very outside.
AC: Current stats show that research is still the numero uno activity on the Web. Do you see that changing as more people come online? Or do you think it's the nature of the beast?
RB: How do you define "research" -- finding stuff out, in the broadest sense? If so, I think it's a function of the way the Web looks and feels and offers right now. It's rapidly becoming, if it's not already, the default medium for information exchange and publishing. So that's what's driving online activity right now.
But I expect that balance will shift, over time, as more people discover the Web's potential for interactivity and communication. Internet telephony is in its infancy. Online collaboration, net-meetings, distance education, and chat -- not to mention the joys of virtual community with which you and I are both very familiar -- could well drive online usage in the not very distant future.
AC: Do you think there's a business model for collaboration and community? Or do these emerge as value adds in other projects, or total nonprofits? I'm thinking about Electric Minds, which, given the chance, might have been profitable, though probably not by venture-capitalist standards.
RB: I see community as more of an end in itself -- other people, and the ability to tap into that vast well -- than as a profit-making venture. It's a compelling reason to be online. I have my doubts -- a few obvious exceptions aside -- about the financial viability of community qua community. I'm skeptical about any business venture that seeks to "create" community for its own sake.
On the other hand, the ability to converse and collaborate in a professional setting -- lawyers, teachers, librarians, whatever -- can be a real attraction; added value, as you suggest, in the appropriate context.
AC: My own belief is that you can create contexts for community, but you can't "create" communities, they have to form. However, there are "best practices" for nurturing a community, making it grow. It's interesting to be living and working in an environment that is essentially information, where the leading preoccupations are conversation and research. I wonder what else works ... online shopping seems to be a contender. Multimedia events, when the bandwidth is there.
Environment is a key word here. Attorney Lance Rose argues that those who define the Internet as a "medium" are mistaken, that it is not a medium but an environment in which media occur. What do you think about this? Does it have relevance for the online researcher?
RB: That's an interesting distinction. I think I know what Lance is getting at. Online has always seemed like an environment to me, even when all I was doing was linear, text-based searching on database services like Dialog. Certain databases were "over there" with the environmental cluster; others were "down there" with the patents or the medical literature. I always had a spatial perception of the CPU on which all this information was loaded, with me on the other end of the phone line exploring various groupings and teasing out a string of relevant results.
So, yes, "environment" seems very apt to me. "Online" is such an abstraction, really; the concept of cyberSPACE is one attempt to begin to give it some form. In my own mind, I feel as if I am traveling through physical space when I move from one Web site to another, or from the Web to a text-based system like the WELL's Picospan interface, or native-mode Dialog.
The next step, and it's already starting to happen, is spatial visualization of large, complex data structures, where you can actually navigate around and through collections of documents using keywords and concepts, and watch the data reshape itself in response to your changing search query.
AC: Full circle: that sounds exactly like William Gibson's vision of cyberspace in Neuromancer!
RB: It sure does. I was thinking that as I spoke. "The Matrix" is upon us.
AC: Can you give us a real-world example of the kind of conjure work you're describing?
RB: One commercial product that I'm aware of is DR-LINK, marketed by Manning and Napier (http://www.mnis.net), which rearranges document groupings dynamically -- picture sort of a freeform org chart -- as you add and change your search criteria. I've also seen demos of several other "data visualization" models. There's one by Magnifi (http://www.magnifi.com) that handles multi-object retrieval (images, multimedia, and so on) and shows the connections between related types of content regardless of the medium. For instance, it'll recognize that a picture of a boat is related to a story about a boat, plans for building a boat, a video about a boat, and so on. Semio Corporation (http://www.semio.com) uses a Java-based "Visualizer" that, like DR-LINK, lets you navigate visually through large content domains. Plumb Designs also uses Java to represent data structures that flow, rotate, and shift in real time, as you move through it. And Lucent uses a lovely, multicolor circular display that analyzes your search results and clusters them by relevance and their relationship to each other. It's very intuitive; you just know what you're looking at.
To my knowledge, none of these are yet available for use on the Web at large; they're pretty much confined to large proprietary database collections or in-house document management projects. But that will probably change.
AC: Good, powerful web-based visualizations would probably tend to be Java-based to work across platforms, and Java's still maturing. I guess bandwidth is a consideration, too.
I've been wondering about the processes that search engines use to collect data: spiders and such. I wonder what's the effective lead time before something's reliably indexed, i.e., how current can your search be, and still be effective?
RB: It varies with the search engine. HotBot, until fairly recently, claimed to be -- and often was -- current within a couple of days. Last time I checked, though, the lag was more like two weeks. I haven't seen a correlation between currency and the size of the index or percentage of the Web or number of web pages a particular engine claims to cover; I suspect it has more to do with their internal algorithms, but that's not really my field of expertise.
Sometimes you'll hit a fairly new page just because it happened to come up right before the search engine spider got around to that point in its periodic cycle of the Web. Equally often, you'll miss a page that's been up for months because the spider got there just before it went up, and won't be back for another few weeks.
If I'm looking for very current information, or for a particular page I know to be new, I'll take a different approach. For information on a breaking topic, I'll go to one of the news sites, or to Usenet. If I'm looking for a particular page, I'll try guessing the URL or e-mailing people I think might know about it. Who knows, I might even resort to picking up the phone!
AC: Do you have a particular methodology for approaching a news site in research mode? Or do you operate intuitively?
RB: You mean a broadcast media site, like NBC et al., or newspaper sites? I use the two very differently. I seldom visit network sites unless I'm trying to follow a breaking story, or get some background on something that's been in the news for a while. There, I operate pretty intuitively.
As for printed news sources, it depends on whether I'm following something current, or digging back in the archives. And it depends on how good a search engine a particular site has.
AC: I was really thinking about newspaper sites, and net-native news sites. And that raises another question: In looking at sites that don't have extensive history to feed an assessment of credibility, do you have particular clues you consider in determining whether content posted there is valid and useful?
RB: First, there are the obvious markers: Is it literate and well thought-out, or riddled with typos and misspellings, disorganized and ranting?
Then, you read between the lines, thinking critically, the same way you do when you're reading something in print. You look at the assumptions that are being made, the arguments that are being advanced, the "facts" that are set forth -- or withheld. You also look at authorship: Who's responsible for the site, or for the material you're reading? Is it an individual, a corporation, a lobbying group, or some entity you've never heard of? Is the author who he or she purports to be? What are their credentials? Can you verify them through other sources, such as databases of published papers in a particular academic discipline, or newspaper accounts, or biographical directories?
Speaking of "other sources," you apply a similar method to the information itself: Is it verifiable? Can you back it up in at least two independent sources? "Sources" can be either other publications or Web sites, or people with an expertise on the subject.
And finally, I'd say that, just because a site does have an ax to grind or a particular bias or slant on a subject, don't discount it as an information source; just be sure to take that bias into account.