Q&A: Building a Visual Internet

The Internet, for all its transformative effects on business and culture, is a Frankenstein-like monster, built by accretion over a span of decades. It takes a certain kind of person to look beyond the mishmash and envision a cleaner, more beautiful virtual experience, one more tailored to the particular strengths of the human mind. Blaise Aguera y Arcas, a computer scientist at Microsoft, is that kind of person. His Seadragon software is a zooming interface that could make computing a more fluid, visual, and natural experience. NEWSWEEK's Barrett Sheridan asked Aguera y Arcas why he considers the Internet, as we know it, to be like black-and-white TV—and how he plans to create a color version. Excerpts:

NEWSWEEK: What's wrong with the Internet as it functions today?
Blaise Aguera y Arcas: If we think about the computer as a vehicle or a mechanism for delivering content, it's primarily visual content, and the Web page is the basic unit of content delivery and viewing. The standards behind the Web have a long history. Originally, Web pages weren't supposed to have any design in them—the original conception of HTML was just text, and all of the styling was actually up to the browser. Of course, as soon as people started offering Web pages and playing with this as a new medium, there was demand for much greater control over the way it looks. And the same is true of images. If you have an image that's 400 pixels on a side, then that has a very definite size on the screen, and it also has a very definite weight in terms of the bandwidth it requires. This is why you don't find images on the Web that are bigger than, at most, a megapixel or so.

Has anyone tried to address this?
This problem was recognized by the industry a number of years ago, and motivated the development of an image standard called JPEG 2000. The idea behind JPEG 2000 is that instead of encoding an image in reading order, from the top-left pixel all the way to the bottom-right pixel, you encode it holistically. You could imagine it like this: the first pixel, or the first byte, is the average color of the entire image. The next four pixels, say, are the average colors of the top left, top right, bottom left, bottom right. And so on. And what that lets you do, is as you read it from the beginning, the moment you've read even a single byte, you can already start to render the entire image. And your initial rendering will look like a constant color, and then it will look very, very blurry, and then it will refine, as you keep going.

What's the benefit of that?
That means if the transmission gets cut off, you still have the whole image—it's just blurrier than it would be otherwise. Or suppose that you're on a low-bandwidth connection. That means you don't have to wait for the entire thing to download; you just cut it off at an arbitrary point and you still see what you can see. Or suppose that you have a low-resolution screen. In that case it might have been an 8 megapixel image, but you don't need to send the whole 8 megapixels. You can say, OK, stop now, because I don't have any more pixels to show content on the screen. Add to the idea of multi-resolution also the idea of spatial random access—meaning that I can dive into a small portion of a very high-resolution image. Either you're zoomed out and you can see everything but at really low resolution, or you might be zoomed way in and looking at a very high resolution, but only a very small subset.

But JPEG 2000 never took off, and you used those ideas in your technology, Seadragon.
Right, and took it to its logical conclusion, and also generalized it to arbitrary kinds of documents. JPEG 2000 was really just about images.

What's the next step for Seadragon? Is this the new way to access information across the whole of the Internet?
Yes. I can't speak too specifically to Microsoft's Seadragon plans. More generally, absolutely. I feel maybe a little bit like this is a necessity in the same sense that a color TV after black-and-white was a necessity—you see it and it's obvious that it has to go that way. The alternative is offering content that is always going to be impoverished with respect to what you can display.

Edward Tufte says that scrolling and linking—the way we navigate the Internet today—are inferior to scanning or zooming. Do you agree that the Internet has been built on the second- and third-best options?
If anything, Tufte doesn't even go far enough. It's not just about zooming and panning, it's about bringing to the computer things that we really are evolved to do. The origins of scrolling are back in the dark ages of computers—it's a very primitive idea. It's an interesting one, modeled on the pre-codex way of representing text in physical scrolls. But in real life we find lots and lots of ways of organizing information. If you just think about everything in your house, you have a much richer mental map of all those things than you have of where your files are in your computer, and where your documents are in e-mail. Just bringing the full power of your visual system to bear on processing information, it just seems like it's so obvious it almost doesn't bear mentioning.

But surely there's a place for text-based searching? Zooming can't do everything.
Yes. Text is unbeatable for organizing information semantically. The visual information is unbeatable for processing large amounts of information. This is how you use your computer now—you do these things, but the visual part is crippled with respect to the real world, while the searching part is already much better than what you can do in the real world.