HotSauce and Meta-Content Format
by Matt Deatherage <firstname.lastname@example.org>
Originally published in MDJ in 1996.
In nearly every recent important Apple executive speech, the Powers That Be have mentioned an Apple technology investigation initially referred to as Project X and now called HotSauce. HotSauce presents a three-dimensional fly-through of sets of data (like all the Web sites categorized by Yahoo, the Usenet newsgroup hierarchy, or similar hierarchical sets of data) that can be described through common themes.
Apple had shown HotSauce before this year, but now the execs are mentioning HotSauce not just as a potential user-interface gizmo but also as an underlying technology that's going to revolutionize the way we browse data. What's the amazing part? Not the three-dimensional representation, or the Netscape Navigator plug-in to do it in a browser window, or even the programs that create the data files used by it. No, it's the format of the data files - a way of describing information about data. Apple calls this format MCF, which is short for Meta-Content Format. In this article, I cut through the hype and look at whether MCF lives up to Apple's recent claims that it will do for databases what HTML did for text.
Meta-content? Many of us just got used to the idea that information businesses are now "content providers," and now we're being asked to understand "meta-content?" It sounds like a technogeek term from hell, but it's not so bad.
The American Heritage Dictionary defines the prefix "meta-" in part as meaning "beyond, transcending, more comprehensive." Engineers like using the prefix to describe the process of referring to a process. For example, a joke about a joke would be "meta-humor," and a language invented to describe other languages is a "meta-language." (The concept is discussed thoroughly in the 1979 Pulitzer Prize-winning book Goedel, Escher, Bach: An Eternal Golden Braid by Douglas Hofstadter - a must-read for engineering or science enthusiasts.) Following this tradition, meta-content is content that talks about other content.
MCF, as defined by Apple's R.V. Guha (who is responsible for both HotSauce and MCF) is a "language for representing a wide range of information about content." A simple example of meta-content is the header on an email message. It tells you information about the message (who sent it, at what time, how it got to you, where replies should go, and more) but it's not the message itself: the person who sent you the mail wasn't sending you the header, but was sending the content of the message.
Why Describe Content? Email headers can be described as a simple language for describing the content of an email message. A language, for these purposes, is a set of simple rules that define valid expressions - in the normal language of mathematics, for example, "4 + 4" is a valid expression but "76#&98+A!" isn't. Email would be less useful if there were no headers: any sender would have to be sure to include the header information in the body of the message or you, as recipient, would never see it. A lack of a signature would leave you clueless as to the message's origin (and a false signature could mislead you further).
So, describing content is a useful pursuit. In fact, when you have lots and lots of content, navigating through it is next to impossible without some form of meta-content. Millions of people turn to Yahoo to find Web pages sorted into useful (if somewhat arbitrary) categories and classes. The same people could turn to AltaVista to search millions of Web pages by content, but searching by content is often less useful when you're browsing. If you want to find magazines about the Macintosh, you can dive through Yahoo until you get to a list of some 30 separate Web sites on the subject. Searching for "Macintosh magazine" in AltaVista returns about 400,000 matches, including job listings at Macworld, dozens of pages from the MacToday site, articles from old issues of Byte in Italian and so on. The raw text searching capability returns thousands of times more matches, but they're not as useful as Yahoo's more limited set.
Once you have a good description of some kind of content, that meta-content can be effectively and efficiently searched with excellent results. The major problem is that - so far - good meta-content comes only from actual people. Technology is getting better at this - Apple demonstrated agents that distill text documents into one sentence at Macworld Boston - but humans can still do much better. Publishers often create library card catalog entries for books to assist librarians - without that help, libraries have no way of knowing a book's contents except by jacket blurbs or the table of contents, and it's rare to find a library with enough resources to hire a librarian just to read books and catalog them properly.
In a similar vein, the trend in Web publishing is towards self-description of Web pages. Assuming you're honest, you can accurately describe your page in 25 words more accurately than someone at Yahoo, and much more accurately a text retrieval system. The HTML 3.2 standard includes a META keyword so you can add some meta-content information to your Web pages to assist with automatic indexing and other meta-content creation activities.
But Why MCF? Individually generated pieces of meta-content are useful, but when you describe collections of hundreds of thousands of pieces of content, you must have some standards. Let's extend the example of a library card catalog to one that's being computerized. When you look at a book's card, you can easily see if the book has 27 authors (perhaps it's an anthology). If you enter that information into a database, though, if there are only three "author" fields, you're stuck - you either leave out 24 authors or you enter them in an unrelated field, such as "description." Either way will foil people searching for books by one of those 24 people (who's going to search the description field for an author?). Large meta-content systems must be flexible; in fact, the MARC format used by the Library of Congress consists of a set of tagged data - you can have as many author tags and authors as you want for any particular entry, limited only by your particular computer's capability to store them.
So why not use an existing format like MARC to describe content on the Web as well? MARC is not an open standard. The "tags" used to indicate what each given entry contains are in fact numbers; and numbers not published are reserved for the MARC committee's definition, with only some exceptions. Further, MARC records include binary data and aren't easily human-readable. Conversely, Guha's MCF format is more like HTML. Consider that in HTML, a Web page author can invent her own tags. If someone's browser doesn't know how to interpret them, they'll just be ignored. If a browser does interpret them, then the page can include nifty new features. Netscape does this with nearly every release of Navigator.
Apple's hoping MCF has a similar reception - it's a simple, text-based format that defines objects and their properties. There are no restrictions on what properties are described for each object, nor are there requirements that all properties be described or that all relationships between objects be included. HotSauce's implementation of MCF only handles a few properties for each object: "parent" objects, "child" objects, suggested locations where the children might appear in the 3-D fly-by in relation to the parent objects, and that's just about it. You can get the white paper on MCF at the URL below.
Apple has submitted MCF to the Internet Engineering Task Force (IETF) for consideration as an Internet standard for describing content, and I'm unaware of any similar counter-proposals. If the IETF does accept MCF as a standard, we can presume there will be a set of standard attributes for describing data (common things like "name", or "URL"; maybe a "description", or "creator", or other similar tags), but extra data can still be included.
What Will MCF Do For Us? In case you're digesting all this with a resounding "Big deal!" building in the back of your throat, you have to realize that most standards are boring - it's what's done with them that's interesting.
Think of HTML. The idea of marking up text with more text that indicates what the original text should look like is, well, a silly idea. It's not a compact way to indicate stylistic changes (a "bold" command can be expressed in less than one byte, rather than the lengthy <STRONG> tag), HTML source is not easy to read, and it's not suitable for advanced page descriptions.
But, HTML is easy for computers to work with, it's extensible (as we've seen), and the simple hypertext capabilities that link a phrase on a page to a completely different page led to the Web browser, which led to today's World Wide Web, which has been noted to be a Really Big Deal.
MCF has the same features - it's easy to create, easy to use, and easy for computers to work with it. For lack of a better term, I envision an MCF "browser" program that can navigate through any collection of MCF-described data. Apple's HotSauce Web site has several such MCF collections, called "X Spaces" because of the early Project X name. If you have Apple's Netscape plug-in for HotSauce, you can fly through any of these X Spaces in your Web browser.
You can also download a stand-alone HotSauce application and view X Spaces that way. It includes a choice of viewing formats recently added to the plug-in - the 3-D fly-through method, or a two-dimensional Finder-like view with folders and disclosure triangles that reveal folder contents when clicked, just like the Finder's View by Name capability. Note that the MCF file describing the data didn't change; the program is just viewing it in a different way.
There is the real key - a single way of describing a large set of data can be displayed in whatever fashion a programmer can invent. The current HotSauce visual interface isn't all that impressive in today's age of 3-D rendered graphics, but it's just a way of looking at MCF data - it would be relatively easy to create a different interface to the same data.
If every Web site generated an MCF description of itself, you could fly-through any site and find the information most relevant to you, without using a site map (which may not be useful at all; some Web site maps are woefully inadequate), or search the site as if it's a Finder window. An MCF-viewing Live Object would add that capability to any OpenDoc container on your Macintosh.
The same MCF browser or viewer part could take you through your own hard disk, through Yahoo's Web pages, through every Web site with an MCF description - even through a database that has an MCF description (imagine browsing huge databases as you could your own hard disk!) - through just about anything at all.
That's why Apple executives say MCF will do for databases what HTML does for text. If it's adopted by the world at large, as an IETF standard or otherwise, they could be right.
Competing Meta-Content Standards -- There have been other efforts to create a standard description for content, but none has a company like Apple behind it. Further, Apple's MCF inventor, R.V. Guha, has built upon the work of committees investigating such possibilities, including the Dublin Core group that has a preliminary standard. MCF is in its early stages; though Dublin Core is a little bit more academically inclined and bears a resemblance to library cataloging structures, Guha's white paper says there's no reason why the benefits of Dublin Core can't be expressed in MCF with some work to define a syntax.
What about Microsoft's Nashville project? Nashville is the code name for Microsoft's "Internet Add-On Pack," expected to come soon for Windows 95 and Windows NT platforms (apparently now also called "Active Desktop"). It's been described in the press as "building the browser into the operating system," and is supposed to include a way to let you view your hard disk as a Web page, complete with hyperlinks. It does exactly that, according to my research.
What Nashville does not do is describe both Web pages and hard disk contents in a meta-content format, then use an MCF-like technology to view both. Nashville replaces (or adds to) Windows' desktop program (their Finder, if you will) by sharing code with Microsoft Internet Explorer 4.0. If you move the discussion to more familiar Macintosh terms, then with Nashville, Web windows could open in the Finder without launching a separate browser program (just like sounds and clippings files), and you could even change your desktop to display live Web content instead of just file icons and Finder windows. You could also embed Finder-like panes into Web pages or documents.
Microsoft does all this without a meta-content format by using an ActiveX control to display file and folder views inside Internet Explorer windows. The browser itself doesn't know anything about the hard disk; it just knows about ActiveX and has an ActiveX component knows about the hard disk. (In our earlier example, OpenDoc wouldn't know about MCF, but an MCF Live Object could give that functionality to every OpenDoc document.)
Nashville's technology is nice. A future version of the Macintosh OS could go even further with OpenDoc because OpenDoc can embed any Live Object, where Nashville appears to embed only ActiveX controls inside Web browser windows or panes (you couldn't, as I read Microsoft's descriptions, have a large spreadsheet with some Web content embedded in it, but you could have a large Web page with spreadsheet content embedded in it).
Nashville is likely to be available before IETF does serious work with MCF, but since the two are not competing standards, that shouldn't make any difference except in public perception. Microsoft hasn't come out against MCF, and if it takes off as it could, Microsoft will probably embrace MCF as quickly as any other Internet-savvy company.
Access to data isn't a problem anymore, but finding useful data is becoming extremely difficult with the proliferation of sources. MCF is a potential way to make the growing Internet a little more manageable, and I can see why Apple is excited about it.