This paper presents BMF (The Burr Metadata Framework), an XML based framework for creating integrated libraries of metadata, and encoded documents. In large part BMF is an extension and expansion of the FRBR (Functional Requirements for Bibliographic Data) model proposed by the IFLA, which uses standard thesaurus relationships to create complex, scalable, hierarchical structures.
| XML Source | PDF (for print) | Author Package | Typeset PDF |
BMF [Burr Metadata Framework] uses a shorthand notation to map out the hierarchical relationships between Burrs (the basic record level building block in BMF).
Each line represents a single Burr, concept or term in a hierarchy. Boldface text indicates the logical, locus focus of the map with lines above it being broader terms, lines at the same level of indentation below it being equivalent. Lines below it with a greater indentation (depth is indicated by period for each level of depth in the hierarchy) are narrower or related terms.
Each line is made up of four fields separated by whiteface.
PT per .. Dick, Philip Kindred.
Where :
For example.
BTI work Ann Charter's Intro to "The special view of history".
PT expr . original text
NTP div .. body of text.
Note: The entity code field may be omitted in examples discussing relationships between terms but are required when discussing relationships between Burrs.
The following relationship types are defined in ANSI Z39.19: Guidelines for the Construction, Format and Management of Monolingual Thesauri. [Z39.19]
All codes should use uppercase characters
| BT |
broader term. |
| BTG |
broader term (generic). |
| BTI |
broader term (instance). |
| BTP |
broader term (partitive). |
| GS |
generic structure. |
| NL |
node label. |
| NT |
narrower term. |
| NTG |
narrower term (generic). |
| NTI |
narrower term (instance). |
| NTP |
narrower term (partitive). |
| PT |
primary term. |
| RT |
related term. |
| TT |
top term. |
| U |
use. |
| UF |
used for. |
| UF+ |
used for ... and ... |
In addition to the above relationships defined in Z39.19, BMF also uses the following codes :
| BTR |
broader term (responsibility). |
| NTR |
narrower term (responsibility). |
which are used in Burrs like a chapter or a story which are collected into a compound document.
| PRE |
previous node. |
| NEX |
next node. |
All codes should use lower case characters and be three to four characters in length.
If you do not work on an important problem, it's unlikely you'll do important work. It's perfectly obvious.
—Richard Hamming, You and Your Research [HAMMING]
In late 1997 I was sitting in Osaka, at a cramped desk on the top floor of a musty, cold, cluttered office, stinking of stale cigarettes, when I read the following:
There is no useful distinction between the representational needs of data and metadata. The kinds of information that need to be represented in metadata and data are very similar. Furthermore, every item of information, without exception, is likely to be regarded by some applications as ancillary and never to be displayed, and by others as core content that needs to be formatted, printed, or searched.
—Meta Content Framework Using XML [GUHA]
I knew at that moment, that this was the insight which would be at the core of, maybe not the next generation of the Web, but perhaps the one after.
At that time the Dot-Com bubble was ready to pop, and nowhere was this more keenly felt than in Japan which was still stinging from the collapse of a monster bubble economy that nearly wrecked the country some years before. The Web had no business or revenue models at that time. It was all just smoke and mirrors.
So I packed it in, tried to prepare for the crash, moved to the backwaters of Thailand and turned my efforts to the next generation of the Internet, an Internet which would have oodles of bandwidth into every home and office, an Internet with a browser which could support powerful and mature applications that could get real work done, an Internet with a business and revenue model that you could make real money from.
It was in this context that I latched onto the idea that information has a dual nature, like the particle-wave nature of light.
The lack of any real metadata and cataloging of Web resources was such an obvious problem, that at the time it seemed that if you could crack the problem of providing a universal metadata system, you'd have everything.
I wasn't alone. It looked like Tim Berners Lee over at the W3C was thinking along the same lines. But his approach with the Semantic Web, although brilliant, didn't feel right. It felt like a cop out.
Just because the problem of adding metadata was difficult, people gave up working on it. Everyone threw up their collective hands and said "We'll never get people to do metadata so let's try to find a way of automating the process and let the machines distill meaning from chaos". The short comings of metadata systems were brilliantly summed up by Cory Doctorow in his essay "Meta Crap".[DOCTOROW]
But that was six years ago, and as they say in New England, if you don't like the weather, wait ten minutes. This is especially true on the Internet.
We now have Wikipedia1, Distributed Proofreaders2, Del.icio.us3, Flickr4, and Technorati5.
Metadata is a matter of priorities, not how much work it takes. If you can get tens of thousands of people to volunteer everyday to proofread mind numbingly dull texts like lists of copyright renewals, nothing is impossible.
What the Semantic Web crowd was really missing, was that automated organization, sorting and uncovering patterns in collections of data is not an end in itself. Search is not everything. It's the process of organizing, sorting, abstracting and cataloging that leads to meaning and ultimately to understanding. In other words, it's the process that results in knowledge which we use to make decisions.
BMF is designed not only to be a content or a metadata framework, but a infrastructure for the process of learning, creating, sharing, collaborating and remembering.
That's about as important a problem and design goal as you can hope for.
The tree is already the image of the world, or the root the image of the world-tree. This is the classical book, as noble, signifying, and subjective organic interiority (the strata of the book). The book imitates the world, as art imitates nature: by procedures specific to it that accomplish what nature cannot or can no longer do.
—Gilles Deleuze, Rhizome Versus Tree [DELEUZE]
The small compass in which the eye can see clearly is little more than a knothole through which we are continuously taking a series of snapshots the brain uses to form a composite image, tricking us into thinking that we live in a panorama of clarity.
Memory is the mid-day light cast through the canopy of a grove of birch on a clear August day, coloring and mellowing the carpet of yellow leaves rustling and crunching beneath our passing feet. It is not the world, it is just what our feeble senses can take in, and even that is more than the brain can process and store.
So if the book, as Deleuze said, is an imitation of the world, it is an imitation twice removed from the world it seeks to ape. And if art imitates nature, it also captures our perception of nature so that others, twice more removed can see with another's eyes what has been mulched by another's mind.
Much of our lives are spent, sorting, organizing and picking out patterns in this cacophony of distorted information which is interleaved with the clear, the fuzzy, and a whole lot of line noise in between.
Every and all is schlepped onto the scales and weighed. All so that we can decide what to do.
I would be an historian as Herodotus was, looking for oneself for the evidence of what is said.
—Charles Olson, Maximus Poems, Letter 23
Isn't this exactly what we do with information? The evidence, comes to us directly through observation, but also again removed, as hearsay, tales told in bars through the amber lens of a pint glass. It's the oral tradition, the most immediate form of human intercourse.
But noise is added with each remove. And it's that noise that man has worked so hard to minimize. So we actualize human language through writing systems, but the duplication and distribution of that writing introduced a different kind of noise which Caxton's ink stained fingers finally remedied with blocks of lead viced together into a mirror image of what was wrote.
We have learned to capture light and fix it on paper. We have given it the illusion of motion by exceeding the brain's ability to discern change. We have fixed sound by reducing it to a groove etched in wax, which can be reanimated on a whim by pushing a paper cone fourteen thousand times a second.
So armed we can now transpose the garbage our senses take in, and the garbage our brains pass out and fix it all into something that is nothing short of magic! A click of a shutter, the ball on the point of a pen applying a smooth ellipse of ink on a piece of paper and we can teleport our memory and experience through any measure of space or passage of time.
Think of it. The words that Homer fixed in his present can become anyone's present so long as his words are not lost. Homer is our contemporary, as is anyone who has fixed some fragment of mind, no matter how trivial the catharsis, and passed it into the physical world.
We are what we pass on, in body, memory, experience and mind. But we are also the by-product of what others have passed on to us.
But the noise still bugs us. And with each reduction in noise, the bombardment of information is stepped up a notch.
Cut the noise and you are punished for your innovation with not just an equal but exponential bombardment of new information.
"On the Internet no one knows you're a dog," or at least so the tagline went in the early 90's. Cyberspace was thought of as being disconnected from the physical world. The interfaces were so abstract, and the few people on the Net were so geographically spread out across the planet, it certainly did feel that way. But what we were forgetting was that Cyberspace only existed because it's entire population were pounding away at keyboards in darkened rooms which were unquestionably still in meatspace.
Ideas are not physical in any sense. The products of intellectual and creative work are not property, but shadows cast by the mind as part of a process of taking in the world through the senses and then trying to make sense, identify, label, define and eventually understand in order to take action.
So cyberspace is a collective tapestry of our mind's interpretation of what our senses have gathered, overlayed and interwoven through the world.
The process of digitalization can be thought of as a technological consolidation of all our different technologies for fixing what we experience and our interpretation of them into a single system where all forms of writing, and recording of images and sound are interoperable. The network revolution is a complementary consolidation of communications, broadcasting and publishing.
But what has not yet happened is the corresponding shift in how we use this new medium.
We live in turbulent times, much like the end of the 19th century as the horse was rudely goosed to the side of the road by a puff of steam. But it wasn't steam that displaced the horse, it was the internal combustion engine which finally did that.
Before the telegraph, communication was a form of time travel where most information described events after the fact. An earthquake in Tokyo was something that happened in the past to someone living in London and vice versa.
The telegraph transformed communication so that everything that happened, happened everywhere simultaneously. Think of that. All those dots and dashes, tapped out across the wires, a beat you could almost tap your foot to and about as abstract in the moment as Bancusi's "Symbol for Joyce" (and if you got that one, I am truly sorry for you).
Like steam, the telegraph changed communications, but did not transform them for the average man. This feat was accomplished by the telephone, radio and television which brought us together into the same room.
Collapsing time changes our perception of space. In the global village, everyone is your neighbor. And every day, people all over the world turn on their televisions and see their new neighbors and mumble under their breath, "there goes the neighborhood..."
The PC revolution, the Windows GUI and the Office Suite are a lot like steam. They are clunky, transitional technologies which got people to adopt them, but aren't as revolutionary as they like to think of themselves.
Information in 19th century was based on paper. Communications, entertainment, business, government and even organized religion all used paper as the means of creating, organizing and controlling information. And, as has been said, the way an organization organizes information is the way it organizes power in that organization. Since all information was on paper, paper became synonymous with information.
The computer came along with a new way of creating and organizing information, but most people couldn't imagine information without paper, so there was little interest or adoption of computers as personal tools until the Desktop GUI, Word Processor, Spreadsheet, and presentation software gave us paper metaphors for using computers to work with information.
That paper crutch is beginning to show it's age and it's time for us to begin moving to a new conceptual framework for finding, creating, organizing and sharing information to replace it, just as the internal combustion engine replaced the steam engine.
Only when this happens will we truly have begun to live in the networked computer age.
This is where we begin.
This paper assumes the reader has a working knowledge of XML and the basic concepts in the IFLA's FRBR [Functional Requirements for Bibliographic Recrods] 6 and ANSI Z39.19 [Guidelines for the Construction, Format, and Management of Monolingual Thesauri] 7.
It's strongly suggested that the reader keep copies of these papers as companions to this paper.
At the time of writing (April 2006) BMF has stable core feature set. A working schema is in place as well as a usable alpha version of a BMF browser and development environment.
In August, 2006 the BMF Guidelines (which will be a greatly expanded version of this paper), will be released for public comment together with the BMF schema, a comprehensive set of BMF encoded content for testing applications, and a content browser and development environment running in the Emacs text editor.
BMF will be released as an open specification under a free license.
A tragic sigh. "Information. What's wrong with dope and women? Is it any wonder the world's gone insane, with information come to be the only real medium of exchange?"
"I thought it was cigarettes."
"You dream." ....
—Gravity's Rainbow, pg. 258.
BMF [Burr Metadata Framework] is built on a number of core concepts which, taken together, form a vision for the next generation of the Internet, digital content and communications.
These concepts are the consequence of the two central trends which have sparked the twin Digital and Network revolutions.
Just to get these out of the way, these are:
These two trends have a number of consequences, many of which we are already aware and others we are just beginning to recognize.
Much of this can be described in terms of information having a dual nature which is discussed in the next section but can summed up with the following five assumptions which BMF is built on:
BMF also draws on a number of other key concepts which include:
Physical media comes with a lot of baggage. In many respects, since Caxton, mankind has increasingly based whole civilizations on this baggage.
Digitization and networking have all but removed the limitations that physical media impose though most people haven't realized this yet. Centuries of living within those confines have led us to believe that they are universal laws which can't be challenged.
The limits of physical media are physical — you can only fit so many words on a page, only bind so many pages into a book before it gets too big to handle.
Once you have divided words into volumes you need a means of organizing the information in each volume. It's practically impossible for a library to create a single index for every keyword in every book in the collection, or to create a single table of contents, so these navigational devices were created only at the level of single volumes. Library catalogs could practically only seek to treat each volume as an item, so the catalogs stopped at the covers of the books.
Significant physical resources are required to duplicate and distribute physical media and economics favors larger volumes which contained a lot of information rather than smaller publications. So smaller texts were collected into larger volumes, individual songs were collected into LP's (long playing record albums) etc.
After you strip away the paper from a text, the vinyl from a record album, or the film from an image, one of the first things that starts to become apparent is that those divisions are indeed artificial and that when they are removed information begins to behave as if it has a dual nature like the dual particle-wave nature of light.
BMF is based on five general principles for how this dual nature applies to information.
The idea that data and metadata are interchangeable is both natural and astonishing at the same time.
We think of metadata as a description of something else, in the way that a card in a library catalog is an external description of a resource in a library.
But a collection of bibliographic data on a particular subject becomes a bibliography which is a work in it's own right. The title page in a book, the liner notes in an album or a telephone directory all can be thought of as data in one context or metadata in another.
If metadata and data are indeed interchangeable, then metadata is not inherently external. This leads us to a very different concept of metadata.
Metadata is not simply a description of data, but a less detailed view of that data. Metadata is data seen at a distance.
For our purposes, the document and the library are essentially the same. In other words, the traditional library-document dichotomy can be viewed as a smooth spectrum, which we consider as a whole.
Towards one end of the spectrum, the number of authors decreases and the topics under discussion become more integrated, and the information artifacts look more document-like. Towards the other end, the number of authors grows and the semantic gaps between topics increase, and the information artifacts become more library-like.
—A Scholia-based Document Model for Commons-based Peer Production, Joseph Corneli and Aaron Krowne [CORNELI]
The illusion of the distinction between document and library is in large part a by-product of the limits of physical media and Caxton's printing press.
Before Caxton, the distinction between a work and library was far less clear as was authorial ownership of documents and all sorts of other assumptions that we take for granted today. We'll come back to this point again later.
Once you have digitized all the works in a library and placed them within a single framework, the distinction is far less clear.
For example, in a digital library you can have one index rather than a different index at the end of every document. The table of contents, which is a tree, can be merged together with all of the other table of contents of all works in the library into a single tree.
The library catalog can be merged with all of the works they describe so that a bibliographic record is a description of a work at a distance.
Links between documents can lead directly to any part of any other document without the reader having to open the document like the cover of a book, work out the organization of the work and only then find the passage that was being referenced.
Many books and sound recordings are not mutually exclusive, but are collections of a number of smaller documents or songs which could easily stand on their own.
In some cases, the collection itself has value as a work in it's own right, but this does not take away from the fact that the parts could stand on their own.
Encyclopedia articles, main entries in dictionaries, newspaper stories and even chapters in many books could stand on their own without the reader needing to see any other part of the collection.
Many collections are for the sole purpose of making the amount of content that is sold on physical media viable as a commercial product. Sound recordings are well known for including songs of dubious quality to make a album with a few popular singles long enough to sell as an album and justify a themed concert tour.
But the MP3 revolution and more recently iTunes and the iPod have brought back a new age of singles. iTunes are the digital equivalent of old 45rpm records which were the backbone of the recording industry during the 50's and 60's when radio was the chief marketing vehicle for music.
The first decade of the World Wide Web was based in large part on the idea of a Web Site being a mutually exclusive collection of information. In effect, Web Sites were treated as self-contained works like a physical book. Imposing the limits of physical media on electronic media is a theme which has been repeated over and over.
For the Web, RSS [Rich Site Syndication Format] blew this idea out of the water by breaking up content so that individual articles on the Web could stand on their own, irrespective of the Web Site which published it.
The relationship between text and commentary is probably as old as texts themselves.
Commentary can take all sorts of forms, such as foot-notes, glosses scribbled in the margins of a book, or notes made while reading a book for a class. Commentary can be as small as a single word or a multi-volume work composed by an army of scholars.
The commentary made by an authoritative person with lots of letters tagged on the end of their name and published along with a document, are not functionally or practically any different than notes scribbled by a high school student doing their homework on the kitchen table.
Such commentary is often a marketing function for a publisher, who is trying to add value to a work (which might be in the public domain) to try to coax readers to purchase their edition over another.
This is not to say that such commentary is not useful or important. It is enormously important to provide context and insight into texts which were based on common knowledge used within a narrow discipline or general knowledge from a past age.
Once commentary is understood to be simply a text, which has as a subject another text, irrespective of who wrote it or how it is published, then all commentary becomes an extension of and part of a work and by extension, the collective content of a library.
It could be said that the Internet itself is all commentary. Email between friends, or in a discussion group on Usenet or on a list-server, threaded comments on Slashdot8, tags and comments about images on Flickr, bookmarks on del.icio.us, reviews on Amazon Books, and of course the entire blogsphere is all a relentless tidal current of commentary that ebbs and flows across the planet as each timezone passes from day into night.
Everything in Lisp is a list. There is no useful distinction in Lisp between the code and the data it is processing.9
The expression (+ 2 2) which is the way you write "2 + 2" in Lisp is a list with three elements where the first item is a symbol which represents a function ("+" is the name of a function which adds numbers together) and the second and third items are the numbers "2" and "2".
Documents which are marked up as Lisp data structures can be thought of in one context as a document, and in another as a program which can be evaluated (or invoked) to get a result.
To understand this, think of Harry Potter who lives in a world where magic is real. In Harry Potter's world, a device like a wand, is used to invoke spells which are spoken. This results in some kind of action which can be anything from levitating a chair, to erasing someone's memories.
Among other things, magic is based on the premise that human language, when used by someone with the appropriate skill and innate ability, has the power to effect the physical world around us. Speaking, or incanting a spell invokes unseen powers which can move and manipulate physical objects.
This belief is as old as humanity. Written texts in some contexts are believed to have magical powers in their own right. Sacred texts like the Bible are thought by believers to have the power to protect them from evil, and invoke supernatural powers.
I am writing this paper using Emacs, a text editor written in Lisp. I can move my cursor next to the expression (+ 2 2) on the screen and invoke the expression with a tap of my wand (by holding down the Control key and typing "x e"). The number "4" is returned in a window at the bottom of the frame.
A hypertext link on a Web page behaves in a similar way. When you click on a link and the browser opens up another page, you are invoking the link made between two documents.
The distinction between text and code will gradually fade. Twenty years from now, we could well have a generation of children who will have a difficult time thinking of a text as being an inert chunk of information permanently stamped on physical media.10
When content recorded on physical media has been digitized and placed in a larger framework, you have in fact ripped the covers off of all books and tossed all of the jewel cases and album sleaves (if you are old enough to remember those) into the bin.
The PC revolution was based on convincing people that computers were just electronic versions of what they already knew. And what people knew was paper.
The desktop metaphor at the heart of the graphical user interface is based on manipulating and managing pieces of paper.
The now ubiquitious "Office Suite" is little more than a metaphor for it's paper counterparts. Word processers are typewriters, spreadsheets are ledgers, and presentation software like Powerpoint is foam core on an easel.
The Web too is built on paper metaphors. The Browser Wars were driven, at least in part, by the addition of proprietary features by Netscape and Microsoft that people were demanding to make Web pages look and feel more like paper based documents, magazines and catalogs.
Many traditional publishers who established Web sites brought with them the same territorial attitude that they had about physical media. They wanted people to first visit their home page before seeing any other content on the site in the same way that you have to see dust jacket of a book before seeing what's inside.
The consequence of the digitalization and networking of all content and communications is to erase the illusion of each work being a self-contained universe which is created by the limits of physical media.
The first major crack in the paper legacy was with the widespread adoption of P2P. Napster so completely destroyed the music record album as a mutually exclusive unit of content that the recording industry was left dumbstruck and it was left to companies like Apple with iTunes and Musicmatch to cash in on the new era of music singles.
The second great fissure was RSS which pulled content from millions of blogs into a breathtaking interconnected Web of content, rather than just a network of Web Sites.
Much of the anguish and beating of breasts by publishers and authors when Google Print was launched have nothing to do with copyright violations. What really scared them, though they probably didn't know it, was that Google had violated the sacred covers of the book and replaced the index at the back of the book with an index which could be used for all books ever written. Google had ripped off the covers and shattered the illusion that a book was a self-contained universe which can't be messed with.
This was as rude a shock to the publishing world as P2P was to the film and music industry. It never occurred to anyone to think that something as sacred as the sanctity of the covers of a book could be violated. The novelist John Updike recently summarized these sentiments in an anti-ebook rant in the New York Times, heavily laden with nostalgic memories of bookshops. [UPDIKE]
This same process will be repeated again and again at all levels of the information hierarchy until everything has been digitized and assimilated into a single global fabric of information containing all of mankind's experience and memory.
The Lisp concept of the REPL [Read Evaluage Print Loop] is all around us. Any process that collects information, requires you to do something with it and then take some kind of action with it, is an instance of the REPL.
The term REPL comes from the process used to write Lisp programs. But it is also a good way of thinking about more general and practical issues of how humans work and process information.
Lisp is a programing language which has been around since 1958. In fact the only programming language older than Lisp which still in active use is Fortran. Lisp was far ahead of it's time. Many of it's most powerful features have only been introduced into more popular languages like Perl and Python in the last few years. Many people still consider Lisp to be more powerful than any other programming language. The Read, Evaluate, Print Loop (REPL) is a part of the Lisp development environment for writing Lisp programs.
Lisp languages are frequently used with an interactive command line, which may be combined with an integrated development environment. The user types in expressions at the command line, or directs the IDE to transmit them to the Lisp system. Lisp reads the entered expressions, evaluates them, and prints the result. For this reason, the Lisp command line is called a "read-eval-print-loop", or REPL.
—Wikipedia: Lisp programing language [WIKIPEDIA-LISP]
So why are we using the term REPL? After all, we could just as easily call it the "Search, Process, Publish Cycle" or SPPC. Is there a reason for using such obscure hardcore geek terminology? Well, yes.
The REPL embodies both the human process, as well as the machine process and keeps in mind our fifth principle that there is no useful distinction between text and code.
One of the most simple and elegant examples of the REPL is found in practically every office on every desk in the form of the ubiquitous, in-tray, pending-tray and out-tray.
Information is dropped into your in-tray. In many offices there is a cover note which indicates where the information came from, who sent it, what action you are required to make and then a list of other people who are expected to receive the information.
You take a look at it, evaluate it. Then you either deal with it right away, perhaps by just reading it, and marking on the note that you've seen it. You then drop it into your out-tray and it is picked up and filed or passed on to the next person in the chain.
If you can't evaluate something right away it is then put in a pending-tray to be evaluated at a later time.
Many people also keep in and out-trays on their desk at home, but in many cases (including myself) they tend to fill up without things ever moving out the in-try. Over time, pending and out-trays eventually just becoming holders for the overflow when the in-tray has reached capacity.
The reason for this is that there mechanism like the cover or action-note attached to items to keep information flowing and no-one to pickup things from the out-tray and pass them on to others.
We will come back to this point later.
An idea often proceeds and triggers the REPL which can be anything from something funny in an email which you want to remember, or a news story about a new product which you think you might be interested in. Any information you find of interest which you want to remember, or know more about, or might be of interest to someone you know is all fodder for the REPL.
Sometimes this will lead to an action, or something that lead to writing a report, or proposal, making a purchase or changing jobs.
The REPL represents a process which employs any number of techniques and approaches. It's worth looking at each step in the loop.
The read process of the loop includes searching, collecting and remembering information that we are looking for or that we come across.
Searching and collecting information is a continuous, ongoing process. Sometimes this is done deliberately done, and other times information may be sent in an email or in dropped in the inbox and kept until it can be evaluated at a later date. It's common to collect information on specific topics over days, weeks, months and even years before there is enough information to be acted on.
In many respects, the evaluation process is the most important part of the REPL and oddly enough, it's the part that has received the least attention from software developers.
The evaluation process uses what we find and collect to make sense of it and decide on actions to take based on what we come up with, the evaluation process includes a wide variety of techniques which are used in any number of ways by each person depending on their preferences and the job at hand.
This requires a set of tools which can be as simple and general or as complex and fine-grained as is needed. The evaluation process is as much a creative process as much as a formal process and tools should be flexible enough to work with whatever information you are working with, rather than imposing limits on how you can display, edit, sort or publish that information.
The print process involves editing the results of the evaluation process into a format that can be understood by others and distributing it.
The most informal way to do this is through email. Email allows us to easily exchange information with other people or groups (mailing lists).
In the past few years, blogs have emerged to fill a need for publication which is more formal than an email, but it is far less formal than something that has been published. A Blog is what we come up with after going through the initial evaluation process. It's a means of getting feedback on ideas in progress and whatever other half-baked stuff that is going on in our brains.
For formal publication there are Journals, Newspapers and Books which have traditionally been paper based but are increasingly being replaced with Web-based services.
But formal publication is not just a matter of a single person making something public. Publication is a collaboration which requires intermediate steps including peer-review (or review by some authority), copy editing, and formating which follow conventions that makes it easier for people to understand what is being published.
The print process is a means of travel, both through space and time. The added steps taken for formal publication are important in order for works to become part of mankind's collective knowledge.
The feedback we get from the print process is then fed back into the beginning of a new loop to be read again. This will then spark new ideas which prompt us to search, collect and remember new things which are passed on to be evaluated again.
This process is used to understand change, to make decisions, and to contribute new information for publication and addition to mankind's collective knowledge and memory.
There is still the problem of exchanging information between people and people, groups and groups and between people and groups.
Each person or group is surrounded by a sphere of information which is processed using the REPL. This information sphere is made up of all of the email, notes, addresses, receipts, images, media, publications and other information which has been collected by the REPL process and makes up the base of information which is used to understand the world around us and to make decisions on how to deal with the world as things change.[ENGELBART]
There is no single way of accomplishing this. The way we organize information and put it in context determines the value and meaning of that information. Everyone uses a different process to collect information for different purposes, and evaluate and organize that information in different ways.
If you send information to another person or group without the context and structure that gives it meaning, the person or group receiving it will spend a large amount of time pulling that information apart and organizing it and putting it into context that they can understand and be used by their own REPL process.
What is missing is a means of making it easier for each person or group to easily integrate information sent to them into their own REPL process without having to strip everything down to bare wood. This has the potential of saving an enormous amount of time and resources in the exchange of information.
We have already touched briefly on the idea of metadata as data at a distance. In 3D modeling and animation, there is a similar concept called LOD [Level of Detail].
A 3D model is made up of polygons. The more polygons you have, the more detailed the model. And the more detailed the model, the more clock cycles your computer will have to burn to render them on screen.
The model for King Kong in the recent remake of the movie is likely composed of millions of polygons. And in complex scenes Kong will have to share the stage with any number of other high polygon models including dinosaurs, giant cockroaches, buildings etc.
For close ups you need all of those polygons to create a realistic image, but if the shot is from a distance, most of the detail is wasted. Your computer is computing polygons which will never be seen.
LOD is used to reduce the number of polygons in a model the farther away it is viewed. This saves an enormous amount of computational power that can be put to better use in models which are close up.
The same principles can be applied to a book or even a library.
If you are standing across the room from a book on a shelf you are looking at the book from a distance. All you might be able to read is the title, author and publisher on the spine. Walk up to the book, take it off the shelf and open to the title page which shows metadata describing the book in more detail. Go to the table of contents and you are closer still.
- list display -- title, author
- scope note -- one or two lines describing the item
- detailed metadata and scope note.
- introductory note or synopsis
- detailed introduction or analysis
- table of contents
- chapter synopsis
- text of chapter
This hierarchy of detail is not simply a convenient means of organizing and finding information, it is an important part of the creation process.
Creating information within a framework which incorporates LOD is far more flexible in how and what you can create. This is covered in more detail in the next section.
If the REPL represents the larger repeated process of acquisition, evaluation and publication, then what is happening in each iteration of the the loop?
Knowledge advances through the advancement of increasingly more complex and accurate systems which build one atop the next without canceling out what came before. Quantum mechanics was built on Einsteinian Relativity which was built on Newtonian Physics which had been built on Copernicus' model of planetary motion.
This principle goes to the heart of the process of creation.
In programing there are two great design methodologies, top-down and bottom-up. Top-down favors the prepared, while bottom up favors the prepared mind.
A top-down approach to writing a novel might be to define the setting for the novel, outline the characters, and then writing an outline for each chapter. When the outline is complete you simply write every chapter according to your outline.
Top-down is favored by large organized projects and is perfect for projects like bridges, rockets and dams which need to have all the kinks worked out beforehand or there could be some nasty consequences.
In terms of what we've been talking about, a top-down approach starts by describing something from a distance and then approaching what you are creating, by creating increasingly more detailed descriptions until you are finished.
A bottom-up approach might start with writing a simple sentence like "Marley was dead: to begin with" without a clue as to who Marley was or how, when or why he was dead. From there you just continue writing and let, to paraphrase Tolkien, "the tale grow in the telling."
Bottom-up is an organic meandering experimental learning, process, full of blind allies, wild epiphanies and a lot of mistakes along the way allowing you to create things that you hadn't intended when you set out.
Bottom-up can start anywhere, from a distance or right smack in the middle. From there you can work your way closer by adding detail, adding new threads, lengthening existing ones and unraveling bits that you don't like as you go along.
In practice, people tend to use a mix of both top-down and bottom-up, a combination of planning peppered with taking advantage of the unexpected encountered along the way.
Collections of information, no matter how large or small must reflect both top-down and bottom-up methodologies. An electronic library should be able to represent works in progress, aborted drafts and anonymous fragments as transparently as it can handle polished published masterpieces.
The <hi> element is used to mark words or phrases which are highlighted in some way, but for which identification of the intended distinction is difficult, controversial or impossible. It enables an encoder simply to record the fact of highlighting, possibly describing it by the use of a rend attribute, as discussed above, without however taking a position as to the function of the highlighting. This may also be useful if the text is to be processed in two stages: representing simply typographic distinctions during a first pass, and then replacing the <hi> tags with more specific tags in a second pass.
—TEI Guidelines, 6.3.2.2 Emphatic Words and Phrases [TEI5]
The process of creating complex, semantic markup and metadata is hard work which takes time, and a lot of thought. In a world of exponential change, all of these things seem in short supply.
Depending on the task at hand, people won't adopt a system which is too difficult to do simply things or is too simple to do complex things with.
Even if your ultimate goal is to create something rich and complex, if it takes too much effort to start the process, not many people will get very far.
So an important design goal for BMF is to make it as simple as possible to jot down a note which enters the system with little or no thought and then at a later time that note can be added to, linked to other related terms, and eventually develop into as complex and dense a structure as is needed.
This can be accomplished by doing composition and markup in multiple passes without there being any requirement for anything to be more complex than it is in order to become part of the larger collection.
The idea of marking up texts in multiple passes is certainly nothing new, but it hasn't had a lot of attention lavished on it either.
To be clear, we are talking about markup here, not application user interfaces which hides the markup and presents a relatively simple interface to the user. Everything we will discuss in this section should be possible using a good text editor with basic syntax hi-lighting.
No one syntax will accomplish this, so instead we will use three different syntaxes which build one on top of each other. And just as importantly can gracefully degrade as well.
Let's now use an example to start with the most simple encoding to the most complex semantic markup possible.
Our example is a simple entry from the Dictionary of Angels11.
At the bottom of the ladder is structured plain text. We prefer to use UTF-8 for all text, but for this example let's use basic ASCII.
Omael -- an angel who multiplies species, perpetuates races, influences chemists etc. Omael is (or was) of the order of dominations and is among the 72 angels bearing the mystical name of God Shemhamphorae. Whether Omeal is fallen or still upright is difficult to determine from the data available. He seems to operate in both domains (Heaven and Hell. [Rf. Amberlain, La Kabbale Pratique.]
Plain text has a lot going for it. Basic structural formating like paragraphs and sentences, lists etc can be easily indicated and there are a wide variety of tools for processing plain text.
But it is difficult to unequivocally indicate sections, headers, bold or italic text. To do this we can use a Wiki Markup language12.
**Omael** -- an angel who multiplies species, perpetuates races, influences chemists etc. *Omael* is (or was) of the order of dominations and is among the 72 angels bearing the mystical name of God Shemhamphorae. Whether *Omeal* is fallen or still upright is difficult to determine from the data available. He seems to operate in both domains (Heaven and Hell). [Rf. Amberlain, La Kabbale Pratique.]
The wiki markup is simple and easily converted into HTML or in our case, simple BMF. Block level and inline markup in BMF is based on TEI, so the following markup may look familiar.
<p><hi>Omael</hi> -- an angel who multiplies species, perpetuates races, influences chemists etc. <hi>Omael</hi> is (or was) of the order of dominations and is among the 72 angels bearing the mystical name of God Shemhamphorae. Whether <hi>Omeal</hi> is fallen or still upright is difficult to determine from the data available. He seems to operate in both domains (Heaven and Hell). [Rf. <hi>Amberlain, La Kabbale Pratique</hi>.]</p>
But now we might want to mark this up more carefully, identifying each name and title and treating this as a formally marked up text in a division entity within an expression entity representing the book.
<p><pn>Omael</pn> -- an <top>angel</top> who multiplies species,
perpetuates races, influences chemists etc. <pn>Omael</pn> is (or
was) of the order of <top>dominations</top> and is among the 72
angels bearing the mystical name of God <pn>Shemhamphorae</pn>.
Whether <pn>Omeal</pn> is fallen or still upright is difficult to
determine from the data available. He seems to operate in both
domains (<pl>Heaven</pl> and</pl>Hell</pl>). <ref>[Rf.
<tit>Amberlain, La Kabbale Pratique</tit>.]</ref></p>
Proper names have been marked up with the <pn> element, concepts with the topic <top> element and titles of works with the title <tit> element.
This is as far as most document-based markup languages will go. But BMF can then even go a step further by turning this entry into a proper record for the Angel named Omael.
First we'll use BMF Wiki shorthand to outline the Burr. The following markup is used in Emacs Burs, a BMF browsing and development environment. The Wiki syntax is based on Emacs Muse-Mode wiki syntax and is still in development.
* hierarchy
$TT top Dictionary of Angels (topicspace)
$BTI top beings (mythical & legendary)
$BTI top dominions (angelic order)
$PT per Omael (angel; fallen or upright)
* terms
$PT Omael (angel; fallen or upright)
$UF Shemhamphorae (used for the angel, Omael)
meta:
## entityType : person
## PersonalName : Omael
## Affiliation : Heaven; Hell.
## Roles : Angel.
* scope
An angel who multiplies species, perpetuates races,
influences chemists etc. Omael is (or was) of the order of
dominations and is among the 72 angels bearing the mystical
name of God Shemhamphorae. Whether *Omeal* is fallen or
still upright is difficult to determine from the data
available. He seems to operate in both domains (Heaven and
Hell).
* references
- Dictionary of Angels. pg 212.
- Amberlain, La Kabbale Pratique
We could can then mark this up it using BMF XML syntax. This example is simplified and shortened to make it more readable.
<BURR typ="per">
<sec typ="hierarchy">
<i r="TT" e="top" l="Dictionary of Angels" q="topicspace" />
<i r="BTI" e="top" l="beings" q="mythical & legandary"
<i r="BTI" e="top" l="dominions" q="angelic order" />
<i r="PT" e="per" l="Omael" q="angel; fallen or upright" />
</sec>
<sec typ="terms">
<i r="PT" l="Omael" q="angel; fallen or upright" />
<i r="UF" l="Shemhamphorae" q="used for the angel, Omael" />
</sec>
<sec typ="meta">
<entityType l="person" />
<personalName l=Omael" />
<affiliation>
<i l="Heaven;" />
<i l="Hell." />
</affiliation>
<roles>
<i typ="preferred" l="Angel." q="preferred"/>
</roles>
</sec>
<sec typ="scope">
<p><pn r="PT">Omael</pn> -- an <top r="BTG">angel</top> who
multiplies species,
perpetuates races, influences chemists etc. <pn>Omael</pn> is (or
was) of the order of <top>dominations</top> and is among the 72
angels bearing the mystical name of God <pn>Shemhamphorae</pn>.
Whether <pn>Omeal</pn> is fallen or still upright is difficult to
determine from the data available. He seems to operate in both
domains (<pl r="RT">Heaven</pl> and<pl r="RT">Hell</pl>).</p>
</sec>
<sec typ="reference">
<i id="DOA" r="BTP" l="Dictionary of Angels"
<a>Dictionary of Angels</i><b>/ Gustav Davison.
- Toronto, Collier-Macmillan, 1967. - pg. 212.</b>
</i>
<i id="AMBERLAIN" r="BT" l="La Kabbale Pratique">
<a>La Kabbale Pratique</a><b>/ Robert Amberlain.
- Paris: Editions Niclaus, 1951.</b>
</i>
</sec>
Multiple pass markup may not be painless, but it should at least ease the pain as much as possible.
info-civilians are remarkably cavalier about their information. Your clueless aunt sends you email with no subject line, half the pages on Geocities are called "Please title this page" and your boss stores all of his files on his desktop with helpful titles like "UNTITLED.DOC."
This laziness is bottomless. No amount of ease-of-use will end it. To understand the true depths of meta-laziness, download ten random MP3 files from Napster. Chances are, at least one will have no title, artist or track information — this despite the fact that adding in this info merely requires clicking the "Fetch Track Info from CDDB" button on every