This is a history of OpenURL, followed by a summary of its capabilities, and a discussion of the extent to which it can address the requirements of the learning content federations being supported by the FRED project.
Open URL in libraries
Rather than giving an historically accurate account of OpenURL, we will couch the entire discussion in terms of identifiers, since that has been the main framework for FRED to deal with object retrieval.
Say you're browsing a Nature article in an online repository. The Nature article includes in its bibliography a reference to a paper in Science. As Elsevier (or whoever publishes Nature), you would like to provide a hyperlink from Nature to the Science paper.
Now, by 1998 this problem was already solved: you could link to the DOI of the Science paper:
* <a href="doi:10.234/3478"> Stalin, J.V. 1953. On Marxism and Linguistics. Science 43: 856-899. </a>
That DOI, 10.234/3478, provides an identifier for Stalin's paper; it's persistent, it's registered, and it's resolvable.
The catch is, where that DOI resolves to. By default, it takes you to Elsevier's homepage, with maybe some metadata about "Marxism and Linguistics", and an invitation to buy the PDF for $4000.
Now that's no good. If I'm at Melbourne Uni, I want the URL to point to the Melbourne Uni Library copy of the paper (if available). I don't want to be sent off to Elsevier.
So we can add the Melbourne Uni PDF URL to the Handle record for 10.234/3478, right?
Well, no, because there are two problems:
- Every university under the sun will want to put their local URLs, once they've licenced the content, into 10.234/3478 ; that Elsevier Handle is going to be nightmarish to maintain, and Elsevier is not about to turn the Handle record over for updating by 4000 institutions anyway.
- Every university wants the DOI to resolve to their own copy of the PDF, not someone else's. Which means someone has to create a smarter Handle server, to pick the URL specific to a particular university; and not --- as now happens --- a random URL.
Now, the link is not actually a doi:10.234/3478 link at all. It is a http://hdl.handle.net/10.234/3478. So I'm saying that instead of http://hdl.handle.net/10.234.3478 always resolving to http://elsevier.com/mucho-dinero/10.234/3478, I want it to resolve to http://lib.unimelb.edu.au/nature/nature3478.pdf if I'm at Melbourne Uni, to http://lib.monash.edu.au/pubs/10/234/nature3478.pdf if I'm at Monash, etc.
But Elsevier want to create only one link, for all instances of the PDF it sends out to universities. They shouldn't be handcrafting different URLs for different customers. They should just link to something that looks like http://hdl.handle.net/10.234/3478.
What we want is an appropriate copy service. This service is given an identifier, and resolves to your instance of the identified resource. http://hdl.handle.net can't do that.
Van de Stompel's solution was simple. Way too simple in fact. Each university knows what the URL its DOI resolves to is, right? Well, don't insert a full URL into the PDF at Elsevier at all. Instead:
- At Elsevier's side (when the content is produced), insert as your hyperlink a URL query for the identified resource --- but leave the server blank: http://________/?id=10.234%3F3478.
- At the university, you have a server which maps DOIs to local URLs. It is an appropriate copy server; e.g. http://openurl.unimelb.edu.au.
- At runtime, when you're about to display the content (including the hyperlink) to a consumer at a university, fill in the blank with the uni's appropriate copy server address: http://openurl.unimelb.edu.au/?id=10.234%3F3478
What have we achieved by this?
- Elsevier only has to write one link for all copies of its content: http://________/?id=10.234%3F3478.
- Each university fills in its own Appropriate Copy server address.
- It just works.
It's not a good solution for static content: you have to be able to insert a server name into the URL at runtime. But it does the job: you have your own database, at openurl.unimelb.edu.au , tell you where your copy of 10.234/3478 lives, without having to edit or even consult the Handles record maintained by Elsevier.
Now IDs are not the only way to identify a record. You could identify a record by Author+Title. Or ISSN. Or anything you want:
- http://________/?id=10.234%3F3478
- http://________/?author=Stalin&title="Marxism and Linguistics"
- http://________/?issn=1034578345X
Definition
DOI, Author, Title, ISSN --- they're all just attributes of a record. So I need to define which attributes I can query by, to retrieve a matching record.
Notice that these attributes (or combinations of them) act as database keys. The idea is that each query retrieves a single item as the appropriate copy. This is not a search: you would not have a hyperlink to http://________/?author=Stalin, because you don't want to look at the complete works of Stalin: links are always to specific resources.
The attributes of the object you're looking for determine what the URL is going to be. But that's not the only object involved:
- My user profile might include the fact that I am visually disabled. Since I'm already customising links to resources dynamically, through a query, I could pass this information along as a query parameter, and get an audio or large print version of the PDF, instead of the default.
- I might be clicking on the link embedded in a French translation of Nature. If the link knows it's currently embedded in French text, it can also use that information for the query to pick a French language PDF as the resolution.
So Van de Stompel ended up creating OpenURL.
- OpenURL is a syntax
- for querying a server
- for the appropriate copy
- of a resource
- specified by unique attributes
- of the resource
- AND the requester
- AND of other entities involved in the transaction
- the attributes in turn being specified in an external schema
- e.g. Articles have: DOIs, Authors, Titles, ISSNs
- e.g. Web pages containing links have languages, HTML versions, default stylesheets
- e.g. Users have acessibility requirements, languages, physical locations, institutional affiliations
Because it's a consistent syntax, any number of vendors can use it to formulate links to appropriate copies, without worrying about the details of local systems.
Because it's extensible (you can make up your own schemas), any number of attributes can be invoked to further customise the URL retrieved to user requirements.
That's the promise of OpenURL. However:
- Everyone until recently has been using the same commercial implementation of OpenURL, and the Open Source OCLC solution we're about to go with is not battle-hardened (and barely even documented).
- OpenURL has only been used in the context of institutional libraries, so noone's extended the schemata past what you need for libraries --- books and papers.
So what do OpenURLs actually look like?
Data Model
OpenURLs are all about changing the resolution of the query based on context. The link should be different if I'm in Melbourne Uni or at Monash, if I'm visually disabled or not, if the web page with the link is in French or English, and so on.
OpenURL has philosophised context into SIX entities. Each of these entities has attributes you can use to refer to them by. Each entity has a schema for those attributes.
- Attributes can be specified in-line in the query ("by-value metadata");
- or they can be included in an external file, whose location is given in the query ("by-reference metadata").
- Identifiers are a special attribute, treated separately by the standard.
Some entities are more obviously relevant than others.
- The Referent is what you're linking to.
- The Requester is who wants access to the referent.
- The ServiceType is what you're doing to the referent.
- The Resolver is what does things to the referent.
- The Referring Entity is what contains the link.
- The Referrer is what creates the link.
Let's run through this in our library scenario.
Jock McJock is reading an article by Lenin in Bolshevik Quarterly. He's reading it through Omsk Uni's library. Bolshevik Quarterly is put online by Komsomol Press, and that's who Omsk Uni have licensed the content from. The Lenin article has a hyperlink to an article by Stalin in Communist Daily. Ideally, that hyperlink should link to the full-text article by Stalin in the Omsk Uni copy of Communist Daily. Omsk Uni has licensed Communist Daily from Comintern Publishers.
- The Referent is the Stalin article.
- The Requester is Jock McJock.
- The ServiceType is Browse Full-Text PDF.
- The Resolver is libary.omsk-uni.ru, the Omsk Uni OpenURL service
- The Referring Entity is the Lenin article
- The Referrer is Komsomol Press
Why do these entities matter?
- Referent: The URL resolved to is going to look different if I ask for the Lenin article or the Stalin article, obviously.
- Requester: The URL may look different depending on Jock's user preferences; e.g. he's visually impaired.
- It may also look different depending on Jock's location --- assuming Omsk Uni has more than one campus. The article may be fetched from the Siberian campus server or the Costa Rican campus server.
- ServiceType: The URL will look different if you're asking for a full PDF, HTML, an abstract, or an RSS feed.
- Referring Entity: The URL may look different depending on the context it appears in.
- If I'm browsing Bolshevik Quarterly in French, that's a property of the object I am browsing, not the user. The links from the French version should be to French-language versions of material where available.
- Referrer: Occasionally, the URL may look different depending on who made the object it appears in.
- Say Omsk Uni has licensed a copy of Communist Daily from both Komsomol Press and Comintern Publishers. I'm looking at a copy of Bolshevik Quarterly supplied by Komsomol. Its hyperlink to Communist Daily should by default go to the Komsomol copy, for consistent user experience. (Same branding etc.)
- The Resolver is specified by the server address, which gets filled in at runtime. We could parameterise extra information about the server in the query, but that seems redundant. However, we can specify multiple resolvers in the one query, so that all of them can be attempted.
Let's try that again with an e-learning example:
Shane, a student at Cairns High, is looking at a learning object on volcanoes. They are accessing the learning object through Learning Place, the Queensland school jurisdiction server. The learning object was created at the e-learning provider TLF --- who embedded "see also" links in the metadata to other related objects; for instance, a learning object on Sicily. When the student clicks on these links, they are meant to get abstracts of the learning objects, and links to the local full version where available.
- The Referent is the object on Sicily.
- The Requester is Shane, at Cairns.
- The ServiceType is Abstract.
- The Resolver is Learning Place
- The Referring Entity is the object on volcanoes
- The Referrer is TLF
Let's go further and see what values we might need.
- The referent has a TLF ID, which will be used consistently to refer to the object. It's TLF763.
- The requester has two important attributes: that they're a student, and that they're at Cairns.
- The Sicily object has a student version and a teacher version, with answer keys. We'll link to the student version.
- Learning Place has local hubs in Townsville, Birdsville, and Brisbane. Shane's in Cairns, so we'll link to the Townsville copy.
- The ServiceType has a vocabulary: abstract, fulltextpdf, fulltexthtml. We use the abstract version as the link from the volcano object; that abstract will itself link to the full text pdf version.
- The Resolver is Learning Place, but we already know that because Learning Place insert their server name into the OpenURL at runtime.
- The Referring Entity uses a particular CSS for rendering; to impose consistency, we can tell the OpenURL query to use the same CSS.
- Because the referrer is TLF, we know that we may need to check our licensing arrangements with TLF before resolving the link. We may need to interpose a license agreement page.
OpenURL Parameters
So let's look at the parameters:
- Each of the six entities has a three-letter abbreviation:
- Referent: rtf
- Referring Entity: rfe
- Requester: req
- Service Type: svc
- Resolver: res
- Referrer: rfr
- The combination of entities makes up a context object (ctx), which has its own attributes (version of notation, charset, ID, ttl)
- An entity may be specified by identifier. The descriptor (the attribute in the OpenURL query) is XXX_id ; e.g. rft_id for referent ID; svc_id = fulltextpdf for Full Text PDF browse.
- Identifier namespaces get registered separately. So do transports.
- An entity may be specified by attributes in-line. This is by-value metadata.
- Attributes follow a defined schema, called a Metadata Format. Metadata Formats have URIs, specifying their serialisation (how are arguments represented?), their constraint language (how is the format notated) and their constraint definition (what values and cardinalities are legal).
- e.g. the inline convention is KEV serialisation (key-value pairs), MTX (Z39.88-2004 Matrix Constraint Language: a simple tab-delimited file) and Z39.88-2004 Matrix Constraint Definitions.
- Z39.88 is the OpenURL standard
- if you use XML for your record, the serialisation is XML, and the constraint language and definition is XML Schema
- Metadata formats have the descriptor XXX_val_fmt
- Individual attributes have the descriptor XXX.ATTRIBUTE
- For instance: I want to refer to a journal article by author and title. The attributes are:
- rft_val_fmt = info:ofi/fmt:kev:mtx:journal (I am an OpenURL identifier, for metadata formats, in KEV = key-value format for my arguments, MTX = Matrix Constraint Language notation for the format description itself, and the metadata format is for journals.)
- rft.aulast = Stalin
- rft.atitle = Marxism and Linguistics
- An entity may be specified by attributes in an external file.
- Attributes still have a metadata format, identified as XXX_ref_fmt
- The external file is a URI, with the descriptor XXX_ref; e.g.
- req_ref_fmt = http://omsk-uni.ru/mxt/ldap.html : Homebrew LDAP server metadata format
- req_ref = ldap://ldap.omsk-uni.ru:389/jockmcjock : Jock's LDAP record
The metadata formats, serialisations, namespaces, transports, etc. etc. get profiled as community profiles. The San Antonio Community Profiles define OpenURL as used by libraries, with key-value pairs and XML.
A FRED customisation of the OpenURL for the Sicily example above could look like this:
http://openurl.tlf.edu.au?
ctx_ver = Z39.88-2004 # We're using the Z39.88 (OpenURL) standard for key-value pairs
rft_id = info:hdl/100.200.1/7348 & # I'm retrieving the appropriate copy of Handle 100.200.1/7348, the Sicily object
req_val_fmt = info:hdl/100.200.1/72 & # I've created a format for requester metadata -- and made it a Handle. This metadata record contains:
req.id = shane@cairns.hs.edu.au & # Shane's email address
req.location = cairnsHS & # Shane's location, using a vocabulary
svc_id = http://tlf.edu.au/svc/abstract & # IDs belong to a registered namespace; I'll just use a URL to specify the "abstract" service, rather than make up a metadata format
res_id = http://learningplace.qld.edu.au & # I'll resolve at Learning Place by default...
res_id = http://openurl.tlf.edu.au & # but I'll keep TLF in there as a backup resolver
rfr_id = http://www.tlf.edu.au & # The OpenURL appears in TLF-generated content
rfe_id = info:hdl/100.200.1/7347 # This link appears in TLF7347, the Volcano object
