Table of contents
|
Document information
Title: Application profile for OpenURL Context Objects and SUSHI
Subject:
Moderator:
Version:
Date published:
Excerpt: Write an excerpt here
(Optional information)
Type:
Format:
Identifier:
Language:
Rights:
Tags: wo
, dataformats
, data-transfer
, sureworkinggroup
|
|
Document History
| Date |
Version |
Owner |
Changelog |
PDF |
| |
|
|
|
|
Abstract
The abstract describes what the application profile is about. It should contain a problem definition, the standards described by the application profile and the goal of the application profile.
1. Aspects of usage events to be recorded
All Dutch repositories make use Apache server software for the maintenance of their repository websites. The Apache log files will be used as the primary source of information on usage statistics. The table below contains a typical entry from an Apache log file.
| [13/Jul/2009:09:36:43 +0200] 66.249.72.138 TLSv1 RC4-MD5 openaccess.leidenuniv.nl "GET /handle/1887/9742/items-by-author?author=Walker%2C+C.E. HTTP/1.1" 10911 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 200 24581 0 |
The recommendations in this report will closely follow the findings of the JISC Usage Statistics Review which was released in September 2008. One of the main aims of this survey was "to propose a standard for the aggregation of repository log files in order to provide comparable usage statistics". The report stipulates that the following items of information must minimally be provided:
- Who (Identification of user/session)
- What (Item identification)
- Type of request performed (e.g. full-text, front-page, including failed/partially fulfilled requests)
- When (Date and time)
- Usage event ID
The following data elements are considered optional:
- From where (Referrer/the referring entity)
- Identity of the service (Item identification)
The JISC Usage Statistics Review also recommends that usage event should be exchanged in the form of OpenURL Context Objects, and that automated access by robots should be tagged.
The remainder of this section will describe which elements from the Apache Log File may be used to record the generic aspects that were mentioned in the JISC Usage Statistics Review.
a. Identification of user/session
| Request IP-address |
|
| Description |
The IP-address of the agent that has sent the request. |
| Usage |
Prohibited, as giving the full IP-address is not allowed by international privacy laws |
| Format |
four decimal number separated by a dot. |
| Example |
132.229.202.153 |
| C-Class subnet |
|
| Description |
The first three bytes of an IP-address, which are used to designate the network ID. It is similar to the IP-address, with the crucial difference that the final (most significant) byte, which desinates the HOST ID has been left out. |
| Usage |
Mandatory |
| Format |
three decimal number separated by a dot |
| Example |
132.229.202 |
b. Item identification
| Document identifier |
|
| Description |
The document identifier provides a globally unique identification of the resource that is requested. |
| Usage |
Mandatory |
| Format |
The identifier must be given in the form of a OAI identifier. Permitted identifiers include handles, URNs and PURLs. |
| Example |
https://openaccess.leidenuniv.nl/dspace/handle/1887/584 |
| Document identifier |
|
| Description |
The URL of the object that was downloaded. This URL must contain an indication of the object file. |
| Usage |
Mandatory |
| Format |
The identifier must be given in the form of a OAI identifier. Permitted identifiers include handles, URNs and PURLs. |
| Example |
https://openaccess.leidenuniv.nl/dspace/handle/1887/584 |
| File format |
|
| Description |
The file format refers to the MIME type of the object that was requested. It indicates, for example, whether an HTML file or an PDF file was downloaded. |
| Usage |
Optional |
| Format |
The MIME type as provides in the IANA list of registered MIME Types. http://www.iana.org/assignments/media-types/. In the case of a metadata view, use the MIME type text/html. |
| Example |
application/pdf |
c. type of request performed
| HTTP Status Code |
|
| Description: |
HTTP response status codes provide information about the status of the request and, as such, indicate whether or not the request was successful. The most common codes are 200 (the server successfully returned the page), 404 (the page that is requested does not exist) and 503 (the server is temporarily unavailable) |
| Usage |
Optional |
| Format |
The code must be provides as a three-digit code, similar to the code that appears in the Apache Log file. A full list of HTTP Status Codes can be found at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html |
| Example |
200 |
d. date and time
| Request Time |
|
| Description |
The exact time on which the usage event took place. |
| Usage |
Mandatory |
| Format |
The request time must be given in a format that that conforms to ISO8601. The YYYY-MM-DDTHH:MM:SSZ representation must be used. Note that this format may differ from the format that is provided in the Apache log file. |
| Example |
2009-07-2908:15:46 +0200 |
e. usage event id
| Usage Event ID |
|
| Description |
Unique identification of the usage event. This identification will be generated, and it can not be derived from the Apache log file. |
| Usage |
Mandatory |
| Format |
The identifier will consist of a combination of the item identifier, the date, an identification of the institute and a generated number. |
| Example |
|
f. from where
| Referrer |
|
| Description |
The environment which has directed the user to the requested object. This usually refers to the search engine which the client has used to find the object. |
| Usage |
Optional |
| Format |
Standardised list of web browser names |
| Example |
|
g. Identity of service
[Other aspects]
| Hostname |
|
| Description |
An identification of the repository that has recorded the usage event |
| Usage |
Mandatory |
| Format |
OAI BaseURL |
| Example |
openaccess.leidenuniv.nl |
| Session Identifier |
|
| Description |
Identification of the session |
| Usage |
Optional |
| Format |
A number |
| Example |
|
3. Open URL Context Objects
In compliance with the JISC Usage Statistics Review, individual usage events need to be serialized in XML using the syntax that is specified in the OpenURL Context Objects schema. The XML Schema for XML Context Objects can be accessed at http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx This section will describe a recommended practice for the use of this schema.
The root element of the XML-document must be <context-objects>. It must conain a reference to the official schema and declare the namespace xmlns:ctx="info:ofi/fmt:xml:xsd:ctx".
Each usage event must be described in a separate <context-object> element, which must appear as a direct child of <context-objects>. Two attributes must be used:
- The time and date on which the usage event took place must be recorded in a timestamp attribute.
- The identification of the usage event must be captured in an attribute with the name identifier.
Within <contextobject>, a number of elements can be used which describe the context of the event. The names of these elements are as follows:
- <Requester>: refers to the agent that has intiated the usage event
- <Refererent>: the object that was downloaded or viewed
- <ServiceType>: the type of service that was requested
- <Referrer>: the environment that has forwarded the reader to the downloaded object
- <Resolver>: the institution that provides access to the requested item and which has received the usage event.
Information about these contextual entities can be given in four different ways. Firsly, they can be characterised using an <identifier>. Secondly, metadata can be included literally by wrapping these into the file using a <metadata-by-val> element. Thirdly, a reference to metadata stored elsewhere can be included by using <metadata-by-ref>. A fourth method is the use of the element <private-data>. In the SURE Statistics project, only the first two methods shall be used. Listing 1 is an example of a full OpenURL Context Object document.
- The <Referent> must be described using an globally unique identifier. The identifier must be given in the <identifier> element.
- The <Requester>, the agent who has requested the <Refererent> must be identified by providing the C-class Subnet. This number must be given in an <identifier> element. In addition, the name of the country where the request was initiated must be provided. The <metadata-by-val> element must be used fro this purpose. The country must be given in <dcterms:spatial>.
- The recommended practice is to use an <identifier> for the institution that provides access to the object within <Resolver>.
- The <Referrer> is the browser that was used by the agent. It must be provided in an <identifier>.
4. Standardized Usage Statistics Harvesting Initiative (SUSHI)
SUSHI http://www.niso.org/schemas/sushi/ was developed by NISO (National Information Standards Organization) in cooperation with COUNTER. SUSHI enables parties to harvest usage statistics. It very simple, as it works with only two types of messages: requests and responses. The protocol was originally developed for the exchange of COUNTER reports, but other types of reports can fortunately be retrieved as well. However, the standard does require that the requirements for report naming are adhered to. SUSHI is based on SOAP. The services that can be offered by the Web Service are described in a WSDL document. It can be accessed at [http://www.niso.org/schemas/sushi/counter_sushi2_5.wsdl|http://www.niso.org/schemas/sushi/counter_sushi2_5.wsdl].
In the infrastructure to be built in this project, the log aggregator will bear the primary responsibility for obtaining the statistical data from individual repositories. Once every 24 hour, it will send a request to each repository for the usage events that have occurred on that particular day.
In SUSHI version 1.0., the following information must be sent along with the request:
- Requestor ID
- CustomerReference ID (may be identical to the Reuqestor ID)
- Name of the report that is requested
- Version number of the report
- Start and end date of the report
This request will active a special tool that can inspect the server logging and that can return the requested data. These data are transferred as OpenURL Context Object log entries, as part of a SUSHI response.
The reponse must contain the following information:
- Requestor ID (copied from the request)
- Name of the report, version number and data (copied from the request)
- The requested report as XML payload
The usage data are subsequently stored in a central database. External parties can obtain information about the contents of this central database through specially developed web services. The log harvester can also expose these data in the form of COUNTER-compliant reports.
Listing 2 is an example of a SUSHI request, sent from the log aggregator to a repository.
Listing 3 is an example of a SUSHI response, sent from a repository to the log aggregator.
Comments (1)
nov 05, 2009
Maurice Vanderfeesten says:
from thomas: related material from the NEEO project <!-- /* Font Definitions...from thomas: related material from the NEEO project <!-- /* Font Definitions */ @font-face
@font-face
@font-face
/* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
span.TekstzonderopmaakChar
.MsoChpDefault
@page Section1
div.Section1
-->
http://homepages.ulb.ac.be/~bpauwels/NEEO/WP5/WP5 Usage metadata guidelines.pdf
Add Comment