SURFshare use of Usage Statistics Exchange

Space Navigation

Table of contents

Document information

Title: Application profile for OpenURL Context Objects and SUSHI
Subject:
Moderator:
Version:
Date published:
Excerpt: Write an excerpt here

(Optional information)
Type:
Format:
Identifier:
Language:
Rights:
Tags: , , ,

Document History

Date Version Owner Changelog PDF
         

Abstract

The abstract describes what the application profile is about. It should contain a problem definition, the standards described by the application profile and the goal of the application profile.

1. Aspects of usage events to be recorded


All Dutch repositories make use Apache server software for the maintenance of their repository websites. The Apache log files will be used as the primary source of information on usage statistics. The table below contains a typical entry from an Apache log file.

[13/Jul/2009:09:36:43 +0200] 66.249.72.138 TLSv1 RC4-MD5 openaccess.leidenuniv.nl "GET /handle/1887/9742/items-by-author?author=Walker%2C+C.E. HTTP/1.1" 10911 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 200 24581 0


The recommendations in this report will closely follow the findings of the JISC Usage Statistics Review which was released in September 2008. One of the main aims of this survey was "to propose a standard for the aggregation of repository log files in order to provide comparable usage statistics". The report stipulates that the following items of information must minimally be provided:

  • Who (Identification of user/session)
  • What (Item identification)
  • Type of request performed (e.g. full-text, front-page, including failed/partially fulfilled requests)
  • When (Date and time)
  • Usage event ID


The following data elements are considered optional:

  • From where (Referrer/the referring entity)
  • Identity of the service (Item identification)


The JISC Usage Statistics Review also recommends that usage event should be exchanged in the form of OpenURL Context Objects, and that automated access by robots should be tagged.
The remainder of this section will describe which elements from the Apache Log File may be used to record the generic aspects that were mentioned in the JISC Usage Statistics Review.
a. Identification of user/session

Request IP-address  
Description The IP-address of the agent that has sent the request.
Usage Prohibited, as giving the full IP-address is not allowed by international privacy laws
Format four decimal number separated by a dot.
Example 132.229.202.153


C-Class subnet  
Description The first three bytes of an IP-address, which are used to designate the network ID. It is similar to the IP-address, with the crucial difference that the final (most significant) byte, which desinates the HOST ID has been left out.
Usage Mandatory
Format three decimal number separated by a dot
Example 132.229.202


b. Item identification

Document identifier  
Description The document identifier provides a globally unique identification of the resource that is requested.
Usage Mandatory
Format The identifier must be given in the form of a OAI identifier. Permitted identifiers include handles, URNs and PURLs.
Example https://openaccess.leidenuniv.nl/dspace/handle/1887/584


Document identifier  
Description The URL of the object that was downloaded. This URL must contain an indication of the object file.
Usage Mandatory
Format The identifier must be given in the form of a OAI identifier. Permitted identifiers include handles, URNs and PURLs.
Example https://openaccess.leidenuniv.nl/dspace/handle/1887/584


File format  
Description The file format refers to the MIME type of the object that was requested. It indicates, for example, whether an HTML file or an PDF file was downloaded.
Usage Optional
Format The MIME type as provides in the IANA list of registered MIME Types. http://www.iana.org/assignments/media-types/. In the case of a metadata view, use the MIME type text/html.
Example application/pdf


c. type of request performed

HTTP Status Code  
Description: HTTP response status codes provide information about the status of the request and, as such, indicate whether or not the request was successful. The most common codes are 200 (the server successfully returned the page), 404 (the page that is requested does not exist) and 503 (the server is temporarily unavailable)
Usage Optional
Format The code must be provides as a three-digit code, similar to the code that appears in the Apache Log file. A full list of HTTP Status Codes can be found at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
Example 200


d. date and time

Request Time  
Description The exact time on which the usage event took place.
Usage Mandatory
Format The request time must be given in a format that that conforms to ISO8601. The YYYY-MM-DDTHH:MM:SSZ representation must be used. Note that this format may differ from the format that is provided in the Apache log file.
Example 2009-07-2908:15:46 +0200


e. usage event id

Usage Event ID  
Description Unique identification of the usage event. This identification will be generated, and it can not be derived from the Apache log file.
Usage Mandatory
Format The identifier will consist of a combination of the item identifier, the date, an identification of the institute and a generated number.
Example  


f. from where

Referrer  
Description The environment which has directed the user to the requested object. This usually refers to the search engine which the client has used to find the object.
Usage Optional
Format Standardised list of web browser names
Example  


g. Identity of service

Service Type  
Description The Service type specifies whether a full text was requested or only the abstract
Inclusion Optional
Format A term taken from the OFI-registered list of service types must be chosen.
http://alcme.oclc.org/openurl/servlet/OAIHandler/extension?verb=GetMetadata&metadataPrefix=xsd&identifier=info:ofi/fmt:xml:xsd:sch_svc
Example Fulltext


[Other aspects]

Hostname  
Description An identification of the repository that has recorded the usage event
Usage Mandatory
Format OAI BaseURL
Example openaccess.leidenuniv.nl


Session Identifier  
Description Identification of the session
Usage Optional
Format A number
Example  



3. Open URL Context Objects


In compliance with the JISC Usage Statistics Review, individual usage events need to be serialized in XML using the syntax that is specified in the OpenURL Context Objects schema. The XML Schema for XML Context Objects can be accessed at http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx This section will describe a recommended practice for the use of this schema.
The root element of the XML-document must be <context-objects>. It must conain a reference to the official schema and declare the namespace xmlns:ctx="info:ofi/fmt:xml:xsd:ctx".
Each usage event must be described in a separate <context-object> element, which must appear as a direct child of <context-objects>. Two attributes must be used:

  • The time and date on which the usage event took place must be recorded in a timestamp attribute.
  • The identification of the usage event must be captured in an attribute with the name identifier.


Within <contextobject>, a number of elements can be used which describe the context of the event. The names of these elements are as follows:

  • <Requester>: refers to the agent that has intiated the usage event
  • <Refererent>: the object that was downloaded or viewed
  • <ServiceType>: the type of service that was requested
  • <Referrer>: the environment that has forwarded the reader to the downloaded object
  • <Resolver>: the institution that provides access to the requested item and which has received the usage event.


Information about these contextual entities can be given in four different ways. Firsly, they can be characterised using an <identifier>. Secondly, metadata can be included literally by wrapping these into the file using a <metadata-by-val> element. Thirdly, a reference to metadata stored elsewhere can be included by using <metadata-by-ref>. A fourth method is the use of the element <private-data>. In the SURE Statistics project, only the first two methods shall be used. Listing 1 is an example of a full OpenURL Context Object document.

Listing 1


  • The <Referent> must be described using an globally unique identifier. The identifier must be given in the <identifier> element.
  • The <Requester>, the agent who has requested the <Refererent> must be identified by providing the C-class Subnet. This number must be given in an <identifier> element. In addition, the name of the country where the request was initiated must be provided. The <metadata-by-val> element must be used fro this purpose. The country must be given in <dcterms:spatial>.
  • The recommended practice is to use an <identifier> for the institution that provides access to the object within <Resolver>.
  • The <Referrer> is the browser that was used by the agent. It must be provided in an <identifier>.


4. Standardized Usage Statistics Harvesting Initiative (SUSHI)


SUSHI http://www.niso.org/schemas/sushi/ was developed by NISO (National Information Standards Organization) in cooperation with COUNTER. SUSHI enables parties to harvest usage statistics. It very simple, as it works with only two types of messages: requests and responses. The protocol was originally developed for the exchange of COUNTER reports, but other types of reports can fortunately be retrieved as well. However, the standard does require that the requirements for report naming are adhered to. SUSHI is based on SOAP. The services that can be offered by the Web Service are described in a WSDL document. It can be accessed at [http://www.niso.org/schemas/sushi/counter_sushi2_5.wsdl|http://www.niso.org/schemas/sushi/counter_sushi2_5.wsdl].
In the infrastructure to be built in this project, the log aggregator will bear the primary responsibility for obtaining the statistical data from individual repositories. Once every 24 hour, it will send a request to each repository for the usage events that have occurred on that particular day.
In SUSHI version 1.0., the following information must be sent along with the request:

  • Requestor ID
  • CustomerReference ID (may be identical to the Reuqestor ID)
  • Name of the report that is requested
  • Version number of the report
  • Start and end date of the report


This request will active a special tool that can inspect the server logging and that can return the requested data. These data are transferred as OpenURL Context Object log entries, as part of a SUSHI response.
The reponse must contain the following information:

  • Requestor ID (copied from the request)
  • Name of the report, version number and data (copied from the request)
  • The requested report as XML payload


The usage data are subsequently stored in a central database. External parties can obtain information about the contents of this central database through specially developed web services. The log harvester can also expose these data in the form of COUNTER-compliant reports.

Listing 2 is an example of a SUSHI request, sent from the log aggregator to a repository.

Listing 2

Listing 3 is an example of a SUSHI response, sent from a repository to the log aggregator.

Listing 3

Labels

wo wo Delete
dataformats dataformats Delete
data-transfer data-transfer Delete
sureworkinggroup sureworkinggroup Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. nov 05, 2009

    Maurice Vanderfeesten says:

    from thomas: related material from the NEEO project <!-- /* Font Definitions...

    from thomas: related material from the NEEO project <!-- /* Font Definitions */ @font-face

    Unknown macro: {font-family}

    @font-face

    Unknown macro: {font-family}

    @font-face

    Unknown macro: {font-family}

    /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal

    Unknown macro: {mso-style-unhide}

    p.MsoPlainText, li.MsoPlainText, div.MsoPlainText

    Unknown macro: {mso-style-priority}

    span.TekstzonderopmaakChar

    Unknown macro: {mso-style-name}

    .MsoChpDefault

    Unknown macro: {mso-style-type}

    @page Section1

    Unknown macro: {size}

    div.Section1

    Unknown macro: {page}

    -->

    http://homepages.ulb.ac.be/~bpauwels/NEEO/WP5/WP5 Usage metadata guidelines.pdf

Add Comment