Sunday, February 5, 2017

4 different ways of measuring library eresource usage

How does one measure library eresources usage? This is a question I've bumped into numerous times recently in the course of my work whether it be trying to do correlation studies between student success and electronic usage , choosing the right metric for the library dashboard or even more mundanely just evaluating a database for subscription.

My way of looking at it is two fold.

Firstly you can classify metric by the source, that is where you get the data from. Secondly you can classify by the type of usage metric.

For many electronic resource librarians, when you talk about electronic resource usage, the main source of such statistics would be via publishers, which usually but not always is COUNTER compliant.

But that's not the only possible source. A secondary source of electronic resource usage perhaps less commonly used would be via the library's own systems which typically means via Ezproxy (or perhaps openathen logs).

Of the usage statistics that you can derive from these two sources, I divide them into 2 main types of statistics, download based and non-download (session) based.

This creates a 2x2 grid of possible statistics.



My thoughts on the strengths and weaknesses of the 4 types of electronic usage metric and when you should them are as follows


Type (1) - Publisher based download metrics

This is probably the most common type of usage metric used. Typically for most big journal based publishers, you will get standardised COUNTER compliant statistics (up to Release 4 now). While there are many different type of COUNTER reports, the ones generally most well used are JR1 and BR1 and perhaps BR2

JR1 - Number of Successful Full-Text Article Requests by Month and Journal
BR1 - Number of Successful Title Requests by Month and Title
BR2 - Number of Successful Section Requests by Month and Title

There are others like Multimedia Report 1 (basically JR1 for multimedia) and more complicated ones like "Title Report 1 Mobile", but are rarely known to most librarians.

These three metrics are easy to understand by all and basically tell you how many times the journal article/book title/book chapter was downloaded.

Pros : Easy to understand., after all a download is a download! Heavily used to calculate cost per downloads for decision on renewals. JR1 and BR1 are pretty much industry standard and almost always comparable across vendors if they implement COUNTER statistics.

Cons : While journal based platforms are mostly COUNTER compliant , many resources are not COUNTER compliant (e.g many law and finance/business databases).

Many non-traditional type of resources that don't serve up journal articles or books don't adapt well to the concept of downloads. Most obviously are A&I databases, or even databases that have a variety of different types of content.

A bigger issue is that COUNTER statistics a) only provides monthly reports b) only shows total counts.

As a result if you are doing correlation type studies where you correlate say student GPAs with electronic resource use COUNTER statistics can't be used as you can't relate usage to individuals.

Firstly, COUNTER only statistics would mean you wouldn't be able to track usage of a lot of NON COUNTER resources.  More seriously, using JR1, BR1 is not appropriate as you can't do any granular analysis by discipline much less individual. Even tracking time of heaviest use (beyond month) is impossible.



Type (2) - Publisher based non-download metrics

COUNTER include other statistics that don't count "successful requests" (AKA downloads). These include among others

Journal Report 4 -  Total Searches Run By Month and Collection
Database Report 1 - Total Searches, Result Clicks and Record Views by Month and Database
Book Report 5 -  Total Searches by Month and Title

These are what I call "non download based". They count number of searches made or views. Some people are of the view these are of lesser value than downloads, since one can search or view a lot but still not gain any value. Of course a possible counter is even a download might still be useless when read.

Still they share the advantages of other COUNTER statistics as they are standardised. and theoretically comparable across publishers. Of course they share the same issues in that many content providers are not COUNTER in particular inability to drill down further beyond monthly data.



Type (4) - Log based non-download metrics

The point is do you 100% trust what publishers tell you? What if you want to double check? The main way to do so would be to do an analysis your ezproxy logs.

This is a lot less often done in my experience because of the size and complexity of ezproxy logs. As such, the simplest way that most libraries deal with this is to count "sessions". This can be done fairly easily using various methods.

For those who are unaware, when you start can ezproxy session, a sessionID is created logged and stored in your browser cookie. This will continue until you timeout/signout or close the browser.

As such to measure usage of say Scopus, one can just count the number of unique sessions where there is a request for say scopus.com. So One can count unique session counts for each database or journal of interest.

In practice one just counts unique sessions of domains, though this can sometimes get complicated whether you count subdomains together, or where a content provider might have multiple domains. How much extra work you want to do here is up to you.

The main advantage of using this method is that a) It works for pretty much all types of resources (including those that don't have "Downloads" or aren't COUNTER compliant) as long as they are accessed over ezproxy b) It's fairly simple to get technically and fairly comparable and most importantly c) if you setup your ezproxy config properly you can uniquely identify the individual using it (e.g by NT logins/email).

You can then link up the email with the data sitting in your library management system and you have access to a rich source of data on who , what and when they are accessing your eresources.

Main disadvantage is that for many libraries not all traffic needs to be channeled via ezproxy particularly when in campus. In my current and former institution, all traffic is required to go through the proxy even if in campus so this isn't an issue.

I'm not too familiar with openathens type systems but I understand those by default make it trivial to calculate sessions by users by date/time since those are already recorded in the logs but make it hard to go further and study downloads and other details recorded by ezproxy, but I could be word.


Type (3) - Log based download metrics

Sessions obtained from ezproxy are well and good, but what if you want downloads to calculate cost per download?

This can be very time consuming as you need to setup complicated ruleset to be able to identify from the logs which lines are downloads and for which journals or platforms they refer to and this is the main reason why people tend to stick with COUNTER download statistics or other publisher provided download statics

This is where the open source ezpaarse comes in.

I'm already referred to this open source software in the past. It's a amazing software that can crunch your logs and spit out downloads it recognises. It's a community effort, with rulebases been updated constantly.

It even allows you to create COUNTER like statistics for comparisons!



Once you have obtained the logs crunched out by ezpaarse, you can then further enrich the data with more user information similar to in (4).

Since the last time I tried it, I've done more bulk processing of our logs, my main learning point is that as good as Ezpaarse is, at least for our set of databases, it is still incapable of identifying a lot of aggregator platforms. It could be my setup but for example it doesn't seem to identify Proquest at all for us. Even for platforms it does recognise like EBSCO it can't reliably identify journal titles. A lot more testing is needed.

Of course the project is opensource and always looking for help in creating new rule sets.


Conclusion

Obviously, what type of statistics you use depends on the usage case you are looking for and there's no reason you can't combine the two.

If all you want to do is to evaluate or renewal of a specific journal database and it has COUNTER JR1 statistics, that is the obvious thing to do.

But if you need to go down to the level of which schools or types of users who use the journal (perhaps for allocating costs), then you would need to use some sort of ezproxy/openathen  log based metric.

Another question you need to consider is do you need to compare across a variety of resources? A correlation study that tries to compare usage of library resources vs student grades would obviously need a metric that firstly covers as broad a range of resources as possible and secondly do it in a consistent way.

I've found generally counting sessions from ezproxy/openathens probably fits the bill best here. It's still not perfect since many resources are not (some particularly important like high-end financial databases like bloomberg aren't tracked this way), but that's the best I can do.

For showing data into the dashboard, it is harder to say what would be useful. Perhaps all of them?

Which of the types of electronic usage statistics I have outlined do you use? Are any of them useless to you? Most importantly if you have a library dashboard tracking such statistics, which one do you use?





blog comments powered by Disqus

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...