Tuesday, July 31, 2012

Searching for review articles, literature reviews and more in Summon & Google Scholar

I find students doing thesis and dissertations love to look for past thesis and dissertations because they instinctively know the literature review chapter is a virtual goldmine, contains a wealth of resources, allowing them to jump-start their research.

What they might not know of is the existence of survey papers, review papers/articles etc which are often peer reviewed articles but contain no original research beyond a review and perhaps critique of the state of art research in a certain area.

Somewhat related are bibliographies, meta-analysis and systematic reviews. The last two, in particular systematic reviews are well known in the medical field of course. But let's leave out the medical field for now.

The question is how does one find such papers? Besides looking at publications like Annual Reviews Publications, there are generally two approaches to searching for them.

1. By Facet/Subject heading control

Firstly, some databases have facets that allow you to select for them. So do a search for the keyword then refine down into them. Some examples

i) Document type : Reviews - in Web of Science
ii) Document type : Review - in Scopus
iii) Document type : Literature Review - in Ebscohost
iv) Publication type : Meta-analysis/reviews/systematic reviews - in Pubmed (but more on this later)

Document type facet in Web of Science

It is unclear to me how good using these facets are in terms of precision and recall for finding such articles but I suppose it depends on the quality of indexing and also I assume this is a controlled term.

This approach is available also for Summon, but a lot less effective. You occasionally see the subject terms facet appear with values like
  • literature review 
  • meta-analysis
  • metaanalysis
  • review
  • reviews
  • bibliographie
  • bibliography
  • systematic review
Unfortunately you can't count on it appearing for two reasons. Firstly, by default Summon lists I believe only the top 100 most common subject terms in the results and if you have a lot of results, such subject terms won't appear because they aren't numerous enough.

You can get around such a problem by forcing a search in Summon with field searching in subject terms and the keyword you want. In Summon syntax you need to do subjectterms:("xxxxx") where xxxxx is the subject term to match a item with that subject term.

So for example you could do

"Class size" AND (subjectterms:("literature review") OR subjectterms:("bibliographie") OR subjectterms:("bibliography") OR subjectterms:("meta-analysis") OR subjectterms:("metaanalysis") OR subjectterms:("review") OR subjectterms:("reviews") OR subjectterms:("systematic review") OR subjectterms:("bibliographic literature") OR subjectterms:("bibliographical literature"))

and find articles with the phrase "class size" (or perhaps restricting this just to title might be even more relevant) and which had any of the subject terms listed. Try example search using Princeton's Summon.

A minor weakness with this search is that you will notice items with "book review" subject terms appearing but this can be easily excluded after search using the facet.

The main weakness of this approach was already hinted by the fact that you had to search for subject terms such as both meta-analysis and metaanalysis. The reason is the subject term facet is not controlled as it comes from items from varying sources which different standards, I have even seen examples where journal articles do not have any subject terms.

EDIT : It seems in Summon if you do subjectterms:("library"), it doesn't give you an exact search for items with the subject terms that is exactly library but you can get items with subject terms like say Science library or library & librarians. This is different from using the widget builder from Serialssolutions which gives you exact matches. But unfortunately the widget builder only lists a small subset of possible subject terms. 

That said at least Summon has a subject term facet, you can't do this for Google Scholar even if you wanted to. So what can you do instead if such an option is not possible and/or you don't trust the meta-data that allows you to refine on?

2. Matching phrases in titles, abstracts, and other important parts of the journal article

If you look through many review articles you will notice a pattern, they tend to have very similar titles. Something along the lines of

  • XXXX, a review of the literature
  • A survey of the literature on XXXX
  • XXXX, a survey of the literature

and many more.

The obvious idea here is to generate a complicated search strategy to match as many such titles as possible.

For example here's a fairly complicated string I am toying with to match in the title field in Summon or Google Scholar.

 ((literature AND (survey OR review)) OR (systematic review) OR meta-analysis OR meta-analytical OR "a review" OR "a survey" OR a mini review OR a brief review  OR
Bibliography OR Bibliographic)

PhraseTo matchComment
literature review (no quotes)a literature review, a review of the literatureProbably most accurate
"Systematic review"systematic reviewNot needed for non-medical ? 
meta-analysis OR meta-analyticalmeta-analytical analysishyperhens may make a difference, use * if it works
"A review"A reviewMay lead to false positives of the nature A review of etc , Some databases allow you to filter.
"A survey"A surveyMay lead to false positives of a real survey not survey of literature
literature survey (no quotes)survey of literature
"A brief review" or "A mini review"
bibliography OR bibliographicbibliographic survey,
a select bibliography


Here's a sample search in Summon

And some search results

The phrase meta-analysis is carrying a lot of the load due to nature of this topic but further down you can see other matches

I tried it with other topics and it seemed to be reasonably robust and in my subjective view outperformed similar searches by facet refinement. In fact the equalvant search strategy in Scopus , in my view gave better results in terms of recall than using the facet document type: review in Scopus. More testing needed of course.

A similar search in Google scholar  using  intitle:(xxxx) to match phrases in the title works well also.

I was feeling somewhat pleased with myself until while working through a Pubmed tutorial, I coincidentally stumbled upon the fact that such strategies are old hat in Medical community,

Correct me if I am wrong, any medical librarians reading this but while Pubmed has publication types for meta-analysis and reviews it doesn't actually assign a Publication type to systematic reviews.

This is very counter-intuitive for many reasons, one of which is because you see one of the filters under Article types is indeed systematic reviews , so if the rest are publication types, shouldn't that be as well?

But a close reading shows that while almost all the values in the Article Type filter is a publication type ,systematic review itself is not.

Here's what it says about Article Types.

In fact, this filter (which you can access via clinical query as well) is a search strategy.

Surprisingly isn't it? We talking about PubMed here where Medline articles are carefully indexed using the most specific Mesh heading and searchers do auto-exploding mesh heading searches. 

But it seems indeed while "Review" and "Systematic Review" is directly indexed/assigned systematic review isn't.

In short when you select Systematic review in the limiter it's doing XXXXX AND systematic [sb]. 

For those not into pubmed this is a subset[sb] search for systematic review and the subset is the predefined saved search strategy.

Very impressive indeed, makes my search string above look like child's play.

Further digging shows that people actually write papers on figuring out the best search strategy to pull out such papers, measuring both precision and recall of each search strategy against a certain known test bed.

Pubmed credits the current search strategy mostly to this paper

Taking advantage of the explosion of systematic reviews: an efficient MEDLINE search strategy. Eff Clin Pract. 2001 Jul-Aug;4(4):157-62. [PMID: 11525102].

It's a free paper worth a read. I couldn't locate any non-medical papers devoted to this area though -eg finding Survey/review articles/papers, though I think some might exist. (Is there a review article on this topic??)

To be fair my search strategy is a lot simpler than PubMed's because it was designed for Summon and Google Scholar.

The Pubmed search above has the advantage of matching on very fine grained fields not just the title [ti] including

This allows all kinds of sophisticated search strategies to maximize recall and precision. With Google Scholar all you can do is match on title.

Summon is a little better,  you can match on subject terms (already seen above or you can combine ORing that with the matches in title) and increase precision by exclude facet content such as book reviews in content type (that tend to appear as false positives) etc.

But you can't restrict matches to just abstract for example, so there is less room for sophisticated searches.


I wonder if it is worth doing such complicated searches as Summon and Google Scholar really isn't meant for such complicated logic searches as weird results may occur.

It is also unlikely users will bother with such complicated searches.

Personally I like Pubmed's idea of using a saved search strategy as a filter/facet so perhaps Summon could implement a similar idea that if you click will apply the search strategy subset of results against the current search.

So perhaps a new content type or under "Refine" could be added that does this automatically.

The challenge of course is you need to do a lot of tests to come up with a robust search strategy that works reasonably well for all fields, a taller order than just for one field, but arguably even a half working one could be worth having.

Also this strategy works only if there are such articles to be found of course, pair it with too broad a keyword or too specific a keyword and you will end up with nothing relevant. The right granularity of detail needs to be searched where a survey/review article is likely to exist.

Beyond even that we can look at Primo's ScholarRank technology which is supposed to be able to tell the difference between broad and narrow topic searches and for the former tries to rank review articles higher, which makes sense since if you are new to the field you are likely to benefit from such articles.

Tuesday, July 24, 2012

Sending updates via FourSquare claimed library venues - location based alerts

It's finally here.

FourSquare launches Local Updates from businesses .

"Foursquare Local Updates let merchants send text, photos, and specials to customers who have either checked into a business several times or liked it on Foursquare" (more)

Here's how it looks like.


From my institution's Foursquare business page

As we have claimed 6 NUS library venues, we can now send updates to "loyal customers" , and such updates will appear in their friend tabs which typically tell them what their friends are doing and where they are.

It's unclear what "Customers will see your updates when they're close to your locations" means above, but this says

"Starting today, you’ll start seeing updates in your friends tab from the places where you’re a loyal customer. It’s an easy way to keep up with news from places you frequent, including things like new specials, pictures of the latest shipment of shoes, or a serendipitous food truck appearance. The best part is there’s no extra work for you to do: we already know you care about a place if you’ve checked in often or liked it, and will show you updates from it when you’re in the same city." (empathisis mine)

So close to your locations = same city?

If so, effectively this turns your claimed FourSquare site into a Facebook page (for loyal customers within the city which is effectively the whole country for me).

We have being doing active foursquare specials campaigns for our library as seen in our IATUL2012 presentation Check-ins... Not Just for Books! - NUS Libraries' Experience with Foursquare , and this seems to have paid off as seen above we have 1,731 customers we can send updates to!

We have more Facebook fans but still a nice bonus.

In addition, updates you send will also appear after people check-in to your claimed venues.

I feel the ability to show updates after the user checks-in is perhaps even more effective, since you can push user news when they are already in the library, maximizing the relevancy of the news you are sending. 

With the new term starting soon, you can imagine all sorts of possibilities opening up for marketing...
For example let's say you have longer or shorter opening hours than normal today? Post the update on your FourSquare venue and when they check-in they will automatically be reminded! 

Plenty of other possibilities including events happening later in the day but a major use is to advertise FourSquare specials etc.
Tips & Thoughts

I haven't tested this much yet, but it seems the image or text needs to do the bulk of the work, since the URL posted isn't currently hotlinked.

Also FourSquare doesn't keep track if you have seen the update before, so if you check-in once, see the update, and check-in again say the next day, it will still show you the same update unless presumably a more recent update was posted. Not a big deal though since it isn't that intrusive.

It would be nice if FourSquare used the FourSquare radar or geofencing to automatically send notifications when within range. Something like how location alerts work for the build in reminder app on iPhone. But this might be considered too intrusive.

A even more interesting idea but probably requiring a specialised app would be a library app that would be aware of the loan periods of your books and notify you whenever the due period is close and you are within range of a library or book-drop!

PS : For more details see the FAQ on local updates , in particularly see Who gets my Local Updates? and Where do people see my Local Updates?

Wednesday, July 18, 2012

How big are the indexes of web scale discovery services? How do they affect searching?

With the rise of web scale discovery services like Summon, Ebsco Discovery Service, WorldCat Local and Primo Central, librarians have began to assess how to teach searching and Information literacy differently.

As I blog this there is a Information Literacy & Summon workshop going on at Sheffield Hallam University.

Librarians are talking about "Going beyond Boolean" and invoke Dave's Lawusers should not have to become mini-librarians in order to use the library and one blogger talked about how a library was teaching boolean operators, phrase searching, truncation despite the library having "one search tool on their page, a discovery system in which almost none of these search strategies will work". (Off topic, while the discovery tools I am familiar with - primarily Summon, Ebsco Discovery Service don't need boolean or phrase searching - they *can* work with it)

In short, we talking about a Google-like search that users can key in a few keywords and get relevant results.

In a earlier blog post "How is Google different from traditional Library OPACs & databases?" , I discussed some of the differences between Google and a traditional library databases/OPAC. Some of the differences like ranking by relevance, "implied AND" have filtered down to most library databases and OPACs and are no longer a distinguishing factor.

Other features like autostemming (for databases), covering full-text  - are slowly gaining control in systems.

As noted in a comment in the blog, I forgot to mention what was the biggest obvious difference between Google and traditional library databases/OPAC - the huge size of the index!

A huge index, magnitudes larger than traditional databases coupled with full-text searching means a more forgiving search. It also means throwing away traditional stop words like "the" become a productive strategy.

Currently the only library system that is even close to fulfilling the two criteria of huge index and full-text searching are the web scale discovery services. But how huge is huge?  

I am most familiar with Summon, so I will discuss that from here on. 

What I did was to again use lib-web-cats to pull out ARL libraries using Summon. 1 or 2 may have be in testing or perhaps the information was outdated and I was unable to find Summon. But I ended up with 21 University Libraries.

The nice thing about Summon is that you can do a "blank search" to search everything. e.g Here's one for Duke University Library. 

By doing such searches for all 21 Libraries and looking at the facet counts, one can quickly figure out how much is in the index in total and for each category.

For example in the blank search above, you can see the library has 389 million items, of which 272 million items are in Newspaper Article and 83 million are journal articles etc. 

So how much do the typical ARL libraries have in Summon?

As seen above, at the time I did this last week, Princeton University had the biggest index with 392 million, the smallest was University Nevada - Las Vegas with 157 million and the average was 300 million. 

How big is 300 million? Given even the largest University library probably has no more than 10 million unique titles , the typical Summon library on ARL is at least 30 times the largest catalogue. (The average ARL library seems to have on average 60 times as much as their own catalogues) Also consider the fact that a large amount (unknown how much) is full-text searchable.

A large proportion of most Summon Libraries is in Newspapers articles (for obvious reasons), removing that we see the ARL libraries start to look similar averaging 100 million 

How about if one were just interested in Journal Articles? Average seems to be 72 million. Compare with Scopus which claims 40+ million (Sciverse Hub with ScienceDirect 10 million +Scopus+third party data is much larger and is I believe is in the class of a web scale Discovery Service). Again consider almost double and full-text search....

And finally if we just look at items in peer reviewed publications - which amounts to mostly peer reviewed articles.

Summon knows of 54 million peer reviewed articles, most libraries are on par about 45 million.

See google doc for more raw data, perhaps one could analyse in the future by discipline.

While Summon is huge one suspects it is still smaller than Google or Google Scholar if you think Google is a unfair comparison, but there is no way to tell except by estimating using search terms which I won't do.

A few qualifiers of the data above. The first thing to note is such statistics are outdated the moment they are obtained due to addition of packages and even if no new packages were added, newspapers, journals gained new items etc.

Neither are the figures a 100% representation of what each library has, some might be more complete in terms of putting packages they subscribe into Summon and even if this wasn't a factor, what Summon shows is limited to what is indexed in Summon.

The last probably explains why the libraries all bunch up in terms of number of peer reviewed articles. The biggest library possibly has a much bigger set but due to the "limited" size of articles indexed in Summon, the advantage cannot show as clearly.

Of course, arguably if an item can't be found in the discovery tool in an environment where discovery tool is the default, the effective accessibility for most users would indeed be the amount indexed in Summon.

That said, in this enlightened day and age , we are supposed to move away from input measures and even output measures, and should try to measure outcome if not impact measures so it seems totally archaic to start benchmarking on the size of your discovery tool index, so perhaps the individual numbers are not important.

Size is important though, as arguably the search experiences changes quite a bit when you scale up from searching a few million items (either full text or just metadata) to hundreds of millions (full text).

(As an aside, I have heard of the 'new' growing field of digital digital humanities, if more and more full-text of journals and books is indexed in Summon, I wonder if one could use it to run all sorts of analysis?)

Different search strategies need to come into play and perhaps this explains the resistance of some librarians to web scale discovery tools because despite the fact that superficially they look like databases like JSTOR they don't quite behave like that...

I speculate of course. For those of you who have some experience with such tools, how do they feel to you? Like a typical library database ? Or do they feel like a different category all together that you start feeling your usual instincts start to feels odd?

 Actually the answer is simple, they feel like Google! Typically you get a list of results that you are unlikely to exhaust reading , so you just look through the first few until you start to get  a string of irrelevant results then you stop.

That I feel is quite a different mind-set from searching a typical A&I database or even a smaller full-text database.

As we just launched Summon here, I don't have that much experience but in future posts I will talk about what I mean.

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...