How do you get access to current and historical research articles if you are not affiliated with a university or large research organization? Our second public service article discusses some useful online research archives.Most readers of this blog probably keep track of the latest developments in their field through journal subscriptions and memberships to appropriate professional associations. Perhaps some of you even splurge on digital library subscriptions, such as IEEE Explore or the INFORMS Digital Library — both of which I have found quite useful. In our field (Computer Science), academic researchers are generally conscientious about making their research papers available through their websites.
But researchers in other fields are not always so good about making copies of their papers easily available, and older classic papers (say, for example, Bradley Efron’s 1979 Annals of Statistics paper on the Jackknife) are often still worth reading, but are not always easy to find. Where to go?
This is a list of some resources that I’ve discovered over the years. The list isn’t comprehensive, by any means, but I offer them here because maybe you will find them helpful, too. The list, and my opinions, are biased towards research in the mathematical and computer sciences, but many of these resources are potentially useful for any research area, including the humanities.
JSTOR is a digital archive of over one thousand scholarly journals, covering topics in the humanities, social and physical sciences and mathematics. I love JSTOR. It is an incredibly useful resource, containing the full contents of every issue of every journal in their collection up to within 3-5 years of the present time (it’s a moving wall). The collection is full-text searchable. I use JSTOR to find classic papers in Math, Statistics, and Computer Science, as well as more recent papers that have been published in journals that are otherwise not available to me.
Access to JSTOR is available to members of participating institutions, mostly universities, but also many public libraries. I have access to JSTOR free with my San Francisco Public Library card, via the SFPL website. (I believe that any resident of California is eligible for a SFPL library card with proof of California residency; good news if you are in California and your local library doesn’t subscribe).
As a side note, San Francisco Public Library subscribes to several quite useful digital research services, including FirstSearch, the OED, Encyclopedia Brittanica, and Morningstar. Some of these other services also provide access to selected full-text articles. SFPL also participates in ILL (Interlibrary Loan) and Link+, a similar cross-library loan service. All good reasons to support your local library!
ArXiv is a pre-print server hosted by Cornell, serving pre-prints of papers in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics. Many important researchers use ArXiv to get around the fact that major journal publishers insist on holding the copyright to articles published in their journals. “Pre-prints” haven’t yet been published, and hence the authors are free to distribute them freely. Fields Medalist Terence Tao regularly distributes his about-to-be published work through ArXiv.
On the other hand, ArXiv has very open submission policies, so you should be more careful of the papers you find here than you would be with a refereed or curated source, such as JSTOR or PubMed Central (which we will discuss later). ArXiv has, unfortunately, more than its fair share of what Augustus de Morgan used to politely call “paradoxers“. The “Journal Reference” field of the article summaries will generally give you an indication of whether or not the paper is legitimate, in the sense of having been peer-reviewed; but note, for instance, this paper on a polynomial-time algorithm for Traveling Salesman (the Traveling Salesman problem is provably NP-complete, so a result of this magnitude would win the Clay Millennium Prize, if true).
Another side note: I’ve linked to the Amazon page on de Morgan’s Budget of Paradoxes because that was the first synopsis I found. The copyright on the book has expired, so if you are actually interested in reading it (it’s fairly funny, in places), you can find the full version on Google Books or Project Gutenberg.
CiteSeer was the original search engine and archive for online technical papers; it got me through graduate school, and my first post-PhD position at SRI. I don’t believe that the original CiteSeer system is still active, but its successor, CiteSeerX, is being developed and hosted at Penn State. It concentrates on computer science literature, as did the original. CiteSeerX builds its corpus by webcrawling, so again, the papers it finds are not necessarily refereed. Like its predecessor, CiteSeerX search results include the paper’s abstract, a BibTex citation, a list of the paper’s references, a pointer to the paper’s original location, and (usually) an archived version of the paper, in case the original link has gone dead. Good stuff.
AccessMyLibrary is a service that pools the periodical resources of several libraries across the United States. Any article in a periodical held by a participating library is available for free download to anyone who holds a library card in any other participating library. I find this service less useful than JSTOR: the holdings are generally newspapers and popular magazines, although there are some journals represented, as well as law and business reviews. The download format strips all of the original formatting from articles, which makes them rather ugly and a bit harder to read. I think you lose the figures, too. Still, it’s free if you have a library card, and it’s a good place to search for an article if you can’t find it anywhere else.
Questia is a for-pay service that claims to have “the world’s largest online collection of books and journal articles in the humanities and social sciences, plus magazine and newspaper articles”. Their collection is full-text searchable and, as they say, “you can read every title cover to cover”. Good luck doing so, though — articles and book chapters are not downloadable. Instead, you have to read them through Questia’s online interface, which is pretty clunky. On the plus side, they allow you to build your own “bookshelves” to collect books and articles that are relevant to you by topic or project. You can bookmark key sections, and highlight key passages. I used Questia when I was involved in research projects with psychology and organizational science aspects. I could get hold of articles or textbooks that I wanted to look at faster than through Interlibrary Loan, and more conveniently than going down to Stanford. The subscription fee at the time was cheaper than a membership to the APA or buying the articles piecemeal from Elsevier, or whoever.
Currently, Questia’s subscription fee is $19.95/month for full library access; you can also subscribe to specific collections (such as Psychology, Literature, or Philosophy) for $9.95 per collection per month.
Another way to find useful literature is to connect with other people out there who share your interests. Mendeley is a tool that allows you to organize your collection of research papers, share it with colleagues, and to peruse the collections of other researchers with similar interests. I haven’t used it myself; but a friend of ours who is an active and influential AI researcher recommends it. It’s certainly worth a mention.
PubMed Central is a free digital archive of biomedical and life sciences journal literature, sponsored and managed by the NIH. We don’t do life science research here at Win-Vector, but I’m mentioning PubMed because of this awesome policy by the NIH:
NSF and DoD should institute similar policies, too.
Google Books, Google Scholar
Yes, they’re out there. Personally, I find them less useful than JSTOR or a subscription to (say) IEEE Explore. Google Scholar generally returns the abstracts of articles at sites that don’t provide open access to the full-text article, such as the website of the journal that published the article, or the website of a restricted research archive, like the ACM. This is useful, in that it tells you that the article exists, but it’s rather frustrating, too. I don’t find Google Scholar to be significantly more helpful than doing a general Google search on the same keywords. On the other hand, some people swear by Google Scholar, so obviously your mileage may vary.
Google Books has a very annoying habit of returning hits on your search terms, then not giving you read access to the page in question. Useless. If you happen to be doing research in an area where older books in the public domain are still of interest (for instance, my amateur interest in folklore and mythology), then Google Books can be quite helpful; of course, this situation is generally not true in technical research.
Offline: Your Local University Library
Here in the Bay Area, we are fortunate because the Stanford Library System has generous visitor access policies. The visitors’ policy statement is here; briefly, non-Stanford visitors are allowed 7 courtesy visits per year, with no borrowing privileges. For more visits, you can purchase an access card. I used the Stanford Libraries when my company was down in Mountain View, and I’m grateful for their openness. I don’t think many universities are as generous as Stanford is, but if you are near a university campus, it doesn’t hurt to check. For instance, the University of San Francisco will sell access cards to their library, with or without borrowing privileges, to non-affiliated visitors (it ain’t cheap), and allows practicing California attorneys access to their Law Library. San Francisco State has a Friends of the Library program, whereby non-affiliated visitors have access and borrowing privileges to the CSUSF library collection for $45/year.
And there you have it. Research away!
Data scientist with Win Vector LLC. I also dance, read ghost stories and folklore, and sometimes blog about it all.