To keep pace with information changes and the needs of users, librarians need some programming skills, argues David Stuart
Recent ICT innovations have led to the reappraisal of our understanding of library and information services. The traditional vision of the library as a provider of physical documents has been superseded by the library as a provider of access to information – virtual and physical.
However, until now the focus has still primarily been on the concept of the document. If librarians are going to continue being relevant in the age of Google and Google Scholar, they need to move beyond the document and facilitate access to the increasing amounts of data that is being made available on the web. To do this effectively, librarians need to develop their programming skills. Such skills will enable librarians to provide access to the increasingly large amounts of data, combine the data in new and innovative ways and enable access to it in the places that customers want. Equally importantly, librarians will then have a recognisable set of skills that can differentiate them in the minds of the users.
While the traditional role of the librarian continues to be important, it would be naïve not to recognise changes in how today’s information is being made available, as well as changes in user expectations, and user perceptions of the librarian. For most users today, the web is the first place they look for information, and it is a process that is rarely accompanied by the assistance of a librarian.
Although librarians have taken the time to learn the intricacies of the advanced search features and the subtleties of Boolean algebra, most users will type two or three words into Google’s query box and find that the simple search is ‘good enough’. Librarians have a virtual role in providing access to online subscription services. However, attempts to make access to such subscriptions seamless through integration with services such as Google Scholar mean that the user is often ignorant of the role of the librarian; “free at the point of use” is often equated to “free” by the user. There are still some users who make use of a librarian’s specialist skills, and recognise the contributions they can make to the research process, but increasingly few users get that far.
The role of APIs
The large quantities of structured data available online come from many sources and in many formats. Up to now much of the focus has been on the data available through Application Programming Interfaces (APIs) – interfaces that allow software to interact with websites’ data and services, particularly those of the big three search engines and the popular Web 2.0 sites. A website may gain additional context by including information from an external source, or combining more than one external source of information in an innovative fashion. For example, geo-tagged Flickr photos may be provided with additional context by displaying them on a Google Map.
The provision of APIs not only provides the opportunity for data to be combined in previously unthought-of ways, but also enables services to be used in innovative ways and on multiple platforms. Much of the recent success of Twitter may be ascribed to its simple API. It has enabled the creation of numerous applications presenting the data in different ways and on different platforms. Research by the social media analytics company Sysomos shows that less than half of Twitter updates are sent from the main website. Twitter streams not only answer the question “What are you doing?”, but provide information from a host of real-world objects, from when plants need watering to when London’s Tower Bridge opens and closes. Such applications are unlikely to have been considered originally.
Recognising the potential of user communities to come up with far more innovative uses for their information, many more traditional sources of information are attempting to make their data available online in a usable format. The British government’s Digital Engagement Team is currently working to make increasing amounts of government data available, whilst newspapers such as the Guardian and the New York Times have also recently made APIs available. The quantities of data available and the different ways it can be combined are likely to increase exponentially as we move towards an increasingly semantic web, where data on the web has meaning.
Much of the external information available becomes more useful to a specific library community when combined with information that the library already has available in the library systems, as well as additional information the library is in a position to gather. For many libraries, much of the information that could enhance a user’s experience of the library will be embedded within proprietary integrated library systems such as circulation records. Having said that, in response to calls from the library community, these systems increasingly claim to offer service oriented architectures and APIs.
In addition, newer OPACs, such as Bibliocommons, are increasingly gathering an additional level of social content similar to the sort that users have come to expect from sites such as Amazon (for example, book ratings). It is important, however, that the social interaction features of Web 2.0 are coupled with the ethos of Web 2.0, enabling innovation wherever it may occur. This is better enabled by open-source integrated library systems such as Koha and Evergreen and open-source OPACs such as VuFind and SOPAC.
The skill sets of librarians and computer programmers are very different and it would undoubtedly be an inefficient use of resources to train librarians to a professional standard of programming. Programming languages go in and out of fashion, and new platforms regularly emerge, requiring their own scripting languages. However, a basic level of programming and experience of manipulating and combining together some of the data available will provide librarians with a better understanding of the potential opportunities with the available data. At a minimum it should be expected that librarians have experience of some of the available mashup tools and editors, and are aware of the scope of the data available.
Of the mashup tools currently available, Yahoo! Pipes is by far the easiest with which to begin. It is a web application, requiring no download, and has a simple graphical user interface that allows the mashup elements to be dragged and dropped into a flowchart. Because of this, Yahoo Pipes has become one of the most popular mashup editors online with numerous books and online tutorials available. However, as is often the case, the simplicity of the application comes at the price of relatively-limited inputs and outputs. Whilst Yahoo Pipes is limited to data structured in certain formats, the more complicated Openkapow desktop application allows the manipulation of data from websites without RSS feeds and APIs.
Keeping abreast of the data available is difficult due to the sheer quantity of different sources and the breadth of the data that is often being released. Whilst announcements about the freeing of data from major information providers may make the professional news, most information is released with little fanfare. One of the best places to keep up to date with the increasing variety of data that is available is www.programmableweb.com. Not only does it attempt to keep abreast of some of the data that is available, but it also provides coverage of some of the numerous mashups that have been built upon them. As for the breadth of data that may be released, there is little alternative to playing around with the data itself to see what is available and how it can be manipulated.
Beyond these initial steps, the learning curve for the novice programmer is seemingly very steep. They find themselves in a world using alien terminology and with a multitude of possible avenues to take. What are the differences between REST and SOAP? Client-side scripting and server-side scripting? Should the librarian be concentrating on learning Python, PHP, Visual Basic, or one of the host of other languages available?
Unfortunately there are few simple answers, especially when it comes to which languages to learn. Instead, the easiest way to start is with the data and platform that the librarian is interested in using, and working backwards. Many of the services will offer software development kits or examples in certain languages, and can form the foundations for building a librarian’s first programs. For the librarian, gaining the necessary programming skills to manipulate the increasing quantities of available data should not be seen as an overnight task, but rather seen as an ongoing part of professional development.
A good position
Librarians are in the best position to make use of the increasing information available, having access to both internal and external information as well as knowledge of the information needs of their specific users. As we increasingly move towards a web of data rather than a web of documents it would seem that a basic level of proficiency in data collection and manipulation could become as important a skill to the future librarian as search engines are today.
At the simplest level, this may be data manipulation through mashup editors, although for real innovation to occur within the library community we will require more expert skills. Whilst the librarian’s skill set may be supplemented in part by their user community and by outsourcing to programmers, for libraries to be truly innovative, librarians need to be aware of potential opportunities, and this only comes from experimenting with the data and the platforms themselves.
David Stuart is an independent web analyst and consultant for the Statistical Cybermetrics Research Group at the University of Wolverhampton, UK