5 Jan 2016

Project Gutenberg: collection development and economics

This post discusses Project Gutenberg’s collection development trajectory. It also considers and applies to PG selected aspects of the evolutionary stages of the digital library life-cycle model as proposed by Calhoun (2014, pp. 159 – 177).

Candela et al. (2007) note that an online digital library is any virtual organisation that comprehensively collects, manages and preserves digital content, as well as offering it to respective user communities according to codified policies. This particular description of an online digital library is fitting for the purpose of this post since a well-defined collection development policy (CDP) determines, arguably, the level of success of a digital library. A CDP can be characterised as "a formal written statement of the principles guiding a library’s selection of materials, including the criteria used in making selection and deselection decisions (Reitz, 2004).

The process of developing a collection over time is guided by a library’s mission statement. Another important function of collection development includes resource sharing, which specifically involves the sharing of collections, data and facilities. For an in-depth discussion on collection development within the digital library context consider Jones (1999) and Joint (2006). Fundamentally, collection development frameworks apply to any library type, whether analogue, hybrid, or digital as is the case with Project Gutenberg (Jones, 1999, p. 28).

Project Gutenberg owes its growing collection to a vibrant online community of content providers. But what does the construct of community in an online environment mean? Lee, Vogel, and Limayem (2003) explain that such communities are “cyberspace(s) supported by computer-based information technology, centred upon communication and interaction of participants to generate member-driven content, resulting in a relationship being built” (p. 51). There are other definitions of virtual communities (see for example Craig and Zimring (2000); Ho, Schraefel, and Chignell (2000)), including one that describes them as computer-mediated spaces where there is potential for an integration of content and communication with an emphasis on member-generated content (Hagel, 1999).

One way of tracing the evolution of Project Gutenberg’s electronic library collection up till now can be achieved through the application of the theoretical life-cycle model of success factors for digital libraries in social environments (Calhoun, 2014, pp. 160-176). The model was specifically developed for the digital library context and adapted from Iriberi’s and Leroy’s (2009) original life-cycle model for online communities which represents an in-depth theoretical and practical treatment on the diverse activities of online communities. Calhoun’s (2014) adaptation contains four cyclical stages: 1) inception, 2) creation, 3) growth, 4) maturity. Each stage holds distinct sub-components, also called success factors, of which a limited (targeted) selection will be discussed below in relation to Project Gutenberg.

Purpose and focus
From the outset, Project Gutenberg’s collection acquisition strategy was built around the idea of
creating a vibrant user and contributor orientated online social environment. Calhoun (2014) notes
that digital libraries tend to succeed if they are “backed by passionate, committed builders on the
one hand and enthusiastic, vested community participants on the other” (p. 162). The committed
initiator and start-up builder was Michael S. Hart (2015), whose public e-text library strategy is based
upon two clearly communicated premises: 1) straightforward open access to and use of e-texts (i.e.
low legal and technological entry barriers), 2) digital artefact inclusion on the basis of the “bang for a
buck” philosophy to appeal to and capture 99% of the general reading (online) public Michael Hart(1992).

(Virtual) community orientation
One way of effectively encouraging existing as well as recruiting new PG contributor volunteers is
through the characterisation of practical experiences and motivations via its “Volunteers’ Voices”
feature at http://www.gutenberg.org/wiki/Gutenberg:Volunteers'_Voices. These tend to be framed around the “love of good literature”, “free availability” and the belief that “people could be influenced for good by what they read”, among other reasons. Fundamentally, Project Gutenberg’s social structure can be described as a “virtual volunteer organisation” (Jones and Rafaeli, 2000), which excels at sustaining mutual-interest individuals that keep on contributing to the communal material acquisition effort. Towards the end of 2003, already about 2,000 people were doing some form of constructive work for PG (Project Gutenberg, 2014).

Volunteers are facilitated through comprehensive support documentation provided for on the Project Gutenberg homepage (Project Gutenberg, 2007). Different levels of user involvement are possible: proofreading of an e-text, procuring of source material, donating money or promoting Project Gutenberg on one’s website through a PG widget (Project Gutenberg, 2011). This idea of encouraging pro-active user involvement is also emphasised by Witten, Bainbridge andNichols (2009), who note that “libraries can evolve from exclusive suppliers of information that users consume into a partnership where both the library and its users supply material (p. 67). Project Gutenberg represents a radically reversed content acquisition model, whereby the user/contributor fully absorbs the role of the acquisition librarian. Essentially, this approach goes against the traditional library mediated digital library eco-system where librarian oversight is absolute despite the obvious downsides: “most digital libraries do not allow users to contribute in this manner, missing out on potentially valuable sources of quality improvement” (Witten, Bainbridge and Nichols, 2009, p. 68).


Project Gutenberg’s collection development policy enables the success of an ever-growing collection
since “individual volunteers choose and produce books according to their own tastes and values, and
the availability … of the book” (Project Gutenberg, 2014). This broad policy enables the project to
organically adjust to the varying needs and expectations of its user audience over time. Other digital
libraries, e.g. in the academic user domain (e.g. the university context), are considerably more
restricted by stringent material selection criteria that need to satisfy the particularities of teaching
and learning programmes, cost control and licensing requirements among other variables (Jones,
1999, p. 29).

Interaction Support
Since 2000, Distributed Proofreaders (DP) supports the development of e-texts for Project Gutenberg (Lebert, 2008). DP enables the web-based process and assists in the conversion of analogue public domain books into e-texts. Volunteers contribute to the workflow and share the workload by, for example, splitting book conversion projects into individual pages, which significantly speeds up the process. (Distributed Proofreaders, n.d.).

Individuals register with the site first before they can contribute to Project Gutenberg. Distributed Proofreaders aims to act as a one-stop shop for this purpose and offers a detailed help section, called FAQ Central, to support its volunteers on various subjects including proofreading, formatting, creating and managing projects, as well as mentoring and a dedicated public mailing list, among other topics (Distributed Proofreaders, n.d.).

The existence of DP reinforces the assertion by the Center for History and New Media (2010), the providers of Omeka, that “Web 2.0 technologies and approaches to academic and cultural websites … foster user interaction and participation” (para. 3). Essentially, gutenberg.org and pgdp.net/c/ (DP) form a symbiotic relationship that actuates virtual community participation.

At the same time, Web 2.0 technologies need to be designed in such a way that they can keep on attracting, growing and, most importantly, maintaining a vibrant community of dynamic content contributors. Lampert and Chung (2011) rightly point out that community requirements must be consistently met during the planning process of new digital library projects in the sense that key decision points are clearly defined, and processes are well created and documented (p. 83-90).

Quality content
Calhoun (2014) notes that a digital library can be considered a candidate for lasting success if it is perceived by its users “as a hub for a certain type of content that is essential to their shared interests”, including the ability of building up a critical mass of material content (p. 171).

Evidently, Project Gutenberg is very good at doing both if one considers its collection development trajectory over time: “to this day, nobody has done a better job of putting the world's literature at everyone's disposal … and to create a vast network of volunteers all over the world, without wasting people's skills or energy.” (Lebert, 2008). Until the mid-nineties, when the Internet started to become more ubiquitous, Michael Hart was more or less the sole contributor to the project. It then started to expand rapidly through the involvement of a growing number of enthusiastic volunteers in many countries: 1000 books by August 1997; 2,000 in May 1999; 3,000 in December 2000; 4,000 in October 2001; 10,000 in October 2003 and 25,000 books in April 2008 (Lebert, 2008). Presently, the project offers over 50,000 e-texts completely free of charge (Project Gutenberg News, 2015).

Ongoing funding
Baker and Evans (2008, pp. 46-47) identified eight classic economic models for digital libraries including the free model, which is somewhat applicable to Project Gutenberg. Here, the “costs of set-up and maintenance are absorbed by the owner rather than passed on to the user” (Baker and Evans, 2008, p. 46). The reality is that the free model in the Gutenberg context is conditional in the sense that it operates as a public digital library relying on the financial support and e-text contributions of the wider public.

The Project Gutenberg Literary Archive Foundation operates as a tax-exempt registered charity drawing on individual peoples’ financial donations (Project Gutenberg, 2014). As a result, its organisational structure is lean and low-cost by definition; only two paid part-time individuals are presently employed. The project is flexible and adaptive in its approach to generating donations. PG also generates funding streams through the services offered by micro-crowdfunding platforms. Since 2012, Project Gutenberg uses flattr.com (n.d.), which enables individuals to contribute to PG (and other projects) on an ongoing basis at flattr.com/thing/509045/Project-Gutenberg (Teller, 2012).

Arms, Calimlim, and Walle (2009) observed that “financial sustainability is the Achilles' heel of digital libraries”. PG’s very business model is rooted in the idea of collection building through an army of volunteers, lean operations management, and a highly optimised operational cost-base. The project’s continued success depends on the fusion of two distinct roles: e-text contributions and financial support through a committed volunteer base.

Arguably, the risk of failure is mitigated by the fact that the individual volunteer feels very much empowered. (S)he is encouraged to influence every aspect of Gutenberg Project. Contributors are given the opportunity to shape the project’s daily operation, as well as its strategic projection into the future (see an example list of volunteering opportunities listed at gutenbergnews.org/category/volunteers/).

The road ahead
It is challenging to speculate on what further developments could take place within the Project Gutenberg universe. Maron, Smith and Loy (2009) identified various factors for ongoing success and stability in digital library projects: 1) dedicated and entrepreneurial leadership, 2) a clear value proposition, 3) minimising direct costs, 4) developing diverse revenue sources, 5) clear accountability and metrics for success (p. 13-27).

Project Gutenberg delivers on all of the above as previously outlined. The project is pro-active and embraces opportunities to expand its appeal. It links up with a variety of like-minded partners and affiliates, including, for example, Wattpad (Wattpad, n.d.), which lists thousands of titles accessible for computer or mobile devices at m.wattpad.com (Project Gutenberg, 2014). Wattpad, just like ManyBooks.net, delivers PG titles to the mobile reader, which expands its potential user base significantly. For a full list of affiliates and partner see Partner, Affiliates and Resources (Project Gutenberg, 2014).

An interesting link-up is also OCLC’s WorldCat indexing of targeted Project Gutenberg books, which currently includes over 1,400 titles: https://www.worldcat.org/search?q=au%3AProject+Gutenberg.&qt=hot_author.

As a strategy to expand revenue resources, Maron, Smith and Loy (2009) consider the approach of licensing content to users or commercial publishers (p. 23). In principle, this idea could be considered by PG, whereby copyright owning authors and smaller-scale publishers could be invited to contribute to the project to get access to an expanding user base. However, there are significant cost imperatives involved.

Though the professional licensing business can be lucrative – professional clients, in particular, have the ability and     motivation to pay for this content – there are significant costs associated with meeting the unique needs of these demanding customers in a competitive environment. Professional clients require custom tools, functionality and metadata to address their specific needs, and labour-intensive customer support must be available. Maron, Smith and Loy (2009, p. 24)

Realistically, these expectations cannot be met as PG’s legal status and operational model revolve around the volunteer principle. Commerciality, by definition, is outside of the equation.


Post a Comment