Controlled Vocabularies in Digital Libraries: Challenges and Solutions for Increased Discoverability of Digital Objects

Digital Library Systems are widely used in the Higher Education sector, through the use of Institutional Repositories (IRs),to collect, store, manage and make available scholarly research output produced by Higher Education Institutions (HEIs).This wide application of IRs is a direct response to the increase in scholarly research output produced. In order to facilitatediscoverability of digital content in IRs, accurate, consistent and comprehensive association of descriptive metadata to digitalobjects during ingestion into IRs is crucial. However, due to human errors resulting from complex IR ingestion workflows,most digital content in IRs have incorrect and inconsistent descriptive metadata. While there exists a broad spectrum ofdescriptive metadata elements, subject headings present a classic example of a crucial metadata element that adversely affectsdiscoverability of digital content when incorrectly and inconsistently specified. This paper outlines a case study conducted atan HEI—The University of Zambia—in order to demonstrate the effectiveness of integrating controlled subject vocabulariesduring the ingestion of digital objects in to IRs. A situational analysis was conducted to understand how subject headings areassociated with digital objects and to analyse subject headings associated with already ingested digital objects. In addition, anexploratory study was conducted to determine domain-specific subject headings to be integrated with the IR. Furthermore, ausability study was conducted in order to comparatively determine the usefulness of using controlled vocabularies during theingestion of digital objects into IRs. Finally, multi-label classification experiments were carried out where digital objects wereassigned with more than one class. The results of the study revealed that a noticeable number of digital content is associatedwith incorrect subject categories and, additionally, associated with few subjects headings: two or less subject headings(71.2%), with a significant number of subject headings (92.1%) being associated with a single publication. A comparativestudy conducted suggests that IRs integrated with controlled vocabularies are perceived to be more usable (SUS Score =68.9) when compared with IRs without controlled vocabularies (SUS Score = 66.2). Furthermore, the effectiveness of themulti-label arXiv subjects classifier demonstrates the viability of integrating automated techniques for subject classification.
Year of Publication
International Journal on Digital Libraries
Number of Pages
Date Published
Journal Article
Chipangila, Bertha, Eric Liswaniso, Andrew Mawila, Philomena Mwanza, Daisy Nawila, Robert M'sendo, Mayumbo Nyirenda, and Lighton Phiri. 2023. “Controlled Vocabularies In Digital Libraries: Challenges And Solutions For Increased Discoverability Of Digital Objects”. International Journal On Digital Libraries, 17. doi:10.1007/s00799-023-00374-1.