KeyspiderKeyspider
Knowledge Hub/Case Study
Case Study

From Keyword to Concept: How a Research University Transformed Discovery Across 180,000 Academic Resources

A research-intensive university deployed Keyspider semantic search across its institutional repository, library catalogue, research databases, and faculty publications — 180,000 resources that keyword search was consistently failing to surface. Research discovery time fell by 58%. Postgraduate students reported finding relevant prior work in minutes that had previously taken days.

12 min readHigher EducationMay 2025Read Case Study

58%

reduction in research discovery time

Academic research has a discoverability problem that nobody talks about openly because it feels like an admission of failure: researchers routinely fail to find relevant work that already exists. Not because it is hidden, but because keyword search cannot bridge the gap between the way a researcher describes their question and the way prior work was indexed in a catalogue, repository, or database two decades ago. A research university decided to fix this — not just for discovery of its own institutional repository, but for the entire academic resource ecosystem it provided. The results changed how researchers, librarians, and postgraduate students described the experience of finding prior work.

The University and Its Research Infrastructure

The university is a research-intensive institution with approximately 3,200 academic staff, 8,400 postgraduate students (including 2,100 doctoral candidates), and 24,000 undergraduate students. Its research profile spans the natural sciences, engineering, social sciences, health sciences, law, and humanities. Annual research revenue exceeds $380 million, placing it among the top research universities in its country.

The library's digital resource portfolio comprised 180,000 items: 42,000 theses and dissertations in the institutional repository, 68,000 journal articles and conference papers authored by university affiliates, 24,000 research datasets and supplementary materials, 18,000 digitised special collection items, 14,000 technical reports and working papers, and 14,000 items in miscellaneous academic resource categories. These were distributed across four separate systems: the institutional repository (hosted on an open-source repository platform), the library catalogue, a research data portal, and a digitised collections viewer.

Each system had its own search interface, its own metadata schema, and its own access control model. A postgraduate student beginning a literature review would need to search four separate systems, adapt their query to each system's metadata conventions, reconcile duplicate results, and manually compile a reference list from four separate interfaces. The library's own user research, conducted annually, had placed 'difficulty finding relevant prior work across all systems' as the top digital services pain point for three consecutive years.

180,000

academic resources across 4 separate systems

2,100

doctoral candidates regularly conducting literature reviews

4

separate search interfaces researchers navigated per literature review

3 years

cross-system discoverability ranked as top library pain point

Why Keyword Search Fails Academic Discovery

Academic research terminology evolves. A paper published in 2003 on 'neural network language models' may be directly relevant to a researcher studying 'large language model alignment' in 2025 — but keyword search cannot make this connection. The concepts are related; the vocabulary is not. Semantic search, which understands conceptual proximity rather than lexical identity, bridges this gap.

The university's library conducted a structured test in early 2024 to quantify the keyword search failure rate. Twenty postgraduate students — five from each of four disciplines — were each given three research questions relevant to their field and asked to find the five most relevant items in the institutional repository using the existing keyword search interface. An expert panel of senior academics then independently identified the five most relevant items for each question. The overlap — how many of the student-identified items matched the expert panel's selections — was 34% on average. Keyword search was returning what was lexically relevant; it was missing what was conceptually relevant.

The vocabulary problem in academic search

Academic terminology changes faster than repository metadata. A researcher searching for prior work on 'transformer attention mechanisms' may find nothing in a repository catalogued in 2019 under 'sequence-to-sequence neural network architectures' — even though the items are directly relevant. Semantic search understands that these describe related concepts and surfaces the older work alongside contemporarily-termed material. For researchers doing thorough literature reviews, this is not a convenience improvement — it is the difference between missing and not missing prior work that could affect the validity of their research.

A second failure mode was cross-system invisibility. A postgraduate student researching a topic that spanned empirical data (research data portal), published findings (institutional repository), and methodological background (digitised working papers collection) needed to search three separate systems, with no guarantee that all relevant content had been found. The absence of a unified search layer was not just inconvenient — it was a systematic source of literature review incompleteness.

What the Library Deployed

The library's digital services team deployed Keyspider across all four resource systems, creating a unified semantic search index covering all 180,000 items. The deployment used Keyspider's API connectors to index content directly from each system's repository API — pulling item metadata, abstracts, full-text content where available (institutional repository theses, working papers, and technical reports), and access control flags (open access vs. authenticated-only items).

Access control was preserved through Keyspider's permission-aware indexing. Open-access items were surfaced in the public search interface. Items requiring university authentication — subscription-based journal articles, restricted datasets, and embargoed theses — appeared in search results with an authentication prompt, directing students to log in via the university's SSO system before accessing the content. No restricted content was ever returned in full to unauthenticated users.

The search interface was deployed in two locations: a unified academic search portal accessible from the library homepage, and an embedded search widget on the postgraduate research portal — the entry point most commonly used by doctoral candidates beginning literature reviews. Both surfaces drew from the same unified index; the postgraduate portal widget applied a default filter prioritising theses, journal articles, and working papers over digitised special collections for that audience.

An AI answer layer was configured for methodology and policy queries — questions like 'what citation format does the university require for doctoral theses?' or 'what is the open access policy for published research?' that could be answered from the library's own guidance documentation rather than the research content. This separated the 'find research content' use case from the 'navigate library policy' use case, keeping the AI responses appropriately scoped.

Results: First Academic Year

58%

reduction in self-reported research discovery time

2.4×

increase in institutional repository item retrievals

71%

of researchers found relevant cross-system content they would not have found otherwise

94%

postgraduate researcher satisfaction with unified search

Discovery Time

The library's annual user research survey — administered nine months after deployment — asked postgraduate researchers to estimate the time required for an initial literature review search phase (the period from beginning to feel they had identified the most relevant prior work in their area). Pre-deployment, the median estimate was 3.5 days. Post-deployment, the median was 1.5 days — a 58% reduction. The distribution had also narrowed: the pre-deployment range was 1–12 days, reflecting the highly variable outcomes of keyword-based manual searching. Post-deployment, the range was 0.5–4 days.

Repository Retrievals

Monthly item retrievals from the institutional repository — the number of times a repository item was accessed (downloaded or viewed) — increased 2.4 times in the first academic year post-deployment compared to the equivalent prior-year period. This was not attributable to content growth: new items added to the repository in that period represented a 4% increase in total holdings. The increase was attributable to existing items becoming discoverable through semantic search that keyword search had consistently failed to surface.

Cross-System Discovery

The user survey asked researchers whether they had found relevant content in a system they would not have thought to search using the previous separate interfaces. Seventy-one percent answered yes. The most frequently cited cross-system discoveries were: finding relevant datasets in the research data portal while searching for published findings in the institutional repository, and finding relevant digitised working papers from the 1990s while searching for contemporary journal articles on the same concept. These items had been inaccessible not because they were unavailable, but because no researcher would have known to search a fourth separate system for them.

"I found three papers from the 1980s that were directly relevant to my thesis — work I would never have found by searching each system separately. One of them changed how I framed my entire literature review. It was sitting in the digitised collections for 40 years, completely invisible."

Doctoral candidate, Faculty of Social Sciences

Library Reference Enquiries

Reference enquiries to the library's research support team — questions about how to search for literature, how to access specific resource types, and how to navigate the repository — fell 38% over the first academic year. The library's research support librarians redirected this capacity to higher-value work: one-on-one research methodology consultations, systematic review support for postgraduate students, and the development of discipline-specific search guides that the analytics data showed were needed for specific subject areas with high zero-result rates.

For Undergraduate Students: A Different Benefit

The deployment's impact on undergraduate students was different from its impact on postgraduate researchers — and in some respects more immediate. Undergraduates conducting assignments and assessments typically search for specific resource types (a peer-reviewed article on a particular topic, a dataset for a class project) rather than conducting comprehensive literature reviews. For them, the benefit of semantic search was not depth of discovery but ease of first-contact: finding one highly relevant resource quickly, rather than navigating through pages of loosely matched keyword results.

The undergraduate population's use of the unified search was 4.3 times higher than their use of the individual system interfaces in the equivalent prior-year period — a clear signal that the simpler, single-interface experience drove adoption. The library's digital access team noted that many undergraduates had not previously used the institutional repository at all, despite it containing directly relevant undergraduate-level resources. Semantic search, by surface the repository's content alongside the library catalogue in a single results page, introduced students to a resource type they had not previously known to use.

Ready to make your institution's research genuinely discoverable?

Book a demo with our higher education team. We'll show you how unified semantic search works across your institutional repository and library systems.

Book a Demo

Ready to give your users better answers?

AI Search, AI Assistant, and Workplace Search. Deployed in days, not months. See it live on your own content.

No credit card required · Live in 2 weeks · Cancel anytime