Your Source for NPR News & Music
Play Live Radio
Next Up:
0:00
0:00
0:00 0:00
Available On Air Stations

Boston Public Library aims to increase access to a vast historic archive using AI

SCOTT DETROW, HOST:

The Boston Public Library is launching a project this summer to make it easier for the public to search for and read the historically significant documents in its collection. As NPR's Chloe Veltman reports, the plan to digitize hundreds of thousands of documents involves artificial intelligence.

CHLOE VELTMAN, BYLINE: Boston Public Library is one of the oldest and largest public library systems in the country. Its collection includes masses of government documents dating back to the early 1800s.

JESSICA CHAPEL: Things like oral histories, congressional reports, reports on varied industries and communities.

VELTMAN: Jessica Chapel is the library's chief of digital and online services.

CHAPEL: It really is an incredible repository of primary source materials, covering the whole history of the United States.

VELTMAN: Using these materials usually involves showing up at the library, filling out a slip, and waiting for a librarian to fetch them from the stacks. The library now wants this content to be available digitally to anyone who wants it from anywhere in the world and more fully searchable.

CHAPEL: This is where the AI piece of this comes in. We want to be able to enhance the metadata to create more description for these objects so there's, like, this richer detail about what this collection is or what this object is and making these collections more usable.

VELTMAN: So for example, instead of just searching the usual titles, keywords, subject headings and so on, thanks to AI, users will be able to search and cross-reference entire texts - and not just printed materials, handwritten ones, too. But these goals are daunting because the historic collection is massive and fragile.

(SOUNDBITE OF MACHINE RUNNING)

VELTMAN: Every item has to be run through a scanner by hand. It takes about an hour to do 3- to 400 pages. Harvard University said it could help. Researchers at the Harvard Law School Library's Institutional Data Initiative are working with libraries, museums and archives on a number of fronts, including training new AI models to help libraries enhance the searchability of their collections.

Among the initiative's supporters are AI companies. They help fund these efforts and, in return, get to train their large language models on high-quality materials that are out of copyright and therefore less likely to lead to lawsuits. Greg Leppert is the Institutional Data Initiative's executive director.

GREG LEPPERT: The goal is not to give advantaged access to the AI companies. The goal is to get more information out there.

VELTMAN: Leppert says it's a two-way street.

LEPPERT: Where we are improving data in a way that will help AI, that those improvements work their way back into the library so it improves the patron experience, as well.

VELTMAN: OpenAI is helping Boston Public Library cover such costs as scanning and project management, but anyone can have access to the digitized data, not just OpenAI. In a statement to NPR, OpenAI said it always benefits from being able to train its large language models on high-quality materials. Michael Hanegan is the coauthor of a new book about generative AI and libraries. He says AI systems, which have been heavily criticized for regurgitating unreliable information from the internet, can only improve through collaborations between public institutions and tech companies. But he also expresses caution.

MICHAEL HANEGAN: The kind of move fast and break things of Silicon Valley is counter to the values of librarianship, which are about access and transparency.

VELTMAN: Boston Public Library says it plans to digitize 5,000 documents by the end of the year and, if all goes well, grow the project from there. Chloe Veltman, NPR News.

(SOUNDBITE OF FLUME SONG, "SPACE CADET") Transcript provided by NPR, Copyright NPR.

NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.

Chloe Veltman
Chloe Veltman is a correspondent on NPR's Culture Desk.
Related Stories