Hathi Members: Exporting Holdings

HathiTrust members are required to update their holdings yearly.

Instructions for Creating Lists:

Hathi requires 3 separate .tsv (tab separated values) files.

  1. Single-part monographic holdings – leader 07 (bib level) = m or i
  2. Multi-part monographic holdings – leader 07 (bib level) = a or c
  3. Serial holdings  – leader 07 (bib level) =s, b, or d

Below are links to search strategies used in CBBCat and URSUS to pull print materials lists, including government documents.  Searches include restricting to materials with a valid OCLC 001, and eliminating form items equal to non print formats.  Note that in Create Lists the FormItem search fields are different based on rec type.  Thus there are 3 separate lists created for single multi part monographs based on rec type (Note that it might be more efficient to dump the marc records as one batch and parse through them with a script).  More on FormItem can be found in CSDirect.  Note also that suppressed and withdrawn items are included as these are still considered holdings by Hathi. (See note below on tracking withdrawn materials.)

Colby Saved Searches:

Hathi-CBY-Single-part-monograph-at Hathi-CBY-Multi-part-monograph-at
Hathi-CBY-Single-part-monograph-cd   Hathi-CBY-Multi-part-monograph-cd
Hathi-CBY-Single-part-monograph-ef  Hathi-CBY-Multi-part-monograph-ef
 Hathi-CBY-Series

Note below the fields that were used for Form Item in each saved search:
SearchStrategies

University of Maine (Orono): Saved search: Hathi-MEU-Monographs (Used in 2014 – may need to be updated to above form item fields)

(BIBLIOGRAPHIC  MARC Tag 001  starts with  "ocn"    OR 
BIBLIOGRAPHIC  MARC Tag 001  starts with  "ocm"    OR 
BIBLIOGRAPHIC  MARC Tag 001  matches  "^001..(|a){0,1}[0-9]+") AND 
(BIBLIOGRAPHIC  BRANCH  starts with  "d"    OR 
BIBLIOGRAPHIC  BRANCH  starts with  "o")    AND 
(BIBLIOGRAPHIC  REC TYPE  equal to  "a"    OR 
BIBLIOGRAPHIC  REC TYPE  equal to  "c"    OR 
BIBLIOGRAPHIC  REC TYPE  equal to  "d"    OR 
BIBLIOGRAPHIC  REC TYPE  equal to  "e"    OR 
BIBLIOGRAPHIC  REC TYPE  equal to  "f"    OR 
BIBLIOGRAPHIC  REC TYPE  equal to  "t")    AND 
(BIBLIOGRAPHIC  BIB LEVL  equal to  "m"    OR 
BIBLIOGRAPHIC  BIB LEVL  equal to  "i")    AND 
BIBLIOGRAPHIC  FormItem  not equal to  "a"    AND 
BIBLIOGRAPHIC  FormItem  not equal to  "b"    AND 
BIBLIOGRAPHIC  FormItem  not equal to  "c"    AND 
BIBLIOGRAPHIC  FormItem  not equal to  "o"    AND 
BIBLIOGRAPHIC  FormItem  not equal to  "q"    AND 
BIBLIOGRAPHIC  FormItem  not equal to  "s"    AND 
BIBLIOGRAPHIC  BRANCH  not equal to  "oweb "    AND 
BIBLIOGRAPHIC  BRANCH  not equal to  "owebb"    AND 
BIBLIOGRAPHIC  BRANCH  not equal to  "dweb "    AND 
BIBLIOGRAPHIC  BRANCH  not equal to  "dwebb"

See also the saved searches: Hathi-MEU-Multi-part-monograph  & Hathi-MEU-Series

Instructions for Exporting and Formatting Files:

Data can be exported using the Saved Export “Hathi”.   This exports 001, bib number, 022, 008 position 28 (gov doc*), and item Volume fields in a tab delimited format with multiple fields separated by semicolons. Note that the ‘Field delmiter’ <9> is for the tab format.

IIIHathiExport

*Since position 28 in the 008 of music (scores) is not Gov Docs this may result in a few scores being coded as gov docs for Hathi, but the numbers are negligible.

Hathi requires different fields to be included in the 3 files, as outlined below.   They also request that government documents be flagged as such if possible.   The exported files can either be edited manually to conform to the required specifications, changing the gov doc values of ‘s’ and ‘f’ to ‘1’ and all others (including blank) to ‘0’, or you may use this perl script.  If using the script simply change the 4 variables at the top (infile, outfile, oclcfile and outputformat) to match your scenario.

Fields and Filenames  for Hathi Submission in Tab Delimited files:

Single Part Monograph ( filename: {symbol}_single-part_yyyymmdd.tsv):
OCLC , Bib #, Holding Status (blank), Condition (blank), Gov Doc
Multi Part Monograph: ( filename: {symbol}_multi-part_yyyymmdd.tsv):
OCLC , Bib #, Holding Status (blank), Condition (blank), Vol/Copy, Gov Doc
Series: ( filename: {symbol}_serials_yyyymmdd.tsv)
OCLC, Bib #, ISSN, Gov Doc

Also note that Hathi will allow some access to materials that have been lost and withdrawn. This requires the member libraries to maintain a list of withdrawn material. The dropbox folders listed at the top of this document include lists of OCLC numbers output, and the script mentioned in the previous paragraph will generate a list of OCLC numbers of the current export.  By comparing the two lists it would be possible to determine what has been withdrawn over the year.    Here’s another small perl script to compare lists of OCLC numbers, and append missing numbers to monographs file (assuming most of these would be monographs.)

Number of records Submitted:

Colby – 2015
Single-part monographs 463,552
Multipart monographs 89
Serials 3,638
University of Maine – 2014
Single-part monographs 1,110,693
Multipart monographs 825
Serials 42,449

Reminder of Leader and FormItem values leaderhints


MSCS >> People >> Technical Services Subcommittee >> Hathi Members: Export Holdings