Hathi Records: Loading into Catalog

Loading Records

Copies of m2bmaps and m2btab are located on:

Dropbox/MSCS Project Team/HathiRecords/LoadProfiles

Load profiles accomplish the following:

1) Prefix 001 with “hathi” (m2bmap.hathi001s).   This to prevent overlaying on current records based on 001. This may or may not be desireable.  Note that in MaineCat matching is based on more than 001, so sometimes records are matching in MaineCat.

2) Create Read Online from HathiTrust links – m2bmap.hathitext . Includes Google Analytics tracking link.

**SEE Note regarding problem when record has multiple 856s!!!

3) Create Google Book link – this references http://mainecat.maine.edu/screens/gbcheck.js which controls text and links output.  (m2bmap.hathiurls).

4) Create POD link. (m2bmap.mscspod)  This references http://mainecat.maine.edu/screens/mscspod.html.

5) Removes 9xx fields

6) Uses ‘hathib’ template for bibs, does not create items. (As of Feb 2015 hathib needs to be made a template on Ursus)

Links that need to be in the page head (toplogo):

<script src=”https://code.jquery.com/jquery-1.9.1.min.js”></script>

and on mainecat:
<script src=”/screens/ga.js”></script>
<script type=”text/javascript” src=”/screens/sharedprint.js”> </script>

Files that need to exist :

http://mainecat.maine.edu/screens/ga.js
http://mainecat.maine.edu/screens/gbcheck.js
http://mainecat.maine.edu/screens/mscspod.html
http://mainecat.maine.edu/screens/sharedprint.js
http://mainecat.maine.edu/screens/umaine.png
http://mainecat.maine.edu//screens/google.png
http://mainecat.maine.edu/screens/hathi_logo.png

 **NOTE regarding multiple 856s:

Some, thought not many,  records have more than one 856 field.   Possible field text is below.

85640 $uhttp://catalog.hathitrust.org/api/volumes/oclc/5518087.html$zHathiTrust Public Domain only in US Access
85640 $uhttp://catalog.hathitrust.org/api/volumes/oclc/5518087.html$zHathiTrust Public Domain Access
85640 $uhttp://catalog.hathitrust.org/api/volumes/oclc/36909847.html$zHathiTrust Creative Commons - No Copyright
85640 $uhttp://catalog.hathitrust.org/api/volumes/oclc/132582276.html$zHathiTrust World Access

If the record has more than one 856, the load table creates 3 urls for each 856.  (E.g. if there are 2, it creates 6.)   In theory one could use the %first directive in the load table to just process the first instance.  This would be ideal EXCEPT that we are also using the %map directive, and from what we’ve see so far, the system ONLY respects the first directive and not the second, so something like this still processes all urls: 856|85641|+|0|0|b|y|0|y|N|0|%map=(“m2bmap.hathitextPD”)%first.

Unless or until this is fixed, there are two possible workarounds:

1)  Use four separate load tables, one for each input file, e.g. CC, PD, US, and World,  and use translation maps that exclude matches to text in all other files.  E.g. in the load table for Public Domain Access:

856|85641|+|0|0|b|y|0|y|N|0|%map=(“m2bmap.hathitextPD”)
856|85641|+|0|0|b|y|0|y|N|0|%map=(“m2bmap.hathiurlsPD”)
856|85641|+|0|0|b|y|0|y|N|0|%map=(“m2bmap.mscspodPD”)

In PD translation tables put regexs that exclude the other possible types of text:

@delimiter=/
/.*only in US.*/
/.*Creative.*/
/.*World.*/

2)  Use Marc Edit or perl Marc::Record to remove extra 856 fields before loading.

A sample record that had two 856s is http://www.worldcat.org/oclc/729489102 , and was contained in the HathiTrustUS.5.mrc file.


Acquiring Records From OCLC WorldShare Kb

NOTE: A Complete set of records from October 2014 are available on Dropbox/MSCS Project Team/HathiRecords/WorldShareRecords in preparation for loading into URSUS.

These directions assume that the Hathi collections are not already selected.  If they are, and a new set of records is desired, they need to be deselected and wait 1-3 weeks for holdings to be removed and then reselected via the process below.  (This as of Summer 2014, it might be worth checking with OCLC again if creating a custom collection is possible and/or if record deselection is faster now. Records were left unselected after download in October 2014.)

In WorldShare choose Manage Collections.    Search “Collection” for “hathitrust”.

worldsharekb1

Select the 4 public domain collections, as shown below.  Note that selection may take some time on the larger collections.

worldsharekb2

Once selected, click on the title to bring up the Holdings and Marc Records option, and open it.

HathiMarc

If you don’t want holdings adding in WorldCat for these records, make sure the following is set to “Disable..” :

worldsharekb4

Under “Enable MARC Record Delivery” choose “Use institution setting”.   Choose “Deliver Records for this collection in a separate file” and give it a name that you will recognize later, e.g.:

ws3

Specify the frequency of delivery.  In most cases you will only be grabbing the new set of files once they are generated, at which point you may want to deselect the holdings.  However, if you wish to continue to receive updated files it is possible to do so:

worldsharekb3

Finally, Under the main menu “Manage Collection” / “Library Holdings” / “Enable Marc Delivery” – make sure Yes is selected.

worldsharekb5

Records will now be delivered for FTP pickup, and will continue to be delivered at the frequency specified above.

Picking Up FTP Marc Records:

1. Enter host: ftp2.oclc.org (or scp.oclc.org)
2. Enter your institution’s username and your password
3. Navigate through the following folders: metacoll—out—ongoing
4. Choose a folder: new, updates or deletes.  In most cases files should be in “New”.

Look for the file name you used in the ‘deliver records for this collection in a separate file’.  Files usually take 24 hours to appear.

 


 

MSCS >> People >> Technical Services Subcommittee >> Hathi Records: Loading into Catalog