On this page:
You can use JSTOR Seeklight to generate high-quality transcripts for items containing text, including handwritten, typed, and mixed-media documents. Seeklight can generate transcripts in multiple languages, though support and accuracy vary by language, script, and document quality.
Transcripts can be searched, edited, and downloaded from JSTOR Digital Stewardship Services, or you can choose to upload your own transcript. Transcripts can be generated or uploaded for any item in JSTOR Stewardship Tier 3, whether or not it was uploaded through Seeklight.
Notes:
- Transcripts can be generated with Seeklight for PDF, JPEG, and TIFF (up to 400MB) only.
- Transcripts cannot be generated in bulk or published to JSTOR directly from JSTOR Stewardship at this time.
- Seeklight's transcript generation is separate from machine-generated transcripts for audio and video content.
Generating transcripts with Seeklight
You can generate transcripts when uploading items through Seeklight, or you can generate them at any time from an item’s page within JSTOR Stewardship.
During upload
To create transcripts while uploading:
- To start with a new project, select Upload media and generate metadata with JSTOR Seeklight from the JSTOR Stewardship Home page.
- Alternatively, to add items to an existing project, open the Add drop-down menu and select Upload & Generate Metadata.
- See Generating, reviewing, and editing metadata for more information.
- Select Browse Files and choose your PDF (.pdf), JPEG (.jpeg/.jpg), or TIFF (.tiff/.tif) file(s).
- For TIFF, individual files up to 400MB are supported.
- Check the checkbox beside Create transcripts for items with text, after the optional context box.
- Select Upload.
Your items’ transcripts will be generated along with the item-level metadata, and will appear under the Transcript tab on the item page after processing.
For existing content
To create a transcript for an existing item:
- In your project, double-click an item to edit it or select an item and select Edit.
- Select the Transcript tab after Item details.
- Select the Generate transcript button.
Your item’s transcript will be generated and will appear on the Transcript tab once it has been processed.
Note:
For items that have been processed through JSTOR Seeklight, generating transcripts with Seeklight is included in your item allowance and does not count as an additional item processed. This includes if you choose to generate transcripts when uploading your items through Seeklight, or if you generate a transcript for an existing item that was originally uploaded through Seeklight.
For items that were not processed through Seeklight, generating a transcript will count as one (1) item out of your item allowance.
Uploading or replacing transcripts
In addition to generating transcripts with Seeklight, you can also upload your own transcripts for any text item in JSTOR Stewardship, including items with Seeklight-generated transcripts. (See Transcripts for audio and video for information on audio and video content.)
You can upload transcripts for any text or image file type and uploading your own transcripts does not count toward your item processing allowance.
To upload a transcript or replace an existing transcript:
- In your project, double-click an item to edit it or select an item and select Edit.
- Select the Transcript tab after Item details.
- Select the Upload a transcript button after Add a transcript manually.
- Alternatively, for items with an existing transcript, open the three-dot More actions menu and select Upload new transcript.
- Select Upload file and choose your TXT (.txt) or RTF (.rtf) file.
- Select the Upload button.
Tip: To include line breaks in your uploaded transcripts...
- For RTF files, you can add line breaks to the file prior to uploading or afterward in the transcript editor.
- For TXT files, line breaks in the file will not be preserved during upload, but you can add line breaks afterward in the transcript editor.
Editing transcripts
Whether generated by Seeklight or uploaded, you can edit your transcripts in JSTOR Stewardship by selecting the Edit button under the Transcript tab on the item page. Select the Save changes button to save your edits.
The transcript editor supports the following:
- Unicode (UTF-8) characters
- Bold, italics, underline, and strikethrough text formatting
- Page breaks
If needed, you can revert to the original Seeklight-generated transcript at any time by selecting Revert to generated transcript from the three-dot More actions menu.
Downloading transcripts
You can download the transcript for an individual item in RTF or TXT format using the Download button in the Transcript tab on the item page. The downloaded transcript can then be edited and re-uploaded, if desired.
To download transcripts for one or more items in bulk:
- In your project, select one or more items.
- At this time, downloading transcripts in bulk is limited to 250 items per export.
- Open the three-dot more actions menu and select Download transcripts.
- If this option is disabled, make sure you don't have more than 250 items selected or that the Select all... link is not selected.
- Optionally, Select a file format for your transcripts: TXT (default) or RTF.
- Select Send download link to finish.
A link to a ZIP file containing one transcript for each selected item (if available) will be emailed to you. Transcript filenames will match the filenames of the associated media files.
Transcripts and accessibility
Transcripts can be used as an accessible text alternative for images of text, such as handwritten letters. (This is a technique to meet WCAG SC 1.4.5 Images of Text (AA). You can learn more about SC 1.4.5 on W3C’s website.)
RTF and TXT files are machine-readable files, which means your downloaded transcripts are in an accessible file format and ready to be presented as an accessible alternative wherever you would like to display them. If you choose to convert them into an alternate format, such as a PDF, be sure to check accessibility guidelines for the corresponding file format.
Accessibility techniques
This section has recommendations for how you can use Seeklight-generated transcripts or your own uploaded transcripts to increase your items’ accessibility on JSTOR. Each recommendation is an equivalent technique to meet accessibility requirements. You can select the technique that fits your institutions’ needs, or you can combine techniques if desired.
Technique #1: Provide a hyperlink to the transcript within your item’s metadata
Download and host your transcript somewhere it can be accessed, such as your institution’s library website. You can then add a link to the transcript, depending on the metadata schema your project uses:
- If your project uses the AI-Compatible Dublin Core schema, you can include a link to the transcript in the Description field.
- If your project uses any other metadata schema, you can create a custom field labeled "Transcript" or "Link to transcript" and link to the transcript in that field.
To be accessible, make sure your link’s purpose is clearly communicated. You should not use the naked URL or words such as "click here." Example:
<a href="https://www.example.org">Accessible text transcript for [item name]. (This will open a new page.)</a>For more information on using hyperlinks in your project fields, see Hyperlinks in metadata.
Technique #2: Include the transcript text in a custom field
If your project does not use the AI-Compatible Dublin Core schema, you can create a custom text area field within your project labeled "Transcript" or "Accessible Text Transcript."
You can then copy and paste your transcript text into this field. The transcript text will appear as plain text within the Item Details section on JSTOR. This approach ensures the transcript text is indexed in our search, but please note the transcript will not be downloaded along with the item.
The item page on JSTOR supports accessibility features such as text resizing and reflow. Information about the accessibility features the JSTOR platform supports is available in our Accessibility policies.
This approach would work best for shorter text items such as postcards or short letters. Text within a custom field will display in full on the item page on JSTOR. Large amounts of text within a single field may be difficult for some users to navigate.
Technique #3: Publish the transcript as a separate item
Upload your transcript file as a separate item in your project, which can then be published to JSTOR. If your project uses the AI-Compatible Dublin Core schema, you can link the items by clearly titling them and including hyperlinks in the items’ Description fields.
For example, if you had an item titled "Letter to Mr. Smith, 1905," you could upload the transcript and title it "Letter to Mr. Smith, 1905 [Accessible Transcript]." In the original item’s Description field, you can add a link to the accessible transcript item on JSTOR, and in the accessible transcript’s Description field, you can add a link to the original item.
Alternatively, if your project uses any other metadata schema, you can group the item and its transcript together using a container.
Either approach ensures the transcript is indexed in JSTOR’s search and can be downloaded by users.
Technique #4: Add the transcript to your media file
If you would prefer the transcript to appear and be downloadable as part of the original item on JSTOR, you can choose to add the transcript to your item’s media file.
For example, if you have a 3-page letter, you can add the transcript to the end of the letter and save it as a single PDF file using your PDF-editing software of choice. You can then re-upload the media file with the included transcript and publish the item to JSTOR. This approach ensures the transcript is indexed in JSTOR’s search and can be downloaded by users.
We recommend adding a note to the item’s Description field clearly stating the transcript is included in the item file.
Transcripts and OCR
Please note that Seeklight-generated transcripts are not the same as OCR (Optical Character Recognition). OCR is a technology which converts an image of text into machine readable text. OCR is embedded in the file’s metadata, so a PDF with OCR does not require an accessible alternative. Transcripts are presented as an alternative format to an image of text, and do not need to be embedded into the original file to meet accessibility requirements.
If publishing the transcript to JSTOR using one of the techniques described above, either as a separate item or with the original item, you will be transforming the machine-readable transcript file to a PDF, which requires OCR to be machine readable.
All textual content types published to JSTOR have OCR automatically applied to them if there is no existing OCR in the document. Since this is an automated process, we recommend double-checking the downloaded PDF to ensure the OCR meets your accessibility standards.
You can use a program like Adobe Acrobat to review or apply your own OCR to your files.