Skip to main content

DOI Check

1. Overview and Context

This procedure extractschecks the DOIsDOI extracted from a PDF byagainst meansthe of a custom Python script and returns a CSV file.PDF.

2. Triggers

The execution of this procedure is usually triggered by

3. Steps to Be Performed

  • SaveOpen athe copyCSV offile generated by the DOI Extraction Script toprocedure. yourThe hardfile drivecontains asa doiextract.pycolumn with ordinals and a column with DOIs.
  • MoveThe first line will contain the DOI of the publication itself. Remove the DOI number and keep the empty place for the first DOI missed by the script (see below).
  • Search the PDF for the string DOI: and compare each instance against the CSV file.
  • The DOI script will have generated the following errors:
    • DOIs cut with a line break after a slash /: these are not recognized and should be added manually with the correct ordinal.
    • DOIs cut with a line break after a period .: these are only recognized up to the sameperiod, folderthe rest should be added the entry.
    • OpenDOIs at the end of a page: the script filewill in a code editor
    • Change pdf_path torecognize the filepage namenumber as part of the PDFDOI, (incl.and .pdf)will have to be removed.
  • Open a command line tool like Terminal
  • Navigate toOnce the folder containing the script and the PDF
  • Run python3 doiextract.py
  • The script will generate a CSV file ofhas been corrected make sure each DOI is correctly numbered and there are no empty lines or spaces.
  • Save the PDFCSV file suffixed with _dois.csvfile.

4. Additional Information

5. Document Control

Document ID PRO-003004
Document Owner Vincent
Version 1.0
Last Date of Change October 2, 2025
Next Review Due Date
Version & Change Tracking