Skip to main content

DOI Check

1. Overview and Context

This procedure extractschecks the DOIsDOI extracted from a PDF byagainst meansthe of a custom Python script and returns a CSV file.PDF.

2. Triggers

The execution of this procedure is usually triggered by

3. Steps to Be Performed

  • SaveOpen athe copyCSV offile generated by the DOI Extraction Script toprocedure. yourThe hardfile drivecontains asa doiextract.pycolumn with ordinals and a column with DOIs.
  • MoveThe first line will contain the DOI of the publication itself. Remove the DOI number and keep the empty place for the first DOI missed by the script (see below).
Search the PDF for the string DOI: and compare each instance against the CSV file. The DOI script will have generated the following errors:
    DOIs cut with a line break after a slash /: these are not recognized and should be added manually with the correct ordinal. DOIs cut with a line break after a period .: these are only recognized up to the sameperiod, folderthe rest should be added the entry. OpenDOIs at the end of a page: the script filewill in a code editor Change pdf_path torecognize the filepage namenumber as part of the PDFDOI, (incl.and .pdf)will have to be removed. Open a command line tool like Terminal Navigate toOnce the folder containing the script and the PDF Run python3 doiextract.py The script will generate a CSV file ofhas been corrected make sure each DOI is correctly numbered and there are no empty lines or spaces. Save the PDFCSV file suffixed with _dois.csvfile.

    4. Additional Information

    5. Document Control

    Document ID PRO-003004
    Document Owner Vincent
    Version 1.0
    Last Date of Change October 2, 2025
    Next Review Due Date
    Version & Change Tracking