DOI Check
1. Overview and Context
This procedure extractschecks the DOIsDOI extracted from a PDF byagainst meansthe of a custom Python script and returns a CSV file.PDF.
2. Triggers
The execution of this procedure is usually triggered by
3. Steps to Be Performed
SaveOpenathecopyCSVoffile generated by the DOI ExtractionScripttoprocedure.yourThehardfiledrivecontainsasa
column with ordinals and a column with DOIs.doiextract.pyMoveThe first line will contain the DOI of the publication itself. Remove the DOI number and keep the empty place for the first DOI missed by the script (see below).- Search the PDF for the string
DOI:
and compare each instance against the CSV file. - The DOI script will have generated the following errors:
- DOIs cut with a line break after a slash
/
: these are not recognized and should be added manually with the correct ordinal. - DOIs cut with a line break after a period
.
: these are only recognized up to thesameperiod,folderthe rest should be added the entry. OpenDOIs at the end of a page: the scriptfilewillin a code editorChangepdf_pathtorecognize thefilepagenamenumber as part of thePDFDOI,(incl.and.pdf)will have to be removed.
- DOIs cut with a line break after a slash
Open a command line tool like TerminalNavigate toOnce thefolder containing the script and the PDFRunpython3 doiextract.pyThe script will generate aCSV fileofhas been corrected make sure each DOI is correctly numbered and there are no empty lines or spaces.- Save the
PDFCSVfile suffixed with
file._dois.csv
4. Additional Information
5. Document Control
Document ID | PRO- |
Document Owner | Vincent |
Version | 1.0 |
Last Date of Change | October 2, 2025 |
Next Review Due Date | |
Version & Change Tracking |