Abstract (eng)
Freely available online resources such as ChemSpider, DrugBank, PubChem, and Wikipedia are widely used for obtaining information on drugs. For pharmacy students of the University of Vienna, PharmXplorer is a commonly used source of information.
This project investigates whether the drug-related InChI & InChIKey are consistent in the databases ChemSpider, DrugBank, PubChem, and Wikipedia. On the other hand, a gold-standard dataset was created based on the data of the consistency tests, which were used to validate the databases ChemSpider, DrugBank, PubChem, PharmXplorer, and Wikipedia.
The workflow tool KNIME Analytics Platform was used to obtain InChI & InChIKey for all drugs approved in Austria from ChemSpider, DrugBank, PubChem, and Wikipedia.
The consistency test showed that the total consistency is 79.34%.
The database validation revealed that PubChem performed best with a correctness of 96.59%, followed by DrugBank (96.07%), ChemSpider (93.88%), Wikipedia (92.83%) and PharmXplorer (83.94%).
All in all, whenever International nonproprietary names used to query InChI & InChIKey in four different databases automatically, this results in at least two different InChIs & InChIKeys in 20% of the cases.