You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Objective

Structuring chemical datasets with a modern, secure, and scalable tech stack - grounded in information retrieval principles - transforms raw data into actionable knowledge. This foundation is essential for supporting experimentation, enabling scientific exploration, integrating workflows, and powering advanced simulations and machine learning models. Ultimately, it accelerates the discovery and development of new products, providing a strategic advantage in scientific innovation.


Eligible Datasets


Dataset NameDescriptionFormat(s)SizeUpdate FrequencyReference/Link
OPoly26Large-scale open dataset of 26M+ unique polymer structures with computed properties and rich metadata for polymer informatics and machine learningCSV, JSON, SDF, HDF5, SMILES, SELFIES~1.2 TBBatch releases (every few months)arXiv:2512.23117
PolyInfoPolymer properties, structures, synthesis routesCSV, SDF, JSON~GBsPeriodichttps://polymer.nims.go.jp/
Polymer GenomePolymer property predictions, descriptorsCSV, JSON~GBsPeriodichttps://www.polymergenome.org/
Materials ProjectInorganic materials, properties, structuresJSON, CSV~TBsWeeklyhttps://materialsproject.org/
QM9Small organic molecules, quantum propertiesXYZ, CSV, JSON~GBsStatichttps://deepchemdata.s3-us-west-1.amazonaws.com/datasets/qm9.csv
Open Catalyst ProjectCatalysts, surface reactions, DFT calculationsHDF5, JSON~TBsPeriodichttps://opencatalystproject.org/
PubChemChemical structures, properties, bioactivitySDF, CSV, JSON~TBsDailyhttps://pubchem.ncbi.nlm.nih.gov/
  • No labels