|
|
|
|
|
--- |
|
|
license: mit |
|
|
language: en |
|
|
tags: |
|
|
- materials science |
|
|
- synthesis prediction |
|
|
- lightgbm |
|
|
- cheminformatics |
|
|
datasets: [] |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
--- |
|
|
|
|
|
# Synthesis Condition Predictor |
|
|
|
|
|
This model predicts optimal temperature bins and atmosphere categories for inorganic material synthesis. |
|
|
It was trained on a dataset of text-mined synthesis procedures. Here is the source of the dataset: https://www.nature.com/articles/s41597-019-0224-1 |
|
|
|
|
|
**Models Included:** |
|
|
* Temperature Bin Prediction (LightGBM) |
|
|
* Atmosphere Category Prediction (LightGBM) |
|
|
|
|
|
**Intended Use:** |
|
|
To assist researchers in designing synthesis experiments by predicting key process parameters. |
|
|
Input a target material, precursors, and basic operational details to get predictions. |
|
|
|
|
|
**How to Use:** |
|
|
```python |
|
|
# Ensure your inference script and its dependencies are in the PYTHONPATH |
|
|
# from synthesis_predictor_hf_repo.src.inference import predict_synthesis_outcome, load_all_artifacts_once |
|
|
|
|
|
# Or, if running from a cloned repo where 'src' is a subdirectory: |
|
|
# from src.inference import predict_synthesis_outcome, load_all_artifacts_once |
|
|
|
|
|
# if not load_all_artifacts_once(): |
|
|
# print("Failed to load model artifacts.") |
|
|
# else: |
|
|
# raw_input_example = { |
|
|
# 'target_formula_raw': "YBa2Cu3O7", |
|
|
# 'precursor_formulas_raw': ["Y2O3", "BaCO3", "CuO"], |
|
|
# 'operations_simplified_list': [ |
|
|
# {'type': 'MixingOperation', 'string': 'Ball milling for 2h', 'conditions': {'duration': [{'value':2, 'unit':'h'}]}}, |
|
|
# {'type': 'HeatingOperation', 'string': 'Calcined at 920C for 10h in air', |
|
|
# 'conditions': {'heating_temperature': [{'value':920}], 'heating_time': [{'value':10}], 'atmosphere':'air'}}, |
|
|
# {'type': 'HeatingOperation', 'string': 'Sintered at 950C for 20h in O2', |
|
|
# 'conditions': {'heating_temperature': [{'value':950}], 'heating_time': [{'value':20}], 'atmosphere':'Oxygen'}} |
|
|
# ], |
|
|
# 'reactants_coeffs': [("Y2O3", 0.5), ("BaCO3", 2.0), ("CuO", 3.0)], # Example, adjust as needed |
|
|
# 'products_coeffs': [("YBa2Cu3O7", 1.0)] # Example |
|
|
# } |
|
|
# predictions = predict_synthesis_outcome(raw_input_example) |
|
|
# print(predictions) |
|
|
``` |
|
|
|
|
|
**Limitations:** |
|
|
* The model's accuracy is around 68-72%. |
|
|
* Predictions are based on patterns in the training data and may not generalize to all chemical systems. |
|
|
* The feature engineering for process parameters in the inference script relies on the user providing an `operations_simplified_list` that can be parsed by the internal logic. The quality of these inputs directly affects prediction accuracy. |
|
|
|
|
|
**Training Data:** |
|
|
The model was trained on a proprietary dataset of text-mined inorganic synthesis procedures. (Kononova et al.) |
|
|
https://www.nature.com/articles/s41597-019-0224-1 |
|
|
|
|
|
**Evaluation Results:** |
|
|
The models were evaluated on a hold-out test set. |
|
|
|
|
|
**1. Tuned Temperature Bin Prediction Model:** |
|
|
* **Overall Test Set Accuracy:** 0.6821 |
|
|
* **Overall Test Set F1 Score (Weighted):** 0.6785 |
|
|
* **Per-Class Performance (Test Set):** |
|
|
``` |
|
|
precision recall f1-score support |
|
|
|
|
|
TempBin_1_(1_to_900] 0.77 0.79 0.78 954 |
|
|
TempBin_2_(900_to_1100] 0.62 0.53 0.57 743 |
|
|
TempBin_3_(1100_to_1300] 0.58 0.58 0.58 768 |
|
|
TempBin_4_(1300_to_3000] 0.72 0.80 0.76 715 |
|
|
|
|
|
accuracy 0.68 3180 |
|
|
macro avg 0.67 0.68 0.67 3180 |
|
|
weighted avg 0.68 0.68 0.68 3180 |
|
|
``` |
|
|
|
|
|
**2. Tuned Atmosphere Category Prediction Model:** |
|
|
* **Overall Test Set Accuracy:** 0.7193 |
|
|
* **Overall Test Set F1 Score (Weighted):** 0.7174 |
|
|
* **Per-Class Performance (Test Set):** |
|
|
``` |
|
|
precision recall f1-score support |
|
|
|
|
|
Inert 0.59 0.38 0.46 139 |
|
|
Other_Atm_Target 1.00 0.44 0.62 9 |
|
|
Oxidizing 0.67 0.71 0.69 1552 |
|
|
Reducing 0.70 0.47 0.56 100 |
|
|
Unknown_Atm_Category 0.76 0.76 0.76 2098 |
|
|
|
|
|
accuracy 0.72 3898 |
|
|
macro avg 0.74 0.55 0.62 3898 |
|
|
weighted avg 0.72 0.72 0.72 3898 |
|
|
``` |
|
|
) |
|
|
|