File size: 4,434 Bytes
3961ee7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e204e43
3961ee7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e204e43
 
3961ee7
 
3f91261
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104

---
license: mit
language: en
tags:
- materials science
- synthesis prediction
- lightgbm
- cheminformatics
datasets: []
metrics:
- accuracy
- f1
---

# Synthesis Condition Predictor

This model predicts optimal temperature bins and atmosphere categories for inorganic material synthesis.
It was trained on a dataset of text-mined synthesis procedures. Here is the source of the dataset: https://www.nature.com/articles/s41597-019-0224-1

**Models Included:**
* Temperature Bin Prediction (LightGBM)
* Atmosphere Category Prediction (LightGBM)

**Intended Use:**
To assist researchers in designing synthesis experiments by predicting key process parameters.
Input a target material, precursors, and basic operational details to get predictions.

**How to Use:**
```python
# Ensure your inference script and its dependencies are in the PYTHONPATH
# from synthesis_predictor_hf_repo.src.inference import predict_synthesis_outcome, load_all_artifacts_once

# Or, if running from a cloned repo where 'src' is a subdirectory:
# from src.inference import predict_synthesis_outcome, load_all_artifacts_once

# if not load_all_artifacts_once():
#     print("Failed to load model artifacts.")
# else:
#     raw_input_example = {
#         'target_formula_raw': "YBa2Cu3O7",
#         'precursor_formulas_raw': ["Y2O3", "BaCO3", "CuO"],
#         'operations_simplified_list': [
#             {'type': 'MixingOperation', 'string': 'Ball milling for 2h', 'conditions': {'duration': [{'value':2, 'unit':'h'}]}},
#             {'type': 'HeatingOperation', 'string': 'Calcined at 920C for 10h in air', 
#               'conditions': {'heating_temperature': [{'value':920}], 'heating_time': [{'value':10}], 'atmosphere':'air'}},
#             {'type': 'HeatingOperation', 'string': 'Sintered at 950C for 20h in O2', 
#               'conditions': {'heating_temperature': [{'value':950}], 'heating_time': [{'value':20}], 'atmosphere':'Oxygen'}}
#         ],
#         'reactants_coeffs': [("Y2O3", 0.5), ("BaCO3", 2.0), ("CuO", 3.0)], # Example, adjust as needed
#         'products_coeffs': [("YBa2Cu3O7", 1.0)] # Example
#     }
#     predictions = predict_synthesis_outcome(raw_input_example)
#     print(predictions)
```

**Limitations:**
* The model's accuracy is around 68-72%.
* Predictions are based on patterns in the training data and may not generalize to all chemical systems.
* The feature engineering for process parameters in the inference script relies on the user providing an `operations_simplified_list` that can be parsed by the internal logic. The quality of these inputs directly affects prediction accuracy.

**Training Data:**
The model was trained on a proprietary dataset of text-mined inorganic synthesis procedures. (Kononova et al.)
https://www.nature.com/articles/s41597-019-0224-1

**Evaluation Results:**
The models were evaluated on a hold-out test set.

**1. Tuned Temperature Bin Prediction Model:**
* **Overall Test Set Accuracy:** 0.6821
* **Overall Test Set F1 Score (Weighted):** 0.6785
* **Per-Class Performance (Test Set):**
    ```
                                  precision    recall  f1-score   support

        TempBin_1_(1_to_900]       0.77      0.79      0.78       954
     TempBin_2_(900_to_1100]       0.62      0.53      0.57       743
    TempBin_3_(1100_to_1300]       0.58      0.58      0.58       768
    TempBin_4_(1300_to_3000]       0.72      0.80      0.76       715

                    accuracy                           0.68      3180
                   macro avg       0.67      0.68      0.67      3180
                weighted avg       0.68      0.68      0.68      3180
    ```

**2. Tuned Atmosphere Category Prediction Model:**
* **Overall Test Set Accuracy:** 0.7193
* **Overall Test Set F1 Score (Weighted):** 0.7174
* **Per-Class Performance (Test Set):**
    ```
                              precision    recall  f1-score   support

                   Inert       0.59      0.38      0.46       139
        Other_Atm_Target       1.00      0.44      0.62         9
               Oxidizing       0.67      0.71      0.69      1552
                Reducing       0.70      0.47      0.56       100
    Unknown_Atm_Category       0.76      0.76      0.76      2098

                accuracy                           0.72      3898
               macro avg       0.74      0.55      0.62      3898
            weighted avg       0.72      0.72      0.72      3898
    ```
)