Raymond-dev-546730 commited on
Commit
5f4be9a
·
verified ·
1 Parent(s): e62371f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +215 -51
README.md CHANGED
@@ -6,38 +6,48 @@ license: apache-2.0
6
 
7
  ![MaterialsAnalyst-AI Logo](Logo.png)
8
 
9
- A specialized AI model for materials science analysis. Built on Qwen 2.5 Instruct 7B and fine-tuned with LoRA, this model analyzes materials properties and provides structured insights from materials databases.
10
 
11
- ## Key Capabilities
12
 
13
- - **Multi-Property Analysis**: Electronic, mechanical, thermal, structural, and magnetic characteristics
14
- - **Property Correlation**: Identifies relationships between material properties
15
- - **Application Prediction**: Suggests practical applications based on characteristics
16
- - **Stability Assessment**: Evaluates thermodynamic and structural stability
17
- - **Structured Output**: Provides both reasoning process and concise analysis
18
 
19
- ## Quick Start
20
 
21
- **Install dependencies:**
22
- ```bash
23
- pip install torch transformers accelerate safetensors
24
- # For LLaMA.cpp: pip install llama-cpp-python
25
- ```
 
26
 
27
- **Run analysis:**
28
- ```bash
29
- # SafeTensors (recommended)
30
- python Scripts/Inference_safetensors.py
31
 
32
- # LLaMA.cpp (CPU optimized)
33
- python Scripts/Inference_llama.cpp.py
34
- ```
35
 
36
- Edit the `JSON_INPUT` variable in either script with your materials data.
37
 
38
- ## Input Format
 
 
 
 
39
 
40
- Provide materials data as JSON:
 
 
 
 
 
 
 
 
 
 
 
 
41
  ```json
42
  {
43
  "material_id": "mp-8062",
@@ -47,48 +57,202 @@ Provide materials data as JSON:
47
  "band_gap": 3.26,
48
  "formation_energy_per_atom": -0.73,
49
  "density": 3.21,
 
 
 
50
  "elastic_modulus": 448,
51
  "bulk_modulus": 220,
52
- "thermal_conductivity": 490,
 
 
53
  "crystal_system": "Hexagonal",
54
- "magnetic_property": "Non-magnetic"
 
 
 
 
55
  }
56
  ```
57
 
58
- ## Output Structure
59
 
60
- The model provides dual output:
61
 
62
- **Reasoning Process (`<think>` section):**
63
- Step-by-step analysis of material properties and relationships
 
 
 
 
 
 
64
 
65
- **Structured Analysis (`<answer>` section):**
66
- - Composition & Structure
67
- - Electronic Characteristics
68
- - Stability & Performance
69
- - Recommended Applications
70
- - Key Advantages
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  ## Repository Contents
73
 
74
- - `Scripts/` - Inference scripts for SafeTensors and LLaMA.cpp
75
- - `Model_Weights/` - Model files in various formats
76
- - `Data/` - Training dataset
77
- - `Training/` - Training logs
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
- ## Technical Details
 
 
 
80
 
81
- **Base Model:** Qwen 2.5 Instruct 7B
82
- **Fine-tuning:** LoRA (Low-Rank Adaptation)
83
- **Training:** 6,000 samples, 6.4M tokens, NVIDIA A100 (~5.4 hours)
84
- **Data Sources:** Materials Project, AFLOW, custom JSON formats
 
 
85
 
86
- ## Use Cases
 
 
 
 
87
 
88
- - Materials screening and selection
89
- - Property correlation analysis
90
- - Research and development workflows
91
- - Educational materials characterization
92
- - Database curation and analysis
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
- **Developed by:** Mike in collaboration with Oregon State University Materials Modeling and Development Group
 
6
 
7
  ![MaterialsAnalyst-AI Logo](Logo.png)
8
 
9
+ A specialized **open-source** AI model designed to assist materials scientists and researchers in comprehensive analysis and interpretation of materials data. Built on Qwen 2.5 Instruct 7B and fine-tuned with LoRA (Low-Rank Adaptation), MaterialsAnalyst-AI-7B delivers expert-level materials property analysis and actionable insights from complex materials databases.
10
 
11
+ ## Overview
12
 
13
+ MaterialsAnalyst-AI-7B transforms raw materials data into comprehensive, structured analyses through advanced chain-of-thought reasoning. The model excels at interpreting relationships between material properties, predicting applications, and providing clear insights that accelerate materials research and development.
 
 
 
 
14
 
15
+ ### Key Capabilities
16
 
17
+ - **Multi-Property Analysis**: Interprets electronic, mechanical, thermal, structural, and magnetic characteristics
18
+ - **Property Correlation**: Identifies relationships between different material properties and their implications
19
+ - **Application Prediction**: Suggests practical applications based on material characteristics
20
+ - **Stability Assessment**: Evaluates thermodynamic and structural stability indicators
21
+ - **Performance Benchmarking**: Compares materials against industry standards
22
+ - **Structured Reasoning**: Provides both detailed analysis and concise conclusions
23
 
24
+ ## How It Works
 
 
 
25
 
26
+ 1. **Input**: Provide materials data in JSON format with properties, structure, and characteristics
27
+ 2. **Analysis**: The model performs chain-of-thought reasoning about material properties and relationships
28
+ 3. **Output**: Receive structured analysis with practical insights and application recommendations
29
 
30
+ ## Use Cases
31
 
32
+ **Research & Development**
33
+ - Materials screening for specific applications
34
+ - Property correlation analysis
35
+ - Comparative materials assessment
36
+ - Database curation and documentation
37
 
38
+ **Education & Training**
39
+ - Graduate student research support
40
+ - Materials characterization learning
41
+ - Computational results interpretation
42
+
43
+ **Industry Applications**
44
+ - Material selection for product development
45
+ - R&D pipeline automation
46
+ - Technical documentation generation
47
+
48
+ ## Example Analysis
49
+
50
+ ### Input Data
51
  ```json
52
  {
53
  "material_id": "mp-8062",
 
57
  "band_gap": 3.26,
58
  "formation_energy_per_atom": -0.73,
59
  "density": 3.21,
60
+ "volume": 41.2,
61
+ "nsites": 8,
62
+ "is_stable": true,
63
  "elastic_modulus": 448,
64
  "bulk_modulus": 220,
65
+ "thermal_expansion": 4.2e-06,
66
+ "electron_affinity": 4.0,
67
+ "ionization_energy": 6.7,
68
  "crystal_system": "Hexagonal",
69
+ "magnetic_property": "Non-magnetic",
70
+ "thermal_conductivity": 490,
71
+ "specific_heat": 0.69,
72
+ "is_superconductor": false,
73
+ "band_gap_type": "Indirect"
74
  }
75
  ```
76
 
77
+ ### Model Output
78
 
79
+ The model provides dual-structured output:
80
 
81
+ **Reasoning Process (`<think>` section)**
82
+ ```
83
+ Analyzing SiC composition and hexagonal crystal structure (P63mc)...
84
+ Electronic properties: 3.26 eV indirect bandgap indicates wide-bandgap semiconductor behavior...
85
+ Thermodynamic stability: -0.73 eV/atom formation energy shows strong bonding...
86
+ Mechanical properties: High elastic modulus (448 GPa) suggests exceptional stiffness...
87
+ Thermal behavior: 490 W/m·K conductivity ideal for heat dissipation applications...
88
+ ```
89
 
90
+ **Structured Analysis (`<answer>` section)**
91
+ ```
92
+ **SiC Materials Analysis (ID: mp-8062)**
93
+
94
+ **Composition & Structure**
95
+ - Silicon carbide with hexagonal crystal structure (P63mc space group)
96
+ - High symmetry configuration with anisotropic properties
97
+
98
+ **Electronic Characteristics**
99
+ - Wide bandgap semiconductor (3.26 eV, indirect)
100
+ - Suitable for high-power and optoelectronic applications
101
+
102
+ **Stability & Performance**
103
+ - Thermodynamically stable (-0.73 eV/atom formation energy)
104
+ - Exceptional mechanical stiffness (448 GPa elastic modulus)
105
+ - Outstanding thermal management (490 W/m·K conductivity)
106
+
107
+ **Recommended Applications**
108
+ - High-power electronics and devices
109
+ - Thermal management systems
110
+ - Optoelectronic components
111
+ - Abrasion-resistant coatings
112
+
113
+ **Key Advantages**
114
+ Superior combination of thermal, mechanical, and electronic properties makes SiC ideal for demanding high-temperature and high-power applications.
115
+ ```
116
 
117
  ## Repository Contents
118
 
119
+ ```
120
+ MaterialsAnalyst-AI-7B/
121
+ ├── Model_Weights/
122
+ │ ├── llama.cpp/ # LLaMA.cpp compatible weights (.gguf format)
123
+ │ ├── safetensors/ # SafeTensors format models
124
+ │ └── LoRA_adapter/ # LoRA adapter weights
125
+ ├── Scripts/
126
+ │ ├── Inference_llama.cpp.py # LLaMA.cpp deployment script
127
+ │ └── Inference_safetensors.py # SafeTensors deployment script
128
+ ├── Data/
129
+ │ └── Train-Ready.jsonl # Complete training dataset
130
+ ├── Training/
131
+ │ └── Training_Logs.txt # Training process logs
132
+ └── README.md
133
+ ```
134
+
135
+ ## Technical Specifications
136
 
137
+ **Base Architecture**
138
+ - **Foundation Model**: Qwen 2.5 Instruct 7B
139
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
140
+ - **Parameters**: 7 billion parameters
141
 
142
+ **Training Details**
143
+ - **Infrastructure**: Single NVIDIA A100 SXM4 GPU
144
+ - **Training Duration**: ~5.4 hours
145
+ - **Dataset Size**: 6,000 samples (6.4M tokens)
146
+ - **Average Sample Length**: 1,074 tokens
147
+ - **Data Generation**: DeepSeekV3 API
148
 
149
+ **Supported Formats**
150
+ - Materials Project database format
151
+ - AFLOW database format
152
+ - Custom JSON materials data
153
+ - Hugging Face Transformers integration
154
 
155
+ ## Installation & Requirements
156
+
157
+ ### Basic Requirements
158
+ ```bash
159
+ pip install torch transformers accelerate
160
+ pip install safetensors
161
+ pip install numpy pandas
162
+ ```
163
+
164
+ ### For CUDA GPU Support
165
+ If you have NVIDIA GPUs with CUDA support:
166
+ ```bash
167
+ # Install PyTorch with CUDA support (replace cu118 with your CUDA version)
168
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
169
+
170
+ # For faster inference with GPU acceleration
171
+ pip install bitsandbytes
172
+ ```
173
+
174
+ ### For LLaMA.cpp Deployment
175
+ ```bash
176
+ # Install llama-cpp-python for optimized CPU/GPU inference
177
+ pip install llama-cpp-python
178
+
179
+ # For GPU acceleration with llama.cpp
180
+ CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --no-cache-dir
181
+ ```
182
+
183
+ ### Optional Dependencies
184
+ ```bash
185
+ # For advanced materials data processing
186
+ pip install pymatgen
187
+ pip install matminer
188
+ pip install ase # Atomic Simulation Environment
189
+ ```
190
+
191
+ ## Quick Start
192
+
193
+ ### Option 1: SafeTensors (Recommended)
194
+ ```bash
195
+ python Scripts/Inference_safetensors.py
196
+ ```
197
+
198
+ ### Option 2: LLaMA.cpp (CPU Optimized)
199
+ ```bash
200
+ python Scripts/Inference_llama.cpp.py
201
+ ```
202
+
203
+ Both scripts include example SiC data - simply edit the `JSON_INPUT` variable with your materials data.
204
+
205
+ ## Getting Started
206
+
207
+ 1. **Install dependencies**
208
+ ```bash
209
+ pip install torch transformers accelerate safetensors
210
+ # For LLaMA.cpp option:
211
+ pip install llama-cpp-python
212
+ ```
213
+
214
+ 2. **Clone or download this repository**
215
+ ```bash
216
+ git clone https://huggingface.co/your-username/MaterialsAnalyst-AI-7B
217
+ cd MaterialsAnalyst-AI-7B
218
+ ```
219
+
220
+ 3. **Run the provided scripts**
221
+
222
+ **For SafeTensors deployment:**
223
+ ```bash
224
+ python Scripts/Inference_safetensors.py
225
+ ```
226
+
227
+ **For LLaMA.cpp deployment:**
228
+ ```bash
229
+ python Scripts/Inference_llama.cpp.py
230
+ ```
231
+
232
+ 4. **Customize your analysis**
233
+ - Edit the `JSON_INPUT` variable in either script with your materials data
234
+ - Modify the `model_path` variable to point to your model files
235
+ - Adjust generation parameters as needed
236
+
237
+ 5. **Input your materials data**
238
+ - Replace the example SiC data with your material properties
239
+ - Common sources: Materials Project, AFLOW, DFT calculations, experimental databases
240
+
241
+ ## License
242
+
243
+ This project is licensed under the Apache 2.0 License.
244
+
245
+ ## Citation
246
+
247
+ If you use MaterialsAnalyst-AI-7B in your research, please cite:
248
+
249
+ ```bibtex
250
+ @software{materialsanalyst_ai_7b,
251
+ title={MaterialsAnalyst-AI-7B: Specialized AI for Materials Analysis},
252
+ author={Mike and Oregon State University Materials Modeling and Development Group},
253
+ year={2024},
254
+ license={Apache-2.0}
255
+ }
256
+ ```
257
 
258
+ **Developed by**: Mike in collaboration with Oregon State University Materials Modeling and Development Group