Raymond-dev-546730 commited on
Commit
066a27b
·
verified ·
1 Parent(s): 472634e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +296 -122
README.md CHANGED
@@ -1,154 +1,328 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
 
5
- ![Logo](Logo.png)
6
 
7
- # Introducing MaterialsAnalyst-AI-7B:
8
 
9
- A specialized **open-source** AI model designed to assist materials scientists and researchers in **comprehensive analysis** and interpretation of materials data. Built on Qwen 2.5 Instruct 7B and fine-tuned with LoRA (Low-Rank Adaptation), MaterialsAnalyst-AI-7B is optimized to **analyze materials properties** and provide clear, actionable insights from complex materials databases.
10
 
11
- ## How It Works
12
 
13
- The process is *beautifully* simple:
14
 
15
- 1. You input materials data (JSON format with properties, structure, and characteristics)
16
- 2. The model engages in chain-of-thought reasoning about the material's properties
17
- 3. You receive a structured, comprehensive analysis with practical applications
18
 
19
- ## Features
 
 
 
 
 
20
 
21
- MaterialsAnalyst-AI-7B offers a comprehensive suite of capabilities tailored specifically for materials analysis:
22
 
23
- * **Dual-Output Structure**: Provides both detailed chain-of-thought reasoning tokens and concise answer tokens
24
- * **Multi-Property Analysis**: Trained on diverse materials properties including electronic, mechanical, thermal, structural, and magnetic characteristics
25
- * **Comprehensive Materials Characterization**: Excels at interpreting structural, compositional, and phase relationships across diverse material types
26
- * **Property Correlation**: Identifies relationships between different material properties and their implications
27
- * **Application Prediction**: Suggests practical applications based on material characteristics
28
- * **Stability Assessment**: Evaluates thermodynamic and structural stability indicators
29
- * **Performance Benchmarking**: Compares materials against industry standards and competing materials
30
- * **Materials Database Integration**: Optimized for standard materials database formats (Materials Project, AFLOW, etc.)
31
- * **Structured Output Format**: Consistently delivers well-organized, hierarchical materials analysis with clear section delineation
32
 
33
  ## Use Cases
34
 
35
- MaterialsAnalyst-AI-7B serves as a valuable tool for:
 
 
 
 
36
 
37
- * **Materials scientists and engineers** needing comprehensive property analysis
38
- * **Graduate students and researchers** learning materials characterization
39
- * **R&D teams** screening materials for specific applications
40
- * **Academic researchers** analyzing large materials datasets
41
- * **Industry professionals** evaluating material selection for products
42
- * **Database curators** ensuring comprehensive materials documentation
43
- * **Computational materials scientists** interpreting simulation results
44
- * **Materials informatics researchers** building automated analysis pipelines
45
 
46
- ## See It In Action:
 
 
 
47
 
48
- Input Example:
49
 
 
50
  ```json
51
  {
52
- "material_id": "mp-8062",
53
- "formula": "SiC",
54
- "elements": [
55
- "Si",
56
- "C"
57
- ],
58
- "spacegroup": "P63mc",
59
- "band_gap": 3.26,
60
- "formation_energy_per_atom": -0.73,
61
- "density": 3.21,
62
- "volume": 41.2,
63
- "nsites": 8,
64
- "is_stable": true,
65
- "elastic_modulus": 448,
66
- "bulk_modulus": 220,
67
- "thermal_expansion": 4.2e-06,
68
- "electron_affinity": 4.0,
69
- "ionization_energy": 6.7,
70
- "crystal_system": "Hexagonal",
71
- "magnetic_property": "Non-magnetic",
72
- "thermal_conductivity": 490,
73
- "specific_heat": 0.69,
74
- "is_superconductor": false,
75
- "band_gap_type": "Indirect"
76
  }
77
  ```
78
- Output Example:
79
-
80
- The model produces two key components:
81
-
82
- ### The Thinking Process
83
- ```markdown
84
- <think>
85
- I need to think through how to interpret this SiC entry and assess its characteristics.
86
- First, I should examine the composition and structure. This material contains silicon (Si) and carbon (C) in a hexagonal crystal structure with space group P63mc. This structure suggests high symmetry and anisotropic properties, typical for wide-bandgap materials like SiC.
87
- Next, I should analyze the electronic properties. With a band gap of 3.26 eV that is indirect, this material would be classified as a wide-bandgap semiconductor. This suggests applications in optoelectronics, high-power electronics, and transparent conductive oxides.
88
- For thermodynamic stability, the formation energy per atom of -0.73 eV indicates strong bonding and thermodynamic favorability. The material is stable according to the data, which means it can be synthesized and used under practical conditions without decomposition.
89
- The mechanical properties show a high elastic modulus (448 GPa) and bulk modulus (220 GPa). This suggests the material would be very stiff and resistant to deformation when subjected to stress or pressure.
90
- Regarding thermal behavior, the thermal conductivity of 490 W/m·K and expansion coefficient of 4.2e-6 K^-1 indicate excellent heat dissipation and dimensional stability under temperature changes. This would make it suitable for high-temperature applications like thermal management in electronics or coatings.
91
- The magnetic properties are listed as non-magnetic, which means it won't interact significantly with magnetic fields. This would affect applications in magneto-electronics but is generally beneficial for general semiconductor uses.
92
- Additionally, the material is marked as not superconducting, which is significant because it rules out cryogenic or quantum computing applications where superconductivity is required.
93
- Considering all these properties together, this material would be particularly valuable for high-power electronics, optoelectronics, and thermal management due to its combination of wide bandgap, high thermal conductivity, and mechanical robustness. It might compete with other materials like GaN or AlN but offers advantages in cost (Si-based) and thermal stability.
94
- </think>
95
  ```
96
 
97
- ### The Structured Materials Analysis
98
- ```markdown
99
- <answer>
100
- **Summary of SiC (ID: mp-8062):**
101
- 1. **Material Type & Composition**
102
- - Silicon carbide (SiC), a widely studied wide-bandgap semiconductor compound.
103
- 2. **Crystal Structure**
104
- - Hexagonal structure with space group P63mc, indicating anisotropic properties and high symmetry.
105
- 3. **Electronic Properties**
106
- - Indirect bandgap of 3.26 eV, suitable for optoelectronics and high-power devices.
107
- 4. **Thermodynamic Stability**
108
- - Highly stable (formation energy: -0.73 eV/atom), making it practical for synthesis and use.
109
- 5. **Mechanical Properties**
110
- - Exceptional stiffness (elastic modulus: 448 GPa) and resistance to compression (bulk modulus: 220 GPa).
111
- 6. **Thermal Properties**
112
- - Outstanding thermal conductivity (490 W/m·K) and low expansion coefficient (4.2e-6 K^-1), ideal for heat dissipation.
113
- 7. **Magnetic & Conductive Properties**
114
- - Non-magnetic and non-superconducting, limiting use in magnetic applications but beneficial for general electronics.
115
- 8. **Applications**
116
- - High-power electronics, optoelectronics, thermal management systems, and abrasion-resistant coatings.
117
- **Verdict**: SiC is a high-performance material with exceptional thermal, mechanical, and electronic properties, making it ideal for demanding applications like power devices and high-temperature environments. Its stability and robustness give it an edge over competing wide-bandgap materials.
118
- </answer>
119
  ```
 
120
 
121
- ## What's Included
 
 
122
 
123
- This repository contains everything you need to use and understand MaterialsAnalyst-AI-7B:
 
 
124
 
125
- * **Model_Weights/** - All model weights in various formats
126
- * **llama.cpp/** - LLaMA.cpp compatible weights with various quantization options available
127
- * **safetensors/** - SafeTensors format models
128
- * **LoRA_adapter/** - LoRA adapter weights
129
- * **Scripts/** - Ready-to-use inference scripts
130
- * **Inference_llama.cpp.py** - For LLaMA.cpp deployment
131
- * **Inference_safetensors.py** - For SafeTensors deployment
132
- * **Data/** - Training data
133
- * **Train-Ready.jsonl** - Complete JSONL training dataset
134
- * **Training/** - Training terminal logs
135
- * **Training_Logs.txt** - Complete terminal logs from the training process
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
 
137
- ## Model Training Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
- * **Base Model**: Qwen 2.5 Instruct 7B
140
- * **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
141
- * **Training Infrastructure**: Single NVIDIA A100 SXM4 GPU
142
- * **Training Duration**: Around 5.4 hours
143
- * **Training Dataset**: Custom curated dataset specifically for materials analysis
144
- * **Total Token Count**: 6,441,671
145
- * **Total Sample Count**: 6,000
146
- * **Average Tokens Per Sample**: 1073.61
147
- * **Dataset Creation**: Generated using DeepSeekV3 API
148
 
149
- ## Attribution
150
 
151
- MaterialsAnalyst-AI-7B was developed by Raymond Lee. If you use this model in your work, please include a reference to this repository. As of **June 3, 2025**, this model has been downloaded **0** times. Thank you for your interest and support!
 
 
152
 
153
- *Download statistics are manually updated as HuggingFace doesn't display this metric publicly. Visit this repository periodically for the latest metrics.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
 
 
 
1
+ # MaterialsAnalyst-AI-7B
 
 
2
 
3
+ ![MaterialsAnalyst-AI Logo](Logo.png)
4
 
5
+ A specialized **open-source** AI model designed to assist materials scientists and researchers in comprehensive analysis and interpretation of materials data. Built on Qwen 2.5 Instruct 7B and fine-tuned with LoRA (Low-Rank Adaptation), MaterialsAnalyst-AI-7B delivers expert-level materials property analysis and actionable insights from complex materials databases.
6
 
7
+ 🤗 **Available on Hugging Face**: [MaterialsAnalyst-AI-7B](https://huggingface.co/your-username/MaterialsAnalyst-AI-7B)
8
 
9
+ ## Overview
10
 
11
+ MaterialsAnalyst-AI-7B transforms raw materials data into comprehensive, structured analyses through advanced chain-of-thought reasoning. The model excels at interpreting relationships between material properties, predicting applications, and providing clear insights that accelerate materials research and development.
12
 
13
+ ### Key Capabilities
 
 
14
 
15
+ - **Multi-Property Analysis**: Interprets electronic, mechanical, thermal, structural, and magnetic characteristics
16
+ - **Property Correlation**: Identifies relationships between different material properties and their implications
17
+ - **Application Prediction**: Suggests practical applications based on material characteristics
18
+ - **Stability Assessment**: Evaluates thermodynamic and structural stability indicators
19
+ - **Performance Benchmarking**: Compares materials against industry standards
20
+ - **Structured Reasoning**: Provides both detailed analysis and concise conclusions
21
 
22
+ ## How It Works
23
 
24
+ 1. **Input**: Provide materials data in JSON format with properties, structure, and characteristics
25
+ 2. **Analysis**: The model performs chain-of-thought reasoning about material properties and relationships
26
+ 3. **Output**: Receive structured analysis with practical insights and application recommendations
 
 
 
 
 
 
27
 
28
  ## Use Cases
29
 
30
+ **Research & Development**
31
+ - Materials screening for specific applications
32
+ - Property correlation analysis
33
+ - Comparative materials assessment
34
+ - Database curation and documentation
35
 
36
+ **Education & Training**
37
+ - Graduate student research support
38
+ - Materials characterization learning
39
+ - Computational results interpretation
 
 
 
 
40
 
41
+ **Industry Applications**
42
+ - Material selection for product development
43
+ - R&D pipeline automation
44
+ - Technical documentation generation
45
 
46
+ ## Example Analysis
47
 
48
+ ### Input Data
49
  ```json
50
  {
51
+ "material_id": "mp-8062",
52
+ "formula": "SiC",
53
+ "elements": ["Si", "C"],
54
+ "spacegroup": "P63mc",
55
+ "band_gap": 3.26,
56
+ "formation_energy_per_atom": -0.73,
57
+ "density": 3.21,
58
+ "volume": 41.2,
59
+ "nsites": 8,
60
+ "is_stable": true,
61
+ "elastic_modulus": 448,
62
+ "bulk_modulus": 220,
63
+ "thermal_expansion": 4.2e-06,
64
+ "electron_affinity": 4.0,
65
+ "ionization_energy": 6.7,
66
+ "crystal_system": "Hexagonal",
67
+ "magnetic_property": "Non-magnetic",
68
+ "thermal_conductivity": 490,
69
+ "specific_heat": 0.69,
70
+ "is_superconductor": false,
71
+ "band_gap_type": "Indirect"
 
 
 
72
  }
73
  ```
74
+
75
+ ### Model Output
76
+
77
+ The model provides dual-structured output:
78
+
79
+ **Reasoning Process (`<think>` section)**
80
+ ```
81
+ Analyzing SiC composition and hexagonal crystal structure (P63mc)...
82
+ Electronic properties: 3.26 eV indirect bandgap indicates wide-bandgap semiconductor behavior...
83
+ Thermodynamic stability: -0.73 eV/atom formation energy shows strong bonding...
84
+ Mechanical properties: High elastic modulus (448 GPa) suggests exceptional stiffness...
85
+ Thermal behavior: 490 W/m·K conductivity ideal for heat dissipation applications...
 
 
 
 
 
86
  ```
87
 
88
+ **Structured Analysis (`<answer>` section)**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  ```
90
+ **SiC Materials Analysis (ID: mp-8062)**
91
 
92
+ **Composition & Structure**
93
+ - Silicon carbide with hexagonal crystal structure (P63mc space group)
94
+ - High symmetry configuration with anisotropic properties
95
 
96
+ **Electronic Characteristics**
97
+ - Wide bandgap semiconductor (3.26 eV, indirect)
98
+ - Suitable for high-power and optoelectronic applications
99
 
100
+ **Stability & Performance**
101
+ - Thermodynamically stable (-0.73 eV/atom formation energy)
102
+ - Exceptional mechanical stiffness (448 GPa elastic modulus)
103
+ - Outstanding thermal management (490 W/m·K conductivity)
104
+
105
+ **Recommended Applications**
106
+ - High-power electronics and devices
107
+ - Thermal management systems
108
+ - Optoelectronic components
109
+ - Abrasion-resistant coatings
110
+
111
+ **Key Advantages**
112
+ Superior combination of thermal, mechanical, and electronic properties makes SiC ideal for demanding high-temperature and high-power applications.
113
+ ```
114
+
115
+ ## Repository Contents
116
+
117
+ ```
118
+ MaterialsAnalyst-AI-7B/
119
+ ├── Model_Weights/
120
+ │ ├── llama.cpp/ # LLaMA.cpp compatible weights (.gguf format)
121
+ │ ├── safetensors/ # SafeTensors format models
122
+ │ └── LoRA_adapter/ # LoRA adapter weights
123
+ ├── Scripts/
124
+ │ ├── Inference_llama.cpp.py # LLaMA.cpp deployment script
125
+ │ └── Inference_safetensors.py # SafeTensors deployment script
126
+ ├── Data/
127
+ │ └── Train-Ready.jsonl # Complete training dataset
128
+ ├── Training/
129
+ │ └── Training_Logs.txt # Training process logs
130
+ └── README.md
131
+ ```
132
 
133
+ ## Technical Specifications
134
+
135
+ **Base Architecture**
136
+ - **Foundation Model**: Qwen 2.5 Instruct 7B
137
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
138
+ - **Parameters**: 7 billion parameters
139
+
140
+ **Training Details**
141
+ - **Infrastructure**: Single NVIDIA A100 SXM4 GPU
142
+ - **Training Duration**: ~5.4 hours
143
+ - **Dataset Size**: 6,000 samples (6.4M tokens)
144
+ - **Average Sample Length**: 1,074 tokens
145
+ - **Data Generation**: DeepSeekV3 API
146
+
147
+ **Supported Formats**
148
+ - Materials Project database format
149
+ - AFLOW database format
150
+ - Custom JSON materials data
151
+ - Hugging Face Transformers integration
152
+
153
+ ## Installation & Requirements
154
+
155
+ ### Basic Requirements
156
+ ```bash
157
+ pip install torch transformers accelerate
158
+ pip install safetensors
159
+ pip install numpy pandas
160
+ ```
161
+
162
+ ### For CUDA GPU Support
163
+ If you have NVIDIA GPUs with CUDA support:
164
+ ```bash
165
+ # Install PyTorch with CUDA support (replace cu118 with your CUDA version)
166
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
167
+
168
+ # For faster inference with GPU acceleration
169
+ pip install bitsandbytes
170
+ ```
171
+
172
+ ### For LLaMA.cpp Deployment
173
+ ```bash
174
+ # Install llama-cpp-python for optimized CPU/GPU inference
175
+ pip install llama-cpp-python
176
+
177
+ # For GPU acceleration with llama.cpp
178
+ CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --no-cache-dir
179
+ ```
180
+
181
+ ### Optional Dependencies
182
+ ```bash
183
+ # For advanced materials data processing
184
+ pip install pymatgen
185
+ pip install matminer
186
+ pip install ase # Atomic Simulation Environment
187
+ ```
188
 
189
+ ## Quick Start
 
 
 
 
 
 
 
 
190
 
191
+ ### Option 1: Using Hugging Face Transformers (Recommended)
192
 
193
+ ```python
194
+ from transformers import AutoModelForCausalLM, AutoTokenizer
195
+ import torch
196
 
197
+ # Load model and tokenizer
198
+ model_name = "your-username/MaterialsAnalyst-AI-7B"
199
+ model = AutoModelForCausalLM.from_pretrained(
200
+ model_name,
201
+ torch_dtype=torch.float16,
202
+ device_map="auto",
203
+ trust_remote_code=True
204
+ )
205
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
206
+
207
+ # Prepare your materials data
208
+ materials_data = """
209
+ {
210
+ "material_id": "mp-8062",
211
+ "formula": "SiC",
212
+ "elements": ["Si", "C"],
213
+ "spacegroup": "P63mc",
214
+ "band_gap": 3.26,
215
+ "formation_energy_per_atom": -0.73,
216
+ "density": 3.21,
217
+ "elastic_modulus": 448,
218
+ "bulk_modulus": 220,
219
+ "thermal_conductivity": 490,
220
+ "crystal_system": "Hexagonal",
221
+ "magnetic_property": "Non-magnetic"
222
+ }
223
+ """
224
+
225
+ # Generate analysis
226
+ prompt = f"USER: {materials_data}\nASSISTANT:"
227
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
228
+ outputs = model.generate(
229
+ **inputs,
230
+ max_new_tokens=3000,
231
+ temperature=0.7,
232
+ top_p=0.9,
233
+ repetition_penalty=1.1,
234
+ do_sample=True
235
+ )
236
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
237
+ print(response.split("ASSISTANT:")[-1].strip())
238
+ ```
239
+
240
+ ### Option 2: Using LLaMA.cpp (For CPU/Optimized Inference)
241
+
242
+ ```python
243
+ from llama_cpp import Llama
244
+
245
+ # Load model (download .gguf file from the repo)
246
+ model_path = "path/to/MaterialsAnalyst-AI-7B.gguf"
247
+ llm = Llama(
248
+ model_path=model_path,
249
+ n_gpu_layers=29, # Adjust based on your GPU memory
250
+ n_ctx=10000,
251
+ n_threads=4
252
+ )
253
+
254
+ # Prepare your materials data
255
+ materials_data = """
256
+ {
257
+ "material_id": "mp-8062",
258
+ "formula": "SiC",
259
+ "elements": ["Si", "C"],
260
+ "spacegroup": "P63mc",
261
+ "band_gap": 3.26,
262
+ "formation_energy_per_atom": -0.73,
263
+ "density": 3.21,
264
+ "elastic_modulus": 448,
265
+ "bulk_modulus": 220,
266
+ "thermal_conductivity": 490,
267
+ "crystal_system": "Hexagonal",
268
+ "magnetic_property": "Non-magnetic"
269
+ }
270
+ """
271
+
272
+ # Generate analysis
273
+ prompt = f"USER: {materials_data}\nASSISTANT:"
274
+ output = llm(
275
+ prompt,
276
+ max_tokens=3000,
277
+ temperature=0.7,
278
+ top_p=0.9,
279
+ repeat_penalty=1.1
280
+ )
281
+ result = output.get("choices", [{}])[0].get("text", "").strip()
282
+ print(result)
283
+ ```
284
+
285
+ ## Getting Started
286
+
287
+ 1. **Install dependencies**
288
+ ```bash
289
+ pip install torch transformers accelerate safetensors
290
+ ```
291
+
292
+ 2. **Download the model**
293
+ - Option A: Use Hugging Face Hub (automatic download)
294
+ - Option B: Clone this repository for local files
295
+
296
+ 3. **Prepare your materials data**
297
+ - Format as JSON with material properties
298
+ - Include relevant structural, electronic, and mechanical data
299
+ - Common sources: Materials Project, AFLOW, DFT calculations, experimental databases
300
+
301
+ 4. **Run analysis**
302
+ - Use the provided scripts in `/Scripts/` folder
303
+ - Or integrate the code examples above into your workflow
304
+
305
+ 5. **Customize your analysis**
306
+ - Modify the JSON input with your specific materials data
307
+ - Adjust generation parameters (temperature, top_p) for different output styles
308
+
309
+ ## License
310
+
311
+ This project is licensed under the Apache 2.0 License.
312
+
313
+ ## Citation
314
+
315
+ If you use MaterialsAnalyst-AI-7B in your research, please cite:
316
+
317
+ ```bibtex
318
+ @software{materialsanalyst_ai_7b,
319
+ title={MaterialsAnalyst-AI-7B: Specialized AI for Materials Analysis},
320
+ author={Mike and Oregon State University Materials Modeling and Development Group},
321
+ year={2024},
322
+ license={Apache-2.0}
323
+ }
324
+ ```
325
+
326
+ ---
327
 
328
+ **Developed by**: Mike in collaboration with Oregon State University Materials Modeling and Development Group