Raymond-dev-546730 commited on
Commit
740fd6f
Β·
verified Β·
1 Parent(s): 7b68cf7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -64
README.md CHANGED
@@ -8,7 +8,7 @@ license: apache-2.0
8
 
9
  A specialized **open source** AI model designed to assist materials scientists and researchers in comprehensive analysis and interpretation of materials data. Built on Qwen 2.5 Instruct 7B and fine-tuned with LoRA (Low-Rank Adaptation), MaterialsAnalyst-AI-7B delivers expert-level materials property analysis and actionable insights from complex materials databases.
10
 
11
- ### Key Capabilities
12
 
13
  - **Multi-Property Analysis**: Interprets electronic, mechanical, thermal, structural, and magnetic characteristics
14
  - **Property Correlation**: Identifies relationships between different material properties and their implications
@@ -17,15 +17,32 @@ A specialized **open source** AI model designed to assist materials scientists a
17
  - **Performance Benchmarking**: Compares materials against industry standards
18
  - **Structured Reasoning**: Provides both detailed analysis and concise conclusions
19
 
20
- ## How It Works
21
 
22
- 1. **Input**: Provide materials data in JSON format with properties, structure, and characteristics
23
- 2. **Analysis**: The model performs chain-of-thought reasoning about material properties and relationships
24
- 3. **Output**: Receive structured analysis with practical insights and application recommendations
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- ## Example Analysis
 
 
 
 
 
27
 
28
  ### Input Data
 
29
  ```json
30
  {
31
  "material_id": "mp-8062",
@@ -56,7 +73,7 @@ A specialized **open source** AI model designed to assist materials scientists a
56
 
57
  The model provides dual-structured output:
58
 
59
- **Reasoning Process (`<think>` section)**
60
  ```
61
  Analyzing SiC composition and hexagonal crystal structure (P63mc)...
62
  Electronic properties: 3.26 eV indirect bandgap indicates wide-bandgap semiconductor behavior...
@@ -65,7 +82,7 @@ Mechanical properties: High elastic modulus (448 GPa) suggests exceptional stiff
65
  Thermal behavior: 490 W/mΒ·K conductivity ideal for heat dissipation applications...
66
  ```
67
 
68
- **Structured Analysis (`<answer>` section)**
69
  ```
70
  **SiC Materials Analysis (ID: mp-8062)**
71
 
@@ -94,67 +111,21 @@ Superior combination of thermal, mechanical, and electronic properties makes SiC
94
 
95
  ## Repository Contents
96
 
97
- ```
98
- MaterialsAnalyst-AI-7B/
99
- β”œβ”€β”€ Model_Weights/
100
- β”‚ β”œβ”€β”€ llama.cpp/ # LLaMA.cpp compatible weights (.gguf format)
101
- β”‚ β”œβ”€β”€ safetensors/ # SafeTensors format models
102
- β”‚ └── LoRA_adapter/ # LoRA adapter weights
103
- β”œβ”€β”€ Scripts/
104
- β”‚ β”œβ”€β”€ Inference_llama.cpp.py # LLaMA.cpp deployment script
105
- β”‚ └── Inference_safetensors.py # SafeTensors deployment script
106
- β”œβ”€β”€ Data/
107
- β”‚ └── Train-Ready.jsonl # Complete training dataset
108
- β”œβ”€β”€ Training/
109
- β”‚ └── Training_Logs.txt # Training process logs
110
- └── README.md
111
- ```
112
 
113
  ## Technical Specifications
114
 
115
- **Base Architecture**
116
- - **Foundation Model**: Qwen 2.5 Instruct 7B
117
- - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
118
- - **Parameters**: 7 billion parameters
119
 
120
  **Training Details**
121
- - **Infrastructure**: Single NVIDIA A100 SXM4 GPU
122
- - **Training Duration**: ~5.4 hours
123
- - **Dataset Size**: 6,000 samples (6.4M tokens)
124
- - **Average Sample Length**: 1,074 tokens
125
- - **Data Generation**: DeepSeekV3 API
126
-
127
-
128
- ## Getting Started
129
-
130
- 1. **Install dependencies**
131
- ```bash
132
- pip install torch transformers accelerate safetensors
133
- # For LLaMA.cpp option:
134
- pip install llama-cpp-python
135
- ```
136
-
137
- 2. **Run the provided scripts**
138
-
139
- **For SafeTensors deployment:**
140
- ```bash
141
- python Scripts/Inference_safetensors.py
142
- ```
143
-
144
- **For LLaMA.cpp deployment:**
145
- ```bash
146
- python Scripts/Inference_llama.cpp.py
147
- ```
148
-
149
- 3. **Customize your analysis**
150
- - Edit the `JSON_INPUT` variable in either script with your materials data
151
- - Modify the `model_path` variable to point to your model files
152
- - Adjust generation parameters as needed
153
-
154
- 4. **Input your materials data**
155
- - Replace the example SiC data with your material properties
156
- - Common sources: Materials Project, AFLOW, DFT calculations, experimental databases
157
-
158
 
159
  ## Citation
160
 
 
8
 
9
  A specialized **open source** AI model designed to assist materials scientists and researchers in comprehensive analysis and interpretation of materials data. Built on Qwen 2.5 Instruct 7B and fine-tuned with LoRA (Low-Rank Adaptation), MaterialsAnalyst-AI-7B delivers expert-level materials property analysis and actionable insights from complex materials databases.
10
 
11
+ ## Key Capabilities
12
 
13
  - **Multi-Property Analysis**: Interprets electronic, mechanical, thermal, structural, and magnetic characteristics
14
  - **Property Correlation**: Identifies relationships between different material properties and their implications
 
17
  - **Performance Benchmarking**: Compares materials against industry standards
18
  - **Structured Reasoning**: Provides both detailed analysis and concise conclusions
19
 
20
+ ## Quick Start
21
 
22
+ **Install dependencies:**
23
+ ```bash
24
+ pip install torch transformers accelerate safetensors
25
+ # For LLaMA.cpp option: pip install llama-cpp-python
26
+ ```
27
+
28
+ **Run analysis:**
29
+ ```bash
30
+ # SafeTensors deployment (recommended)
31
+ python Scripts/Inference_safetensors.py
32
+
33
+ # LLaMA.cpp deployment (CPU optimized)
34
+ python Scripts/Inference_llama.cpp.py
35
+ ```
36
 
37
+ **Customize your analysis:**
38
+ - Edit the `JSON_INPUT` variable in either script with your materials data
39
+ - Modify the `model_path` variable to point to your model files
40
+ - Common data sources: Materials Project, AFLOW, DFT calculations, experimental databases
41
+
42
+ ## Input/Output Format
43
 
44
  ### Input Data
45
+ Provide materials data as JSON with properties, structure, and characteristics:
46
  ```json
47
  {
48
  "material_id": "mp-8062",
 
73
 
74
  The model provides dual-structured output:
75
 
76
+ **Reasoning Process (`<think>` section)** - Step-by-step analysis:
77
  ```
78
  Analyzing SiC composition and hexagonal crystal structure (P63mc)...
79
  Electronic properties: 3.26 eV indirect bandgap indicates wide-bandgap semiconductor behavior...
 
82
  Thermal behavior: 490 W/mΒ·K conductivity ideal for heat dissipation applications...
83
  ```
84
 
85
+ **Structured Analysis (`<answer>` section)** - Comprehensive summary:
86
  ```
87
  **SiC Materials Analysis (ID: mp-8062)**
88
 
 
111
 
112
  ## Repository Contents
113
 
114
+ - **Scripts/** - Inference scripts for SafeTensors and LLaMA.cpp deployment
115
+ - **Model_Weights/** - Model files (.gguf, safetensors, LoRA adapter formats)
116
+ - **Data/** - Complete training dataset (Train-Ready.jsonl)
117
+ - **Training/** - Training process logs
 
 
 
 
 
 
 
 
 
 
 
118
 
119
  ## Technical Specifications
120
 
121
+ **Model Architecture**
122
+ - Foundation: Qwen 2.5 Instruct 7B (7 billion parameters)
123
+ - Fine-tuning: LoRA (Low-Rank Adaptation)
 
124
 
125
  **Training Details**
126
+ - Infrastructure: Single NVIDIA A100 SXM4 GPU (~5.4 hours)
127
+ - Dataset: 6,000 samples (6.4M tokens, avg 1,074 tokens/sample)
128
+ - Data Generation: DeepSeekV3 API
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
 
130
  ## Citation
131