taradutt007 commited on
Commit
53973c1
ยท
verified ยท
1 Parent(s): ce4c9f4

Upload 2 files

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. HEA_query.png +3 -0
  3. README.md +105 -11
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ HEA_query.png filter=lfs diff=lfs merge=lfs -text
HEA_query.png ADDED

Git LFS Details

  • SHA256: e77fa15bd668d08b08069ebd7a7f92422343dd4c173f451e614a9df54f7146f4
  • Pointer size: 131 Bytes
  • Size of remote file: 178 kB
README.md CHANGED
@@ -1,14 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: HEA Query
3
- emoji: ๐Ÿข
4
- colorFrom: gray
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 5.45.0
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: LLM-Powered assistant to query HEA papers and datasets
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ # ๐Ÿ”ฌ HEA Query
2
+
3
+ **LLM-Powered Research Assistant for High Entropy Alloys**
4
+ *Hackathon: LLM for Materials Science*
5
+
6
+ ---
7
+
8
+ ## ๐Ÿง  Summary
9
+
10
+ **HEA Query** is a research assistant built to enable intelligent access to both unstructured scientific literature and structured datasets related to **High Entropy Alloys (HEAs).**
11
+
12
+ We combine:
13
+
14
+ - ๐Ÿ” A semantic search engine over thousands of research paper chunks
15
+ - ๐Ÿ“Š Cleaned and normalized tabular datasets containing alloy properties
16
+ - ๐Ÿค– A powerful LLM (Mistral-7B) to generate natural language answers
17
+ - ๐Ÿ–ฅ๏ธ An interactive Gradio interface for easy exploration
18
+
19
+ This assistant helps materials scientists ask questions like:
20
+
21
+ > ๐Ÿงช _"Which FCC alloys have Vickers hardness above 200?"_
22
+ > ๐Ÿ“ˆ _"Show me alloys with high entropy mixing and low modulus."_
23
+ > ๐Ÿ“š _"What do recent papers say about yield strength trends in BCC HEAs?"_
24
+
25
+ ---
26
+
27
+ ## ๐Ÿš€ Features
28
+
29
+ - ๐Ÿ” **Semantic paper search** using FAISS + BAAI embeddings
30
+ - ๐Ÿ“Š **Smart dataset filtering** using canonical and synonymous property names
31
+ - ๐Ÿค– **LLM-based reasoning** via Mistral-7B
32
+ - ๐Ÿงพ **Unified prompt generation** from papers + structured data
33
+ - ๐Ÿ–ฅ๏ธ **Gradio app** with answer + tables + raw paper context
34
+
35
+ ---
36
+
37
+ ## ๐Ÿ“‚ Resources Used
38
+
39
+ ### ๐Ÿ“š Resource 1: Literature Corpus
40
+
41
+ - ~1800 open-access PDFs on HEAs
42
+ - Extracted sections: `abstract`, `introduction`, `methods`, `conclusion`
43
+ - Chunked with LangChain `RecursiveCharacterTextSplitter`
44
+ - Embedded with `BAAI/bge-base-en`
45
+ - Indexed using `FAISS`
46
+
47
+ ### ๐Ÿ“Š Resource 2: Structured Datasets
48
+
49
+ | Dataset | Description | Format |
50
+ |-----------|--------------------------------------------|---------------------|
51
+ | MPEA | Experimental data (HV, UTS, density, etc.) | `dataset1_clean.csv` |
52
+ | ML Pred | Design parameters + predictions | `dataset2_clean.csv` |
53
+ | Achief | Thermodynamic + phase data | `dataset3_clean.csv` |
54
+
55
+ โžก๏ธ Each dataset was cleaned, normalized, and reformatted for query integration.
56
+
57
+ ---
58
+
59
+ ## ๐Ÿง  Model Details
60
+
61
+ - **LLM**: `Mistral-7B-Instruct v0.3`
62
+ - **Embeddings**: `BAAI/bge-base-en`
63
+ - **Frameworks**: LangChain, Transformers, FAISS, Gradio
64
+
65
  ---
66
+
67
+ ## ๐Ÿ’ฌ How It Works
68
+
69
+ 1. User submits a question.
70
+ 2. The system:
71
+ - Uses **FAISS** to retrieve relevant paper chunks.
72
+ - Filters HEA datasets based on numeric or categorical queries.
73
+ - Constructs a unified prompt combining both results.
74
+ 3. **Mistral LLM** generates a structured, domain-aware response.
75
+
76
+ ### Outputs:
77
+ - ๐Ÿ“„ **LLM Answer**
78
+ - ๐Ÿ“Š **Matching rows from datasets**
79
+ - ๐Ÿ“š **Raw context from papers**
80
+
81
+ ---
82
+
83
+ ## ๐Ÿ–ฅ๏ธ Gradio Demo UI
84
+
85
+ | Panel | Description |
86
+ |---------------|---------------------------------------------|
87
+ | ๐Ÿง  LLM Answer | Natural language response from Mistral |
88
+ | ๐Ÿ“Š CSV Matches | Tabular results matching query filters |
89
+ | ๐Ÿ“š FAISS Context | Raw text from relevant research papers |
90
+
91
+ ---
92
+
93
+ ## ๐Ÿง‘โ€๐Ÿ’ป Team
94
+
95
+ - **Taradutt Pattnaik** โ€“ [University of Connecticut, Storrs]
96
+ - **Sanjeev K Nayak** โ€“ [University of Connecticut, Storrs]
97
+ - **Alexander Horvath** โ€“ [University of Connecticut, Storrs]
98
+
99
+ ---
100
+
101
+ ## ๐Ÿ“š Dataset References _(to be added)_
102
+
103
+ - [1] MPEA Dataset โ€“ *C. Borg, โ€œExpanded dataset of mechanical properties and observed phases of multi-principal element alloysโ€. figshare, 12-Jul-2020, doi: 10.6084/m9.figshare.12642953.v9*
104
+ - [2] ML Pred Dataset โ€“ *R. Machaka, G. T. Motsi, L. M. Raganya, P. M. Radingoana, and S. Chikosha, โ€œMachine learning-based prediction of phases in high-entropy alloys: A data article,โ€ Data in Brief, vol. 38, p. 107346, Oct. 2021, doi: https://doi.org/10.1016/j.dib.2021.107346.*
105
+ - [3] Achief Dataset โ€“ *C. E. Precker, A. Gregores Cotoand S. Muรญรฑos Landรญn, โ€œMaterials for Design Open Repository. High Entropy Alloysโ€. Zenodo, Aug. 03, 2021. doi: 10.5281/zenodo.5155150.*
106
+
107
  ---
108