Zwounds commited on
Commit
218a680
Β·
verified Β·
1 Parent(s): 38729c7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +182 -15
README.md CHANGED
@@ -1,22 +1,189 @@
1
  ---
2
- base_model: boolean_model_merged
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - llama
8
- - trl
9
- license: apache-2.0
10
- language:
11
- - en
12
  ---
13
 
14
- # Uploaded model
15
 
16
- - **Developed by:** Zwounds
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** boolean_model_merged
19
 
20
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  tags:
3
+ - transformers
4
+ - llama
5
+ - boolean-search
6
+ - search
7
+ - language-to-query
8
+ library_name: transformers
9
+ pipeline_tag: text2text-generation
10
+ license: llama2
11
  ---
12
 
13
+ # Boolean Search Query Model
14
 
15
+ Convert natural language queries into proper boolean search expressions for academic databases. This model helps researchers and librarians create properly formatted boolean search queries from natural language descriptions.
 
 
16
 
17
+ ## Features
18
 
19
+ - Converts natural language to boolean search expressions
20
+ - Handles multi-word terms correctly with quotes
21
+ - Removes meta-terms (articles, papers, research, etc.)
22
+ - Groups OR clauses appropriately
23
+ - Minimal, clean formatting
24
+
25
+ ## Installation
26
+
27
+ ```bash
28
+ pip install transformers torch unsloth
29
+ ```
30
+
31
+ ```python
32
+ from unsloth import FastLanguageModel
33
+
34
+ model, tokenizer = FastLanguageModel.from_pretrained(
35
+ "Zwounds/boolean-search-model",
36
+ max_seq_length=2048,
37
+ dtype=None, # Auto-detect
38
+ load_in_4bit=True
39
+ )
40
+ FastLanguageModel.for_inference(model)
41
+ ```
42
+
43
+ ## Quick Start
44
+
45
+ ```python
46
+ # Format your query
47
+ query = "Find papers about climate change and renewable energy"
48
+ prompt = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
49
+
50
+ ### Instruction:
51
+ Convert this natural language query into a boolean search query by following these rules:
52
+
53
+ 1. FIRST: Remove all meta-terms from this list (they should NEVER appear in output):
54
+ - articles, papers, research, studies
55
+ - examining, investigating, analyzing
56
+ - findings, documents, literature
57
+ - publications, journals, reviews
58
+ Example: "Research examining X" β†’ just "X"
59
+
60
+ 2. SECOND: Remove generic implied terms that don't add search value:
61
+ - Remove words like "practices," "techniques," "methods," "approaches," "strategies"
62
+ - Remove words like "impacts," "effects," "influences," "role," "applications"
63
+ - For example: "sustainable agriculture practices" β†’ "sustainable agriculture"
64
+ - For example: "teaching methodologies" β†’ "teaching"
65
+ - For example: "leadership styles" β†’ "leadership"
66
+
67
+ 3. THEN: Format the remaining terms:
68
+ CRITICAL QUOTING RULES:
69
+ - Multi-word phrases MUST ALWAYS be in quotes - NO EXCEPTIONS
70
+ - Examples of correct quoting:
71
+ - Wrong: machine learning AND deep learning
72
+ - Right: "machine learning" AND "deep learning"
73
+ - Wrong: natural language processing
74
+ - Right: "natural language processing"
75
+ - Single words must NEVER have quotes (e.g., science, research, learning)
76
+ - Use AND to connect required concepts
77
+ - Use OR with parentheses for alternatives (e.g., ("soil health" OR biodiversity))
78
+
79
+ Example conversions showing proper quoting:
80
+ "Research on machine learning for natural language processing"
81
+ β†’ "machine learning" AND "natural language processing"
82
+
83
+ "Studies examining anxiety depression stress in workplace"
84
+ β†’ (anxiety OR depression OR stress) AND workplace
85
+
86
+ "Articles about deep learning impact on computer vision"
87
+ β†’ "deep learning" AND "computer vision"
88
+
89
+ "Research on sustainable agriculture practices and their impact on soil health or biodiversity"
90
+ β†’ "sustainable agriculture" AND ("soil health" OR biodiversity)
91
+
92
+ "Articles about effective teaching methods for second language acquisition"
93
+ β†’ teaching AND "second language acquisition"
94
+
95
+ ### Input:
96
+ {query}
97
+
98
+ ### Response:
99
+ """
100
+
101
+ # Generate boolean query
102
+ inputs = tokenizer(prompt, return_tensors="pt")
103
+ outputs = model.generate(**inputs, max_new_tokens=100)
104
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
105
+ print(result) # "climate change" AND "renewable energy"
106
+ ```
107
+
108
+ ## Examples
109
+
110
+ Input queries and their boolean translations:
111
+
112
+ 1. Natural: "Studies about anxiety depression stress in workplace"
113
+ - Boolean: (anxiety OR depression OR stress) AND workplace
114
+
115
+ 2. Natural: "Articles about artificial intelligence ethics and regulation or policy"
116
+ - Boolean: "artificial intelligence" AND (ethics OR regulation OR policy)
117
+
118
+ 3. Natural: "Research on quantum computing applications in cryptography or optimization"
119
+ - Boolean: "quantum computing" AND (cryptography OR optimization)
120
+
121
+ ## Rules
122
+
123
+ The model follows these formatting rules:
124
+
125
+ 1. Meta-terms are removed:
126
+ - "articles", "papers", "research", "studies"
127
+ - Focus on actual search concepts
128
+
129
+ 2. Quotes only for multi-word terms:
130
+ - "artificial intelligence" AND ethics βœ“
131
+ - NOT: "ethics" AND "ai" βœ—
132
+
133
+ 3. Logical grouping:
134
+ - Use parentheses for OR groups
135
+ - (x OR y) AND z
136
+
137
+ 4. Minimal formatting:
138
+ - No unnecessary parentheses
139
+ - No repeated terms
140
+
141
+ ## Local Development
142
+
143
+ ```bash
144
+ # Clone repo
145
+ git clone https://github.com/your-username/boolean-search-model.git
146
+ cd boolean-search-model
147
+
148
+ # Install dependencies
149
+ pip install -r requirements.txt
150
+
151
+ # Run tests
152
+ python test_boolean_model.py
153
+ ```
154
+
155
+ ## Contributing
156
+
157
+ 1. Fork the repository
158
+ 2. Create your feature branch
159
+ 3. Add tests for any new functionality
160
+ 4. Submit a pull request
161
+
162
+ ## Model Card
163
+
164
+ See [MODEL_CARD.md](MODEL_CARD.md) for detailed model information including:
165
+ - Training data details
166
+ - Performance metrics
167
+ - Limitations
168
+ - Intended use cases
169
+
170
+ ## License
171
+
172
+ This model is subject to the Llama 2 license. See the [LICENSE](LICENSE) file for details.
173
+
174
+ ## Citation
175
+
176
+ If you use this model in your research, please cite:
177
+ ```bibtex
178
+ @misc{boolean-search-llm,
179
+ title={Boolean Search Query LLM},
180
+ author={Stephen Zweibel},
181
+ year={2025},
182
+ publisher={Hugging Face},
183
+ url={https://huggingface.co/Zwounds/boolean-search-model}
184
+ }
185
+ ```
186
+
187
+ ## Contact
188
+
189
+ Stephen Zweibel - [@szweibel](https://github.com/szweibel)