Zwounds
/

boolean-search-model

@@ -1,22 +1,189 @@
 ---
-base_model: boolean_model_merged
 tags:
-- text-generation-inference
-- transformers
-- unsloth
-- llama
-- trl
-license: apache-2.0
-language:
-- en
 ---
-# Uploaded  model
-- **Developed by:** Zwounds
-- **License:** apache-2.0
-- **Finetuned from model :** boolean_model_merged
-This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
 tags:
+  - transformers
+  - llama
+  - boolean-search
+  - search
+  - language-to-query
+library_name: transformers
+pipeline_tag: text2text-generation
+license: llama2
 ---
+# Boolean Search Query Model
+Convert natural language queries into proper boolean search expressions for academic databases. This model helps researchers and librarians create properly formatted boolean search queries from natural language descriptions.
+## Features
+- Converts natural language to boolean search expressions
+- Handles multi-word terms correctly with quotes
+- Removes meta-terms (articles, papers, research, etc.)
+- Groups OR clauses appropriately
+- Minimal, clean formatting
+## Installation
+```bash
+pip install transformers torch unsloth
+```
+```python
+from unsloth import FastLanguageModel
+model, tokenizer = FastLanguageModel.from_pretrained(
+    "Zwounds/boolean-search-model",
+    max_seq_length=2048,
+    dtype=None,  # Auto-detect
+    load_in_4bit=True
+)
+FastLanguageModel.for_inference(model)
+```
+## Quick Start
+```python
+# Format your query
+query = "Find papers about climate change and renewable energy"
+prompt = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+### Instruction:
+Convert this natural language query into a boolean search query by following these rules:
+1. FIRST: Remove all meta-terms from this list (they should NEVER appear in output):
+   - articles, papers, research, studies
+   - examining, investigating, analyzing
+   - findings, documents, literature
+   - publications, journals, reviews
+   Example: "Research examining X" → just "X"
+2. SECOND: Remove generic implied terms that don't add search value:
+   - Remove words like "practices," "techniques," "methods," "approaches," "strategies"
+   - Remove words like "impacts," "effects," "influences," "role," "applications"
+   - For example: "sustainable agriculture practices" → "sustainable agriculture"
+   - For example: "teaching methodologies" → "teaching"
+   - For example: "leadership styles" → "leadership"
+3. THEN: Format the remaining terms:
+   CRITICAL QUOTING RULES:
+   - Multi-word phrases MUST ALWAYS be in quotes - NO EXCEPTIONS
+   - Examples of correct quoting:
+     - Wrong: machine learning AND deep learning
+     - Right: "machine learning" AND "deep learning"
+     - Wrong: natural language processing
+     - Right: "natural language processing"
+   - Single words must NEVER have quotes (e.g., science, research, learning)
+   - Use AND to connect required concepts
+   - Use OR with parentheses for alternatives (e.g., ("soil health" OR biodiversity))
+Example conversions showing proper quoting:
+"Research on machine learning for natural language processing"
+→ "machine learning" AND "natural language processing"
+"Studies examining anxiety depression stress in workplace"
+→ (anxiety OR depression OR stress) AND workplace
+"Articles about deep learning impact on computer vision"
+→ "deep learning" AND "computer vision"
+"Research on sustainable agriculture practices and their impact on soil health or biodiversity"
+→ "sustainable agriculture" AND ("soil health" OR biodiversity)
+"Articles about effective teaching methods for second language acquisition"
+→ teaching AND "second language acquisition"
+### Input:
+{query}
+### Response:
+"""
+# Generate boolean query
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=100)
+result = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(result)  # "climate change" AND "renewable energy"
+```
+## Examples
+Input queries and their boolean translations:
+1. Natural: "Studies about anxiety depression stress in workplace"
+   - Boolean: (anxiety OR depression OR stress) AND workplace
+2. Natural: "Articles about artificial intelligence ethics and regulation or policy"
+   - Boolean: "artificial intelligence" AND (ethics OR regulation OR policy)
+3. Natural: "Research on quantum computing applications in cryptography or optimization"
+   - Boolean: "quantum computing" AND (cryptography OR optimization)
+## Rules
+The model follows these formatting rules:
+1. Meta-terms are removed:
+   - "articles", "papers", "research", "studies"
+   - Focus on actual search concepts
+2. Quotes only for multi-word terms:
+   - "artificial intelligence" AND ethics ✓
+   - NOT: "ethics" AND "ai" ✗
+3. Logical grouping:
+   - Use parentheses for OR groups
+   - (x OR y) AND z
+4. Minimal formatting:
+   - No unnecessary parentheses
+   - No repeated terms
+## Local Development
+```bash
+# Clone repo
+git clone https://github.com/your-username/boolean-search-model.git
+cd boolean-search-model
+# Install dependencies
+pip install -r requirements.txt
+# Run tests
+python test_boolean_model.py
+```
+## Contributing
+1. Fork the repository
+2. Create your feature branch
+3. Add tests for any new functionality
+4. Submit a pull request
+## Model Card
+See [MODEL_CARD.md](MODEL_CARD.md) for detailed model information including:
+- Training data details
+- Performance metrics
+- Limitations
+- Intended use cases
+## License
+This model is subject to the Llama 2 license. See the [LICENSE](LICENSE) file for details.
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{boolean-search-llm,
+  title={Boolean Search Query LLM},
+  author={Stephen Zweibel},
+  year={2025},
+  publisher={Hugging Face},
+  url={https://huggingface.co/Zwounds/boolean-search-model}
+}
+```
+## Contact
+Stephen Zweibel - [@szweibel](https://github.com/szweibel)