Instructions to use EQuIP-Queries/EQuIP_3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EQuIP-Queries/EQuIP_3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EQuIP-Queries/EQuIP_3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EQuIP-Queries/EQuIP_3B")
model = AutoModelForCausalLM.from_pretrained("EQuIP-Queries/EQuIP_3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Local Apps Settings

vLLM

How to use EQuIP-Queries/EQuIP_3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EQuIP-Queries/EQuIP_3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EQuIP-Queries/EQuIP_3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EQuIP-Queries/EQuIP_3B

SGLang

How to use EQuIP-Queries/EQuIP_3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EQuIP-Queries/EQuIP_3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EQuIP-Queries/EQuIP_3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EQuIP-Queries/EQuIP_3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EQuIP-Queries/EQuIP_3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use EQuIP-Queries/EQuIP_3B with Docker Model Runner:
```
docker model run hf.co/EQuIP-Queries/EQuIP_3B
```

Improve language tag

by lbourdois - opened Apr 28, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+231

-219

Files changed (1) hide show

README.md +231 -219

README.md CHANGED Viewed

@@ -1,220 +1,232 @@
----
-library_name: transformers
-license: mit
-base_model:
-- Qwen/Qwen2.5-3B-Instruct
-language:
-- en
----
-# Model Card for EQuIP-Queries/EQuIP_3B
-An AI model that understands natural language and translates it into accurate Elasticsearch queries.
-This model is based on the Qwen2.5 3B architecture, a compact yet powerful language model known for its efficiency.
-We fine-tuned this model with 10,000 Elasticsearch query data points to specialize its ability to generate accurate and relevant queries.
-## Model Details
-### Model Description
-Our Solution: An AI-Powered Query Generator
-Our team has developed a solution to this challenge: an AI model that understands natural language and translates it into accurate Elasticsearch queries. This model is based on the Qwen2.5 3B architecture, a compact yet powerful language model known for its efficiency. We fine-tuned this model with 10,000 Elasticsearch query data points to specialize its ability to generate accurate and relevant queries.
-We've employed advanced techniques, including LoRA (Low-Rank Adaptation) to optimize the model for performance and efficiency. Specifically, LoRA reduces the number of trainable parameters by introducing low-rank matrices.
-This combination allows us to achieve high accuracy while minimizing computational resource requirements.
-Key Features and Benefits
-Natural Language Interface: Users can simply describe the data they're looking for in plain English, and the model will generate the corresponding Elasticsearch query.
-Increased Efficiency: Reduces the time and effort required to write complex queries, allowing users to focus on analyzing their data.
-Improved Accessibility: Makes Elasticsearch more accessible to a wider audience, including those who are not experts in its query language.
-Open Source: We are committed to open source and believe in the power of community-driven innovation. By making our model open source, we aim to contribute to the advancement of AI and empower others to build upon our work. We recognize the lack of readily available solutions in this specific area, and we're excited to fill that gap.
-Future Developments: This is just the beginning. Our team is dedicated to pushing the boundaries of what's possible with AI, and we have plans to release further updates and enhancements to this model in the future. We are committed to continuous improvement and innovation in the field of AI-powered search.
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** EQuIP
-- **Funded by :** EQuIP
-- **Model type:** Causal Language Model
-- **Language(s) (NLP):** English (en)
-- **License:** MIT License
-- **Finetuned from model :** Qwen2.5-3B-Instruct
-### Model Sources [optional]
-- **Repository:** https://huggingface.co/EQuIP-Queries/EQuIP_3B
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-This model is intended to be directly used to translate natural language prompts into Elasticsearch queries without additional fine-tuning.
-Example use cases include:
-Generating Elasticsearch queries from plain English prompts.
-Simplifying query generation for analysts, developers, or data scientists unfamiliar with Elasticsearch syntax.
-Automating query creation as part of search, analytics, or data exploration tools.
-Intended users:
-Developers integrating natural language querying capabilities into Elasticsearch-based applications.
-Analysts and data scientists who frequently interact with Elasticsearch data.
-### Out-of-Scope Use
-The model is not intended for use cases such as:
-Generating queries for databases or search engines other than Elasticsearch.
-Handling languages other than English.
-Providing factual answers or general conversational interactions.
-Tasks involving sensitive decision-making, such as medical, legal, or financial advice, where inaccurate queries may lead to significant consequences.
-## Bias, Risks, and Limitations
-Bias Awareness:
-- The model may inherit biases present in the training data. Users should assess generated outputs for unintended biases or patterns, particularly in sensitive contexts.
-Misuse and Malicious Use:
-- Users must avoid using the model to intentionally produce harmful or misleading search queries or manipulate search results negatively.
-Limitations:
-- Performance may degrade significantly if input prompts differ substantially from the fine-tuning data domain.
-- The model does not validate query accuracy or safety and should be reviewed before execution, especially in production environments.
-### Recommendations
-Query Validation:
-- Always validate and test generated Elasticsearch queries before deploying in production or using on sensitive data. Automatic generation may occasionally result in syntactic or semantic inaccuracies.
-Bias Awareness:
-- The model may inherit biases present in the training data. Users should assess generated outputs for unintended biases or patterns, particularly in sensitive contexts.
-Use in Sensitive Contexts:
-- Avoid using this model for critical or high-stakes decision-making tasks without additional human oversight and validation.
-Continuous Monitoring:
-- Monitor the outputs regularly to identify and correct issues promptly, ensuring long-term reliability.
-Transparency:
-- Clearly communicate the AI-driven nature of generated Elasticsearch queries to end-users to manage expectations and encourage verification.
-## How to Get Started with the Model
-Install the required dependencies:
-```python
-[pip install transformers torch]
-```
-Here's how you can quickly start generating Elasticsearch queries from natural language prompts using this model:
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "EQuIP-Queries/EQuIP_3B"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForCausalLM.from_pretrained(model_name)
-mapping = "[Your Elasticsearch mappings]"
-user_request = "Find me products which are less than $50"
-prompt = f"Given this mapping: {mapping}\nGenerate an Elasticsearch query for: {user_request}"
-inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(
-    inputs["input_ids"],
-    max_length=512,
-    do_sample=True,
-    temperature=0.2,
-    top_p=0.9,
-    pad_token_id=tokenizer.pad_token_id
-)
-generated_query = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
-print("Generated Elasticsearch query:")
-print(generated_query)
-```
-## Training Details
-### Training Data
-The model was fine-tuned on a custom dataset consisting of 10,000 pairs of natural language prompts and corresponding Elasticsearch queries. Each prompt describes the desired Elasticsearch query in plain English, paired with a manually crafted accurate Elasticsearch query.
-The dataset covers various query types and common Elasticsearch query patterns, including filters, range queries, aggregations, boolean conditions, and text search scenarios.
-Currently, the dataset is not publicly available. If made available in the future, a Dataset Card link will be provided here.
-Preprocessing:
-- Prompts and queries were cleaned to ensure consistent formatting.
-- Special tokens and unnecessary whitespace were removed to ensure high-quality training data.
-### Training Procedure
-The model was fine-tuned using Low-Rank Adaptation (LoRA) on top of the pre-trained Qwen2.5-3B-Instruct model. LoRA significantly reduced computational requirements by training only low-rank matrices within the Transformer layers.
-#### Training Hyperparameters
-- **Training regime:** bf16 non-mixed precision
-## Evaluation
-The model was evaluated using a held-out test set comprising 1,000 prompt-query pairs not included in the training dataset. The primary goal of the evaluation was to measure the accuracy and relevance of generated Elasticsearch queries.
-### Testing Data, Factors & Metrics
-#### Testing Data
-- Size: 1,000 prompt-query pairs (held-out from training).
-- Composition: Representative of diverse Elasticsearch query types, including boolean conditions, aggregations, text search, and date-based queries.
-#### Factors
-The evaluation considered:
-- Complexity of the Elasticsearch query.
-- Accuracy in interpreting the intent of natural language prompts.
-- Syntactic correctness and relevance of generated queries.
-#### Metrics
-Exact Match: Measures the percentage of queries matching exactly with ground truth queries.
-Semantic Similarity: Assessed using embedding-based similarity scores (e.g., cosine similarity).
-Token-level F1: Evaluates precision and recall at the token-level, measuring partial correctness in generated queries.
-### Results
-| Model              | Parameters | Generation Time (sec) | Token Precision | Token Recall | Token F1 | Validity Rate | Field Similarity |
-|--------------------|------------|-----------------------|-----------------|--------------|----------|---------------|------------------|
-| **EQuIP**          | 3B         | 0.7969                | 0.8738          | 0.9737       | 0.9808   | 0.97          | 0.9916           |
-| **LLaMA 3.1**      | 8B         | 13.4822               | 0.3979          | 0.6          | 0.5693   | 0.5723        | 0.4622           |
-| **Qwen 2.5**       | 7B         | 1.4233                | 0.6667          | 0.7          | 0.7743   | 0.82          | 0.6479           |
-| **Deepseek Distill** | 8B       | 9.2516                | 0.5846          | 0.65         | 0.6979   | 0.7496        | 0.8908           |
-| **Gemma 2**        | 9B         | 3.0801                | 0.6786          | 0.82         | 0.7309   | 0.8           | 0.8151           |
-| **Mistral**        | 7B         | 2.1068                | 0.6786          | 0.79         | 0.7551   | 0.8           | 0.7437           |
-#### Summary
-The evaluation demonstrates that the model achieves strong performance in accurately translating natural language prompts into valid Elasticsearch queries. It shows particularly high effectiveness in terms of token precision, recall, and overall semantic similarity, highlighting its ability to generate accurate, relevant, and syntactically correct queries efficiently. Compared to several other widely-used models, it offers an excellent balance of accuracy, speed, and computational efficiency, making it highly suitable for production use in Elasticsearch query generation tasks. However, it's recommended that users continue to verify query outputs, especially for critical or sensitive applications.
-## Environmental Impact
-Carbon emissions for the training and fine-tuning of this model can be estimated using the Machine Learning Impact calculator introduced by Lacoste et al. (2019).
-- **Hardware Type:** NVIDIA A100 GPU
-- **Hours used:** 11 hours
-- **Cloud Provider:** Vast.ai
-### Model Architecture and Objective
-This model is based on the Qwen2.5-3B-Instruct architecture, which is a decoder-only, transformer-based causal language model. It consists of approximately 3 billion parameters designed for efficient and high-quality natural language understanding and generation.
-The primary objective of this fine-tuned model is to accurately convert natural language prompts into syntactically correct and semantically relevant Elasticsearch queries. To achieve this, the model was fine-tuned on domain-specific data, incorporating Low-Rank Adaptation (LoRA) to optimize performance and resource efficiency.
-## Model Card Contact
-Contact: EQuIP
 Email: [info@equipqueries.com]

+---
+library_name: transformers
+license: mit
+base_model:
+- Qwen/Qwen2.5-3B-Instruct
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+---
+# Model Card for EQuIP-Queries/EQuIP_3B
+An AI model that understands natural language and translates it into accurate Elasticsearch queries.
+This model is based on the Qwen2.5 3B architecture, a compact yet powerful language model known for its efficiency.
+We fine-tuned this model with 10,000 Elasticsearch query data points to specialize its ability to generate accurate and relevant queries.
+## Model Details
+### Model Description
+Our Solution: An AI-Powered Query Generator
+Our team has developed a solution to this challenge: an AI model that understands natural language and translates it into accurate Elasticsearch queries. This model is based on the Qwen2.5 3B architecture, a compact yet powerful language model known for its efficiency. We fine-tuned this model with 10,000 Elasticsearch query data points to specialize its ability to generate accurate and relevant queries.
+We've employed advanced techniques, including LoRA (Low-Rank Adaptation) to optimize the model for performance and efficiency. Specifically, LoRA reduces the number of trainable parameters by introducing low-rank matrices.
+This combination allows us to achieve high accuracy while minimizing computational resource requirements.
+Key Features and Benefits
+Natural Language Interface: Users can simply describe the data they're looking for in plain English, and the model will generate the corresponding Elasticsearch query.
+Increased Efficiency: Reduces the time and effort required to write complex queries, allowing users to focus on analyzing their data.
+Improved Accessibility: Makes Elasticsearch more accessible to a wider audience, including those who are not experts in its query language.
+Open Source: We are committed to open source and believe in the power of community-driven innovation. By making our model open source, we aim to contribute to the advancement of AI and empower others to build upon our work. We recognize the lack of readily available solutions in this specific area, and we're excited to fill that gap.
+Future Developments: This is just the beginning. Our team is dedicated to pushing the boundaries of what's possible with AI, and we have plans to release further updates and enhancements to this model in the future. We are committed to continuous improvement and innovation in the field of AI-powered search.
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** EQuIP
+- **Funded by :** EQuIP
+- **Model type:** Causal Language Model
+- **Language(s) (NLP):** English (en)
+- **License:** MIT License
+- **Finetuned from model :** Qwen2.5-3B-Instruct
+### Model Sources [optional]
+- **Repository:** https://huggingface.co/EQuIP-Queries/EQuIP_3B
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+This model is intended to be directly used to translate natural language prompts into Elasticsearch queries without additional fine-tuning.
+Example use cases include:
+Generating Elasticsearch queries from plain English prompts.
+Simplifying query generation for analysts, developers, or data scientists unfamiliar with Elasticsearch syntax.
+Automating query creation as part of search, analytics, or data exploration tools.
+Intended users:
+Developers integrating natural language querying capabilities into Elasticsearch-based applications.
+Analysts and data scientists who frequently interact with Elasticsearch data.
+### Out-of-Scope Use
+The model is not intended for use cases such as:
+Generating queries for databases or search engines other than Elasticsearch.
+Handling languages other than English.
+Providing factual answers or general conversational interactions.
+Tasks involving sensitive decision-making, such as medical, legal, or financial advice, where inaccurate queries may lead to significant consequences.
+## Bias, Risks, and Limitations
+Bias Awareness:
+- The model may inherit biases present in the training data. Users should assess generated outputs for unintended biases or patterns, particularly in sensitive contexts.
+Misuse and Malicious Use:
+- Users must avoid using the model to intentionally produce harmful or misleading search queries or manipulate search results negatively.
+Limitations:
+- Performance may degrade significantly if input prompts differ substantially from the fine-tuning data domain.
+- The model does not validate query accuracy or safety and should be reviewed before execution, especially in production environments.
+### Recommendations
+Query Validation:
+- Always validate and test generated Elasticsearch queries before deploying in production or using on sensitive data. Automatic generation may occasionally result in syntactic or semantic inaccuracies.
+Bias Awareness:
+- The model may inherit biases present in the training data. Users should assess generated outputs for unintended biases or patterns, particularly in sensitive contexts.
+Use in Sensitive Contexts:
+- Avoid using this model for critical or high-stakes decision-making tasks without additional human oversight and validation.
+Continuous Monitoring:
+- Monitor the outputs regularly to identify and correct issues promptly, ensuring long-term reliability.
+Transparency:
+- Clearly communicate the AI-driven nature of generated Elasticsearch queries to end-users to manage expectations and encourage verification.
+## How to Get Started with the Model
+Install the required dependencies:
+```python
+[pip install transformers torch]
+```
+Here's how you can quickly start generating Elasticsearch queries from natural language prompts using this model:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "EQuIP-Queries/EQuIP_3B"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+mapping = "[Your Elasticsearch mappings]"
+user_request = "Find me products which are less than $50"
+prompt = f"Given this mapping: {mapping}\nGenerate an Elasticsearch query for: {user_request}"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(
+    inputs["input_ids"],
+    max_length=512,
+    do_sample=True,
+    temperature=0.2,
+    top_p=0.9,
+    pad_token_id=tokenizer.pad_token_id
+)
+generated_query = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
+print("Generated Elasticsearch query:")
+print(generated_query)
+```
+## Training Details
+### Training Data
+The model was fine-tuned on a custom dataset consisting of 10,000 pairs of natural language prompts and corresponding Elasticsearch queries. Each prompt describes the desired Elasticsearch query in plain English, paired with a manually crafted accurate Elasticsearch query.
+The dataset covers various query types and common Elasticsearch query patterns, including filters, range queries, aggregations, boolean conditions, and text search scenarios.
+Currently, the dataset is not publicly available. If made available in the future, a Dataset Card link will be provided here.
+Preprocessing:
+- Prompts and queries were cleaned to ensure consistent formatting.
+- Special tokens and unnecessary whitespace were removed to ensure high-quality training data.
+### Training Procedure
+The model was fine-tuned using Low-Rank Adaptation (LoRA) on top of the pre-trained Qwen2.5-3B-Instruct model. LoRA significantly reduced computational requirements by training only low-rank matrices within the Transformer layers.
+#### Training Hyperparameters
+- **Training regime:** bf16 non-mixed precision
+## Evaluation
+The model was evaluated using a held-out test set comprising 1,000 prompt-query pairs not included in the training dataset. The primary goal of the evaluation was to measure the accuracy and relevance of generated Elasticsearch queries.
+### Testing Data, Factors & Metrics
+#### Testing Data
+- Size: 1,000 prompt-query pairs (held-out from training).
+- Composition: Representative of diverse Elasticsearch query types, including boolean conditions, aggregations, text search, and date-based queries.
+#### Factors
+The evaluation considered:
+- Complexity of the Elasticsearch query.
+- Accuracy in interpreting the intent of natural language prompts.
+- Syntactic correctness and relevance of generated queries.
+#### Metrics
+Exact Match: Measures the percentage of queries matching exactly with ground truth queries.
+Semantic Similarity: Assessed using embedding-based similarity scores (e.g., cosine similarity).
+Token-level F1: Evaluates precision and recall at the token-level, measuring partial correctness in generated queries.
+### Results
+| Model              | Parameters | Generation Time (sec) | Token Precision | Token Recall | Token F1 | Validity Rate | Field Similarity |
+|--------------------|------------|-----------------------|-----------------|--------------|----------|---------------|------------------|
+| **EQuIP**          | 3B         | 0.7969                | 0.8738          | 0.9737       | 0.9808   | 0.97          | 0.9916           |
+| **LLaMA 3.1**      | 8B         | 13.4822               | 0.3979          | 0.6          | 0.5693   | 0.5723        | 0.4622           |
+| **Qwen 2.5**       | 7B         | 1.4233                | 0.6667          | 0.7          | 0.7743   | 0.82          | 0.6479           |
+| **Deepseek Distill** | 8B       | 9.2516                | 0.5846          | 0.65         | 0.6979   | 0.7496        | 0.8908           |
+| **Gemma 2**        | 9B         | 3.0801                | 0.6786          | 0.82         | 0.7309   | 0.8           | 0.8151           |
+| **Mistral**        | 7B         | 2.1068                | 0.6786          | 0.79         | 0.7551   | 0.8           | 0.7437           |
+#### Summary
+The evaluation demonstrates that the model achieves strong performance in accurately translating natural language prompts into valid Elasticsearch queries. It shows particularly high effectiveness in terms of token precision, recall, and overall semantic similarity, highlighting its ability to generate accurate, relevant, and syntactically correct queries efficiently. Compared to several other widely-used models, it offers an excellent balance of accuracy, speed, and computational efficiency, making it highly suitable for production use in Elasticsearch query generation tasks. However, it's recommended that users continue to verify query outputs, especially for critical or sensitive applications.
+## Environmental Impact
+Carbon emissions for the training and fine-tuning of this model can be estimated using the Machine Learning Impact calculator introduced by Lacoste et al. (2019).
+- **Hardware Type:** NVIDIA A100 GPU
+- **Hours used:** 11 hours
+- **Cloud Provider:** Vast.ai
+### Model Architecture and Objective
+This model is based on the Qwen2.5-3B-Instruct architecture, which is a decoder-only, transformer-based causal language model. It consists of approximately 3 billion parameters designed for efficient and high-quality natural language understanding and generation.
+The primary objective of this fine-tuned model is to accurately convert natural language prompts into syntactically correct and semantically relevant Elasticsearch queries. To achieve this, the model was fine-tuned on domain-specific data, incorporating Low-Rank Adaptation (LoRA) to optimize performance and resource efficiency.
+## Model Card Contact
+Contact: EQuIP
 Email: [info@equipqueries.com]