david167's picture
Switch back to Llama-3.1-8B-Instruct model: update prompts, generation params, and UI descriptions
e6b5afc
---
title: Question Generation AI
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860
---
# Question Generation AI
This Hugging Face Space provides a ChatGPT-style interface for generating thoughtful questions from input statements using the **Meta Llama-3.1-8B-Instruct** model.
## Features
- 🤖 **ChatGPT-Style Interface**: Intuitive chat interface for generating questions
- 🎯 **Customizable**: Adjust number of questions, difficulty level, and creativity
- 📚 **Llama Powered**: Uses Meta.s instruction-tuned Llama 3.1 model for high-quality questions
- 🚀 **Fast & Reliable**: Optimized for quick response times
- 🔧 **GPU Optimized**: Runs efficiently on NVIDIA A10G hardware
- 💡 **Educational Focus**: Perfect for creating study materials and assessments
## How to Use
### Chat Interface
Simply enter any statement or topic in the chat box, and the AI will generate thoughtful questions about it. You can:
- **Adjust Settings**: Control the number of questions (1-10), difficulty level, and creativity
- **Try Different Topics**: Works great with educational content, research topics, or any text
- **Interactive Experience**: Chat-like interface similar to ChatGPT
### API Access (Still Available)
The original API endpoints are still accessible at `/generate-questions` for programmatic access.
**Request Body:**
```json
{
"statement": "Your input statement here",
"num_questions": 5,
"temperature": 0.8,
"max_length": 2048,
"difficulty_level": "mixed"
}
```
**Parameters:**
- `statement` (required): The input text to generate questions from
- `num_questions` (1-10): Number of questions to generate (default: 5)
- `temperature` (0.1-2.0): Generation creativity (default: 0.8)
- `max_length` (100-4096): Maximum response length (default: 2048)
- `difficulty_level`: "easy", "medium", "hard", or "mixed" (default: "mixed")
**Response:**
```json
{
"questions": [
"What is the main concept discussed?",
"How does this relate to...?",
"Why is this important?"
],
"statement": "Your original statement",
"metadata": {
"model": "DavidAU/Llama-3.1-1-million-ctx-DeepHermes-Deep-Reasoning-8B-GGUF",
"temperature": 0.8,
"difficulty_level": "mixed"
}
}
```
### Health Check
**GET** `/health`
Check the API and model status.
**Response:**
```json
{
"status": "healthy",
"model_loaded": true,
"device": "cuda",
"memory_usage": {
"allocated_gb": 12.5,
"reserved_gb": 14.2,
"total_gb": 24.0
}
}
```
## Usage Examples
### Python
```python
import requests
# API endpoint
url = "https://your-space-name.hf.space/generate-questions"
# Request payload
data = {
"statement": "Artificial intelligence is transforming healthcare by enabling more accurate diagnoses, personalized treatments, and efficient drug discovery processes.",
"num_questions": 3,
"difficulty_level": "medium"
}
# Make request
response = requests.post(url, json=data)
questions = response.json()["questions"]
for i, question in enumerate(questions, 1):
print(f"{i}. {question}")
```
### JavaScript
```javascript
const generateQuestions = async (statement) => {
const response = await fetch('https://your-space-name.hf.space/generate-questions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
statement: statement,
num_questions: 5,
difficulty_level: 'mixed'
})
});
const data = await response.json();
return data.questions;
};
```
### cURL
```bash
curl -X POST "https://your-space-name.hf.space/generate-questions" \
-H "Content-Type: application/json" \
-d '{
"statement": "Climate change is one of the most pressing challenges of our time.",
"num_questions": 4,
"difficulty_level": "hard"
}'
```
## Model Information
This API uses the **DavidAU/Llama-3.1-1-million-ctx-DeepHermes-Deep-Reasoning-8B-GGUF** model, which features:
- **Enhanced Reasoning**: Built on DeepHermes reasoning capabilities
- **Large Context**: Supports up to 1 million tokens context length
- **Optimized Format**: GGUF quantization for efficient inference
- **Thinking Process**: Uses `<think>` tags for internal reasoning
## Hardware Requirements
- **GPU**: NVIDIA A10G (24GB VRAM)
- **Memory**: ~14-16GB VRAM usage
- **Context**: Up to 32K tokens (adjustable based on available memory)
## API Documentation
Visit `/docs` for interactive API documentation with Swagger UI.
## Error Handling
The API returns appropriate HTTP status codes:
- `200`: Success
- `400`: Bad Request (invalid parameters)
- `503`: Service Unavailable (model not loaded)
- `500`: Internal Server Error
## Rate Limits
This is a demo space. For production use, consider:
- Implementing rate limiting
- Adding authentication
- Scaling to multiple instances
- Using dedicated inference endpoints
## Support
For issues or questions:
1. Check the `/health` endpoint
2. Review the error messages
3. Ensure your requests match the API schema
4. Consider adjusting parameters for your hardware
---
**Note**: This Space requires a GPU runtime to function properly. Make sure your Space is configured with GPU support.