david167's picture
Switch back to Llama-3.1-8B-Instruct model: update prompts, generation params, and UI descriptions
e6b5afc
metadata
title: Question Generation AI
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
app_port: 7860

Question Generation AI

This Hugging Face Space provides a ChatGPT-style interface for generating thoughtful questions from input statements using the Meta Llama-3.1-8B-Instruct model.

Features

  • πŸ€– ChatGPT-Style Interface: Intuitive chat interface for generating questions
  • 🎯 Customizable: Adjust number of questions, difficulty level, and creativity
  • πŸ“š Llama Powered: Uses Meta.s instruction-tuned Llama 3.1 model for high-quality questions
  • πŸš€ Fast & Reliable: Optimized for quick response times
  • πŸ”§ GPU Optimized: Runs efficiently on NVIDIA A10G hardware
  • πŸ’‘ Educational Focus: Perfect for creating study materials and assessments

How to Use

Chat Interface

Simply enter any statement or topic in the chat box, and the AI will generate thoughtful questions about it. You can:

  • Adjust Settings: Control the number of questions (1-10), difficulty level, and creativity
  • Try Different Topics: Works great with educational content, research topics, or any text
  • Interactive Experience: Chat-like interface similar to ChatGPT

API Access (Still Available)

The original API endpoints are still accessible at /generate-questions for programmatic access.

Request Body:

{
  "statement": "Your input statement here",
  "num_questions": 5,
  "temperature": 0.8,
  "max_length": 2048,
  "difficulty_level": "mixed"
}

Parameters:

  • statement (required): The input text to generate questions from
  • num_questions (1-10): Number of questions to generate (default: 5)
  • temperature (0.1-2.0): Generation creativity (default: 0.8)
  • max_length (100-4096): Maximum response length (default: 2048)
  • difficulty_level: "easy", "medium", "hard", or "mixed" (default: "mixed")

Response:

{
  "questions": [
    "What is the main concept discussed?",
    "How does this relate to...?",
    "Why is this important?"
  ],
  "statement": "Your original statement",
  "metadata": {
    "model": "DavidAU/Llama-3.1-1-million-ctx-DeepHermes-Deep-Reasoning-8B-GGUF",
    "temperature": 0.8,
    "difficulty_level": "mixed"
  }
}

Health Check

GET /health

Check the API and model status.

Response:

{
  "status": "healthy",
  "model_loaded": true,
  "device": "cuda",
  "memory_usage": {
    "allocated_gb": 12.5,
    "reserved_gb": 14.2,
    "total_gb": 24.0
  }
}

Usage Examples

Python

import requests

# API endpoint
url = "https://your-space-name.hf.space/generate-questions"

# Request payload
data = {
    "statement": "Artificial intelligence is transforming healthcare by enabling more accurate diagnoses, personalized treatments, and efficient drug discovery processes.",
    "num_questions": 3,
    "difficulty_level": "medium"
}

# Make request
response = requests.post(url, json=data)
questions = response.json()["questions"]

for i, question in enumerate(questions, 1):
    print(f"{i}. {question}")

JavaScript

const generateQuestions = async (statement) => {
  const response = await fetch('https://your-space-name.hf.space/generate-questions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      statement: statement,
      num_questions: 5,
      difficulty_level: 'mixed'
    })
  });
  
  const data = await response.json();
  return data.questions;
};

cURL

curl -X POST "https://your-space-name.hf.space/generate-questions" \
     -H "Content-Type: application/json" \
     -d '{
       "statement": "Climate change is one of the most pressing challenges of our time.",
       "num_questions": 4,
       "difficulty_level": "hard"
     }'

Model Information

This API uses the DavidAU/Llama-3.1-1-million-ctx-DeepHermes-Deep-Reasoning-8B-GGUF model, which features:

  • Enhanced Reasoning: Built on DeepHermes reasoning capabilities
  • Large Context: Supports up to 1 million tokens context length
  • Optimized Format: GGUF quantization for efficient inference
  • Thinking Process: Uses <think> tags for internal reasoning

Hardware Requirements

  • GPU: NVIDIA A10G (24GB VRAM)
  • Memory: ~14-16GB VRAM usage
  • Context: Up to 32K tokens (adjustable based on available memory)

API Documentation

Visit /docs for interactive API documentation with Swagger UI.

Error Handling

The API returns appropriate HTTP status codes:

  • 200: Success
  • 400: Bad Request (invalid parameters)
  • 503: Service Unavailable (model not loaded)
  • 500: Internal Server Error

Rate Limits

This is a demo space. For production use, consider:

  • Implementing rate limiting
  • Adding authentication
  • Scaling to multiple instances
  • Using dedicated inference endpoints

Support

For issues or questions:

  1. Check the /health endpoint
  2. Review the error messages
  3. Ensure your requests match the API schema
  4. Consider adjusting parameters for your hardware

Note: This Space requires a GPU runtime to function properly. Make sure your Space is configured with GPU support.