Spaces:

AtPeak
/

deepwiki-open

Sleeping

App Files Files Community

deepwiki-open / api /README.md

bhavinmatariya

Initial complete codebase with Git LFS for binary files

8e0dd55 5 months ago

preview code

raw

history blame contribute delete

6.66 kB

	# 🚀 DeepWiki API

	This is the backend API for DeepWiki, providing smart code analysis and AI-powered documentation generation.

	## ✨ Features

	- Streaming AI Responses: Real-time responses using Google's Generative AI (Gemini)
	- Smart Code Analysis: Automatically analyzes GitHub repositories
	- RAG Implementation: Retrieval Augmented Generation for context-aware responses
	- Local Storage: All data stored locally - no cloud dependencies
	- Conversation History: Maintains context across multiple questions

	## 🔧 Quick Setup

	### Step 1: Install Dependencies

	```bash
	# From the project root
	pip install -r api/requirements.txt
	```

	### Step 2: Set Up Environment Variables

	Create a `.env` file in the project root:

	```
	# Required API Keys
	GOOGLE_API_KEY=your_google_api_key # Required for Google Gemini models
	OPENAI_API_KEY=your_openai_api_key # Required for embeddings and OpenAI models

	# Optional API Keys
	OPENROUTER_API_KEY=your_openrouter_api_key # Required only if using OpenRouter models

	# AWS Bedrock Configuration
	AWS_ACCESS_KEY_ID=your_aws_access_key_id # Required for AWS Bedrock models
	AWS_SECRET_ACCESS_KEY=your_aws_secret_key # Required for AWS Bedrock models
	AWS_REGION=us-east-1 # Optional, defaults to us-east-1
	AWS_ROLE_ARN=your_aws_role_arn # Optional, for role-based authentication

	# OpenAI API Configuration
	OPENAI_BASE_URL=https://custom-api-endpoint.com/v1 # Optional, for custom OpenAI API endpoints

	# Ollama host
	OLLAMA_HOST=https://your_ollama_host" # Optional: Add Ollama host if not local. default: http://localhost:11434

	# Server Configuration
	PORT=8001 # Optional, defaults to 8001
	```

	If you're not using Ollama mode, you need to configure an OpenAI API key for embeddings. Other API keys are only required when configuring and using models from the corresponding providers.

	> 💡 Where to get these keys:
	> - Get a Google API key from [Google AI Studio](https://makersuite.google.com/app/apikey)
	> - Get an OpenAI API key from [OpenAI Platform](https://platform.openai.com/api-keys)
	> - Get an OpenRouter API key from [OpenRouter](https://openrouter.ai/keys)
	> - Get AWS credentials from [AWS IAM Console](https://console.aws.amazon.com/iam/)

	#### Advanced Environment Configuration

	##### Provider-Based Model Selection
	DeepWiki supports multiple LLM providers. The environment variables above are required depending on which providers you want to use:

	- Google Gemini: Requires `GOOGLE_API_KEY`
	- OpenAI: Requires `OPENAI_API_KEY`
	- OpenRouter: Requires `OPENROUTER_API_KEY`
	- AWS Bedrock: Requires `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
	- Ollama: No API key required (runs locally)

	##### Custom OpenAI API Endpoints
	The `OPENAI_BASE_URL` variable allows you to specify a custom endpoint for the OpenAI API. This is useful for:

	- Enterprise users with private API channels
	- Organizations using self-hosted or custom-deployed LLM services
	- Integration with third-party OpenAI API-compatible services

	Example: you can use the endpoint which support the OpenAI protocol provided by any organization
	```
	OPENAI_BASE_URL=https://custom-openai-endpoint.com/v1
	```

	##### Configuration Files
	DeepWiki now uses JSON configuration files to manage various system components instead of hardcoded values:

	1. `generator.json`: Configuration for text generation models
	- Located in `api/config/` by default
	- Defines available model providers (Google, OpenAI, OpenRouter, AWS Bedrock, Ollama)
	- Specifies default and available models for each provider
	- Contains model-specific parameters like temperature and top_p

	2. `embedder.json`: Configuration for embedding models and text processing
	- Located in `api/config/` by default
	- Defines embedding models for vector storage
	- Contains retriever configuration for RAG
	- Specifies text splitter settings for document chunking

	3. `repo.json`: Configuration for repository handling
	- Located in `api/config/` by default
	- Contains file filters to exclude certain files and directories
	- Defines repository size limits and processing rules

	You can customize the configuration directory location using the environment variable:

	```
	DEEPWIKI_CONFIG_DIR=/path/to/custom/config/dir # Optional, for custom config file location
	```

	This allows you to maintain different configurations for various environments or deployment scenarios without modifying the code.

	### Step 3: Start the API Server

	```bash
	# From the project root
	python -m api.main
	```

	The API will be available at `http://localhost:8001`

	## 🧠 How It Works

	### 1. Repository Indexing
	When you provide a GitHub repository URL, the API:
	- Clones the repository locally (if not already cloned)
	- Reads all files in the repository
	- Creates embeddings for the files using OpenAI
	- Stores the embeddings in a local database

	### 2. Smart Retrieval (RAG)
	When you ask a question:
	- The API finds the most relevant code snippets
	- These snippets are used as context for the AI
	- The AI generates a response based on this context

	### 3. Real-Time Streaming
	- Responses are streamed in real-time
	- You see the answer as it's being generated
	- This creates a more interactive experience

	## 📡 API Endpoints

	### GET /
	Returns basic API information and available endpoints.

	### POST /chat/completions/stream
	Streams an AI-generated response about a GitHub repository.

	Request Body:

	```json
	{
	"repo_url": "https://github.com/username/repo",
	"messages": [
	{
	"role": "user",
	"content": "What does this repository do?"
	}
	],
	"filePath": "optional/path/to/file.py" // Optional
	}
	```

	Response:
	A streaming response with the generated text.

	## 📝 Example Code

	```python
	import requests

	# API endpoint
	url = "http://localhost:8001/chat/completions/stream"

	# Request data
	payload = {
	"repo_url": "https://github.com/AsyncFuncAI/deepwiki-open",
	"messages": [
	{
	"role": "user",
	"content": "Explain how React components work"
	}
	]
	}

	# Make streaming request
	response = requests.post(url, json=payload, stream=True)

	# Process the streaming response
	for chunk in response.iter_content(chunk_size=None):
	if chunk:
	print(chunk.decode('utf-8'), end='', flush=True)
	```

	## 💾 Storage

	All data is stored locally on your machine:
	- Cloned repositories: `~/.adalflow/repos/`
	- Embeddings and indexes: `~/.adalflow/databases/`
	- Generated wiki cache: `~/.adalflow/wikicache/`

	No cloud storage is used - everything runs on your computer!