Spaces:
Running
Running
| title: Beta | |
| emoji: 🐢 | |
| colorFrom: blue | |
| colorTo: yellow | |
| sdk: static | |
| pinned: true | |
| license: mpl-2.0 | |
| short_description: চা খাবা? | |
| <div align="center"> | |
| <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67497128927b345d1345e9de/69fZeWPoXB20L7do9nZDY.png" width="300" alt="Shôbdhonic Logo"> | |
| # শব্দনিক | Shôbdhonic | |
| ### **বাংলা NLP-এর নতুন যুগ** | |
| *"ভাষাকে জানো, AI-কে চেনো!"* | |
| *(Unlock Bangla's Future with AI)* | |
| [](https://shobdhonic.com) | |
| [](https://discord.gg/shobdhonic) | |
| [](https://twitter.com/Shobdhonic) | |
| [](https://t.me/Shobdhonic) | |
| [](https://github.com/Shobdhonic) | |
| [](https://huggingface.co/Shobdhonic) | |
| </div> | |
| --- | |
| ## 🚀 **Why Shôbdhonic?** | |
| A **next-gen Bangla NLP platform** built for: | |
| - 🔥 **Gen-Z Creators**: Meme generators, slang translators, TikTok/Reels integrations | |
| - 🏢 **Enterprises**: Sentiment analysis, fraud detection, document processing | |
| - 🇧🇩 **Cultural Preservation**: Digitize literature, dialects, and oral histories | |
| - 🧠 **Research**: Advanced Bangla language models, transformer architectures, and fine-tuning pipelines | |
| - 🌐 **Web3**: Blockchain integration for digital Bangla content authentication | |
| --- | |
| ## ✨ **Key Features** | |
| | **Category** | **Tools** | | |
| |-----------------------|------------------------------------------------------------------------------------| | |
| | **Gen-Z Playground** | `MemeGPT` • `Slang Translator` • `AI Rap Generator` • `Voice Filters` • `TikTok Content API` | | |
| | **Enterprise NLP** | `Legal Doc Analyzer` • `News Sentiment API` • `Plagiarism Checker` • `Customer Service Bot` • `Bangla Data OCR` | | |
| | **Voice Lab** | `Celebrity Voice Cloning` • `Regional Accent TTS` • `Audio Transcription` • `Dialect Analysis` • `Emotion Detection` | | |
| | **Real-Time AI** | `Trend Predictor` • `Social Media Pulse` • `Ittefaq News Scanner` • `Market Sentiment Analysis` • `Election Opinion Tracker` | | |
| | **Academia** | `Literature Analysis` • `Academic Paper Assistant` • `Educational Content Generator` • `Bangla Research Corpus` | | |
| | **Security Suite** | `Bangla Fraud Detection` • `Phishing Text Analysis` • `Disinformation Tracker` • `Financial Alert System` | | |
| --- | |
| ## 🎯 **Core Technologies** | |
| ### **Models Architecture** | |
| - **ShobdhoBERT**: Transformer-based model trained on 5TB of Bangla text corpus | |
| - **ShobdhoGPT-3.5**: GPT-based generative model fine-tuned on diverse Bangla content | |
| - **DialectDiffusion**: Voice synthesis specialized for regional Bangla dialects | |
| - **BanglaLLM-7B**: Large Language Model optimized for Bangla instruction following | |
| - **Multimodal-Bangla**: Vision-language model for Bangla image-text understanding | |
| ### **Data Processing Pipeline** | |
| - Proprietary text normalization for Bangla script variations | |
| - Context-aware slang detection and interpretation | |
| - Real-time news corpus analysis with automated categorization | |
| - Specialized tokenization for Bangla script with compound word handling | |
| - Advanced sentiment analysis for cultural nuances | |
| --- | |
| ## 🎨 **Brand Identity** | |
| ### **Colors** | |
| | Role | Hex | Preview | | |
| |---------------|-----------|------------------------| | |
| | Primary | `#6A5ACD` |  | | |
| | Secondary | `#FF69B4` |  | | |
| | Accent | `#00FFE0` |  | | |
| | Dark Mode | `#1A1A2E` |  | | |
| | Light Mode | `#F5F5F7` |  | | |
| ### **Mascot** | |
| **বর্গী বট (Borgi Bot)** – Our street-smart AI mascot for Gen-Z campaigns: | |
|  | |
| --- | |
| ## ⚡ **Quick Start** | |
| ### **Prerequisites** | |
| - Python 3.10+ / Node.js 18+ | |
| - Hugging Face API Key (Register [here](https://huggingface.co/Shobdhonic)) | |
| - Docker (optional, for containerized deployment) | |
| - GPU acceleration (recommended for model training/inference) | |
| ### **Installation** | |
| ```bash | |
| # Clone repo | |
| git clone https://github.com/Shobdhonic/core-engine.git | |
| cd core-engine | |
| # Create virtual environment | |
| python -m venv shobdhonic-env | |
| source shobdhonic-env/bin/activate # On Windows: shobdhonic-env\Scripts\activate | |
| # Install dependencies (Python) | |
| pip install -r requirements.txt | |
| # Or for Node.js | |
| npm install | |
| # Set up environment variables | |
| cp .env.example .env | |
| # Edit .env with your API keys | |
| ``` | |
| ### **Docker Setup** | |
| ```bash | |
| # Build the Docker image | |
| docker build -t shobdhonic:latest . | |
| # Run the container | |
| docker run -p 8000:8000 -v $(pwd):/app --env-file .env shobdhonic:latest | |
| ``` | |
| ### **Generate Your First Meme** | |
| ```python | |
| from shobdhonic import MemeMaster | |
| # Initialize with your API key | |
| meme_api = MemeMaster(api_key="your_api_key_here") | |
| # Create a meme with custom text and template | |
| meme = meme_api.create( | |
| text="একটা চা আর হয়না? ☕", | |
| template="cha_kaku", | |
| style="viral", # Options: viral, minimal, dramatic, retro | |
| font="bangla_classic", | |
| format="jpg" # Options: jpg, png, gif, mp4 | |
| ) | |
| # Save the meme | |
| meme.download("output/cha_kaku_meme.jpg") | |
| # Share directly to social media | |
| meme.share(platform="facebook") # Options: facebook, twitter, instagram, whatsapp | |
| ``` | |
| ### **Advanced Voice Cloning** | |
| ```python | |
| from shobdhonic import VoiceForge | |
| import numpy as np | |
| # Initialize voice engine | |
| voice_api = VoiceForge(api_key="your_api_key_here") | |
| # Clone a voice with emotion parameters | |
| voice = voice_api.clone( | |
| target_voice="bappa_sir", # Popular Bangla YouTuber | |
| text="ভাই, লাইক আর সাবস্ক্রাইব মনে হয়না!", | |
| emotion="excited", # Options: neutral, sad, excited, angry, persuasive | |
| dialect="dhaka", # Options: dhaka, chittagong, sylhet, rajshahi, khulna, barishal | |
| speed=1.2, # Playback speed multiplier (0.5 - 2.0) | |
| pitch_shift=0.3 # Adjust pitch (-1.0 to 1.0) | |
| ) | |
| # Play the generated audio | |
| voice.play() | |
| # Save to file | |
| voice.save("output/bappa_youtube_promo.mp3") | |
| # Get waveform data for further processing | |
| waveform = voice.get_waveform() | |
| frequencies = np.fft.fft(waveform) | |
| ``` | |
| ### **News Sentiment Analysis** | |
| ```python | |
| from shobdhonic import NewsAnalyzer | |
| import pandas as pd | |
| import matplotlib.pyplot as plt | |
| # Initialize news analyzer | |
| news_api = NewsAnalyzer(api_key="your_api_key_here") | |
| # Analyze recent articles | |
| results = news_api.analyze( | |
| source="prothom_alo", # Options: prothom_alo, ittefaq, bangla_tribune, bbc_bangla | |
| category="politics", # Options: politics, business, sports, entertainment, tech | |
| date_range="last_7_days", # Options: today, last_24h, last_7_days, last_30_days, custom | |
| sample_size=100 # Number of articles to analyze | |
| ) | |
| # Get sentiment breakdown | |
| sentiment_df = pd.DataFrame(results.sentiment_data) | |
| # Plot results | |
| plt.figure(figsize=(10, 6)) | |
| plt.bar(sentiment_df['sentiment'], sentiment_df['percentage']) | |
| plt.title('Political News Sentiment Analysis') | |
| plt.xlabel('Sentiment') | |
| plt.ylabel('Percentage (%)') | |
| plt.savefig('output/sentiment_analysis.png') | |
| ``` | |
| ### **Enterprise Document Processing** | |
| ```python | |
| from shobdhonic import DocumentProcessor | |
| from shobdhonic.security import SensitiveDataDetector | |
| # Initialize document processor | |
| doc_api = DocumentProcessor(api_key="your_api_key_here") | |
| # Process legal document | |
| processed_doc = doc_api.process( | |
| file_path="contracts/agreement.pdf", | |
| tasks=[ | |
| "summarize", # Create executive summary | |
| "extract_entities", # Find people, organizations, dates | |
| "identify_clauses", # Detect important legal clauses | |
| "risk_assessment" # Flag potentially problematic terms | |
| ], | |
| output_format="json" | |
| ) | |
| # Check for sensitive information | |
| sensitive_detector = SensitiveDataDetector() | |
| security_scan = sensitive_detector.scan(processed_doc.raw_text) | |
| if security_scan.has_sensitive_data: | |
| print(f"WARNING: Found {len(security_scan.findings)} instances of sensitive data") | |
| for finding in security_scan.findings: | |
| print(f"- {finding.type}: {finding.severity} risk level") | |
| # Export processed results | |
| processed_doc.export( | |
| output_path="output/processed_contract.json", | |
| include_metadata=True, | |
| redact_sensitive=True | |
| ) | |
| ``` | |
| --- | |
| ## 🔋 **Core Modules** | |
| ### **Text Processing** | |
| - `shobdhonic.tokenizer`: Advanced Bangla tokenization | |
| - `shobdhonic.transformer`: Pre-trained transformer models | |
| - `shobdhonic.nlp`: Natural language processing utilities | |
| - `shobdhonic.generator`: Text generation capabilities | |
| - `shobdhonic.translator`: Cross-language translation services | |
| ### **Audio & Speech** | |
| - `shobdhonic.voice`: Text-to-speech and speech-to-text | |
| - `shobdhonic.audio`: Audio processing utilities | |
| - `shobdhonic.dialect`: Regional dialect processing | |
| ### **Media & Content** | |
| - `shobdhonic.meme`: Meme generation engine | |
| - `shobdhonic.social`: Social media integration | |
| - `shobdhonic.content`: Content creation assistants | |
| - `shobdhonic.video`: Video generation and editing | |
| ### **Analysis & Intelligence** | |
| - `shobdhonic.sentiment`: Sentiment analysis tools | |
| - `shobdhonic.analytics`: Usage statistics and reporting | |
| - `shobdhonic.trends`: Trend detection and prediction | |
| ### **Security & Enterprise** | |
| - `shobdhonic.security`: Security and compliance tools | |
| - `shobdhonic.enterprise`: Enterprise integration utilities | |
| - `shobdhonic.docs`: Document processing pipeline | |
| --- | |
| ## 📈 **Performance Benchmarks** | |
| | **Task** | **Shôbdhonic** | **Other Bangla NLP** | **Improvement** | | |
| |------------------------------|-----------------|----------------------|-----------------| | |
| | Text Classification | 94.7% | 88.2% | +6.5% | | |
| | Named Entity Recognition | 92.3% | 85.9% | +6.4% | | |
| | Sentiment Analysis | 89.8% | 81.3% | +8.5% | | |
| | Question Answering | 87.6% | 79.1% | +8.5% | | |
| | Text Generation (BLEU) | 0.731 | 0.658 | +11.1% | | |
| | Speech Recognition (WER) | 6.4% | 11.7% | -5.3% (better) | | |
| | Text-to-Speech (MOS) | 4.52/5 | 3.87/5 | +16.8% | | |
| *Benchmarks conducted using standard Bangla test sets and industry metrics. Full methodology available in our [technical paper](https://shobdhonic.com/research/benchmarks).* | |
| --- | |
| ## 📊 **Enterprise Solutions** | |
| <div align="center"> | |
| <a href="https://shobdhonic.com/enterprise"> | |
| <img src="https://img.shields.io/badge/Shobdhonic_Enterprise-Get_Custom_Solutions-f42a41?style=for-the-badge&logo=gitlab"> | |
| </a> | |
| </div> | |
| ### **Banking & Finance** | |
| - Fraud detection in Bangla SMS/call transcripts | |
| - Customer support automation | |
| - Financial document processing | |
| - Transaction pattern analysis | |
| - Risk assessment NLP | |
| ### **Media & Publishing** | |
| - Auto-summarize news articles from Prothom Alo/Ittefaq | |
| - Content recommendation engines | |
| - Automated content tagging | |
| - Engagement prediction | |
| - Toxic comment filtering | |
| ### **Education** | |
| - Essay grading and feedback | |
| - Personalized learning content | |
| - Question generation from textbooks | |
| - Academic plagiarism detection | |
| - Educational chatbots in Bangla | |
| ### **Government & NGOs** | |
| - Citizen feedback analysis | |
| - Service request categorization | |
| - Policy document processing | |
| - Public sentiment monitoring | |
| - Disinformation detection | |
| --- | |
| ## 💻 **API Integration** | |
| ### **REST API Example** | |
| ```javascript | |
| // Using fetch in JavaScript | |
| const fetchMeme = async () => { | |
| const response = await fetch('https://api.shobdhonic.com/v1/create-meme', { | |
| method: 'POST', | |
| headers: { | |
| 'Content-Type': 'application/json', | |
| 'Authorization': 'Bearer YOUR_API_KEY' | |
| }, | |
| body: JSON.stringify({ | |
| text: 'পরীক্ষার রেজাল্ট দেখার পর আমি', | |
| template: 'sad_pepe', | |
| format: 'jpg' | |
| }) | |
| }); | |
| const data = await response.json(); | |
| return data.meme_url; | |
| }; | |
| // Call the function | |
| fetchMeme().then(url => { | |
| document.getElementById('meme-image').src = url; | |
| }); | |
| ``` | |
| ### **Python SDK Example** | |
| ```python | |
| from shobdhonic import ShobdhonicClient | |
| import asyncio | |
| async def main(): | |
| # Initialize client | |
| client = ShobdhonicClient(api_key="YOUR_API_KEY") | |
| # Use the sentiment analysis API | |
| result = await client.analyze_sentiment( | |
| text="এই সিনেমাটা দেখে আমি খুবই মুগ্ধ হয়েছি।", | |
| detailed=True | |
| ) | |
| print(f"Overall sentiment: {result.sentiment}") | |
| print(f"Confidence score: {result.confidence:.2f}") | |
| print(f"Emotional breakdown: {result.emotions}") | |
| # Use the translation API | |
| translation = await client.translate( | |
| text="আমি বাংলায় কথা বলতে পারি।", | |
| target_language="en" | |
| ) | |
| print(f"Translation: {translation.text}") | |
| print(f"Source language detected: {translation.source_language}") | |
| # Run the async function | |
| asyncio.run(main()) | |
| ``` | |
| ### **Webhook Integration** | |
| ```python | |
| from flask import Flask, request, jsonify | |
| import hmac | |
| import hashlib | |
| app = Flask(__name__) | |
| @app.route('/webhook/shobdhonic', methods=['POST']) | |
| def shobdhonic_webhook(): | |
| # Verify the webhook signature | |
| signature = request.headers.get('X-Shobdhonic-Signature') | |
| secret = 'your_webhook_secret' | |
| computed_signature = hmac.new( | |
| secret.encode('utf-8'), | |
| request.data, | |
| hashlib.sha256 | |
| ).hexdigest() | |
| if not hmac.compare_digest(signature, computed_signature): | |
| return jsonify({'error': 'Invalid signature'}), 401 | |
| # Process the webhook data | |
| data = request.json | |
| event_type = data.get('event_type') | |
| if event_type == 'sentiment_alert': | |
| handle_sentiment_alert(data) | |
| elif event_type == 'content_moderation': | |
| handle_content_moderation(data) | |
| elif event_type == 'trend_detected': | |
| handle_trend_detection(data) | |
| return jsonify({'status': 'success'}), 200 | |
| def handle_sentiment_alert(data): | |
| # Process sentiment alerts | |
| pass | |
| def handle_content_moderation(data): | |
| # Process content moderation events | |
| pass | |
| def handle_trend_detection(data): | |
| # Process trend detection events | |
| pass | |
| if __name__ == '__main__': | |
| app.run(debug=True, port=5000) | |
| ``` | |
| --- | |
| ## 🧩 **Project Structure** | |
| ``` | |
| shobdhonic/ | |
| ├── api/ # API endpoints | |
| ├── cli/ # Command-line tools | |
| ├── core/ # Core functionality | |
| │ ├── models/ # ML models | |
| │ ├── processors/ # Text processors | |
| │ ├── tokenizers/ # Bangla tokenizers | |
| │ └── vectors/ # Word embeddings | |
| ├── data/ # Data handling | |
| │ ├── corpus/ # Text corpora | |
| │ ├── loaders/ # Data loaders | |
| │ └── scrapers/ # Web scrapers | |
| ├── media/ # Media generation | |
| │ ├── audio/ # Audio processing | |
| │ ├── images/ # Image generation | |
| │ └── video/ # Video processing | |
| ├── security/ # Security tools | |
| ├── services/ # External services | |
| ├── ui/ # User interfaces | |
| │ ├── web/ # Web interface | |
| │ ├── mobile/ # Mobile interface | |
| │ └── widgets/ # Embeddable widgets | |
| ├── utils/ # Utility functions | |
| └── tests/ # Test suite | |
| ``` | |
| --- | |
| ## 🛠️ **Development Workflow** | |
| ### **Setting Up Development Environment** | |
| ```bash | |
| # Clone the development repository | |
| git clone https://github.com/Shobdhonic/shobdhonic-dev.git | |
| cd shobdhonic-dev | |
| # Create development environment | |
| python -m venv dev-env | |
| source dev-env/bin/activate | |
| # Install development dependencies | |
| pip install -r requirements-dev.txt | |
| # Set up pre-commit hooks | |
| pre-commit install | |
| ``` | |
| ### **Running Tests** | |
| ```bash | |
| # Run all tests | |
| pytest | |
| # Run specific test category | |
| pytest tests/test_tokenizers.py | |
| # Run with coverage report | |
| pytest --cov=shobdhonic --cov-report=html | |
| ``` | |
| ### **Building Documentation** | |
| ```bash | |
| # Generate API documentation | |
| cd docs | |
| make html | |
| # View documentation | |
| python -m http.server -d _build/html | |
| ``` | |
| ### **CI/CD Pipeline** | |
| Our continuous integration and deployment pipeline automatically: | |
| 1. Runs tests on all pull requests | |
| 2. Performs code quality checks | |
| 3. Builds and publishes packages on releases | |
| 4. Deploys to staging/production environments | |
| 5. Updates documentation site | |
| --- | |
| ## 🤝 **Contribute to Bangla AI** | |
| We welcome contributions from the community! Here's how to get started: | |
| 1. **Fork the Repository**: [GitHub/Shobdhonic](https://github.com/Shobdhonic) | |
| 2. **Pick an Issue**: Look for issues labeled `good-first-issue`, `help-wanted`, or `Gen-Z feature` | |
| 3. **Set Up Your Environment**: Follow the development setup instructions above | |
| 4. **Make Your Changes**: Write code and tests for your feature or fix | |
| 5. **Submit a Pull Request**: Follow our [Contribution Guidelines](CONTRIBUTING.md) | |
| ### **Areas We Need Help With** | |
| - 🧠 **Model Training**: Fine-tuning transformers on Bangla data | |
| - 🎮 **Gen-Z Features**: Cultural memes, slang translators, social integrations | |
| - 📱 **Mobile Development**: React Native components for our SDK | |
| - 🔊 **Voice Data**: Collection and processing of regional dialects | |
| - 📚 **Documentation**: Tutorials, examples, and API documentation | |
| ### **Contributor Code of Conduct** | |
| All contributors are expected to adhere to our [Code of Conduct](CODE_OF_CONDUCT.md) which promotes a welcoming, inclusive, and harassment-free experience for everyone. | |
| --- | |
| ## 📒 **Documentation** | |
| ### **API Reference** | |
| Complete API documentation is available at [docs.shobdhonic.com](https://docs.shobdhonic.com) | |
| ### **Tutorials** | |
| Step-by-step tutorials for common tasks: | |
| - [Getting Started with Shôbdhonic](https://docs.shobdhonic.com/tutorials/getting-started) | |
| - [Building a Bangla Chatbot](https://docs.shobdhonic.com/tutorials/chatbot) | |
| - [Voice Cloning Basics](https://docs.shobdhonic.com/tutorials/voice-cloning) | |
| - [Meme Generation](https://docs.shobdhonic.com/tutorials/meme-gen) | |
| - [Enterprise Document Processing](https://docs.shobdhonic.com/tutorials/document-processing) | |
| ### **Examples** | |
| Explore our [examples directory](https://github.com/Shobdhonic/examples) for complete code samples: | |
| - Basic NLP tasks (tokenization, classification, etc.) | |
| - Voice synthesis and analysis | |
| - Media generation workflows | |
| - Enterprise integration patterns | |
| - Web and mobile application samples | |
| --- | |
| ## 📜 **License & Ethics** | |
| ```text | |
| MIT License | © 2024 Shôbdhonic | |
| *Bangla Data Ethics Pledge:* | |
| - No misuse of dialects/regional languages | |
| - Cite sources like Ittefaq/Prothom Alo | |
| - Free access for academic research and non-profits/NGOs | |
| - Respecting privacy and data sovereignty | |
| - Preserving Bangla linguistic diversity | |
| ``` | |
| ### **Ethical AI Commitment** | |
| At Shôbdhonic, we commit to: | |
| - Transparency in our AI systems | |
| - Fairness and bias mitigation | |
| - Protection of user privacy | |
| - Responsible data collection practices | |
| - Supporting cultural preservation | |
| - Making advanced Bangla NLP accessible to all | |
| Our complete AI Ethics Policy is available [here](https://shobdhonic.com/ethics). | |
| --- | |
| ## 🧪 **Research** | |
| Our team publishes open research on Bangla NLP: | |
| - [BanglaTransformers: Pre-training Transformers for Bengali NLP](https://arxiv.org/abs/xxxx.xxxxx) | |
| - [Dialect-Aware Speech Synthesis for Low-Resource Languages](https://arxiv.org/abs/xxxx.xxxxx) | |
| - [BanglaEval: Benchmarking NLP Systems for Bengali](https://arxiv.org/abs/xxxx.xxxxx) | |
| Interested in research collaboration? Contact us at research@shobdhonic.com | |
| --- | |
| ## 🌐 **Connect** | |
| <div align="center"> | |
| [](https://huggingface.co/Shobdhonic) | |
| [](https://youtube.com/Shobdhonic) | |
| [](https://linkedin.com/company/Shobdhonic) | |
| [](https://medium.com/Shobdhonic) | |
| [](https://discord.gg/shobdhonic) | |
| </div> | |
| --- | |
| <div align="center"> | |
| **মহাযুদ্ধ বাংলা ভাষার, আমরা প্রস্তুত!** | |
| *Powered by রক্তে বাংলা, প্রযুক্তিতে Shôbdhonic* | |
| </div> |