🛠 Suggested GitHub Issues for Contributors
Beginner-Friendly Issues
- Improve Data Cleaning: Refactor
data/clean_wikipedia.pyto remove unwanted symbols. - Add More Text Data: Find and add additional Macedonian text datasets.
Intermediate Issues
- Optimize Model Training: Improve
training/train_pipeline.pyto support LoRA and 4-bit quantization. - API Deployment: Improve
inference/api.pyto optimize inference speed.
Advanced Issues
- Expand to Other LLMs: Modify training scripts to support LLaMA & Falcon models.
- Develop a Web UI: Create a Streamlit-based interface to interact with the trained model.
📢 Have ideas? Open a GitHub Issue and contribute to Macedonia's first open-source LLM!