|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
- ro |
|
|
base_model: |
|
|
- LLMLit/LLMLit |
|
|
tags: |
|
|
- LLMLiT |
|
|
- Romania |
|
|
- LLM |
|
|
datasets: |
|
|
- LLMLit/LitSet |
|
|
metrics: |
|
|
- accuracy |
|
|
- character |
|
|
- code_eval |
|
|
--- |
|
|
# Model Card for LLMLit |
|
|
|
|
|
## Quick Summary |
|
|
LLMLit is a high-performance, multilingual large language model (LLM) fine-tuned from Meta's Llama 3.1 8B Instruct model. Designed for both English and Romanian NLP tasks, LLMLit leverages advanced instruction-following capabilities to provide accurate, context-aware, and efficient results across diverse applications. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
LLMLit is tailored to handle a wide array of tasks, including content generation, summarization, question answering, and more, in both English and Romanian. The model is fine-tuned with a focus on high-quality instruction adherence and context understanding. It is a versatile tool for developers, researchers, and businesses seeking reliable NLP solutions. |
|
|
|
|
|
- **Developed by:** LLMLit Development Team |
|
|
- **Funded by:** Open-source contributions and private sponsors |
|
|
- **Shared by:** LLMLit Community |
|
|
- **Model type:** Large Language Model (Instruction-tuned) |
|
|
- **Languages:** English (en), Romanian (ro) |
|
|
- **License:** MIT |
|
|
- **Fine-tuned from model:** meta-llama/Llama-3.1-8B-Instruct |
|
|
|
|
|
### Model Sources |
|
|
- **Repository:** [GitHub Repository Link](https://github.com/PyThaGoAI/LLMLit) |
|
|
- **Paper:** [To be published] |
|
|
- **Demo:** [Coming Soon) |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
LLMLit can be directly applied to tasks such as: |
|
|
- Generating human-like text responses |
|
|
- Translating between English and Romanian |
|
|
- Summarizing articles, reports, or documents |
|
|
- Answering complex questions with context sensitivity |
|
|
|
|
|
### Downstream Use |
|
|
When fine-tuned or integrated into larger ecosystems, LLMLit can be utilized for: |
|
|
- Chatbots and virtual assistants |
|
|
- Educational tools for bilingual environments |
|
|
- Legal or medical document analysis |
|
|
- E-commerce and customer support automation |
|
|
|
|
|
### Out-of-Scope Use |
|
|
LLMLit is not suitable for: |
|
|
- Malicious or unethical applications, such as spreading misinformation |
|
|
- Highly sensitive or critical decision-making without human oversight |
|
|
- Tasks requiring real-time, low-latency performance in constrained environments |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
### Bias |
|
|
- LLMLit inherits biases present in the training data. It may produce outputs that reflect societal or cultural biases. |
|
|
|
|
|
### Risks |
|
|
- Misuse of the model could lead to misinformation or harm. |
|
|
- Inaccurate responses in complex or domain-specific queries. |
|
|
|
|
|
### Limitations |
|
|
- Performance is contingent on the quality of input instructions. |
|
|
- Limited understanding of niche or highly technical domains. |
|
|
|
|
|
### Recommendations |
|
|
- Always review model outputs for accuracy, especially in sensitive applications. |
|
|
- Fine-tune or customize for domain-specific tasks to minimize risks. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
To use LLMLit, install the required libraries and load the model as follows: |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
# Load the model and tokenizer |
|
|
model = AutoModelForCausalLM.from_pretrained("llmlit/LLMLit-0.2-8B-Instruct") |
|
|
tokenizer = AutoTokenizer.from_pretrained("llmlit/LLMLit-0.2-8B-Instruct") |
|
|
|
|
|
# Generate text |
|
|
inputs = tokenizer("Your prompt here", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=100) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
LLMLit is fine-tuned on a diverse dataset containing bilingual (English and Romanian) content, ensuring both linguistic accuracy and cultural relevance. |
|
|
|
|
|
### Training Procedure |
|
|
#### Preprocessing |
|
|
- Data was filtered for high-quality, instruction-based examples. |
|
|
- Augmentation techniques were used to balance linguistic domains. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
- **Training regime:** Mixed precision (fp16) |
|
|
- **Batch size:** 512 |
|
|
- **Epochs:** 3 |
|
|
- **Learning rate:** 2e-5 |
|
|
|
|
|
#### Speeds, Sizes, Times |
|
|
- **Checkpoint size:** ~16GB |
|
|
- **Training time:** Approx. 1 week on 8 A100 GPUs |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
#### Testing Data |
|
|
Evaluation was conducted on multilingual benchmarks, such as: |
|
|
- FLORES-101 (Translation accuracy) |
|
|
- HELM (Instruction-following capabilities) |
|
|
|
|
|
#### Factors |
|
|
Evaluation considered: |
|
|
- Linguistic fluency |
|
|
- Instruction adherence |
|
|
- Contextual understanding |
|
|
|
|
|
#### Metrics |
|
|
- BLEU for translation tasks |
|
|
- ROUGE-L for summarization |
|
|
- Human evaluation scores for instruction tasks |
|
|
|
|
|
### Results |
|
|
LLMLit achieves state-of-the-art performance on instruction-following tasks for English and Romanian, with BLEU scores surpassing comparable models. |
|
|
|
|
|
#### Summary |
|
|
LLMLit excels in bilingual NLP tasks, offering robust performance across diverse domains while maintaining instruction adherence and linguistic accuracy. |
|
|
|
|
|
## Model Examination |
|
|
Efforts to interpret the model include: |
|
|
- Attention visualization |
|
|
- Prompt engineering guides |
|
|
- Bias audits |
|
|
|
|
|
## Environmental Impact |
|
|
Training LLMLit resulted in estimated emissions of ~200 kg CO2eq. Carbon offsets were purchased to mitigate environmental impact. Future optimizations aim to reduce energy consumption. |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
--- |