Upload NBR-1B: Brazilian Portuguese 1.13B model

Files changed (5) hide show

README.md ADDED Viewed

+---
+license: apache-2.0
+language:
+- pt
+library_name: transformers
+tags:
+- portuguese
+- brazilian
+- llama
+- causal-lm
+- text-generation
+datasets:
+- uonlp/CulturaX
+- HuggingFaceFW/fineweb-2
+- eduagarcia/cc_news_pt_v2
+pipeline_tag: text-generation
+---
+# NBR-1B: Brazilian Portuguese Language Model
+**NBR-1B** is a 1.13 billion parameter language model trained from scratch for Brazilian Portuguese.
+## Model Details
+| Attribute | Value |
+|-----------|-------|
+| **Parameters** | 1.13B |
+| **Architecture** | LLaMA-style (GQA, RMSNorm, SwiGLU, RoPE) |
+| **Hidden Size** | 2048 |
+| **Layers** | 24 |
+| **Attention Heads** | 16 |
+| **KV Heads** | 4 |
+| **Vocabulary** | 32,000 (BPE) |
+| **Context Length** | 2048 |
+| **Training Tokens** | 3.12B |
+| **Final Loss** | ~2.8 |
+## Training Data
+- CulturaX PT (40%)
+- FineWeb-2 PT (52%)
+- mC4 PT (5%)
+- CC-News PT v2 (2%)
+- Books PT (1%)
+## Usage
+This is a base model for text completion. Use with transformers library.
+## License
+Apache 2.0

config.json ADDED Viewed

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "model_type": "llama",
+  "vocab_size": 32000,
+  "hidden_size": 2048,
+  "intermediate_size": 5504,
+  "num_hidden_layers": 24,
+  "num_attention_heads": 16,
+  "num_key_value_heads": 4,
+  "max_position_embeddings": 2048,
+  "rms_norm_eps": 1e-05,
+  "rope_theta": 10000.0,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.40.0"
+}

pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7a40eff6f949ff68369baeaf3f33adcaec6621c6352ecee5e426a7db74d6b61
+size 4515653991

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

+{
+  "model_type": "llama",
+  "vocab_size": 32000,
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>",
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 3,
+  "unk_token_id": 0
+}