Instructions to use ccore/LLAMA2-446m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ccore/LLAMA2-446m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ccore/LLAMA2-446m")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ccore/LLAMA2-446m") model = AutoModelForCausalLM.from_pretrained("ccore/LLAMA2-446m") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ccore/LLAMA2-446m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ccore/LLAMA2-446m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ccore/LLAMA2-446m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ccore/LLAMA2-446m
- SGLang
How to use ccore/LLAMA2-446m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ccore/LLAMA2-446m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ccore/LLAMA2-446m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ccore/LLAMA2-446m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ccore/LLAMA2-446m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ccore/LLAMA2-446m with Docker Model Runner:
docker model run hf.co/ccore/LLAMA2-446m
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,48 +11,40 @@ model-index:
|
|
| 11 |
|
| 12 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 13 |
should probably proofread and complete it, then remove this comment. -->
|
|
|
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
|
| 18 |
-
It achieves the following results on the evaluation set:
|
| 19 |
-
- Loss: 2.5063
|
| 20 |
-
- Accuracy: 0.4398
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
-
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
-
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
-
|
| 39 |
-
- learning_rate: 0.0005
|
| 40 |
-
- train_batch_size: 1
|
| 41 |
-
- eval_batch_size: 8
|
| 42 |
-
- seed: 42
|
| 43 |
-
- gradient_accumulation_steps: 32
|
| 44 |
-
- total_train_batch_size: 32
|
| 45 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 46 |
-
- lr_scheduler_type: linear
|
| 47 |
-
- num_epochs: 2.0
|
| 48 |
|
| 49 |
-
|
| 50 |
|
|
|
|
| 51 |
|
|
|
|
| 52 |
|
| 53 |
-
|
|
|
|
|
|
|
| 54 |
|
| 55 |
-
|
| 56 |
-
- Pytorch 2.0.1+cu117
|
| 57 |
-
- Datasets 2.14.5
|
| 58 |
-
- Tokenizers 0.13.3
|
|
|
|
| 11 |
|
| 12 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 13 |
should probably proofread and complete it, then remove this comment. -->
|
| 14 |
+
Certainly! Here's a Model Card for your Hugging Face model with the provided information in English:
|
| 15 |
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# Model Card: LLama 2 - Version 7b (Embedding + Output + 1 Hidden Layer)
|
| 19 |
|
| 20 |
+
## Overview
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
+
- **Link to Training Progress:** [WandB Training Progress](https://wandb.ai/inteligenciaartificialcursos/huggingface/runs/wjh9m9x1/workspace?workspace=user-inteligenciaartificialcursos)
|
| 23 |
|
| 24 |
+
- **Model Name:** LLama 2 - Version 7b
|
| 25 |
|
| 26 |
+
- **Total Parameters:** 446 million
|
| 27 |
|
| 28 |
+
## Training Data
|
| 29 |
|
| 30 |
+
The model has been trained with the following sequence of datasets:
|
| 31 |
|
| 32 |
+
1. **GPT-2 Data (In Progress):** The initial training phase involves GPT-2 data and is currently in the finalization stage.
|
| 33 |
|
| 34 |
+
2. **Wikipedia QA in Markdown (Next Stage):** The model's training will continue with Wikipedia question-answering data in Markdown format.
|
| 35 |
|
| 36 |
+
3. **QA with Rhetoric (Future Stages):** The model will further be fine-tuned with question-answering data generated from various LLama models, incorporating rhetorical elements.
|
| 37 |
|
| 38 |
+
## Model Description
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
+
The LLama 2 - Version 7b model is a powerful language model with a total of 446 million parameters. It utilizes embeddings, an output layer, and one hidden layer to perform a wide range of natural language processing tasks. The training is conducted in multiple stages, each focused on different datasets and objectives.
|
| 41 |
|
| 42 |
+
## Disclaimer
|
| 43 |
|
| 44 |
+
This model card provides an overview of the LLama 2 - Version 7b model, its training data, and intended use cases. Keep in mind that the model's performance may vary depending on the specific task or dataset. Users are encouraged to evaluate the model's suitability for their applications and exercise caution when using it in real-world scenarios.
|
| 45 |
|
| 46 |
+
For any further inquiries or issues related to this model, please contact the model developers through the provided training progress link.
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
|
| 50 |
+
Feel free to customize this Model Card further if you have additional details or specific use cases you'd like to highlight.
|
|
|
|
|
|
|
|
|