aws-neuron
/

CodeLlama-7b-hf-neuron-8xlarge

Text Generation

Model card Files Files and versions

jburtoft commited on Dec 29, 2023

Commit

7e3e96c

·

1 Parent(s): b1d557c

Update README.md

Files changed (1) hide show

README.md +66 -0

README.md CHANGED Viewed

@@ -1,3 +1,69 @@
 ---
 license: llama2
 ---

 ---
 license: llama2
+language:
+- en
+pipeline_tag: text-generation
+inference: false
+tags:
+- facebook
+- meta
+- pytorch
+- llama
+- llama-2
+- inferentia2
+- neuron
 ---
+# Neuronx model for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)
+This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf).
+You can find detailed information about the base model on its [Model Card](https://huggingface.co/codellama/CodeLlama-7b-hf).
+This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
+It has been compiled to run on an inf2.8xlarge instance on AWS.
+Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.
+## Usage on Amazon SageMaker
+_coming soon_
+## Usage with 🤗 `optimum-neuron`
+```python
+>>> from optimum.neuron import pipeline
+>>> p = pipeline('text-generation', 'jburtoft/CodeLlama-7b-hf-neuron-8xlarge')
+>>> p("import socket\n\ndef ping_exponential_backoff(host: str):",
+    do_sample=True,
+    top_k=10,
+    temperature=0.1,
+    top_p=0.95,
+    num_return_sequences=1,
+    max_length=200,
+)
+```
+```
+[{'generated_text': 'import socket\n\ndef ping_exponential_backoff(host: str):\n    """\n    Ping a host with exponential backoff.\n\n    :param host: Host to ping\n    :return: True if host is reachable, False otherwise\n    """\n    for i in range(1, 10):\n        try:\n            socket.create_connection((host, 80), 1).close()\n            return True\n        except OSError:\n            time.sleep(2 ** i)\n    return False\n\n\ndef ping_exponential_backoff_with_timeout(host: str, timeout: int):\n    """\n    Ping a host with exponential backoff and timeout.\n\n    :param host: Host to ping\n    :param timeout: Timeout in seconds\n    :return: True if host is reachable, False otherwise\n    """\n    for'}]
+```
+This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
+## Arguments passed during export
+**input_shapes**
+```json
+{
+  "batch_size": 1,
+  "sequence_length": 2048,
+}
+```
+**compiler_args**
+```json
+{
+  "auto_cast_type": "fp16",
+  "num_cores": 2,
+}
+```