AliMaatouk commited on
Commit
0913f8d
·
verified ·
1 Parent(s): d7eda7b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -3
README.md CHANGED
@@ -1,3 +1,88 @@
1
- ---
2
- license: llama3
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - nlp
8
+ ---
9
+
10
+ # LLama-3-8B-Tele Model Card
11
+
12
+ ## Model Summary
13
+
14
+ The language model LLama-3-8B-Tele is a Transformer with **8 billion** parameters, specialized in telecommunications. It is based on Meta [LLama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and was continutally pretrained on [Tele-Data](https://huggingface.co/datasets/AliMaatouk/Tele-Data), a large-scale dataset of approximately 2.5 billion tokens of telecommunications material, including articles, standards, and general web content related to the telecommunications domain.
15
+
16
+ When assessed against telecommunications benchmarks such as [Tele-Eval](https://huggingface.co/datasets/AliMaatouk/Tele-Eval), LLama-3-8B-Tele outperforms [LLama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) by several percentage points. Additionally, LLama-3-8B-Tele matches [LLama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) across benchmarks related to common sense, language understanding, and logical reasoning. Thus, this adaptation was achieved with minimal compromise in performance on the original version.
17
+
18
+ ### Context Length
19
+
20
+ The model was trained on a context length of 8192 tokens.
21
+
22
+ ## Usage
23
+
24
+ LLama-3-8B-Tele is a base model best suited for fine-tuning on applications related to telecommunications. It has not been fine-tuned to follow instructions and operates solely within a text completion framework. An example of this completion can be found below:
25
+
26
+ ```markdown
27
+ Prompt: Shannon capacity is
28
+
29
+ Model: the maximum rate at which information can be transmitted over a communication channel. It was named after Claude Shannon, who introduced it in his 1948 paper "A Mathematical Theory of Communication". Shannon capacity is also known as channel capacity, channel transmission capacity, or simply capacity.
30
+ ```
31
+
32
+ The instruct version of this model can be found by following the link [LLama-3-8B-Tele-it](https://huggingface.co/AliMaatouk/LLama-3-8B-Tele-it).
33
+
34
+ ## Sample Code
35
+
36
+ Below we share some code snippets on how to get quickly started with running the model. First, make sure to `pip install transformers`, then copy the snippet corresponding to your hardware and adapt it to your usecase.
37
+
38
+ #### Running the model on a CPU
39
+
40
+
41
+ ```python
42
+ from transformers import AutoTokenizer, AutoModelForCausalLM
43
+
44
+ model = AutoModelForCausalLM.from_pretrained("AliMaatouk/LLama-3-8B-Tele", torch_dtype="auto")
45
+ tokenizer = AutoTokenizer.from_pretrained("AliMaatouk/LLama-3-8B-Tele")
46
+
47
+ prompt = "Shannon capacity is"
48
+ input_ids = tokenizer(prompt, return_tensors="pt")
49
+ outputs = model.generate(**input_ids, max_new_tokens=100)
50
+
51
+ generated_tokens = outputs[0, len(input_ids['input_ids'][0]):]
52
+ response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
53
+ print(response)
54
+ ```
55
+
56
+ #### Running the model on a single / multi GPU
57
+
58
+ ```python
59
+ import torch
60
+ from transformers import AutoModelForCausalLM, AutoTokenizer
61
+
62
+ model = AutoModelForCausalLM.from_pretrained("AliMaatouk/LLama-3-8B-Tele", torch_dtype="auto", device_map="auto")
63
+ tokenizer = AutoTokenizer.from_pretrained("AliMaatouk/LLama-3-8B-Tele")
64
+
65
+ prompt = "Shannon capacity is"
66
+ input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")
67
+ outputs = model.generate(**input_ids, max_new_tokens=100)
68
+
69
+ generated_tokens = outputs[0, len(input_ids['input_ids'][0]):]
70
+ response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
71
+ print(response)
72
+ ```
73
+
74
+ ## Citation
75
+
76
+ You can find the paper with all details about the model at https://arxiv.org/abs/2409.05314. Please cite it as follows:
77
+
78
+ ```bib
79
+ @misc{maatouk2024telellmsseriesspecializedlarge,
80
+ title={Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications},
81
+ author={Ali Maatouk and Kenny Chirino Ampudia and Rex Ying and Leandros Tassiulas},
82
+ year={2024},
83
+ eprint={2409.05314},
84
+ archivePrefix={arXiv},
85
+ primaryClass={cs.IT},
86
+ url={https://arxiv.org/abs/2409.05314},
87
+ }
88
+ ```