Aman0runanywhere commited on
Commit
be78005
·
verified ·
1 Parent(s): 135e8c9

Upload 5 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Llama-3.2-3B-Instruct-4bit/tokenizer.json filter=lfs diff=lfs merge=lfs -text
Llama-3.2-3B-Instruct-4bit/benchmark_task.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "greedy": false,
3
+ "identifier": "llama-3.2-3b-instruct-4bit",
4
+ "messages": [
5
+ {
6
+ "content": "Summarize user's input",
7
+ "role": "system"
8
+ },
9
+ {
10
+ "content": "Large language models, commonly referred to as LLMs, are a class of artificial intelligence systems designed to understand, generate, and manipulate human language at scale. They are built using deep learning techniques, most notably transformer architectures, and are trained on vast collections of text data drawn from books, articles, websites, and other written sources. Through this training process, LLMs learn statistical patterns in language, allowing them to predict likely sequences of words and produce coherent, contextually appropriate responses to prompts. At their core, LLMs operate by representing words or subword units as numerical vectors in a high-dimensional space. These representations capture semantic and syntactic relationships, such that words with similar meanings or grammatical roles tend to have similar vector representations. The transformer architecture enables the model to process entire sequences of text simultaneously rather than sequentially, using mechanisms such as self-attention to determine which parts of the input are most relevant at any given moment. This allows LLMs to handle long-range dependencies in text, such as references made many sentences earlier, more effectively than earlier generations of language models. Training an LLM typically involves two main phases. The first is pretraining, during which the model learns general language patterns by predicting missing or next tokens in large, mostly uncurated text corpora. This phase gives the model broad linguistic competence and general world knowledge as reflected in its training data. The second phase often involves fine-tuning, where the model is further trained on more specific datasets, such as question-answer pairs, instructional content, or conversational examples. Fine-tuning helps align the model\u2019s behavior with particular tasks or desired styles of interaction. One of the most notable characteristics of LLMs is their versatility. A single model can perform a wide range of language-related tasks, including text generation, summarization, translation, classification, question answering, and code generation, often without task-specific retraining. This flexibility arises from the model\u2019s general-purpose training and its ability to condition its outputs on the instructions or context provided in the prompt. As a result, LLMs are increasingly used as foundational models that can be adapted to many applications across different domains. Despite their impressive capabilities, LLMs do not possess true understanding or consciousness. They do not have beliefs, intentions, or awareness in the human sense. Instead, they generate responses based on learned correlations in data. This limitation can lead to errors such as producing confident-sounding but incorrect information, sometimes referred to as hallucinations. Because LLMs rely on patterns in their training data, they may also reflect biases, inaccuracies, or gaps present in that data. Addressing these issues is an ongoing area of research and development. The computational cost of training and running LLMs is another significant consideration. Training state-of-the-art models can require enormous amounts of computing power, energy, and financial investment. This has implications for environmental sustainability and for who can realistically develop and deploy such models. As a result, there is growing interest in techniques that improve efficiency, such as model compression, distillation, sparse architectures, and more efficient training algorithms, as well as in smaller models that can perform well on specific tasks. LLMs also raise important ethical and social questions. Their ability to generate human-like text can be beneficial in areas such as education, accessibility, and productivity, but it can also be misused for purposes like misinformation, plagiarism, or automated spam. Ensuring responsible use involves a combination of technical safeguards, policy decisions, and user education. Researchers and developers are actively exploring methods for making LLMs more transparent, controllable, and aligned with human values. As research continues, LLMs are likely to become more capable, more efficient, and more integrated into everyday tools and workflows. Future developments may involve better reasoning abilities, improved factual reliability, stronger multimodal integration with images, audio, and video, and more personalized interactions that adapt to individual users while respecting privacy. While LLMs are not a replacement for human judgment or creativity, they represent a powerful technology that, when used thoughtfully, can augment human capabilities and transform how people interact with information and with machines.",
11
+ "role": "user"
12
+ }
13
+ ],
14
+ "number_of_runs": 15,
15
+ "repo_id": "mlx-community/Llama-3.2-3B-Instruct-4bit",
16
+ "tokens_limit": 512
17
+ }
Llama-3.2-3B-Instruct-4bit/config.json ADDED
The diff for this file is too large to render. See raw diff
 
Llama-3.2-3B-Instruct-4bit/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c661c554eacc950bbc774bbed93e66d489fa42767d3874a0c94b4f8c7ce3a802
3
+ size 1874584848
Llama-3.2-3B-Instruct-4bit/speculators/chat/model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f25ed0a0f359412c74d1e6dabe082a30dca9674810ceb5a2322ba2866749332
3
+ size 33816592
Llama-3.2-3B-Instruct-4bit/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b
3
+ size 17209920