JetBrains
/

Mellum-4b-sft-python-gguf

Model card Files Files and versions

topshik commited on Apr 30, 2025

Commit

1ff5e29

·

verified ·

1 Parent(s): 3f7fb2a

Update README.md

Files changed (1) hide show

README.md +0 -73

README.md CHANGED Viewed

@@ -164,79 +164,6 @@ Designed for integration into professional developer tooling (e.g., intelligent
 - Biases: May reflect biases present in public codebases. For example it will likely produce code which is similar in style to the open-source repositories.
 - Security: Code suggestions should not be assumed to be secure or free of vulnerabilities.
-# Sample Usage
-Here are examples of how to run and sample from the model.
-## Generic generaion
-```python
-import json
-from transformers import AutoTokenizer, AutoModelForCausalLM
-example = """
-import sys
-import os
-import time
-sys.path.append(os.getcwd())
-from cluster.prepare_data import get_headers_pairs_list, write_dist_matrix
-from cluster.token_edit_distance import get_distance_matrix
-if len(sys.argv) < 3:
-    print(
-        "Too few arguments. You should provide: \n1. dataset_filename" +
-        "\n2. output_data_filename"
-    )
-    sys.exit()
-start = time.perf_counter()
-dataset_filename_ = sys.argv[1]
-output_data_filename_ = sys.argv[2]
-headers_pairs = get_headers_pairs_list(dataset_filename_, verbose=True)
-dist_matrix, max_dist = get_distance_matrix(
-    list(map(lambda x: x[1], headers_pairs)),
-    verbose=True
-)
-write_dist_matrix(dist_matrix, max_dist, output_data_filename_, verbose=True)
-end = time.perf_counter()
-"""
-tokenizer = AutoTokenizer.from_pretrained('JetBrains/Mellum-4b-base')
-model = AutoModelForCausalLM.from_pretrained('JetBrains/Mellum-4b-base')
-encoded_input = tokenizer(example, return_tensors='pt', return_token_type_ids=False)
-input_len = len(encoded_input["input_ids"][0])
-out = model.generate(
-    **encoded_input,
-    max_new_tokens=100,
-)
-print("### Context")
-print(tokenizer.decode(out[0][:input_len]))
-print("### Prediction")
-print(tokenizer.decode(out[0][input_len:]))
-```
-## Fill in the middle generation
-```python
-prefix = """
-def fibonacci(n: int) -> int:
-"""
-suffix = """
-if __name__ == "__main__":
-    print(fibonacci(10))
-"""
-encoded_input = tokenizer(f"<fim_suffix>{suffix}<fim_prefix>{prefix}<fim_middle>", return_tensors='pt', return_token_type_ids=False)
-out = model.generate(
-    **encoded_input,
-    max_new_tokens=100,
-)
-```
 # Citation
 If you use this model, please cite:

 - Biases: May reflect biases present in public codebases. For example it will likely produce code which is similar in style to the open-source repositories.
 - Security: Code suggestions should not be assumed to be secure or free of vulnerabilities.
 # Citation
 If you use this model, please cite: