--- tags: - deepsparse --- ## Usage ```python from deepsparse import TextGeneration prompt = "How to get in a good university?" formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n" model = TextGeneration(model="hf:neuralmagic/TinyLlama-1.1B-Chat-v0.3-pruned50-quant-ds") print(model(formatted_prompt, max_new_tokens=200).generations[0].text) """ Getting into a good university is a complex process that involves factors such as academic performance, financial aid, and personal qualifications. Here are some steps you can follow to get in a good university: 1. Academic performance: - Look for a university that has a strong academic program, including a well-rounded curriculum that covers a wide range of subjects. - Check if the university offers a clear curriculum that includes a clear sequence of courses. - Check if the university offers a clear pathway to graduation, including clear dates and deadlines. 2. Financial aid: - Look for a university that offers financial aid, such as scholarships, grants, or loans. - Check if the university offers financial aid that fits your budget. - Consider the university's financial aid package, including the cost of tuition, room and board, and other expenses. """ ``` ## One-shot and Export ``` git clone https://github.com/neuralmagic/sparseml pip install -e "sparseml[transformers]" "torch<2" python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py PY007/TinyLlama-1.1B-Chat-v0.3 open_platypus --recipe recipe.yaml --save True python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment --sequence_length 512 cp deployment/model.onnx deployment/model-orig.onnx python onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx ``` `recipe.yaml` ``` test_stage: obcq_modifiers: SparseGPTModifier: sparsity: 0.5 block_size: 128 sequential_update: false quantize: QuantizationModifier: ignore: - LlamaRotaryEmbedding - LlamaRMSNorm - SiLUActivation - model.layers.21.mlp.down_proj - model.layers.7.mlp.down_proj - model.layers.2.mlp.down_proj - model.layers.20.mlp.down_proj - model.layers.19.mlp.down_proj post_oneshot_calibration: false scheme_overrides: Embedding: input_activations: null weights: num_bits: 8 symmetric: false percdamp: 0.01 prunen: 0 prunem: 0 targets: - model.layers.0 - model.layers.1 - model.layers.2 - model.layers.3 - model.layers.4 - model.layers.5 - model.layers.6 - model.layers.7 - model.layers.8 - model.layers.9 - model.layers.10 - model.layers.11 - model.layers.12 - model.layers.13 - model.layers.14 - model.layers.15 - model.layers.16 - model.layers.17 - model.layers.18 - model.layers.19 - model.layers.20 - model.layers.21 target_ids: - attention_mask - position_ids ```