RamzyBakir
/

CySent-SmolLM3-3B

@@ -1,199 +1,126 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+license: apache-2.0
+base_model: HuggingFaceTB/SmolLM3-3B
+metrics:
+- accuracy
+- Training Loss
+- Validation Loss
+datasets:
+- Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset
+pipeline_tag: text-generation
+tags:
+- cybersecurity
+- instruction-tuning
+- security
+- smolm
+- lora
 ---
+# CySent-SmolLM3-3B
+<p align="center">
+    <img src="https://www.cysent.org/_next/image?url=%2Fimages%2FCySent.png&w=384&q=100" width="400"/>
+<p>
+CySent-SmolLM3-3B is a fine-tuned version of [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B), specifically adapted for cybersecurity instruction-following tasks. It was trained on a 20,000-sample subset of the [Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset](https://huggingface.co/datasets/Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset). This model aims to act as a knowledgeable assistant for a wide range of cybersecurity topics.
+It achieves the following results on the evaluation set:
+- **Loss:** 0.757
+- **Mean Token Accuracy:** 0.796
+### Intended uses
+This model is designed to assist with a variety of natural language cybersecurity tasks, including:
+-   Answering technical questions about security concepts.
+-   Explaining vulnerabilities, attack vectors, and defense mechanisms.
+-   Generating simple security-related scripts or commands (e.g., for network analysis or pentesting).
+-   Summarizing security logs, reports, or articles.
+-   Assisting in educational settings for cybersecurity students and professionals.
+It is intended as a **co-pilot or assistant** and not as a standalone, automated security tool.
+### Limitations
+-   **Not for Real-Time Threat Detection:** This model is not designed for or capable of real-time intrusion detection or automated threat response.
+-   **Potential for Hallucination:** Like all language models, it may generate incorrect, outdated, or completely fabricated information. Always verify critical information from authoritative sources.
+-   **Inherited Biases:** The model may inherit biases and limitations from its base model (SmolLM3-3B) and the fine-tuning dataset.
+-   **Knowledge Cutoff:** The model's knowledge is limited to the data it was trained on and may not be aware of the very latest vulnerabilities or security trends.
+-   **Misuse Potential:** The model could potentially be used to generate malicious code or instructions for harmful purposes. Please use it responsibly and ethically.
+## How to use
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "RamzyBakir/CySent-SmolLM3-3B"
+# Load the model and tokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Create a prompt
+prompt = "### Instruction:\nExplain what a SQL injection attack is and provide a simple example of a vulnerable code snippet.\n\n### Response:\n"
+# Generate a response
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+output = model.generate(**inputs, max_new_tokens=250, do_sample=True, temperature=0.7, top_p=0.9)
+# Decode and print the result
+response = tokenizer.decode(output[0], skip_special_tokens=True)
+print(response)
+```
+## Training procedure
+### Training hyperparameters
+The model was fine-tuned using Low-Rank Adaptation (LoRA) with the following configuration:
+**SFTConfig:**
+-   `max_length`: 2048
+-   `per_device_train_batch_size`: 8
+-   `gradient_accumulation_steps`: 2
+-   `learning_rate`: 1e-4
+-   `num_train_epochs`: 3
+-   `warmup_ratio`: 0.1
+-   `weight_decay`: 0.01
+-   `optim`: adamw_torch
+-   `bf16`: True
+-   `eval_strategy`: steps
+-   `eval_steps`: 200
+-   `save_steps`: 200
+-   `metric_for_best_model`: eval_loss
+**LoraConfig:**
+-   `r`: 16
+-   `lora_alpha`: 32
+-   `lora_dropout`: 0.05
+-   `task_type`: CAUSAL_LM
+-   `target_modules`: ["q_proj", "k_proj", "v_proj", "o_proj"]
+### Training results
+The model was trained for 3200 steps on a single H200 GPU. The training and validation metrics progressed as follows:
+| Step | Training Loss | Validation Loss | Entropy  | Num Tokens      | Mean Token Accuracy |
+|------|---------------|-----------------|----------|-----------------|---------------------|
+| 200  | 1.111500      | 1.045437        | 1.002200 | 2,182,437.00    | 0.740981            |
+| 400  | 0.975900      | 0.944684        | 0.917857 | 4,368,626.00    | 0.759094            |
+| 800  | 0.863500      | 0.860705        | 0.862549 | 8,721,104.00    | 0.775031            |
+| 1200 | 0.834900      | 0.816342        | 0.849365 | 13,096,717.00   | 0.784405            |
+| 1600 | 0.792200      | 0.794083        | 0.802182 | 17,452,772.00   | 0.788403            |
+| 2000 | 0.777900      | 0.779576        | 0.790627 | 21,807,624.00   | 0.791107            |
+| 2400 | 0.749800      | 0.771720        | 0.761689 | 26,151,814.00   | 0.792799            |
+| 2800 | 0.747800      | 0.762957        | 0.761588 | 30,504,962.00   | 0.794528            |
+| 3200 | 0.735800      | 0.757395        | 0.757575 | 34,860,059.00   | 0.795802            |
+The model achieved its best performance at the final step, with a validation loss of **0.757** and a mean token accuracy of **0.796**.
+---