m4vic commited on
Commit
e1c5caa
·
verified ·
1 Parent(s): a618341

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -34
README.md CHANGED
@@ -6,61 +6,78 @@ sdk: static
6
  pinned: false
7
  ---
8
 
9
- # Neuralchemy
10
 
11
- **AI Security Prompt Defense LLM Safety**
12
 
13
- Building secure, reliable AI systems focused on prompt security, adversarial robustness, and practical safety tooling.
 
 
14
 
15
  ---
16
 
17
- ## Featured Project — PromptShield & Threat Matrix
18
- A comprehensive prompt injection and adversarial intent detection framework, classifying malicious jailbreak patterns across real-world and massive synthetic attack typologies.
19
 
20
- ### Core Resources
 
 
 
21
 
22
- * **SOTA Datasets:**
23
- [neuralchemy/prompt-injection-Threat-Matrix](https://huggingface.co/datasets/neuralchemy/prompt-injection-Threat-Matrix)
24
- A highly curated, leakage-free classification dataset mapping 32,000+ entries across a 5-dimensional security ontology (Intent, Technique, Severity).
 
 
25
 
26
- [neuralchemy/prompt-injection-dataset](https://huggingface.co/datasets/neuralchemy/prompt-injection-dataset)
27
- 6000+ prompt injection and benign samples collected from realistic attack scenarios.
28
-
29
- * **DeBERTa Fine-Tuned Model:** [neuralchemy/prompt-injection-deberta](https://huggingface.co/neuralchemy/prompt-injection-deberta)
30
- Transformer-based prompt injection classifier.
31
 
32
- * **DistilBERT Base Model:** [neuralchemy/distilbert-base-threat-matrix](https://huggingface.co/neuralchemy/distilbert-base-threat-matrix)
33
- A 99.4% F1-scoring Transformer defense gateway, optimized for high-speed, accurate prompt intent gating.
34
 
35
- * **Classical ML Models:** [neuralchemy/prompt-injection-detector](https://huggingface.co/neuralchemy/prompt-injection-detector)
36
- Ultra-lightweight machine learning classifiers (RF, LR) for legacy/offline prompt risk detection.
 
 
37
 
38
- * **Live Demo Space:** [Prompt-injection-DeBERTa](https://huggingface.co/spaces/neuralchemy/Prompt-injection-DeBERTa)
39
- Interactive inference demo for prompt safety classification.
 
 
40
 
41
  ---
42
 
43
- ## Research & Architecture
44
 
45
- * **AI In The Loop (AITL):**
46
- Pioneering an inherently secure, multi-agent orchestration loop designed strictly to mitigate Prompt Injection (PI) bypass methodologies, enforce JSON-structured constraints, and evaluate autonomous systemic risks.
47
- https://zenodo.org/records/19551173
48
-
49
- **The Autonomous Sunk-Cost Fallacy: Stopping Failures and Meta-Reasoning in LLMs Deployed within the Autonomous Empirical Optimization System (AEOS)**
50
- https://zenodo.org/records/19846960
51
- ---
52
 
53
- ## Mission
 
 
54
 
55
- Advancing AI security through enterprise open-source datasets, robust model deployment, and adversarial safety research.
 
 
 
56
 
57
  ---
58
 
59
- ## Connectivity
60
-
61
- * **Website:** https://www.neuralchemy.in
62
 
 
 
63
 
64
  ---
65
 
66
- *Building safer AI systems through open security research.* 🚀
 
 
 
 
 
 
 
 
 
 
 
6
  pinned: false
7
  ---
8
 
9
+ # Neuralchemy Research
10
 
11
+ **AI Security · Autonomous Systems · LLM Safety**
12
 
13
+ Independent research lab building open datasets,
14
+ models, and frameworks for LLM security and
15
+ autonomous evaluation.
16
 
17
  ---
18
 
19
+ ## Research Papers
 
20
 
21
+ **AI In The Loop (AITL): A Systems Taxonomy
22
+ for Closed-Loop Autonomous Evaluation**
23
+ Sanskar Jajoo · Neuralchemy Labs · 2026
24
+ [zenodo.org/records/19551173](https://zenodo.org/records/19551173)
25
 
26
+ **The Autonomous Sunk-Cost Fallacy: Stopping
27
+ Failures and Meta-Reasoning in LLMs Deployed
28
+ within AEOS**
29
+ Sanskar Jajoo · Neuralchemy Labs · 2026
30
+ [zenodo.org/records/19846960](https://zenodo.org/records/19846960)
31
 
32
+ ---
 
 
 
 
33
 
34
+ ## Datasets
 
35
 
36
+ **Prompt Injection Threat Matrix**
37
+ 32,320 samples · 7 intent classes ·
38
+ 10 severity levels · Full threat schema
39
+ [View Dataset](https://huggingface.co/datasets/neuralchemy/prompt-injection-Threat-Matrix)
40
 
41
+ **Prompt Injection Dataset**
42
+ 6,000+ samples · Benign vs malicious ·
43
+ Real-world attack scenarios
44
+ [View Dataset](https://huggingface.co/datasets/neuralchemy/prompt-injection-dataset)
45
 
46
  ---
47
 
48
+ ## Models
49
 
50
+ **DistilBERT Threat Matrix Classifier**
51
+ 99.4% F1 · Prompt intent classification ·
52
+ High-speed inference
53
+ [View Model](https://huggingface.co/neuralchemy/distilbert-base-threat-matrix)
 
 
 
54
 
55
+ **DeBERTa Prompt Injection Classifier**
56
+ Transformer-based injection detection
57
+ [View Model](https://huggingface.co/neuralchemy/prompt-injection-deberta)
58
 
59
+ **Classical ML Detector**
60
+ Lightweight RF/LR classifiers for
61
+ offline/legacy deployment
62
+ [View Model](https://huggingface.co/neuralchemy/prompt-injection-detector)
63
 
64
  ---
65
 
66
+ ## Live Demo
 
 
67
 
68
+ Try our prompt injection classifier:
69
+ [Prompt-injection-DeBERTa Space](https://huggingface.co/spaces/neuralchemy/Prompt-injection-DeBERTa)
70
 
71
  ---
72
 
73
+ ## About
74
+
75
+ NeuralAlchemy is an independent AI security
76
+ research lab based in India. We build open
77
+ datasets, train security models, and publish
78
+ research on LLM behavioral failures and
79
+ autonomous evaluation systems.
80
+
81
+ **Website:** neuralchemy.in
82
+ **GitHub:** github.com/m4vic
83
+ **Contact:** Via GitHub or neuralchemy.in