Upload BuilderBrain small model

Browse files

Files changed (9) hide show

README.md +69 -0
model/config.json +52 -0
model/generation_config.json +9 -0
model/pytorch_model.bin +3 -0
model/pytorch_model.bin.index.json +10 -0
model/special_tokens_map.json +6 -0
model/tokenizer/merges.txt +1 -0
model/tokenizer/tokenizer_config.json +10 -0
model/tokenizer/vocab.json +1 -0

README.md ADDED Viewed

	@@ -0,0 +1,69 @@

+---
+language: en
+license: apache-2.0
+tags:
+- builderbrain
+- compositional-ai
+- grammar-constrained
+- pytorch
+- transformers
+model-index:
+- name: builderbrain-small
+  results: []
+---
+# BuilderBrain Small Model
+BuilderBrain is a dual-rail compositional AI system that extends pretrained transformers with learned composition blocks, grammar constraints, and executable plans.
+## Model Description
+This is a small scale BuilderBrain model trained for compositional reasoning tasks.
+### Architecture
+- **Base Model**: GPT-2 based transformer
+- **Builder Rail**: Additional composition layer with discrete program skills
+- **Grammar Constraints**: CFG/PEG parsing for structured outputs
+- **Plan Validation**: DAG-based plan execution with precondition checking
+- **Multi-objective Training**: Lagrangian optimization with constraint satisfaction
+### Training
+- **Dataset**: Compositional reasoning tasks
+- **Loss Functions**: Multi-objective with grammar, plan, and reuse constraints
+- **Training Steps**: 50 epochs
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("builderbrain_small_1759327754")
+model = AutoModelForCausalLM.from_pretrained("builderbrain_small_1759327754")
+# Grammar-constrained generation
+input_text = "Generate a JSON API call"
+inputs = tokenizer(input_text, return_tensors="pt")
+# Generate with grammar constraints (implementation specific)
+outputs = model.generate(**inputs, max_length=150)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+## Limitations
+- This is a mock export for demonstration purposes
+- In production, models would be trained on domain-specific datasets
+- Grammar constraints and plan validation would be fully implemented
+## Citation
+```bibtex
+@misc{builderbrain_small,
+  title={BuilderBrain: Dual-Rail Compositional AI System},
+  author={BuilderBrain Team},
+  year={2024},
+  url={https://github.com/JacobFV/builderbrain}
+}
+```

model/config.json ADDED Viewed

	@@ -0,0 +1,52 @@

+{
+  "model_type": "builderbrain",
+  "scale": "small",
+  "builderbrain_version": "1.0.0",
+  "model": {
+    "type": "gpt2",
+    "name": "gpt2",
+    "hidden_size": 768,
+    "num_layers": 4,
+    "num_programs": 16,
+    "alpha_cap": 0.1
+  },
+  "constraints": {
+    "grammar": {
+      "enabled": true,
+      "target": 0.0,
+      "normalizer": "rank"
+    },
+    "graph2graph": {
+      "enabled": true,
+      "target": 0.2,
+      "normalizer": "rank"
+    },
+    "buildability": {
+      "enabled": true,
+      "target": 0.0,
+      "normalizer": "winsor"
+    },
+    "reuse": {
+      "enabled": true,
+      "target": 0.5,
+      "normalizer": "rank"
+    }
+  },
+  "training": {
+    "batch_size": 8,
+    "learning_rate": "5e-4",
+    "eta_lambda": "1e-2",
+    "lambda_max": 20.0,
+    "num_epochs": 50,
+    "save_every": 10
+  },
+  "data": {
+    "max_length": 512,
+    "vocab_size": 50257
+  },
+  "runtime": {
+    "max_generation_length": 100,
+    "temperature": 0.8,
+    "use_grammar_mask": true
+  }
+}

model/generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "max_new_tokens": 150,
+  "temperature": 0.7,
+  "top_p": 0.9,
+  "do_sample": true,
+  "pad_token_id": 50256,
+  "bos_token_id": 50256,
+  "eos_token_id": 50256
+}

model/pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c532b929bb974691b379361fc095c803a9e50cb5052abe694ed5238e9bbf19d7
+size 18

model/pytorch_model.bin.index.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "metadata": {
+    "total_size": 18
+  },
+  "weight_map": {
+    "model.embed_tokens.weight": "pytorch_model.bin",
+    "model.layers.0.weight": "pytorch_model.bin",
+    "lm_head.weight": "pytorch_model.bin"
+  }
+}

model/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "eos_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>",
+  "pad_token": "<|endoftext|>",
+  "additional_special_tokens": []
+}

model/tokenizer/merges.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Mock merges file

model/tokenizer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "tokenizer_class": "GPT2Tokenizer",
+  "model_max_length": 1024,
+  "padding_side": "right",
+  "truncation_side": "right",
+  "pad_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "bos_token": "<|endoftext|>"
+}

model/tokenizer/vocab.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"mock": "vocabulary"}