--- language: - en tags: - regression - similarity - sql - natural-language - reward-model license: mit datasets: - custom metrics: - mse - mae - rmse model-index: - name: BERT Reward Model for CoT Filtering results: - task: type: regression name: Similarity Score Prediction dataset: name: Custom CoT Dataset type: custom metrics: - type: mse value: 0.0238 - type: mae value: 0.1229 - type: rmse value: 0.1543 --- # BERT Reward Model for CoT Filtering A BERT-based regression model fine-tuned to predict similarity scores between SQL queries, reasoning chains (Chain-of-Thought), and natural language descriptions. ## Model Description This model is based on `bert-base-uncased` and has been fine-tuned for regression to predict similarity scores in the range [0, 1]. The model takes as input a concatenation of: - SQL query - Reasoning/Chain-of-Thought explanation - Predicted natural language description And outputs a similarity score indicating how well the predicted NL matches the ground truth. ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("DarianNLP/bert_sequel_beagles") model = AutoModelForSequenceClassification.from_pretrained( "DarianNLP/bert_sequel_beagles", num_labels=1, problem_type="regression" ) model.eval() # Prepare input sql = "SELECT movie_title FROM movies WHERE movie_release_year = 1945" reasoning = "think: The SQL selects the movie title..." predicted_nl = "What was the most popular movie released in 1945?" input_text = f"SQL: {sql}\nReasoning: {reasoning}\nNL: {predicted_nl}" # Tokenize and predict inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) # Apply sigmoid to get probability similarity_score = torch.sigmoid(outputs.logits).item() print(f"Predicted similarity: {similarity_score:.3f}") ``` ## Training Details - **Base Model**: bert-base-uncased - **Training Dataset**: Custom CoT dataset with corruptions (7,342 examples) - **Train/Val/Test Split**: 75% / 12.5% / 12.5% - **Training Loss**: MSE (Mean Squared Error) - **Evaluation Metrics**: - MSE: 0.0238 - MAE: 0.1229 - RMSE: 0.1543 ## Limitations - Maximum input length: 512 tokens (BERT's limit) - Trained on a specific domain (SQL to NL translation with CoT) - Performance may vary on out-of-domain data ## Citation If you use this model, please cite: ```bibtex @misc{bert_cot_reward_model, title={BERT Reward Model for Chain-of-Thought Filtering}, author={Darian Lee}, year={2025}, } ```