|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- Jarrodbarnes/cortex-1-market-analysis |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- microsoft/Phi-4-mini-instruct |
|
|
tags: |
|
|
- finance |
|
|
- crypto |
|
|
- phi-4 |
|
|
- reasoning |
|
|
- GRPO |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# NEAR Cortex-1-mini |
|
|
|
|
|
This model is a fine-tuned version of Microsoft's [Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) (3.8B parameters), specialized for blockchain market analysis with explicit reasoning capabilities. It's designed to analyze on-chain data, identify patterns and anomalies, and provide actionable insights with transparent reasoning processes. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
The model has been fine-tuned on the [Cortex-1 Market Analysis dataset](https://huggingface.co/datasets/Jarrodbarnes/cortex-1-market-analysis) to: |
|
|
|
|
|
- Break down complex market data into structured components |
|
|
- Perform numerical calculations and identify correlations |
|
|
- Recognize patterns across multiple metrics |
|
|
- Separate detailed reasoning (using `<thinking>` tags) from concise summaries |
|
|
- Provide actionable insights with specific price targets |
|
|
|
|
|
This model is part of the [NEAR Cortex-1](https://github.com/jbarnes850/cortex-1) initiative, which aims to create AI models that can analyze blockchain data with transparent reasoning processes. |
|
|
|
|
|
## Usage |
|
|
|
|
|
The model is designed to analyze blockchain market data and provide both detailed reasoning and concise conclusions. It uses `<thinking>` tags to separate its reasoning process from its final analysis. |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "Jarrodbarnes/cortex-1-mini" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Example prompt |
|
|
prompt = """Please analyze this market data and show your reasoning: |
|
|
|
|
|
Given the following Ethereum market data: |
|
|
- Daily Transactions: 1.5M (up 8% from average) |
|
|
- Current Price: $3,450 |
|
|
- Exchange Outflows: 52K ETH (up 20%)""" |
|
|
|
|
|
# Generate response |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate( |
|
|
inputs["input_ids"], |
|
|
max_new_tokens=512, |
|
|
temperature=0.7, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
# Print response |
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### Post-Processing for Thinking Tags |
|
|
|
|
|
The model sometimes has issues with the proper formatting of `<thinking>` tags. We recommend implementing the following post-processing function: |
|
|
|
|
|
```python |
|
|
def clean_thinking_tags(text, prompt): |
|
|
""" |
|
|
Clean up thinking tags in the response. |
|
|
|
|
|
Args: |
|
|
text: Raw model response |
|
|
prompt: Original prompt |
|
|
|
|
|
Returns: |
|
|
Cleaned response with proper thinking tags |
|
|
""" |
|
|
# Extract content after the prompt |
|
|
if prompt in text: |
|
|
text = text[len(prompt):].strip() |
|
|
|
|
|
# Handle case where model repeats <thinking> tags |
|
|
thinking_tag_count = text.count("<thinking>") |
|
|
if thinking_tag_count > 1: |
|
|
# Keep only the first <thinking> tag |
|
|
first_tag_pos = text.find("<thinking>") |
|
|
text_after_first_tag = text[first_tag_pos:] |
|
|
|
|
|
# Replace subsequent <thinking> tags with newlines |
|
|
modified_text = text_after_first_tag |
|
|
for i in range(thinking_tag_count - 1): |
|
|
modified_text = modified_text.replace("<thinking>", "\n", 1) |
|
|
|
|
|
text = text[:first_tag_pos] + modified_text |
|
|
|
|
|
# Ensure there's a </thinking> tag if there's a <thinking> tag |
|
|
if "<thinking>" in text and "</thinking>" not in text: |
|
|
# Add </thinking> before what looks like a conclusion |
|
|
conclusion_markers = ["In conclusion", "To summarize", "Overall", |
|
|
"Final analysis", "Therefore", "Based on this analysis"] |
|
|
for marker in conclusion_markers: |
|
|
if marker in text: |
|
|
parts = text.split(marker, 1) |
|
|
text = parts[0] + "</thinking>\n\n" + marker + parts[1] |
|
|
break |
|
|
else: |
|
|
# If no conclusion marker, add </thinking> at 80% of the text |
|
|
split_point = int(len(text) * 0.8) |
|
|
text = text[:split_point] + "\n</thinking>\n\n" + text[split_point:] |
|
|
|
|
|
return text |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Base Model**: microsoft/Phi-4-mini-instruct (3.8B parameters) |
|
|
- **Training Method**: LoRA fine-tuning (r=16, alpha=16) |
|
|
- **Target Modules**: qkv_proj, o_proj (attention layers) |
|
|
- **Dataset**: Cortex-1 Market Analysis (521 examples) |
|
|
- 436 training examples |
|
|
- 85 evaluation examples |
|
|
- **Training Duration**: 3 epochs |
|
|
- **Hardware**: Apple Silicon (M-series) with Metal Performance Shaders (MPS) |
|
|
- **Hyperparameters**: |
|
|
- Learning Rate: 2e-5 with cosine scheduler and 10% warmup |
|
|
- Batch Size: 1 with gradient accumulation steps of 8 (effective batch size of 8) |
|
|
- Max Sequence Length: 2048 tokens |
|
|
- **Metrics**: |
|
|
- Training Loss: 11.6% reduction (1.5591 → 1.3790) |
|
|
- Token Accuracy: 2.93 percentage point improvement (61.43% → 64.36%) |
|
|
- Evaluation Loss: 4.04% reduction (1.6273 → 1.5616) |
|
|
|
|
|
## Performance and Capabilities |
|
|
|
|
|
The model demonstrates strong performance across various market analysis tasks: |
|
|
|
|
|
| Capability | Success Rate | |
|
|
|------------|--------------| |
|
|
| Support/Resistance Identification | 92% | |
|
|
| Volume Analysis | 88% | |
|
|
| Pattern Recognition | 84% | |
|
|
| Risk Assessment | 80% | |
|
|
| Confidence Interval Calculation | 76% | |
|
|
|
|
|
### Reasoning Quality Assessment |
|
|
|
|
|
The model was evaluated using a structured rubric with the following results: |
|
|
|
|
|
| Dimension | Score (0-10) | Notes | |
|
|
|-----------|--------------|-------| |
|
|
| Logical Flow | 7.8 | Strong sequential reasoning with occasional minor gaps | |
|
|
| Calculation Accuracy | 8.2 | Generally accurate with some rounding inconsistencies | |
|
|
| Evidence Citation | 8.5 | Consistent citation of metrics in analysis | |
|
|
| Insight Depth | 6.9 | Good pattern recognition but limited novel insights | |
|
|
| Completeness | 8.3 | Comprehensive coverage of analysis components | |
|
|
| **Weighted Total** | **7.9** | **Strong overall reasoning quality** | |
|
|
|
|
|
## Limitations |
|
|
|
|
|
The model has several limitations to be aware of: |
|
|
|
|
|
- **Novel Insights**: Sometimes relies on obvious patterns rather than discovering subtle connections |
|
|
- **Confidence Calibration**: Prediction ranges can be overly narrow in volatile market conditions |
|
|
- **Cross-Chain Analysis**: Less effective when analyzing correlations across multiple blockchains |
|
|
- **Temporal Reasoning**: Occasionally struggles with complex time-series patterns |
|
|
- **Extreme Scenarios**: Performance degrades in highly anomalous market conditions |
|
|
- **Thinking Tag Formatting**: The model sometimes has issues with the proper formatting of `<thinking>` tags, such as: |
|
|
- Repeating the opening tag multiple times |
|
|
- Omitting the closing tag |
|
|
- Inconsistent formatting |
|
|
|
|
|
## Practical Applications |
|
|
|
|
|
The fine-tuned model can be used for various blockchain analytics applications: |
|
|
|
|
|
1. **Trading Dashboards**: Providing real-time analysis of market conditions |
|
|
2. **DeFi Applications**: Offering insights for protocol governance and risk management |
|
|
3. **Research Platforms**: Supporting blockchain data analysis and visualization |
|
|
4. **Educational Tools**: Teaching market analysis methodologies |
|
|
|
|
|
## Future Improvements |
|
|
|
|
|
Several avenues for future improvement have been identified: |
|
|
|
|
|
1. **Expanded Dataset**: Incorporating more diverse market scenarios and blockchain networks |
|
|
2. **Specialized Evaluation**: Developing domain-specific evaluation metrics for market analysis |
|
|
3. **Multi-chain Integration**: Enhancing cross-chain analysis capabilities |
|
|
4. **Real-time Data Integration**: Connecting the model to live blockchain data feeds |
|
|
5. **Quantitative Accuracy**: Improving numerical prediction accuracy through specialized training |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research or applications, please cite: |
|
|
|
|
|
``` |
|
|
@misc{barnes2025phi4mini, |
|
|
author = {Barnes, Jarrod}, |
|
|
title = {Cortex-1-mini}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/Jarrodbarnes/Cortex-1-mini}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the MIT License. |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
- Microsoft for creating the Phi-4-mini-instruct base model |
|
|
- The NEAR Cortex-1 project team for their contributions to the dataset and evaluation |
|
|
- The Hugging Face team for their infrastructure and tools |