BeanCounter
Collection
An open dataset containing more than 150B tokens of low-toxicity and business-oriented text. • 4 items • Updated
How to use bradfordlevy/phi-1_5-bc-cp with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="bradfordlevy/phi-1_5-bc-cp") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("bradfordlevy/phi-1_5-bc-cp")
model = AutoModelForCausalLM.from_pretrained("bradfordlevy/phi-1_5-bc-cp")How to use bradfordlevy/phi-1_5-bc-cp with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bradfordlevy/phi-1_5-bc-cp"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "bradfordlevy/phi-1_5-bc-cp",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/bradfordlevy/phi-1_5-bc-cp
How to use bradfordlevy/phi-1_5-bc-cp with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "bradfordlevy/phi-1_5-bc-cp" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "bradfordlevy/phi-1_5-bc-cp",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "bradfordlevy/phi-1_5-bc-cp" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "bradfordlevy/phi-1_5-bc-cp",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use bradfordlevy/phi-1_5-bc-cp with Docker Model Runner:
docker model run hf.co/bradfordlevy/phi-1_5-bc-cp
This model was created from Microsoft's Phi-1.5 model by continued pretraining on the BeanCounter dataset. Full details of the training process are available in Wang and Levy (2024). The model has not undergone any safety checks or alignment, thus it should be used for research purposes only.
If you use this model in your work, please cite us:
@inproceedings{
wang2024beancounter,
title={BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text},
author={Siyan Wang and Bradford Levy},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024},
url={https://openreview.net/forum?id=HV5JhUZGpP}
}