How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("feature-extraction", model="sjiang1/codecse")
# Load model directly
from transformers import GraphCodeBERTForCL
model = GraphCodeBERTForCL.from_pretrained("sjiang1/codecse", dtype="auto")
Quick Links

Model Card for CodeCSE

A simple pre-trained model for code and comment sentence embeddings using contrastive learning. This model was pretrained using CodeSearchNet.

Please clone the CodeCSE repository to get GraphCodeBERTForCL and other dependencies to use this pretrained model. https://github.com/emu-se/CodeCSE

Detailed instructions are listed in the repository's README.md. Overall, you will need:

  1. GraphCodeBERT (CodeCSE uses GraphCodeBERT's input format for code)
  2. GraphCodeBERTForCL defined in codecse/codecse

Inference example

NL input example: example_nl.json

{
    "original_string": "", 
    "docstring_tokens": ["Save", "model", "to", "a", "pickle", "located", "at", "path"], 
    "url": "https://github.com/openai/baselines/blob/3301089b48c42b87b396e246ea3f56fa4bfc9678/baselines/deepq/deepq.py#L55-L72"
}

Code snippet to get the embedding of an NL document (link to complete code):

nl_json = load_example("example_nl.json")
batch = prepare_inputs(nl_json, tokenizer, args)
nl_inputs = batch[3]
with torch.no_grad():        
    nl_vec = model(input_ids=nl_inputs, sent_emb="nl")[1] 
Downloads last month
245
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support