Text Classification
Transformers
PyTorch
JAX
Safetensors
code
English
roberta
text-embeddings-inference
Instructions to use Fsoft-AIC/Codebert-docstring-inconsistency with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Fsoft-AIC/Codebert-docstring-inconsistency with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Fsoft-AIC/Codebert-docstring-inconsistency")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Fsoft-AIC/Codebert-docstring-inconsistency") model = AutoModelForSequenceClassification.from_pretrained("Fsoft-AIC/Codebert-docstring-inconsistency") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -102,17 +102,17 @@ Using model with Jax and Pytorch
|
|
| 102 |
```python
|
| 103 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification, FlaxAutoModelForSequenceClassification
|
| 104 |
|
| 105 |
-
#Load
|
| 106 |
model = FlaxAutoModelForSequenceClassification.from_pretrained("Fsoft-AIC/Codebert-docstring-inconsistency")
|
| 107 |
|
| 108 |
-
#Load
|
| 109 |
model = AutoModelForSequenceClassification.from_pretrained("Fsoft-AIC/Codebert-docstring-inconsistency")
|
| 110 |
```
|
| 111 |
|
| 112 |
## Limitations
|
| 113 |
-
This model is trained on
|
| 114 |
|
| 115 |
-
It is hard to evaluate the model due to the unavailable labeled datasets. ChatGPT is adopted as a reference to measure the correlation between the model and ChatGPT's scores. However, the result could be influenced by ChatGPT's potential biases and ambiguous conditions. Therefore, we recommend having human labeling dataset and
|
| 116 |
|
| 117 |
## Additional information
|
| 118 |
### Licensing Information
|
|
|
|
| 102 |
```python
|
| 103 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification, FlaxAutoModelForSequenceClassification
|
| 104 |
|
| 105 |
+
#Load model with jax
|
| 106 |
model = FlaxAutoModelForSequenceClassification.from_pretrained("Fsoft-AIC/Codebert-docstring-inconsistency")
|
| 107 |
|
| 108 |
+
#Load model with torch
|
| 109 |
model = AutoModelForSequenceClassification.from_pretrained("Fsoft-AIC/Codebert-docstring-inconsistency")
|
| 110 |
```
|
| 111 |
|
| 112 |
## Limitations
|
| 113 |
+
This model is trained on 5M subset of The Vault in a self-supervised manner. Since the negative samples are generated artificially, the model's ability to identify instances that require a strong semantic understanding between the code and the docstring might be restricted.
|
| 114 |
|
| 115 |
+
It is hard to evaluate the model due to the unavailable labeled datasets. ChatGPT is adopted as a reference to measure the correlation between the model and ChatGPT's scores. However, the result could be influenced by ChatGPT's potential biases and ambiguous conditions. Therefore, we recommend having human labeling dataset and fine-tune this model to achieve the best result.
|
| 116 |
|
| 117 |
## Additional information
|
| 118 |
### Licensing Information
|