Text Classification
fastText
English
scikit-learn
code-classification
programming-language-detection
source-code
machine-learning
modernbert
classification
nlp
code-analysis
software-engineering
Eval Results (legacy)
Instructions to use kaushik-harsh-99/Code-Lang-Classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- fastText
How to use kaushik-harsh-99/Code-Lang-Classifier with fastText:
from huggingface_hub import hf_hub_download import fasttext model = fasttext.load_model(hf_hub_download("kaushik-harsh-99/Code-Lang-Classifier", "model.bin")) - Notebooks
- Google Colab
- Kaggle
| import json | |
| FILES = { | |
| "dataset/train.jsonl": "fasttext_train.txt", | |
| "dataset/validation.jsonl": "fasttext_validation.txt", | |
| "dataset/test.jsonl": "fasttext_test.txt", | |
| } | |
| for input_file, output_file in FILES.items(): | |
| print(f"Converting {input_file} -> {output_file}") | |
| count = 0 | |
| with open(input_file, "r", encoding="utf-8") as fin, \ | |
| open(output_file, "w", encoding="utf-8") as fout: | |
| for line in fin: | |
| row = json.loads(line) | |
| label = str(row["label"]).strip() | |
| text = str(row["content"]) | |
| text = text.replace("__label__", "__lbl__") | |
| text = " ".join(text.split()) | |
| fout.write( | |
| f"__label__{label} {text}\n" | |
| ) | |
| count += 1 | |
| print(f"Saved {count:,} samples") | |
| print("\nDone.") |