You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

You agree to use the model according to its license.

Log in or Sign Up to review the conditions and access this model content.

SSL Audio 1k-base

This model is pretrained on 1,000 hours of audio content from INA, sampled following the base setting, as described in our LREC 2026 paper "Data Selection Effects on Self-Supervised Learning of Audio Representations for French Audiovisual Broadcasts".

Link to model Pretraining data selection
ssl-audio-1k-base Random sample of 1,000h
ssl-audio-1k-no_music Samples not containing musics
ssl-audio-1k-only_speech Samples only composed of speech
ssl-audio-1k-only_fr Samples only composed of French content
ssl-audio-1k-gender Samples with a balanced proportion of male and female speech
ssl-audio-1k-duplicates Samples with duplicates content. This model is not released.

The features generated by these models have been used for Voice Activity Detection (VAD) and music detection For detailed information about training and results associated with this model, please refer to our publication. Along with the Tensorboard training metrics, we release the hyperparameters.

Usage

import librosa
from transformers import AutoModel, AutoFeatureExtractor

# loading the audio file, need to be sampled at 16kHz
audio, sr = librosa.load('/path/to/your/audio/file.wav', sr=16000)

# loading the feature extractor and SSL model
model_name = 'ina-foss/ssl-audio-1k-base'
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
processor = AutoFeatureExtractor.from_pretrained(model_name)
model.eval()

inputs = processor(audio, sampling_rate=sr, return_tensors="pt")

# extract features
with torch.no_grad():
    outputs = model(**inputs)

License and citation

The model is distributed using the pantagruel-research-license.

If you use this model or find it useful in your research, publications, or applications, please cite the following work:

@inproceedings{pelloin2026lrec,
  author =       "Pelloin, Valentin and Bekkali, Lina and Dehak, Reda and Doukhan, David",
  year =         "2026",
  title =        "Data Selection Effects on Self-Supervised Learning of Audio Representations for French Audiovisual Broadcasts",
  booktitle={Fifteenth International Conference on Language Resources and Evaluation (LREC 2026)},
  address = "Palma, Mallorca, Spain",
  publisher = "European Language Resources Association",
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ina-foss/ssl-audio-1k-base

Finetunes
2 models

Collection including ina-foss/ssl-audio-1k-base