How can I get higher accuracy

#1
by blackmaksym - opened

I am trying to use your LLM with signals from SIGID wiki website and the results are totally away from 0.99. Can you tell me what do I do wrong?
Here is my code:

waveform, sample_rate = torchaudio.load(wav_path)

# Resample to 16kHz if necessary
if sample_rate != 16000:
    resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
    waveform = resampler(waveform)

# Preprocess the audio (extract features)
inputs = feature_extractor(waveform.squeeze().numpy(), 16000, return_tensors="pt")

Yea same here, even the Sigwiki files it was trained on is a total miss. I ran a bunch of them and not even close to guessing what it is:

Loading model from local path: D:\Projects\AI\Sigwiki\sigwiki\model\AST_finetuned_SIGIDwiki
Processing audio: .\2G_ALEaudio.mp3
Detected original sampling rate: 44100 Hz
Raw loaded shape: (1317888,) (channels × samples or samples)
After mono/1D conversion shape: (1317888,)
Resampling from 44100 Hz to 16000 Hz...
Final input to extractor shape: (478146,) (must be 1D)

============================================================
Predicted Signal Type : Morse Code (CW)
Confidence : 0.4621 (46.21%)

Top 3 predictions:

  1. Morse Code (CW) 0.4621 (46.21%)
  2. STANAG 5065 0.1916 (19.16%)
  3. Automatic Picture Transmission (APT) 0.0359 (3.59%)
    PS D:\Projects\AI\Sigwiki>

S D:\Projects\AI\Sigwiki> python .\sigwiki.py .\29B6_40Hz_USB_5kHz.ogg

Loading model from local path: D:\Projects\AI\Sigwiki\sigwiki\model\AST_finetuned_SIGIDwiki
Processing audio: .\29B6_40Hz_USB_5kHz.ogg
Detected original sampling rate: 48000 Hz
Raw loaded shape: (518400, 2) (channels × samples or samples)
Downmixed multi-channel audio to mono
After mono/1D conversion shape: (518400,)
Resampling from 48000 Hz to 16000 Hz...
Final input to extractor shape: (172800,) (must be 1D)

============================================================
Predicted Signal Type : STANAG 5065
Confidence : 0.4002 (40.02%)

Top 3 predictions:

  1. STANAG 5065 0.4002 (40.02%)
  2. 5G "New Radio" cellular network - Downlink 0.2346 (23.46%)
  3. Digital Audio Broadcasting Plus (DAB+) 0.1490 (14.90%)
    PS D:\Projects\AI\Sigwiki>

PS D:\Projects\AI\Sigwiki> python .\sigwiki.py .\Amps_cell_broadcast_decreasedVOL.wav
Loading model from local path: D:\Projects\AI\Sigwiki\sigwiki\model\AST_finetuned_SIGIDwiki
Processing audio: .\Amps_cell_broadcast_decreasedVOL.wav
Detected original sampling rate: 96000 Hz
Raw loaded shape: (1217802,) (channels × samples or samples)
After mono/1D conversion shape: (1217802,)
Resampling from 96000 Hz to 16000 Hz...
Final input to extractor shape: (202967,) (must be 1D)

============================================================
Predicted Signal Type : 5G "New Radio" cellular network - Downlink
Confidence : 0.5153 (51.53%)

Top 3 predictions:

  1. 5G "New Radio" cellular network - Downlink 0.5153 (51.53%)
  2. Digital Mobile Radio (DMR) 0.0775 (7.75%)
  3. STANAG 5065 0.0659 (6.59%)
    PS D:\Projects\AI\Sigwiki>

Sign up or log in to comment