Model Description

This model is a transformer-based sentiment classification system built using DistilBERT and trained on text data derived from the InfoBay.AI audio dataset.

The training pipeline converts raw conversational audio into structured text using Whisper base, followed by segmentation and sentiment labeling. The resulting text dataset is then used to train the sentiment classification model.

This approach enables the transformation of unstructured audio data into meaningful NLP intelligence, demonstrating the value of the InfoBay.AIdataset for downstream AI applications.

Training Pipeline

The complete pipeline used for training is as follows:

Raw Audio (InfoBay.AI Dataset) → Whisper ASR (Speech-to-Text) → Text Segmentation → Sentiment Labeling → DistilBERT Training

Audio Source: InfoBay.AI podcast dataset
Transcription: Whisper base model
Data Processing: Sentence-level segmentation
Labeling: VADER-based sentiment scoring
Model Training: DistilBERT for 3-class sentiment classification

Key Insight

This model demonstrates that audio data alone can be converted into high-quality training data and used effectively to train transformer-based NLP models.

It validates the ability of the InfoBay.AI dataset to support:

Speech-to-text pipelines
Sentiment analysis systems
End-to-end conversational AI workflows

Dataset Split

Train/Test Split: 80% / 20%
Split Strategy: Stratified sampling (to preserve class distribution)
Label Encoding: Applied using LabelEncoder

Training Hyperparameters

Number of Epochs: 15
Train Batch Size: 16
Evaluation Batch Size: 16
Learning Rate: 2e-5
Optimizer: AdamW
Loss Function: Cross-Entropy Loss
ogging Directory: ./logs
Output Directory: ./results

Model Performance

The model demonstrates strong performance on the speech-derived dataset on internal evaluation:

Accuracy: ~98%
Macro F1-score: ~0.98
Weighted F1-score: ~0.99

Classification Report

Class	Sentiment	Precision	Recall	F1-score	Support
0	Negative	0.97	0.96	0.96	1,128
1	Neutral	0.99	0.99	0.99	7,865
2	Positive	0.98	0.98	0.98	2,658

Usage

Install dependencies

pip install -U transformers torch

from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import torch
import torch.nn.functional as F

repo_id = "InfoBayAI/Audio-to-Sentiment_Intelligence_Model"

tokenizer = DistilBertTokenizer.from_pretrained(
    repo_id,
    subfolder="sentiment-model"
)

model = DistilBertForSequenceClassification.from_pretrained(
    repo_id,
    subfolder="sentiment-model"
)

model.eval()

text = " Write your text "

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=1)

predicted_class = torch.argmax(probs, dim=1).item()

labels = ["Negative", "Neutral", "Positive"]  

print("Text:", text)
print("Prediction:", labels[predicted_class])
print("Confidence:", probs[0][predicted_class].item())

AUDIO-TO-TEXT

import whisper
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from transformers import pipeline
import os
import numpy as np

model = whisper.load_model("base")
audio_folder = r"C:\Users\3\Documents\AUDIO 2\b6"
print(os.path.exists(audio_Folder))


analyzer = SentimentIntensityAnalyzer()

data = []
sr = 1

# Loop through all audio files
for file in os.listdir(audio_folder):

    if file.endswith((".wav", ".mp3")):

        audio_path = os.path.join(audio_folder, file)

        print("Processing:", file)

        result = model.transcribe(audio_path, task="translate",  fp16=False)

        segment_id = 1

        for segment in result["segments"]:

            text = segment["text"]

            # Sentiment score
            sentiment_score = analyzer.polarity_scores(text)["compound"]

            # Convert score to label
            if sentiment_score > 0.05:
                sentiment = "positive"
            elif sentiment_score < -0.05:
                sentiment = "negative"
            else:
                sentiment = "neutral"

            data.append({
                "sr_no": sr,
                "call_id": file,
                "segment_id": segment_id,
                "start_time": segment["start"],
                "end_time": segment["end"],
                "text": text,
                "sentiment": sentiment
            })

            sr += 1
            segment_id += 1

df= pd.DataFrame(data)

df.to_csv("AUDIO", index=False)
print("dataset created ")

print(df.head())

Considerations

This model is trained on text derived from the InfoBay.AI audio dataset and is provided for research and evaluation purposes. The dataset contains a larger collection of high-quality conversational audio. For access to the full dataset or enterprise licensing inquiries, please visit our website InfoBay.AI or contact us directly.

Ph: (91) 8303174762
Email: vipul@infobay.ai

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Audio-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for InfoBayAI/Audio-to-Sentiment_Intelligence_Model

Base model

openai/whisper-small

Finetuned

(3443)

this model