Model Description
This model is a transformer-based sentiment classification system built using DistilBERT and trained on text data derived from the InfoBay.AI audio dataset.
The training pipeline converts raw conversational audio into structured text using Whisper base, followed by segmentation and sentiment labeling. The resulting text dataset is then used to train the sentiment classification model.
This approach enables the transformation of unstructured audio data into meaningful NLP intelligence, demonstrating the value of the InfoBay.AIdataset for downstream AI applications.
Training Pipeline
The complete pipeline used for training is as follows:
Raw Audio (InfoBay.AI Dataset) โ Whisper ASR (Speech-to-Text) โ Text Segmentation โ Sentiment Labeling โ DistilBERT Training
Audio Source: InfoBay.AI podcast dataset
Transcription: Whisper base model
Data Processing: Sentence-level segmentation
Labeling: VADER-based sentiment scoring
Model Training: DistilBERT for 3-class sentiment classification
Key Insight
This model demonstrates that audio data alone can be converted into high-quality training data and used effectively to train transformer-based NLP models.
It validates the ability of the InfoBay.AI dataset to support:
Speech-to-text pipelines
Sentiment analysis systems
End-to-end conversational AI workflows
Dataset Split
Train/Test Split: 80% / 20%
Split Strategy: Stratified sampling (to preserve class distribution)
Label Encoding: Applied using LabelEncoder
Training Hyperparameters
Number of Epochs: 15
Train Batch Size: 16
Evaluation Batch Size: 16
Learning Rate: 2e-5
Optimizer: AdamW
Loss Function: Cross-Entropy Loss
ogging Directory: ./logs
Output Directory: ./results
Model Performance
The model demonstrates strong performance on the speech-derived dataset on internal evaluation:
Accuracy: ~98%
Macro F1-score: ~0.98
Weighted F1-score: ~0.99
Classification Report
| Class | Sentiment | Precision | Recall | F1-score | Support |
|---|---|---|---|---|---|
| 0 | Negative | 0.97 | 0.96 | 0.96 | 1,128 |
| 1 | Neutral | 0.99 | 0.99 | 0.99 | 7,865 |
| 2 | Positive | 0.98 | 0.98 | 0.98 | 2,658 |
Usage
Install dependencies
pip install -U transformers torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import torch
import torch.nn.functional as F
repo_id = "InfoBayAI/Audio-to-Sentiment_Intelligence_Model"
tokenizer = DistilBertTokenizer.from_pretrained(
repo_id,
subfolder="sentiment-model"
)
model = DistilBertForSequenceClassification.from_pretrained(
repo_id,
subfolder="sentiment-model"
)
model.eval()
text = " Write your text "
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=1)
predicted_class = torch.argmax(probs, dim=1).item()
labels = ["Negative", "Neutral", "Positive"]
print("Text:", text)
print("Prediction:", labels[predicted_class])
print("Confidence:", probs[0][predicted_class].item())
AUDIO-TO-TEXT
import whisper
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from transformers import pipeline
import os
import numpy as np
model = whisper.load_model("base")
audio_folder = r"C:\Users\3\Documents\AUDIO 2\b6"
print(os.path.exists(audio_Folder))
analyzer = SentimentIntensityAnalyzer()
data = []
sr = 1
# Loop through all audio files
for file in os.listdir(audio_folder):
if file.endswith((".wav", ".mp3")):
audio_path = os.path.join(audio_folder, file)
print("Processing:", file)
result = model.transcribe(audio_path, task="translate", fp16=False)
segment_id = 1
for segment in result["segments"]:
text = segment["text"]
# Sentiment score
sentiment_score = analyzer.polarity_scores(text)["compound"]
# Convert score to label
if sentiment_score > 0.05:
sentiment = "positive"
elif sentiment_score < -0.05:
sentiment = "negative"
else:
sentiment = "neutral"
data.append({
"sr_no": sr,
"call_id": file,
"segment_id": segment_id,
"start_time": segment["start"],
"end_time": segment["end"],
"text": text,
"sentiment": sentiment
})
sr += 1
segment_id += 1
df= pd.DataFrame(data)
df.to_csv("AUDIO", index=False)
print("dataset created ")
print(df.head())
Considerations
This model is trained on text derived from the InfoBay.AI audio dataset and is provided for research and evaluation purposes. The dataset contains a larger collection of high-quality conversational audio. For access to the full dataset or enterprise licensing inquiries, please visit our website InfoBay.AI or contact us directly.
Ph: (91) 8303174762
Email: vipul@infobay.ai
Model tree for InfoBayAI/Audio-to-Sentiment_Intelligence_Model
Base model
openai/whisper-small