File size: 1,929 Bytes
609adc6
c282db4
 
 
 
 
ff5a4d6
609adc6
 
 
 
 
 
dba24db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9230ae1
 
dba24db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9230ae1
dba24db
9230ae1
dba24db
9230ae1
dba24db
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
title: Phoneme Detection Leaderboard
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.44.1"
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# Phoneme Detection Leaderboard

A clean, simplified phoneme detection leaderboard based on the open_asr_leaderboard interface.

## Features

- **Clean Interface**: Uses the same interface structure as open_asr_leaderboard
- **Phoneme Evaluation**: Evaluates models on phoneme recognition tasks
- **Multiple Datasets**: Supports evaluation on multiple phoneme datasets
- **Model Request System**: Allows users to request evaluation of new models

## Structure

```
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ constants.py          # Constants and text definitions
β”œβ”€β”€ utils_display.py      # Display utilities and column definitions
β”œβ”€β”€ init.py              # Initialization and hub integration
β”œβ”€β”€ phoneme_eval.py      # Core phoneme evaluation logic
β”œβ”€β”€ utils/               # Utility modules
β”‚   β”œβ”€β”€ load_model.py    # Model loading and inference
β”‚   β”œβ”€β”€ audio_process.py # Audio processing and PER calculation
β”‚   └── cmu_process.py   # CMU to IPA conversion
β”œβ”€β”€ requirements.txt     # Python dependencies
└── README.md           # This file
```

## Usage

1. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```

2. Run the application:
   ```bash
   python app.py
   ```

3. Run evaluation:
   ```bash
   python phoneme_eval.py
   ```

## Evaluation

The leaderboard evaluates models on:
- **PER (Phoneme Error Rate)**: Lower is better
- **Average Duration**: Processing time per sample

Models are ranked by Average PER across all datasets.

## Datasets

- `phoneme_asr`: General phoneme recognition dataset
- `kids_phoneme_md`: Children's speech phoneme dataset