Commit 3a5e7187 authored by Carl De Sousa Trias's avatar Carl De Sousa Trias
Browse files

Update ASR_1/src/asr_funs.py, ASR_1/src/main.py, ASR_1/src/run_funs.py,...

Update ASR_1/src/asr_funs.py, ASR_1/src/main.py, ASR_1/src/run_funs.py, ASR_1/src/trs_class.py, ASR_1/Dockerfile, ASR_1/README.md, ASR_1/requirements.txt, ASR_2/ASR.py, ASR_2/README.md, ASR_2/requirements.txt, README.md
parent 06e37fca
```
cd $PATH_SHARED
mkdir models
cd models
mkdir mmc_asr
cd mmc_asr
git lfs install
git clone https://huggingface.co/openai/whisper-large-v3
```
......@@ -5,4 +5,4 @@ torchaudio # VAD needs it
transformers
typeguard==4.1.5
typing_extensions==4.8.0
whisper-timestamped
\ No newline at end of file
whisper-timestamped
import torch
from transformers import pipeline
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
class AutomaticSpeechRecognition():
QuestionAudio = None
##
QuestionText = None
def funcAutomaticSpeechRecognition(self, input):
'''
Verify the inference
'''
speech_reco = pipeline(
"automatic-speech-recognition", model="openai/whisper-base", device=device
)
res = speech_reco(input)
return res["text"]
def run(self):
self.QuestionText = self.funcAutomaticSpeechRecognition(self.QuestionAudio)
if __name__ == '__main__':
module = AutomaticSpeechRecognition()
module.QuestionAudio = "path/to/audiofile"
module.run()
print(module.QuestionText)
Implementation of MMC-ASR as a class in Python.
## Installation
Code was designed and tested on an Ubuntu 20.04 operating system using anaconda 23.7.2 and Python 3.9.
An environment with all the necessary libraries can be created using:
```bash
conda create --name <env> --file requirements.txt
```
torch
transformers
```
cd $PATH_SHARED
mkdir models
cd models
mkdir mmc_asr
cd mmc_asr
git lfs install
git clone https://huggingface.co/openai/whisper-large-v3
```
# MPAI-MMC Automatic Speech Recognition
This code refers to the implementation of the MPAI-NNW under MPAI-AIF, as described in the [AIMs](https://mpai.community/standards/mpai-mmc/v2-2/ai-modules/automatic-speech-recognition/.
### Guide to the ASR code #1
The code takes Speech Objects from MMC-AUS and generates Text Segments (called text transcripts). It uses the whisper-large-v3 model to convert an input Speech Object (speaker’s turn) into a Text Segment (here called text transcript). Disfluencies (e.g., repetitions, repairs, filled pauses) are often omitted. The Whisper reference document is available.
The MMC-ASR Reference Software is found at the MPAI gitlab site. Use of this AI Modules is for developers who are familiar with Python, Docker, RabbitMQ, and downloading models from HuggingFace. The Reference Software contains:
1. src: a folder with the Python code implementing the AIM
2. Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
3. requirements.txt: dependencies installed in the Docker image
4. README.md: commands for cloning https://huggingface.co/openai/whisper-large-v3
Library: https://github.com/linto-ai/whisper-timestamped
### Guide to the ASR code #2
Use of this AI Modules is for developers who are familiar with Python and downloading models from HuggingFace,
A wrapper for the Whisper NN Module:
1. Manages input files and parameters: Speech Object
2. Performs Speech Recognition on each Speech Object by executing the Whisper Module.
3. Outputs Recognised Text.
The MMC-ASR Reference Software is found at the NNW gitlab site (registration required). It contains:
1. The python code implementing the AIM.
2. The required libraries are: pytorch and transformers (HuggingFace).
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment