diff --git a/Dockerfile b/ASR_1/Dockerfile similarity index 100% rename from Dockerfile rename to ASR_1/Dockerfile diff --git a/ASR_1/README.md b/ASR_1/README.md new file mode 100644 index 0000000000000000000000000000000000000000..57477fbd2c2331ad7d8f0e99f8b80fa6ef3e0e53 --- /dev/null +++ b/ASR_1/README.md @@ -0,0 +1,9 @@ +``` +cd $PATH_SHARED +mkdir models +cd models +mkdir mmc_asr +cd mmc_asr +git lfs install +git clone https://huggingface.co/openai/whisper-large-v3 +``` diff --git a/requirements.txt b/ASR_1/requirements.txt similarity index 86% rename from requirements.txt rename to ASR_1/requirements.txt index c44f19b058b764bbfc79487b5c7ad87159b2d8ac..e61861af2e56cab8488d8731b119ed6cf61b45f4 100644 --- a/requirements.txt +++ b/ASR_1/requirements.txt @@ -5,4 +5,4 @@ torchaudio # VAD needs it transformers typeguard==4.1.5 typing_extensions==4.8.0 -whisper-timestamped \ No newline at end of file +whisper-timestamped diff --git a/src/asr_funs.py b/ASR_1/src/asr_funs.py similarity index 100% rename from src/asr_funs.py rename to ASR_1/src/asr_funs.py diff --git a/src/main.py b/ASR_1/src/main.py similarity index 100% rename from src/main.py rename to ASR_1/src/main.py diff --git a/src/run_funs.py b/ASR_1/src/run_funs.py similarity index 100% rename from src/run_funs.py rename to ASR_1/src/run_funs.py diff --git a/src/trs_class.py b/ASR_1/src/trs_class.py similarity index 100% rename from src/trs_class.py rename to ASR_1/src/trs_class.py diff --git a/ASR_2/ASR.py b/ASR_2/ASR.py new file mode 100644 index 0000000000000000000000000000000000000000..dba341ea42c13352d2e1c36ee6ff21640d5540d1 --- /dev/null +++ b/ASR_2/ASR.py @@ -0,0 +1,27 @@ +import torch +from transformers import pipeline +device = torch.device("cuda" if torch.cuda.is_available() else "cpu") + +class AutomaticSpeechRecognition(): + QuestionAudio = None + ## + QuestionText = None + + def funcAutomaticSpeechRecognition(self, input): + ''' + Verify the inference + ''' + speech_reco = pipeline( + "automatic-speech-recognition", model="openai/whisper-base", device=device + ) + res = speech_reco(input) + return res["text"] + + def run(self): + self.QuestionText = self.funcAutomaticSpeechRecognition(self.QuestionAudio) + +if __name__ == '__main__': + module = AutomaticSpeechRecognition() + module.QuestionAudio = "path/to/audiofile" + module.run() + print(module.QuestionText) diff --git a/ASR_2/README.md b/ASR_2/README.md new file mode 100644 index 0000000000000000000000000000000000000000..745bbc8a330872247b8c47b4c68cdf743556a564 --- /dev/null +++ b/ASR_2/README.md @@ -0,0 +1,8 @@ +Implementation of MMC-ASR as a class in Python. + +## Installation +Code was designed and tested on an Ubuntu 20.04 operating system using anaconda 23.7.2 and Python 3.9. +An environment with all the necessary libraries can be created using: +```bash +conda create --name --file requirements.txt +``` diff --git a/ASR_2/requirements.txt b/ASR_2/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..4f492ddc93de3952474c43b325c02b1b0fcdec9c --- /dev/null +++ b/ASR_2/requirements.txt @@ -0,0 +1,2 @@ +torch +transformers diff --git a/README.md b/README.md index 57477fbd2c2331ad7d8f0e99f8b80fa6ef3e0e53..01b068f52f134e275e9e6ff74bf3efc053466241 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,33 @@ -``` -cd $PATH_SHARED -mkdir models -cd models -mkdir mmc_asr -cd mmc_asr -git lfs install -git clone https://huggingface.co/openai/whisper-large-v3 -``` +# MPAI-MMC Automatic Speech Recognition + + +This code refers to the implementation of the MPAI-NNW under MPAI-AIF, as described in the [AIMs](https://mpai.community/standards/mpai-mmc/v2-2/ai-modules/automatic-speech-recognition/. + +### Guide to the ASR code #1 + +The code takes Speech Objects from MMC-AUS and generates Text Segments (called text transcripts). It uses the whisper-large-v3 model to convert an input Speech Object (speaker’s turn) into a Text Segment (here called text transcript). Disfluencies (e.g., repetitions, repairs, filled pauses) are often omitted. The Whisper reference document is available. + +The MMC-ASR Reference Software is found at the MPAI gitlab site. Use of this AI Modules is for developers who are familiar with Python, Docker, RabbitMQ, and downloading models from HuggingFace. The Reference Software contains: + + 1. src: a folder with the Python code implementing the AIM + 2. Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container + 3. requirements.txt: dependencies installed in the Docker image + 4. README.md: commands for cloning https://huggingface.co/openai/whisper-large-v3 + +Library: https://github.com/linto-ai/whisper-timestamped + +### Guide to the ASR code #2 + +Use of this AI Modules is for developers who are familiar with Python and downloading models from HuggingFace, + +A wrapper for the Whisper NN Module: + + 1. Manages input files and parameters: Speech Object + 2. Performs Speech Recognition on each Speech Object by executing the Whisper Module. + 3. Outputs Recognised Text. + +The MMC-ASR Reference Software is found at the NNW gitlab site (registration required). It contains: + + 1. The python code implementing the AIM. + 2. The required libraries are: pytorch and transformers (HuggingFace). +