README.md 1.12 KB
Newer Older
Carl De Sousa Trias's avatar
Carl De Sousa Trias committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# MPAI-MMC Answer to Multimodal Question


This code refers to the implementation of the MMC-AMQ, as described in the [AIW](https://mpai.community/standards/mpai-mmc/v2-2/ai-workflows/answer-to-multimodal-question/)


## Guide to the AMQ code

    1. Manages input files and parameters: Speech Object, Visual Object, Text Object
    2. Executes the AIW to perform the Answer to Multimodal Question on each individual pair of Speech/Text and Visual Object.
    3. Outputs the answer as Speech Object and Text Object.

The OSD-AMQ Reference Software is found at the NNW gitlab site. It contains:

    1. The python code implementing the AIW.
    2. The required  libraries are: pytorch, transformers (HuggingFace), datasets (HuggingFace), soundfile, and pillow


## Installation
Code was designed and tested on an Ubuntu 20.04 operating system using anaconda 23.7.2 and Python 3.9.
An environment with all the necessary libraries can be created using:
```bash
23
conda create --name <env>
Carl De Sousa Trias's avatar
Carl De Sousa Trias committed
24
conda activate <env>
25
26
27
28
29
pip install -r requirements.txt
```
Based on your ffmpeg installation you might also needs :
```bash
conda install -c conda-forge ffmpeg
Carl De Sousa Trias's avatar
Carl De Sousa Trias committed
30
31
```