Whisper Transcription

type access

  • Operating System:

  • Terminal:

  • Shell:

  • Editor:

  • Package Manager:

  • Programming Language:

  • Utility:

  • Extension:


This utility is used to make a transcription of a voice or video recording, using the Whisper large language model from OpenAI.

Input format

The app can process .mp3, .mp4, .m4a, .wav and .mpg files.

Output format

Output folder

By default the transcript files are saved in /Jobs/Whisper Transcription/<job-id>/out. The user can select another directory using the corresponding optional parameter.

Optional parameters

From the application's job submission page the user may specify a number of optional parameters, as needed:

  • Input file:
    A single file which will be transcribed by Whisper. This optional parameter is suitable if the user only needs Whisper to transcribe one file in the job.

  • Input directory:
    The directory containing the file(s) which will be transcribed by Whisper. This optional parameter is suitable if the user needs Whisper to transcribe more than one file in the job. Output files are generated for each file in the 'Input directory'.

  • Option: --output_dir:
    If the user wants the output files to be saved somewhere else than the default output folder, the desired folder is specified here.

  • Option: --output_format:
    The file format of the output. See details above. The default is that all the output formats are generated and saved.

  • Option: --model:
    The model which Whisper will use for the transcription. The default is large, i.e., the largest, and most accurate model.

  • Option: --language:
    Selecting a specific language forces Whisper to transcribe the input file(s) in that language. If no language is selected, Whisper tries to recognize the language.

  • Interactive mode:
    Allows the user to select whether Whisper should run interactively (by setting the parameter value to true). If the job is entered in interactive mode, the user can access the app terminal or web interface. The latter gives access to a JupyterLab workspace to run notebooks. The default setting is a non-interactive mode.

  • Archive password:
    This will AES encrypt and password-protect the ZIP output archive. The user must specify a password for the archive as a text string.

General considerations

When using the Whisper app, there are a few things to keep in mind:

  • In general, larger models yield more accurate transcription results but also take longer to run. The user should therefore be sure to allocate enough time for the job, and/or extend the job lifetime if necessary.

  • Running Whisper on a GPU node is considerably faster than running it on a CPU node. However, the app can only use one GPU at a time. Therefore, users should only allocate single-GPU machines (i.e., *-gpu-1 machines) to their Whisper jobs.

  • Speaker diarization (i.e., distinguishing between different speakers) is not currently a feature of Whisper.