API Reference

Core functionality

openl3.core.get_audio_embedding(audio, sr, model=None, input_repr=None, content_type='music', embedding_size=6144, center=True, hop_size=0.1, batch_size=32, frontend='kapre', verbose=True)[source]

Computes and returns L3 embedding for given audio data.

Embeddings are computed for 1-second windows of audio.

Parameters
audionp.ndarray [shape=(N,) or (N,C)] or list[np.ndarray]

1D numpy array of audio data or list of audio arrays for multiple inputs.

srint or list[int]

Sampling rate, or list of sampling rates. If not 48kHz audio will be resampled.

modeltf.keras.Model or None

Loaded model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.

input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for model. Ignored if model is a valid Keras model.

content_type“music” or “env”

Type of content used to train the embedding model. Ignored if model is a valid Keras model.

embedding_size6144 or 512

Embedding dimensionality. Ignored if model is a valid Keras model.

centerboolean

If True, pads beginning of signal so timestamps correspond to center of window.

hop_sizefloat

Hop size in seconds.

batch_sizeint

Batch size used for input to embedding model

frontend“kapre” or “librosa”

The audio frontend to use. By default, it will use “kapre”.

verbosebool

If True, prints verbose messages.

Returns
embeddingnp.ndarray [shape=(T, D)] or list[np.ndarray]

Array of embeddings for each window or list of such arrays for multiple audio clips.

timestampsnp.ndarray [shape=(T,)] or list[np.ndarray]

Array of timestamps corresponding to each embedding in the output or list of such arrays for multiple audio cplips.

openl3.core.get_image_embedding(image, frame_rate=None, model=None, input_repr='mel256', content_type='music', embedding_size=8192, batch_size=32, verbose=True)[source]

Computes and returns L3 embedding for given video frame (image) data.

Embeddings are computed for every image in the input.

Parameters
imagenp.ndarray [shape=(H, W, C) or (N, H, W, C)] or list[np.ndarray]

3D or 4D numpy array of image data. If the images are not 224x224, the images are resized so that the smallest size is 256 and then the center 224x224 patch is extracted from the images. Any type is accepted, and will be converted to np.float32 in the range [-1,1]. Signed data-types are assumed to take on negative values. A list of image arrays can also be provided.

frame_rateint or list[int] or None

Video frame rate (if applicable), which if provided results in a timestamp array being returned. A list of frame rates can also be provided. If None, no timestamp array is returned.

modeltf.keras.Model or None

Loaded model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.

input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for to train audio part of embedding model. Ignored if model is a valid Keras model.

content_type“music” or “env”

Type of content used to train the embedding model. Ignored if model is a valid Keras model.

embedding_size8192 or 512

Embedding dimensionality. Ignored if model is a valid Keras model.

batch_sizeint

Batch size used for input to embedding model

verbosebool

If True, prints verbose messages.

Returns
embeddingnp.ndarray [shape=(N, D)]

Array of embeddings for each frame.

timestampsnp.ndarray [shape=(N,)]

Array of timestamps for each frame. If frame_rate is None, this is not returned.

openl3.core.get_output_path(filepath, suffix, output_dir=None)[source]

Returns path to output file corresponding to the given input file.

Parameters
filepathstr

Path to audio file to be processed

suffixstr

String to append to filename (including extension)

output_dirstr or None

Path to directory where file will be saved. If None, will use directory of given filepath.

Returns
output_pathstr

Path to output file

openl3.core.preprocess_audio(audio, sr, hop_size=0.1, input_repr=None, center=True, **kw)[source]

Preprocess the audio into a format compatible with the model.

Parameters
audionp.ndarray [shape=(N,) or (N,C)] or list[np.ndarray]

1D numpy array of audio data or list of audio arrays for multiple inputs.

srint or list[int]

Sampling rate, or list of sampling rates. If not 48kHz audio will be resampled.

hop_sizefloat

Hop size in seconds.

input_reprstr or None

Spectrogram representation used for model. If input_repr, is None, then no spectrogram is computed and it is assumed that the model contains the details about the input representation.

centerboolean

If True, pads beginning of signal so timestamps correspond to center of window.

Returns
input_data (np.ndarray): The preprocessed audio. Depending on

the value of input_repr, it will be np.ndarray[batch, time, frequency, 1] if a valid input representation is provided, or np.ndarray[batch, time, 1] if no input_repr is provided.

openl3.core.process_audio_file(filepath, output_dir=None, suffix=None, model=None, input_repr=None, content_type='music', embedding_size=6144, center=True, hop_size=0.1, batch_size=32, overwrite=False, frontend='kapre', verbose=True)[source]

Computes and saves L3 embedding for a given audio file

Parameters
filepathstr or list[str]

Path or list of paths to WAV file(s) to be processed.

output_dirstr or None

Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.

suffixstr or None

String to be appended to the output filename, i.e. <base filename>_<suffix>.npz. If None, then no suffix will be added, i.e. <base filename>.npz.

modeltf.keras.Model or None

Loaded model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.

input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used as model input. Ignored if model is a valid Keras model with a Kapre frontend. This is required with a Librosa frontend.

content_type“music” or “env”

Type of content used to train the embedding model. Ignored if model is a valid Keras model.

embedding_size6144 or 512

Embedding dimensionality. Ignored if model is a valid Keras model.

centerboolean

If True, pads beginning of signal so timestamps correspond to center of window.

hop_sizefloat

Hop size in seconds.

batch_sizeint

Batch size used for input to embedding model

overwritebool

If True, overwrites existing output files

frontend“kapre” or “librosa”

The audio frontend to use. By default, it will use “kapre”.

verbosebool

If True, prints verbose messages.

Returns
openl3.core.process_image_file(filepath, output_dir=None, suffix=None, model=None, input_repr='mel256', content_type='music', embedding_size=8192, batch_size=32, overwrite=False, verbose=True)[source]

Computes and saves L3 embedding for a given image file

Parameters
filepathstr or list[str]

Path or list of paths to image file(s) to be processed.

output_dirstr or None

Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.

suffixstr or None

String to be appended to the output filename, i.e. <base filename>_<suffix>.npz. If None, then no suffix will be added, i.e. <base filename>.npz.

modeltf.keras.Model or None

Loaded model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.

input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for model. Ignored if model is a valid Keras model.

content_type“music” or “env”

Type of content used to train the embedding model. Ignored if model is a valid Keras model.

embedding_size8192 or 512

Embedding dimensionality. Ignored if model is a valid Keras model.

batch_sizeint

Batch size used for input to embedding model

overwritebool

If True, overwrites existing output files

verbosebool

If True, prints verbose messages.

Returns
openl3.core.process_video_file(filepath, output_dir=None, suffix=None, audio_model=None, image_model=None, input_repr=None, content_type='music', audio_embedding_size=6144, audio_center=True, audio_hop_size=0.1, image_embedding_size=8192, audio_batch_size=32, image_batch_size=32, audio_frontend='kapre', overwrite=False, verbose=True)[source]

Computes and saves L3 audio and video frame embeddings for a given video file

Note that image embeddings are computed for every frame of the video. Also note that embeddings for the audio and images are not temporally aligned. Please refer to the timestamps in the output files for the corresponding timestamps for each set of embeddings.

Parameters
filepathstr or list[str]

Path or list of paths to video file(s) to be processed.

output_dirstr or None

Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.

suffixstr or None

String to be appended to the output filename, i.e. <base filename>_<modality>_<suffix>.npz. If None, then no suffix will be added, i.e. <base filename>_<modality>.npz.

audio_modeltf.keras.Model or None

Loaded audio model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.

image_modeltf.keras.Model or None

Loaded audio model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.

input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for audio model. Ignored if model is a valid Keras model with a Kapre frontend. This is required with a Librosa frontend.

content_type“music” or “env”

Type of content used to train the embedding model. Ignored if model is a valid Keras model.

audio_embedding_size6144 or 512

Audio embedding dimensionality. Ignored if model is a valid Keras model.

audio_centerboolean

If True, pads beginning of audio signal so timestamps correspond to center of window.

audio_hop_sizefloat

Hop size in seconds.

image_embedding_size8192 or 512

Video frame embedding dimensionality. Ignored if model is a valid Keras model.

audio_batch_sizeint

Batch size used for input to audio embedding model

image_batch_sizeint

Batch size used for input to image embedding model

audio_frontend“kapre” or “librosa”

The audio frontend to use. By default, it will use “kapre”.

overwritebool

If True, overwrites existing output files

verbosebool

If True, prints verbose messages.

Returns

Models functionality

openl3.models.get_audio_embedding_model_path(input_repr, content_type)[source]

Returns the local path to the model weights file for the model with the given characteristics

Parameters
input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for model.

content_type“music” or “env”

Type of content used to train embedding.

Returns
output_pathstr

Path to given model object

openl3.models.get_image_embedding_model_path(input_repr, content_type)[source]

Returns the local path to the model weights file for the model with the given characteristics

Parameters
input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for model.

content_type“music” or “env”

Type of content used to train embedding.

Returns
output_pathstr

Path to given model object

openl3.models.kapre_v0_1_4_magnitude_to_decibel(x, ref_value=1.0, amin=1e-10, dynamic_range=80.0)[source]

log10 tensorflow function.

openl3.models.load_audio_embedding_model(input_repr, content_type, embedding_size, frontend='kapre')[source]

Returns a model with the given characteristics. Loads the model if the model has not been loaded yet.

Parameters
input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for audio model.

content_type“music” or “env”

Type of content used to train embedding.

embedding_size6144 or 512

Embedding dimensionality.

frontend“kapre” or “librosa”

The audio frontend to use. If frontend == ‘kapre’, then the kapre frontend will be included. Otherwise no frontend will be added inside the keras model.

Returns
modeltf.keras.Model

Model object.

openl3.models.load_audio_embedding_model_from_path(model_path, input_repr, embedding_size, frontend='kapre')[source]

Loads a model with weights at the given path.

Parameters
model_pathstr

Path to model weights HDF5 (.h5) file. Must be in format *._<input_repr>_<content_type>.h5 or *._<input_repr>_<content_type>-.*.h5, since model configuration will be determined from the filename.

input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for audio model.

embedding_size6144 or 512

Embedding dimensionality.

frontend“kapre” or “librosa”

The audio frontend to use. If frontend == ‘kapre’, then the kapre frontend will be included. Otherwise no frontend will be added inside the keras model.

Returns
modeltf.keras.Model

Model object.

openl3.models.load_image_embedding_model(input_repr, content_type, embedding_size)[source]

Returns a model with the given characteristics. Loads the model if the model has not been loaded yet.

Parameters
input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for audio model.

content_type“music” or “env”

Type of content used to train embedding.

embedding_size8192 or 512

Embedding dimensionality.

Returns
modeltf.keras.Model

Model object.

openl3.models.load_image_embedding_model_from_path(model_path, embedding_size)[source]

Loads a model with weights at the given path.

Parameters
model_pathstr

Path to model weights HDF5 (.h5) file.

embedding_size6144 or 512

Embedding dimensionality.

input_repr“linear”, “mel128”, or “mel256”

Spectrogram representation used for audio model.

content_type“music” or “env”

Type of content used to train embedding.

embedding_size8192 or 512

Embedding dimensionality.

Returns
modeltf.keras.Model

Model object.