API Reference¶

Core functionality¶

openl3.core.get_audio_embedding(audio, sr, model=None, input_repr=None, content_type='music', embedding_size=6144, center=True, hop_size=0.1, batch_size=32, frontend='kapre', verbose=True)[source]¶

Computes and returns L3 embedding for given audio data.

Embeddings are computed for 1-second windows of audio.

Parameters

audionp.ndarray [shape=(N,) or (N,C)] or list[np.ndarray]: 1D numpy array of audio data or list of audio arrays for multiple inputs.
srint or list[int]: Sampling rate, or list of sampling rates. If not 48kHz audio will be resampled.
modeltf.keras.Model or None: Loaded model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.
input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used for model. Ignored if model is a valid Keras model.
content_type“music” or “env”: Type of content used to train the embedding model. Ignored if model is a valid Keras model.
embedding_size6144 or 512: Embedding dimensionality. Ignored if model is a valid Keras model.
centerboolean: If True, pads beginning of signal so timestamps correspond to center of window.
hop_sizefloat: Hop size in seconds.
batch_sizeint: Batch size used for input to embedding model
frontend“kapre” or “librosa”: The audio frontend to use. By default, it will use “kapre”.
verbosebool: If True, prints verbose messages.

Returns

embeddingnp.ndarray [shape=(T, D)] or list[np.ndarray]: Array of embeddings for each window or list of such arrays for multiple audio clips.
timestampsnp.ndarray [shape=(T,)] or list[np.ndarray]: Array of timestamps corresponding to each embedding in the output or list of such arrays for multiple audio cplips.

openl3.core.get_image_embedding(image, frame_rate=None, model=None, input_repr='mel256', content_type='music', embedding_size=8192, batch_size=32, verbose=True)[source]¶

Computes and returns L3 embedding for given video frame (image) data.

Embeddings are computed for every image in the input.

Parameters

imagenp.ndarray [shape=(H, W, C) or (N, H, W, C)] or list[np.ndarray]: 3D or 4D numpy array of image data. If the images are not 224x224, the images are resized so that the smallest size is 256 and then the center 224x224 patch is extracted from the images. Any type is accepted, and will be converted to np.float32 in the range [-1,1]. Signed data-types are assumed to take on negative values. A list of image arrays can also be provided.
frame_rateint or list[int] or None: Video frame rate (if applicable), which if provided results in a timestamp array being returned. A list of frame rates can also be provided. If None, no timestamp array is returned.
modeltf.keras.Model or None: Loaded model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.
input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used for to train audio part of embedding model. Ignored if model is a valid Keras model.
content_type“music” or “env”: Type of content used to train the embedding model. Ignored if model is a valid Keras model.
embedding_size8192 or 512: Embedding dimensionality. Ignored if model is a valid Keras model.
batch_sizeint: Batch size used for input to embedding model
verbosebool: If True, prints verbose messages.

Returns

embeddingnp.ndarray [shape=(N, D)]: Array of embeddings for each frame.
timestampsnp.ndarray [shape=(N,)]: Array of timestamps for each frame. If frame_rate is None, this is not returned.

openl3.core.get_output_path(filepath, suffix, output_dir=None)[source]¶

Returns path to output file corresponding to the given input file.

Parameters

filepathstr: Path to audio file to be processed
suffixstr: String to append to filename (including extension)
output_dirstr or None: Path to directory where file will be saved. If None, will use directory of given filepath.

Returns

output_pathstr: Path to output file

openl3.core.preprocess_audio(audio, sr, hop_size=0.1, input_repr=None, center=True, **kw)[source]¶

Preprocess the audio into a format compatible with the model.

Parameters

audionp.ndarray [shape=(N,) or (N,C)] or list[np.ndarray]: 1D numpy array of audio data or list of audio arrays for multiple inputs.
srint or list[int]: Sampling rate, or list of sampling rates. If not 48kHz audio will be resampled.
hop_sizefloat: Hop size in seconds.
input_reprstr or None: Spectrogram representation used for model. If input_repr, is None, then no spectrogram is computed and it is assumed that the model contains the details about the input representation.
centerboolean: If True, pads beginning of signal so timestamps correspond to center of window.

Returns

input_data (np.ndarray): The preprocessed audio. Depending on: the value of input_repr, it will be np.ndarray[batch, time, frequency, 1] if a valid input representation is provided, or np.ndarray[batch, time, 1] if no input_repr is provided.

openl3.core.process_audio_file(filepath, output_dir=None, suffix=None, model=None, input_repr=None, content_type='music', embedding_size=6144, center=True, hop_size=0.1, batch_size=32, overwrite=False, frontend='kapre', verbose=True)[source]¶

Computes and saves L3 embedding for a given audio file

Parameters

filepathstr or list[str]: Path or list of paths to WAV file(s) to be processed.
output_dirstr or None: Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.
suffixstr or None: String to be appended to the output filename, i.e. <base filename>_<suffix>.npz. If None, then no suffix will be added, i.e. <base filename>.npz.
modeltf.keras.Model or None: Loaded model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.
input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used as model input. Ignored if model is a valid Keras model with a Kapre frontend. This is required with a Librosa frontend.
content_type“music” or “env”: Type of content used to train the embedding model. Ignored if model is a valid Keras model.
embedding_size6144 or 512: Embedding dimensionality. Ignored if model is a valid Keras model.
centerboolean: If True, pads beginning of signal so timestamps correspond to center of window.
hop_sizefloat: Hop size in seconds.
batch_sizeint: Batch size used for input to embedding model
overwritebool: If True, overwrites existing output files
frontend“kapre” or “librosa”: The audio frontend to use. By default, it will use “kapre”.
verbosebool: If True, prints verbose messages.

Returns

openl3.core.process_image_file(filepath, output_dir=None, suffix=None, model=None, input_repr='mel256', content_type='music', embedding_size=8192, batch_size=32, overwrite=False, verbose=True)[source]¶

Computes and saves L3 embedding for a given image file

Parameters

filepathstr or list[str]: Path or list of paths to image file(s) to be processed.
output_dirstr or None: Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.
suffixstr or None: String to be appended to the output filename, i.e. <base filename>_<suffix>.npz. If None, then no suffix will be added, i.e. <base filename>.npz.
modeltf.keras.Model or None: Loaded model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.
input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used for model. Ignored if model is a valid Keras model.
content_type“music” or “env”: Type of content used to train the embedding model. Ignored if model is a valid Keras model.
embedding_size8192 or 512: Embedding dimensionality. Ignored if model is a valid Keras model.
batch_sizeint: Batch size used for input to embedding model
overwritebool: If True, overwrites existing output files
verbosebool: If True, prints verbose messages.

Returns

openl3.core.process_video_file(filepath, output_dir=None, suffix=None, audio_model=None, image_model=None, input_repr=None, content_type='music', audio_embedding_size=6144, audio_center=True, audio_hop_size=0.1, image_embedding_size=8192, audio_batch_size=32, image_batch_size=32, audio_frontend='kapre', overwrite=False, verbose=True)[source]¶

Computes and saves L3 audio and video frame embeddings for a given video file

Note that image embeddings are computed for every frame of the video. Also note that embeddings for the audio and images are not temporally aligned. Please refer to the timestamps in the output files for the corresponding timestamps for each set of embeddings.

Parameters

filepathstr or list[str]: Path or list of paths to video file(s) to be processed.
output_dirstr or None: Path to directory for saving output files. If None, output files will be saved to the directory containing the input file.
suffixstr or None: String to be appended to the output filename, i.e. <base filename>_<modality>_<suffix>.npz. If None, then no suffix will be added, i.e. <base filename>_<modality>.npz.
audio_modeltf.keras.Model or None: Loaded audio model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.
image_modeltf.keras.Model or None: Loaded audio model object. If a model is provided, then input_repr, content_type, and embedding_size will be ignored. If None is provided, the model will be loaded using the provided values of input_repr, content_type and embedding_size.
input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used for audio model. Ignored if model is a valid Keras model with a Kapre frontend. This is required with a Librosa frontend.
content_type“music” or “env”: Type of content used to train the embedding model. Ignored if model is a valid Keras model.
audio_embedding_size6144 or 512: Audio embedding dimensionality. Ignored if model is a valid Keras model.
audio_centerboolean: If True, pads beginning of audio signal so timestamps correspond to center of window.
audio_hop_sizefloat: Hop size in seconds.
image_embedding_size8192 or 512: Video frame embedding dimensionality. Ignored if model is a valid Keras model.
audio_batch_sizeint: Batch size used for input to audio embedding model
image_batch_sizeint: Batch size used for input to image embedding model
audio_frontend“kapre” or “librosa”: The audio frontend to use. By default, it will use “kapre”.
overwritebool: If True, overwrites existing output files
verbosebool: If True, prints verbose messages.

Returns

Models functionality¶

openl3.models.get_audio_embedding_model_path(input_repr, content_type)[source]¶

Returns the local path to the model weights file for the model with the given characteristics

Parameters

input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used for model.
content_type“music” or “env”: Type of content used to train embedding.

Returns

output_pathstr: Path to given model object

openl3.models.get_image_embedding_model_path(input_repr, content_type)[source]¶

Returns the local path to the model weights file for the model with the given characteristics

Parameters

input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used for model.
content_type“music” or “env”: Type of content used to train embedding.

Returns

output_pathstr: Path to given model object

openl3.models.kapre_v0_1_4_magnitude_to_decibel(x, ref_value=1.0, amin=1e-10, dynamic_range=80.0)[source]¶: log10 tensorflow function.

openl3.models.load_audio_embedding_model(input_repr, content_type, embedding_size, frontend='kapre')[source]¶

Returns a model with the given characteristics. Loads the model if the model has not been loaded yet.

Parameters

input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used for audio model.
content_type“music” or “env”: Type of content used to train embedding.
embedding_size6144 or 512: Embedding dimensionality.
frontend“kapre” or “librosa”: The audio frontend to use. If frontend == ‘kapre’, then the kapre frontend will be included. Otherwise no frontend will be added inside the keras model.

Returns

modeltf.keras.Model: Model object.

openl3.models.load_audio_embedding_model_from_path(model_path, input_repr, embedding_size, frontend='kapre')[source]¶

Loads a model with weights at the given path.

Parameters

model_pathstr: Path to model weights HDF5 (.h5) file. Must be in format *._<input_repr>_<content_type>.h5 or *._<input_repr>_<content_type>-.*.h5, since model configuration will be determined from the filename.
input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used for audio model.
embedding_size6144 or 512: Embedding dimensionality.
frontend“kapre” or “librosa”: The audio frontend to use. If frontend == ‘kapre’, then the kapre frontend will be included. Otherwise no frontend will be added inside the keras model.

Returns

modeltf.keras.Model: Model object.

openl3.models.load_image_embedding_model(input_repr, content_type, embedding_size)[source]¶

Returns a model with the given characteristics. Loads the model if the model has not been loaded yet.

Parameters

input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used for audio model.
content_type“music” or “env”: Type of content used to train embedding.
embedding_size8192 or 512: Embedding dimensionality.

Returns

modeltf.keras.Model: Model object.

openl3.models.load_image_embedding_model_from_path(model_path, embedding_size)[source]¶

Loads a model with weights at the given path.

Parameters

model_pathstr: Path to model weights HDF5 (.h5) file.
embedding_size6144 or 512: Embedding dimensionality.
input_repr“linear”, “mel128”, or “mel256”: Spectrogram representation used for audio model.
content_type“music” or “env”: Type of content used to train embedding.
embedding_size8192 or 512: Embedding dimensionality.

Returns

modeltf.keras.Model: Model object.