VTT Ingest Configuration

Updated on December 8th, 2022

Table of Contents

Assumptions VTT Ingest Add Watchfolder Configure Caption Ingest MediaStore Caption Processing Configuration Convert Subclips to VTT Configuration Translation Configuration SetMetadata Known Problems and Limitations Known Bugs Testing problems Integration Limitations Updates 2021-01-21

This document describes the configuration requirements for VTT Ingest.

THIS IS A WORK IN PROGRESS

Assumptions

We assume there is an existing Curator 3.1 system such that:

All proxies are made using the PROXY-V3 MediaStore

VTT Ingest

At this time we only support VTT ingest by watchfolder.

Add Watchfolder

In order to configure VTT Ingest, a new feature of the Ingest File - Spawn by Type workflow needs to be used which allows us to configure additional types of ingest. We will need to add the following setting to the XChange Manager MediaStore, e.g., XCHANGEMANAGER-INGEST-TO-CURATOR

AdditionalFileTypes = vtt:FILEINGEST-CAPTION

This instructs the workflow to use the FILEINGEST-CAPTION MediaStore for ingest of VTT files.

Note that this will only work if VTT is not included in any known file type, so it may have to be removed from FileTypesOther in the GLOBAL MediaStore.

Configure Caption Ingest MediaStore

To manage the caption file ingest procedure, the FILEINGEST-CAPTION MediaStore is used.

This will need to be created, with the following settings:

Key	Value	Description
CaptionsStore	PROCESS-CAPTIONS	The name of the store that contains information about how caption data is stored (see the next section for details).
FilePathRegex	([^\\\/]+)_subtitle(?:_?([a-zA-Z]))?(\.(?:vtt\|srt))$*	A regex that describes how to extract metadata from the file name. In this example, we look for files that of the format (name)_subtitle_(lang).vtt - where _lang is optional. Capture groups are defined for the name, the language code and the file extension, which are named in the FilePathRegexMetadataNames entry.
FilePathRegexMetadataNames	BaseFileName;Language;FileExtension	A semicolon or bar separated list of names for extracted data from the regex. Each name is assigned to a capture group in the pattern in order; a name can be left blank if there is a capture group that is not required.
IngestWorkflow	Spawn - Ingest Caption File	The name of the workflow to use for ingesting. This should not be changed.
LanguageCode	{Language}	The pattern find out the language of the subtitles. By default, this value is used which assumes a field called Language was extracted in the regex.
UpdateMatchMetadata	Name:{BaseFileName}	A pattern matching system to match the file with the source asset. In this case, we are looking for anything that has the same Name as the part of the file path from before the _subtitle - as extracted from the FilePathRegex.
CaptionMetadataBlobName	CaptionMetadataBlob	This needs to match the name of the metadata field configured above

Caption Processing Configuration

Once the file is ingested, it will use the caption processing data to store the file and convert it into subclips, and add it to a proxy. The PROCESS-CAPTIONS MediaStore contains configuration for these parts of the workflow.

Key	Value	Description
AddToProxy	PROXY-V3	The names of the Proxy stores to which a processed VTT will be added.
CaptionMetadataBlobName	default:CaptionData\|es:CaptionDataEs\|fr:CaptionDataFr	A bar separated list of LanguageCode:MetadataName pairs. For this, captions with a given langauge code will be stored in a given blob. Default is used for captions that do not have a language code.
CreateSubclips	True	This defaults to true; If this is false, no subclips will be created when the caption data is processed using this store.
DefaultLanguages	default\|fr\|es	A bar separated list of default languages to use if the workflow is not given a specific language (eg if this is the target of a transfer).
DeleteExistingSubclips	True	This defaults to true, but can be set to false to disable deleting existing subclips when processing caption data.
SourceRequired	False	Used by the Transfer system, this indicates that there does not need to be any source configured to Transfer to this MediaStore.
StoreType	Dynamic	Used by the Transfer system, this indicates that all tracking will be done by the workflows.
SubclipCaptionMetadataName	default:CuratorClosedCaption\|es:ClosedCaptionEspanol\|fr:ClosedCaptionFrancais	A bar separated list of pairs of LanguageCode:MetadataName. This defines the metadata name used for the text for subclips made when processing caption data for the given language.
VTTIdentifierMetadataName	VTTIdentifier	For subclips, this is metadata field stores the identifier (if any) of the VTT block used to create this subclip.
VTTSettingsMetadataName	VTTSettings	For subclips, this is the metadata field used to store the settings information (if any) for the VTT block used to create the subclip.
Workflow	Spawn - Process Caption Data	Used by the Transfer system, this is the name of the workflow used to process the caption data. This should not be changed.

Convert Subclips to VTT Configuration

After editing caption data or after translation, we need an additional store to convert back to the VTT metadata blob.

This is another new MediaStore, e.g., PROCESS-CAPTION-SUBCLIPS.

Key	Value	Description
AddToProxy	PROXY-V3	This is a list of proxy stores to add a VTT file to when converting subtitles to a VTT file.
CaptionsStore	PROCESS-CAPTIONS	This stores the name of another MediaStore which contains the settings for Captions, such as the name of the VTT blobs and metadata names used for the subclips.
DefaultLanguages	default\|fr\|es	A bar separated list of language codes. When no specific language is given to the workflow, eg when initated by transfer, this will attemp to convert subclips for all of these languages.
SourceRequired	False	Used by the Transfer system; this indicates that there does not need any source transfer for this store.
StoreType	Dynamic	Used by the Transfer system, this indicates that all tracking will be done by the workflows.
Workflow	Spawn - Convert Caption Subclips to VTT	Used by the Transfer system, this is the name of the workflow that will be used to do the transfer. This value should not be changed.

Translation Configuration

Once ingest is configured, Translation can be configured as well. Translation is done for a single language in a MediaStore.

Example: TRANSLATE-SUBTITLE-ES.

Key	Value	Description
AWSConfigStore	AWS	AWS is used by default; this store contains the AWS credentials used by the translate workflow.
ResultLanguage	es	The language code for the result language.
ResultMetadataName	ClosedCaptionEspanol	The metadata name used to store the translation result.
SetMetadata	CaptionMultiLanguage:True	Sets the multilanguage flag so we don’t delete this subclip if we reingest.
SourceLanguage	en	The source language.
SourceMetadataName	CuratorClosedCaption	The metadata name that contains the source data.
SourceRequired	False	For the Transfer workflow support.
StoreType	Dynamic	For the Transfer workflow support.
TranslationDateMetadataName	TranslateDateEs	The name of a metadata field used to store the translation date on the asset.
TranslateMaxLineLength	50	The maximum line lenght if reformatting lines using the SingleLine strategy
TranslateMaxParallel	5	The maximum number of subclips to translate at once.
TranslateProcess	Spawn - AWS Translate Metadata Fields	The process used to do the actual translation.
TranslateStrategy	Simple	The translation strategy - either Simple or SingleLine.
VTTConversionMediaStore	PROCESS-CAPTION-SUBCLIPS	This is used to process the caption data after it is translated; for example to generate a new VTT file for the proxies, or to store a VTT file as a blob on the asset.
Workflow	Spawn - Translate Subtitle Subclips	For the Transfer workflow support.

Translation strategies are either Simple, translating the text verbatim - or SingleLine. In SingleLine mode, the translation text first has all line breaks removed, then the text is translated, then it is converted back into multiple lines. For AWS Translate, this can result in slightly better translations as it treats each line as a separate sentence.

As configured above, this will create a new metadata field on existing subclips. This may be undesirable as they may be deleted by the update process. Do we want to make this behaviour default?

In order to instead create new subclips, the following settings would be added to the translation store (e.g., TRANSLATE-SUBTITLE-ES):

Key	Value	Description
CreateNewSubclips	True	Defaults to False. This indicates we should create new subclips rather than adding to old ones.
NewSubclipName	{Name} - es	A metadata pattern that is used for the name of the new subclip.
SetMetadata	VTTIdentifier:{VTTIdentifier}\|VTTSettings:{VTTSettings}\|CaptionSource:AWS Translate\|CaptionLanguageCode:es\|CaptionLanguage:Espanol\|CaptionSourceText:{CuratorClosedCaption}	A bar separated list of metadata to set on the new asset. Each entry consists of MetadataName:MetadataPattern. See below.

SetMetadata

The value for this field is quite important and needs to be managed carefully.

This is a bar separated list of value, and the settings are different for new subclips and shared subclips:

CaptionMultiLanguage:True is required if we are using shared subclips. This means that once translation data is applied to the subclip, it is no longer automatically deleted if we re-process the data; this means we don’t accidentally delete the translation. This should not be included if we are making new subclips.
CaptionSource:AWS Translate and CaptionLanguageCode:es are required if we are making new subclips. This helps the subclip system identify which subclips were created by translation if we want to delete them later.
VTTIdentifier:{VTTIdentifier} and VTTSettings:{VTTSettings} are needed when copying subclips. This makes sure the additional VTT data is copied over from the source subclip, eg if there is positional data.
Additional values can be added this list for reference or for customer specific needs, but they are not required.

Known Problems and Limitations

Known Bugs

VTT is left in watch folder after ingest.

Testing problems

If an asset does not have any subtitles when it is first viewed, Clip Select will cache this information and will not check to see if subtitles have been added to the proxy. This means it can look like there are no subtitles after they are created, but in actual fact it’s just cached the old settings. In this case, reloading the whole page might help; IIS might need to be restarted.

Integration Limitations

Curator AI does not integrate cleanly into this system at the moment.
- AWS Transcribe can be configured to make subclips that match this system, and if so it can be followed by a transfer to PROCESS-CAPTION-SUBCLIPS to convert the subclips into a blob.
Ingest of VTT has to be done after the asset is ingested.
There are no conversion routines at present, so only VTT is supported.
There is currently no Export of VTT, although the subtitle file in the proxy can be used for validation during testing.

Updates

2021-01-21

Additional metadata fields required for subclip identification have been added:
- CaptionSource - Text
- CaptionLanguageCode - Text
- CaptionMultiLanguage - Boolean
An additional key for deleting subclips has been added to the PROCESS-CAPTIONS MediaStore:
- DeleteExistingSubclips - optional, default to true
Additional keys for SetMetadata have been added for translation
- See the section above for more details; this is now required for both creating new subclips and using existing ones.
Additional features are now working:
- Added thumbnails for subclips
- Added deleting of existing subclips on ingest. These will be deleted currently if you do on-clip translation, so handle with care.

Need some help with IPV Curator?

Require Further Access?

Can't find what you're looking for?

VTT Ingest Configuration

Assumptions

VTT Ingest

Add Watchfolder

Configure Caption Ingest MediaStore

Caption Processing Configuration

Convert Subclips to VTT Configuration

Translation Configuration

SetMetadata

Known Problems and Limitations

Known Bugs

Testing problems

Integration Limitations

Updates

2021-01-21

Was this article helpful?

Need some help with IPV Curator?

Require Further Access?

Can't find what you're looking for?

VTT Ingest Configuration

Assumptions

VTT Ingest

Add Watchfolder

Configure Caption Ingest MediaStore

Caption Processing Configuration

Convert Subclips to VTT Configuration

Translation Configuration

SetMetadata

Known Problems and Limitations

Known Bugs

Testing problems

Integration Limitations

Updates

2021-01-21

Was this article helpful?

Related Questions