This document describes how to configure a simple Search and Transfer workflow.
This is correct for Curator 3.2.
Overview
The search and transfer system, also known as Scavenge, is a system for transferring assets based on a search which is done repeatedly in the background. This is mainly controlled by the Persistent - Search & TransferV3 workflow.
This workflow searches all MediaStores for suitable search and transfer targets, and executes them if:
They are due to run according to their repeat after time
They are in their operating schedule window
They are not currently running
They are enabled
The search workflow will search assets for a number of assets, and transfer each one to a designated target MediaStore.
Typical Basic Configuration
In normal use, only some of the possible fields are used. A typical minimum is usually:
AssetTypes = Search types, e.g., image, media, audio
ScavengeEnabled = True
ScavengeMediaStore = Destination MediaStore name
ScavengeRepeatAfter = Repeat after time, eg 01:00:00
ScavengeSearchProcess = Spawn - Search & Transfer
ScavengeSearchString = Solr search string
MediaStore Configuration
The configuration has many possible configuration settings:
Key |
Description |
Example |
---|---|---|
AssetTypes |
A comma separated list of asset types to be searched for. |
audio,media,images |
BatchSize |
The number of child transfers to do at once. |
2 |
ScavengeEnabled |
If False, this search and transfer store will be ignored. |
True |
ScavengeMaxSearchResults |
The number of results to search for in each iteration of Search and Transfer. |
50 |
ScavengeMediaStore |
The name of the store (or stores) to which the assets found will be transferred. |
UPLOAD-PROXY |
ScavengeRepeatAfter |
The time after which to repeat this search. This is in hh:mm:ss format, although it will never repeat less frequently than every minute. If more than 24 hours is required, then days can be prefixed with a dot separating, eg 3.00:00:00 is repeat every three days. |
01:00:00 - every hour 3.00:00:00 - every three days 00:05:00 - every five minutes |
ScavengeSchedule |
The schedule - see below. |
|
ScavengeSearchProcess |
The name of the process definition that will be used for this Search and Transfer. This is usually Spawn - Search & Transfer. This must be present for a search and transfer to work. |
Spawn - Search & Transfer |
ScavengeSearchString |
The search string used for this Search and Transfer. This is a Solr search string. |
ProxyExists_meta_bool:true AND -ProxyUploaded_meta_bool |
ScavengeSetMetadata |
See Set Metadata section below |
|
ScavengeSetMetadataFail | ||
ScavengeSetMetadataPending | ||
ScavengeSetMetadataSuccess | ||
ScavengeSortOrder |
The order in which the results are transferred. This can have multiple values separated by a | and values are prefixed by ~ if they are descending sorts. |
~ingestdate |
ScavengeStartTime |
This is the time relative to which repeat after will repeat from, eg if we want to run every day at 4am. if not specified, it will assume 4am. |
02:00:00 |
ScavengeUseServerTime |
If this is true, we assume all times in this store are specified in the server (ie local) time. If false, all times are UTC. |
True |
Scheduling Configuration
The search and transfer can be configured to be ran only during certain times of day or week. For example, it can only run during the night, or only on weekends.
The format of this is similar to cron, in that it is five numbers separated by spaces:
minute hour dayofmonth month dayofweek
-
Star matches anything
* * * * * means always avaialble
* * * * 1 means only run on monday but at any time
* 10 * * * means run only when the hour is 10 (ie between 10:00 and 10:59)
-
Ranges can be specified
* 4-6 * * * means run when the hour is 4, 5 or 6 - so between 4:00 and 6:59
0-29 * * * * means run during the first half of any hour
0-29 10 * * * means run between 10:00 and 10:29
-
Multiple values (or ranges) can be specified
* 10-11,15 * * * means start at any day when the hour is 10, 11 or 15
-
Day of week is 1-7 (so 6-7 are easy to use)
* * * * 6-7 means only run on the weekend
It is possible to have more than one condition if they are separated by semicolons
* * * * 6-7;* 3-6 * * 1-5 means run any time at the weekend and between 3am and 6:59am on weekdays
If any range matches, this will run.
Potential Problems
The repeat after and start time can affect this. If the repeat after never falls in the permitted schedule, then it will never run. For example:
Schedule of * 10-12 * * * (run between 10:00 and 12:59)
StartTime of 00:00:00 (ie start at midnight)
RepeatAfter of 1.00:00:00 (ie repeat daily)
This means it will run daily at midnight, but since that is not between 10:00 and 12:59 it will never run.
Set Metadata Configuration
The following four keys can be used if you are using Spawn - Search and Transfer V3 as the search process in order to set metadata during or as result of the transfer:
Key |
Description |
---|---|
ScavengeSetMetadata |
Set metadata when the transfer complete regardless of status |
ScavengeSetMetadataFail |
Set metadata only if transfer fails |
ScavengeSetMetadataPending |
Set metadata before starting transfer |
ScavengeSetMetadataSuccess |
Set metadata only if transfer succeeds |
All of these have the same syntax; they just depend on when the metadata is set.
This syntax is a | (bar) separated list of : (colon) pairs as follows:
-
MetadataName:Value1|AnotherMetadataName:Value2
This sets MetadataName to Value1 and AnotherMetadataName to Value2
Some special values are possible:
Value |
Description |
Example |
---|---|---|
$utcnow |
Sets the value to the current time |
TransferFailedDate:$utcnow |
$status |
Sets the value to the status value; this is usually only useful for failed, when it will set it to the status message of the failed process to diagnose the problem later. |
TransferFailReason:$status |
$null |
Clears the value of a Metadata Name and removes it from the database - distinct from setting it to the empty string. |
TemporaryValue:$null |
Example: Setting up a scavenge to apply a fix workflow once to everything
ScavengeSetMetadataFail = FixFailedReason:$status|FixFailed:True
ScavengeSetMetadataSuccess = FixApplied:True
ScavengeSetMetadataPending = FixAttempted:True
ScavengeSearchString = -FixAttempted_meta_boolean:true
ScavengeDestination = APPLY-FIX
Appendix A: Timing Detail
ScavengeStartTime is specified as the time on the day when the process first finds the scavenge. This is only really relevant if the repeat time is not divisible into a whole day, e.g., if we repeat after 25 hours starting at midnight, then the process will run at midnight, then at 1am, then 2am. But if we restart the process, this data is lost, and it will restart at midnight again the next day.
Because start time is usually 4am, this means that hourly configuration will still work as expected; it will run around the hour on the hour. If you wanted the process to run on the half hour, then specifying 00:30:00 for the start time with a hourly repeat will cause it to run at the half hour.