Configuration Guide: Search and Transfer (Scavenge)

Updated on December 8th, 2022

This document describes how to configure a simple Search and Transfer workflow.

This is correct for Curator 3.2.

Overview

The search and transfer system, also known as Scavenge, is a system for transferring assets based on a search which is done repeatedly in the background. This is mainly controlled by the Persistent - Search & TransferV3 workflow.

This workflow searches all MediaStores for suitable search and transfer targets, and executes them if:

  • They are due to run according to their repeat after time

  • They are in their operating schedule window

  • They are not currently running

  • They are enabled

The search workflow will search assets for a number of assets, and transfer each one to a designated target MediaStore.

Typical Basic Configuration

In normal use, only some of the possible fields are used. A typical minimum is usually:

  • AssetTypes = Search types, e.g., image, media, audio

  • ScavengeEnabled = True

  • ScavengeMediaStore = Destination MediaStore name

  • ScavengeRepeatAfter = Repeat after time, eg 01:00:00

  • ScavengeSearchProcess = Spawn - Search & Transfer

  • ScavengeSearchString = Solr search string

MediaStore Configuration

The configuration has many possible configuration settings:

Key

Description

Example

AssetTypes

A comma separated list of asset types to be searched for.

audio,media,images

BatchSize

The number of child transfers to do at once.

2

ScavengeEnabled

If False, this search and transfer store will be ignored.

True

ScavengeMaxSearchResults

The number of results to search for in each iteration of Search and Transfer.

50

ScavengeMediaStore

The name of the store (or stores) to which the assets found will be transferred.

UPLOAD-PROXY

ScavengeRepeatAfter

The time after which to repeat this search. This is in hh:mm:ss format, although it will never repeat less frequently than every minute. If more than 24 hours is required, then days can be prefixed with a dot separating, eg 3.00:00:00 is repeat every three days.

01:00:00 - every hour

3.00:00:00 - every three days

00:05:00 - every five minutes

ScavengeSchedule

The schedule - see below.

 

ScavengeSearchProcess

The name of the process definition that will be used for this Search and Transfer. This is usually Spawn - Search & Transfer. This must be present for a search and transfer to work.

Spawn - Search & Transfer

ScavengeSearchString

The search string used for this Search and Transfer. This is a Solr search string.

ProxyExists_meta_bool:true AND -ProxyUploaded_meta_bool

ScavengeSetMetadata

 

See Set Metadata section below

 

ScavengeSetMetadataFail

ScavengeSetMetadataPending

ScavengeSetMetadataSuccess

ScavengeSortOrder

The order in which the results are transferred. This can have multiple values separated by a | and values are prefixed by ~ if they are descending sorts.

~ingestdate

ScavengeStartTime

This is the time relative to which repeat after will repeat from, eg if we want to run every day at 4am. if not specified, it will assume 4am.

02:00:00

ScavengeUseServerTime

If this is true, we assume all times in this store are specified in the server (ie local) time. If false, all times are UTC.

True

Scheduling Configuration

The search and transfer can be configured to be ran only during certain times of day or week. For example, it can only run during the night, or only on weekends.

The format of this is similar to cron, in that it is five numbers separated by spaces:

minute hour dayofmonth month dayofweek

  • Star matches anything

    • * * * * * means always avaialble

    • * * * * 1 means only run on monday but at any time

    • * 10 * * * means run only when the hour is 10 (ie between 10:00 and 10:59)

  • Ranges can be specified

    • * 4-6 * * * means run when the hour is 4, 5 or 6 - so between 4:00 and 6:59

    • 0-29 * * * * means run during the first half of any hour

    • 0-29 10 * * * means run between 10:00 and 10:29

  • Multiple values (or ranges) can be specified

    • * 10-11,15 * * * means start at any day when the hour is 10, 11 or 15

  • Day of week is 1-7 (so 6-7 are easy to use)

    • * * * * 6-7 means only run on the weekend

It is possible to have more than one condition if they are separated by semicolons

  • * * * * 6-7;* 3-6 * * 1-5 means run any time at the weekend and between 3am and 6:59am on weekdays

  • If any range matches, this will run.

Potential Problems

The repeat after and start time can affect this. If the repeat after never falls in the permitted schedule, then it will never run. For example:

  • Schedule of * 10-12 * * * (run between 10:00 and 12:59)

  • StartTime of 00:00:00 (ie start at midnight)

  • RepeatAfter of 1.00:00:00 (ie repeat daily)

This means it will run daily at midnight, but since that is not between 10:00 and 12:59 it will never run.

Set Metadata Configuration

The following four keys can be used if you are using Spawn - Search and Transfer V3 as the search process in order to set metadata during or as result of the transfer:

Key

Description

ScavengeSetMetadata

Set metadata when the transfer complete regardless of status

ScavengeSetMetadataFail

Set metadata only if transfer fails

ScavengeSetMetadataPending

Set metadata before starting transfer

ScavengeSetMetadataSuccess

Set metadata only if transfer succeeds

All of these have the same syntax; they just depend on when the metadata is set.

This syntax is a | (bar) separated list of : (colon) pairs as follows:

  • MetadataName:Value1|AnotherMetadataName:Value2

    • This sets MetadataName to Value1 and AnotherMetadataName to Value2

Some special values are possible:


Value

Description

Example

$utcnow

Sets the value to the current time

TransferFailedDate:$utcnow

$status

Sets the value to the status value; this is usually only useful for failed, when it will set it to the status message of the failed process to diagnose the problem later.

TransferFailReason:$status

$null

Clears the value of a Metadata Name and removes it from the database - distinct from setting it to the empty string.

TemporaryValue:$null

Example: Setting up a scavenge to apply a fix workflow once to everything

  • ScavengeSetMetadataFail = FixFailedReason:$status|FixFailed:True

  • ScavengeSetMetadataSuccess = FixApplied:True

  • ScavengeSetMetadataPending = FixAttempted:True

  • ScavengeSearchString = -FixAttempted_meta_boolean:true

  • ScavengeDestination = APPLY-FIX

Appendix A: Timing Detail

ScavengeStartTime is specified as the time on the day when the process first finds the scavenge. This is only really relevant if the repeat time is not divisible into a whole day, e.g., if we repeat after 25 hours starting at midnight, then the process will run at midnight, then at 1am, then 2am. But if we restart the process, this data is lost, and it will restart at midnight again the next day.

Because start time is usually 4am, this means that hourly configuration will still work as expected; it will run around the hour on the hour. If you wanted the process to run on the half hour, then specifying 00:30:00 for the start time with a hourly repeat will cause it to run at the half hour.

Was this article helpful?