Deploying an S3 File Gateway for use with Curator
Introduction
The purpose of this document is to provide some detail on leveraging AWS S3 File Gateway to facilitate SMB access to S3 for Curator microservices. The reason for deploying this functionality is predominantly to improve ingest efficiency. Instead of requiring download of high-resolution content to a local volume, the content remains in S3, and Curator can access the content directly. Additionally, there is no requirement to upload Curator's HLS proxy back to S3, and no subsequent clean-up is required as a result.
Alongside this change in AWS, Curator requires some additional configuration to make use of it. In testing, the efficiency is improved by a factor of approximately 4x (originally tested with Curator Arrival 3.2 update 1 workflows), even on a development spec system with two channels of XCode.
Caveats
IPV is working toward offering a more comprehensive AWS S3 integration, however, currently, it is not possible to restore from Glacier or deeper storage tier to an accessible storage tier (e.g. S3 Standard) without copying the file to block storage in the same process. As such, we cannot make archived files available in the S3 Storage Gateway. Therefore, this functionality should be limited to ingest only. Any back catalog content should remain in an accessible storage tier. For all other functions, like remote conform and re-ingest of assets, for example, the standard functionality of restoring the high-resolution to block storage should remain in place.
We have made our best efforts to test this functionality as comprehensively as possible but testing it "like production" is very difficult, so making the solution easy to undo (if required) was of paramount importance.
Solution
As previously explained, by using the S3 File Gateway, Curator microservices can access S3 via SMB. As we do not want to lose the existing configuration, a separate configuration path will be set up so that the change can be easily reverted if desired - and to maintain the current functionality for re-ingesting and remote conforms.
Because no restore is required, there is no requirement to perform all functions for generating a proxy, technical metadata at once - essentially the two processes can be separate instead of intrinsically linked. This means that there can be a process dedicated to generating proxies, instead of being part of a monolithic ingest process which currently is set up as follows (each of these processes happens sequentially):
- Generate placeholder
- Restore high-resolution from S3 to block storage
- Generate proxy on block storage
- Generate technical metadata
- Copy the proxy from block storage to S3
- Delete proxy from block storage
- Delete high-resolution from block storage
When using an S3 File Gateway, the process is as follows:
- Generate placeholder
- Modify asset with S3 File Gateway Path
- Parallel Processes: Generate proxy, Generate technical metadata
There are no changes to core workflows required for this to function, it is purely configuration. Because the S3 File Gateway cache has a minimum automatic refresh interval of 5 minutes, we ensure the "Modify asset" process will only run against placeholder assets that have an ingest date/time of at least 6 minutes prior. Subsequent scavenges for proxy and technical metadata generation are based on whether a given asset has a value in the S3FileGatewayPath metadata field.
Method
For the testing environment, a public endpoint was selected and only a private IP assigned. The File Gateway was then activated using the private IP from a server within the same subnet.
For the storage cache, testing was made against the minimum (150GB). I believe that the cache is predominantly used for on-prem deployments as a way of reducing egress (by keeping a local copy available on disk to reduce fetching from S3).
For the SMB credentials, the "Guest" access was used the password can be set from the console and will need to be provided to IPV.
Create an S3 File Gateway
https://docs.aws.amazon.com/filegateway/latest/files3/create-gateway-file.html
Activating the Gateway
https://docs.aws.amazon.com/filegateway/latest/files3/gateway-private-link.html
Create an SMB Share (one for each bucket)
https://docs.aws.amazon.com/filegateway/latest/files3/CreatingAnSMBFileShare.html
Adding SMB File Gateway share credentials to Windows
- Log into any server using the ipvservice account that requires access to the File Gateway, this will typically be Transform and Transfer servers, as well as Process Engine where Curator Media Agent is deployed for generating thumbnails.
- In Powershell: cmdkey /add:<gatewayipaddress> /user:smbguest /pass:<providedpassword>
view of disruptive innovation via workplace diversity and empowerment.
Curator Configuration
The configuration for S3FG within Curator is now built in with our Cloud Formation Scripts, however the manual steps for setting up an existing Curator System (or new that isn’t deployed using Cloud Formation) to use S3FG are as follows:
Curator Gateway API Route (/proxies/{catchAll})
- This should be configured to look at your proxy bucket in S3, example below:
- Host in this example is: curator-proxy-testgold1-sandbox.s3.amazonaws.com
Curator Client UIs (Clip Link, Curator Logger, Curator for Adobe)
- These should be configured as per OOTB standard to utilise the Curator Gateway API for proxy playback.
New Metadata Fields
- S3FileGatewayFolderPath (data type FilePath)
- S3FileGatewayPath (data type FilePath)
New MediaStores
ANALYSE-C300-CARD-S3FG (clone ANALYSE-C300-CARD with 1 change):
- Source key set to HI-RES-S3FG
ANALYSE-P2-CARD-S3FG (clone ANALYSE-P2-CARD with 1 change):
- Source key set to HI-RES-S3FG
ANALYSIS-F3SG (clone ANALYSIS with 2 changes):
- SelectTransferDefaultStore key set to MEDIAINFO-S3FG
- SelectTransferTable key set to c300;media:MEDIAINFO-S3FG|p2;media:ANALYSE-P2-CARD-S3FG|red;media:REDINFO-S3FG|*;scratchpad:#none|*;image:#none
EXIFMETADATA-S3FG (clone EXIFMETADATA with 1 change):
- Source key set to HI-RES-S3FG
MEDIAINFO-S3FG (clone MEDIAINFO with 1 change):
- Source key set to HI-RES-S3FG
REDINFO-S3FG (clone REDINFO with 1 change):
- Source key set to HI-RES-S3FG
PROXY-CARD-V3-S3FG (clone of PROXY-CARD-V3 with multiple changes):
- FolderPathMetadataKey key set to WebProxyPath
- Path key updated to point at the S3FG server/share
- PurgeRequiredMetadata key added with a value of S3WebProxyExists:True
- Source key set to HI-RES-S3FG
PROXY-SELECT-S3FG (new MediaStore):
Note: MediaStoreTemplate: Proxy
PROXY-V3-S3FG (clone of PROXY-V3 with multiple changes:
- FolderPathMetadataKey key set to WebProxyPath
- Path key updated to point at the S3FG server/share
- PurgeRequiredMetadata key added with a value of S3WebProxyExists:True
- Source key set to HI-RES-S3FG
HI-RES-S3FG (new MediaStore although could be cloned from HI-RES for ease):
- PathMetadataKey key set to S3FileGatewayPath
The following MediaStores are used as part of S3FG configuration where a user might upload a file using Curator Connect to S3 which creates an asset with the S3HiResPath pointing to the location of the file. These MediaStores will convert that path into an S3FG Path so that our ingest workflows can use it.
SCAVENGE-MODIFY-ASSET-S3FG (new MediaStore):
Note: MediaStore Template: SearchAndTransfer
MODIFY-ASSET-S3FG (new MediaStore):
- Be sure to set the ExtractRegex and SetMetadataValues keys to values respective to your S3 host and S3FG path.
- Note: MediaStoreTemplate: Custom
MODIFY-ASSET-S3FG-REVERSE (new MediaStore):
- Be sure to set the ExtractRegex and SetMetadataValues keys to values respective to your S3 host and S3FG path.
- Note: MediaStoreTemplate: Custom
MODIFY-ASSET-CAMERA-CARD-S3FG (new MediaStore):
- Be sure to set the ExtractRegex and SetMetadataValues keys to values respective to your S3 host and S3FG path.
- Note: MediaStoreTemplate: Custom
SCAVENGE-PLACEHOLDER-INGEST-S3FG (new MediaStore):
- Note: MediaStoreTemplate: SearchAndTransfer
Conclusion
This concludes the configuration required on the Curator System required to implement the S3FG into the system.
The above configuration has been set up on a full cloud Curator Arrival 3.5 System, making use of placeholder ingest scavenge. You can also use this configuration on a hybrid Curator System, making sure the ground station Transform & Transfer servers have access to the S3FG server.