This guide will explain how you can ingest a CSV file that contains listed objects within your AWS S3 Cloud storage.
This gives you the ability to do mass ingests of files that reside in AWS on a grand scale.
The Process works as follows:
- Listing each file of a specific AWS S3 Directory into a text file.
- Converting the text file into a CSV
- Confirming the CSV information and Curator Configuration
- Ingesting the CSV into Curator to create placeholder assets
- Placeholder assets scavenged for Proxy creation/Mediainfo etc
- Assets completed ingest, and are useable for editing etc.
It is assumed that you have followed this guide: How to run AWS list objects - IPV Curator
The above guide will explain how to list the objects within an S3 directory in a text file. If you have not done the above, or don't know how its highly advised you follow the guide linked and then return to this article.
It's also assumed that you have Scavenges that are doing placeholder proxy creation and other steps such as MediaInfo.
If you do not have these setup you will need to configure them otherwise you will be left with just placeholder assets with no proxies/extended metadata.
Once the above guide has been followed, you should have a text file listing all of the objects you want to ingest.
Firstly we must convert the text file into a CSV file to ingest it into Curator.
To do this within Process Engine, confirm you have the following mediastore "BACKCATALOG-INGEST"
If you do not have this store, you will need to add it. Here are the following required Keys and their Values.
<BACKCATALOG-INGEST>
<BucketName>AWS S3 BUCKET NAME</BucketName>
<CSVPath>PATH WHERE YOU WANT THE CSV TO LAND</CSVPath>
<CuratorFolderPath>Library</CuratorFolderPath>
<ExcludeRegex></ExcludeRegex>
<InputType>File</InputType>
<MaxParallel>15</MaxParallel>
<MediaStoreDescription></MediaStoreDescription>
<MediaStoreTemplate>Custom</MediaStoreTemplate>
<UrlDecode>True</UrlDecode>
</BACKCATALOG-INGEST>
CONFIGURATION OPTIONS HERE
Once you have configured it the above mediastore, move on to the next step.
Navigate to the Process Definitions tab and search in the filter for Utility - Start S3 Parsing for Directory
You should then find the listed workflow available, if nothing appears for the search, please reach out to the IPV Tech Ops team who will install the workflow for you.
Next, Select 'Create Instance'
Add the following into the text fields.
Mediastore = "BACKCATALOG-INGEST"
SplitFilesPrefix = ""
SplitFilesRoot = "PATH TO TEXT FILE" EXAMPLE: "\\curatorgateway\Curator\PRODUCTION\Backcatalogtxtfile\"
Once you have run the workflow, you will see an entry for the process within the Processes tab which is converting the text file into a CSV.
When complete (Green Traffic Light) you will have a CSV file in the directory which you configured earlier. We will use this CSV file to ingest.
Next, on the same Process Definitions, Tab search for Spawn - Parallel Create Placeholders from CSV again selecting 'Create Instance'
Enter the following on the Text fields.
Mediastore = "BACKCATALOG-INGEST"
IngestFileFullPath = "PATH TO YOUR CSV FILE" EXAMPLE : "\\curatorgateway\Curator\PRODUCTION\Backcatalog Csv\test.csv"
Once configured, select the 'Create Instance'
This will then go and create placeholders for each asset within the CSV file within their respective Curator Folders.
If configured the Scavenges which are creating Proxies etc should pick up these placeholders to complete their ingest process.
You may want to look into setting up separate scavenges to prioritise these ingests.
Note that you cannot create your own CSV files and ingest them, the workflow creates the CSVs in a very specific format to enable ingest, attempting to create your own CSV files to ingest will likely cause them to fail or other issues.
Please only use the process above to ingest CSV files.