The output schema is the main transformation configuration used to process files from incoming data to predetermined output. There are a number of options for creation and configuration of an output schema. This example will go over common ones.
To create an output schema, do these steps:
- In the left navigation bar choose Data Source
- Click on the Output Schemas from the top navigation tabs
- Click on the New button to create a schema. This will bring up a dialog box.
- Enter the name of the schema.
- Acceptable characters are letters, underscores, and numbers
- You can also optionally choose which Data Owners (clients) will have access to the schema
- Click Next
- In the Mappable Fields tab, add all the fields that will be mapped to input AND fields that will be used for output.
- You can also checkbox any attributes to the field here, similar to when configuring the individual fields earlier.
- NOTE: The originalRecord field can be used to pass all the original field in a CSV file if needed
- Click Next
- In the Output tab choose the type of file that will be created, such as json or delimited (csv)
- In this example we will choose delimited with an output delimiter of a comma
- For the output fields click the add button and choose the field to add. You can alias (rename) the field and set format or width depending on the field type, such as if it is a date or a fixed file length
- You may also reorder the fields to what order you want the output field positions to be
- Click on the Options subtab. This tab exposes a number of options that can be used to modify the file. This includes:
- Whether the file should be encrypted or compressed
- Whether records should be dropped or flagged with duplicates
- Whether the original records column will be included, using the originalRecord field
- Any parsing options available
- Address standardization options
- Data append options
- Once done, click on the Filename tab
- Format the file name is needed. You can include variables to create the file name as needed
- Click on the Target tab
- Select the type of location where the data will go once it is processed
- Different output types will have different required parameters. In this example we will send the data to an S3 bucket. Select "S3" as the "Select Output Location".
- Enter the S3 bucket parameters. You may also again use variables to set the directory structure if needed. An example could be:
- /{{client._id}}/{{convertType.name}}/{{file.createDate}}/
- This would create a path using the client id, the output schema name and the file date to partition data
- The last potential tab to choose would be the Code tab. We will not alter any code, but this is used to perform processing that may not come out-of-the-box. The main language is scala, and the Bettrdata team would be happy to get you started if necessary.
- Click the save button to save and the x button on the upper right of the dialog box to finish