Adding Datasets (Data Ingest)
Once the metadata for your data asset and dataset are complete and all your supporting materials are at hand, you are ready to upload your data to the DDL for ingest into the platform. This step serves to validate your data and to make your data available for review by the Data Services team.
On the Primer Page for your dataset, select “Add Data” to navigate to your data source.
There are three ways to add data to your data asset. You may Upload a Data File, Import Data from a URL, or Link to an External Data Source.
Upload a Data File
When the option to Upload a Data File is selected, an upload box will appear, giving you the ability to upload a data file by either dragging your data file into the box on the platform or clicking browse to navigate to the file on your computer.
Import Data from a URL
Use this option if your dataset is hosted on an external URL. Once you enter the URL and click Start Import, the DDL will browse to the data source and download the data to the platform.
Link to an External Data Source
Alternatively, you can link to an External Data Source. You will be able to link multiple files to one dataset and link multiple datasets to the overall External Data Source. Using this option, however, you will not ingest your data to the DDL, instead you will use your data asset and dataset to link to a site hosting data developed through USAID activities.
After importing your data by uploading a data file or importing data from a URL, a data table preview will be available for you to review and modify the way the DDL reads your dataset.
Guessing Column Data Types
The import process does its best to guess what each column represents - numbers, text, dates and times or true/false values. It does this by taking a sample from the beginning of a file and trying out every possible type for every column. The data type with the fewest errors is selected.
Review the data types the DDL automatically guesses for you. You can override any guess by making a different selection in the data type dropdown menu, under the column name.
This is particularly important when importing a file with a large number of columns. As mentioned above, during the import process, a sample is taken from the beginning of the file. If the number of columns is quite large, the number of rows sampled may decrease, and the system will be less certain of the correct data type.
Any records with errors recognized by the data upload tool will not be imported at all. More details of the errors can be accessed underneath the column data type:
If the errors cannot be resolved by selecting a different data type in the display above, you can update your dataset in an external tool, then select the new data source to import with by going back to the ‘Choose Data Source’ tab from the ‘Preview’ screen.
Note: After a data source has been imported, changing the data source to Link to an External Source will no longer be possible. The restriction also exists if you start with linking to an External Data Source, you will not be able to change to importing a data source.
If you need to change or import a new version of the source file to this dataset, you can do so from this page. Additionally, from the Data Sources list, you can revert to a previously imported dataset.
Dataset Level Formatting
From the Preview window, you will be able to configure the dataset by designating the Header Row, Column Separator, Character Encoding of the source file and Quote Character. From the sidebar, you can add a new column or a georeferenced location column using location data from the dataset.
Column Level Formatting and Transformations
From the data table preview, you can perform the column level actions: selecting a Data Type, formatting how column data is rendered, performing Data Transforms, removing columns from the dataset, and reordering the columns.
If you do make any changes to your data during this step, it is essential that you also attach a complete, unaltered version of the dataset to your submission as a csv file. Use the option to Upload an Attachment in the Data Detail tab of the metadata workflow next to the heading Other Reference Materials, as shown below.
Add a Georeference (location) column
From the data table preview, you can add a georeference (location) column to your dataset using the component columns already in your dataset. There are three tabs that give options to create this column. Lat/Long can be used when you have both a column for latitude and longitude in your dataset. Address (separated) is used when fields such as street, city, state, and zip code are separated out into different columns. Combined Address is used when there is a single column that contains the complete address.
Finally, you can run the geocoder right from this window and preview how the column will display before deciding to publish your dataset.
Collaboration and Publishing
Prior to publishing the dataset, it will exist as a working copy and can be accessed from the Asset Inventory. This allows you to complete different stages of the dataset at different times and can be completed by different contributors.
Any changes made to the dataset, such as schema modifications, changing data sources, etc., will show up in the revision log on the right-hand side of the dataset draft page. You will be able to view the changes made to the data schema by clicking on the ‘Schema’ link in the revision log. This will bring you to the data preview version of the changes that were made. To use those settings, select ‘Save’ from the bottom right corner of the data table.
Submit for Review
Once you are finished creating the dataset and are ready to submit it to the DDL for review, select ‘Submit for Review.’ The DDL will alert you if there are steps you still need to take prior to submission.
The platform will prompt you to select visibility permissions for the dataset - since you should have already specified appropriate access levels in the Risk Utility Assessment tab of the metadata entry form, the Data Services team will use that information to determine the appropriate access level for your dataset.
In this step, select Public to continue. Once this process is complete, you can select Go to Primer to view your new dataset.