Adding Datasets (Data Ingest)
You can submit your data files for ingest into the platform when you have completed the metadata for your data asset and dataset. All related datasets belonging to a single DDL submission should be grouped together under a single Data Asset. Think of the asset as a big book containing smaller chapters (datasets) that contain the actual data. See How to Associate a Dataset with a Data Asset for guidance. Your submission is complete once you have created a data asset and dataset records and submitted the related data. Data Services reviews the submission as a whole; it is not necessary to wait for approval of the asset page before associating your datasets with the asset. The Data Services team can only review your submission when your data are uploaded and your submission includes a tabular codebook, and, if applicable, the questionnaire and informed consent documentation.
ADD DATA TO A DATASET RECORD
On the Primer Page for your dataset, select “Add Data” to navigate to your data source.
There are three ways to add data to your data asset. You may Upload a Data File, Import Data from a URL, or Link to an External Data Source.
Upload a Data File
When the option to Upload a Data File is selected, an upload box will appear, giving you the ability to upload a data file by either dragging your data file into the box on the platform or clicking browse to navigate to the file on your computer. In cases where your dataset contains 500 or more columns, the platform will not ingest it onto the website. In this case, you may simply attach the large dataset under “Other Reference Materials” in the metadata. Alternately, you may create a download button so users can download the dataset.
Import Data from a URL
Use this option if your dataset is hosted on an external URL. Once you enter the URL and click Start Import, the DDL will browse to the data source and download the data to the platform.
Link to an External Data Source
Alternatively, you can link to an External Data Source. You will be able to link multiple files to one dataset and link multiple datasets to the overall External Data Source. Using this option, however, you will not ingest your data to the DDL, instead you will use your data asset and dataset to link to a site hosting data developed through USAID activities.
After importing your data by uploading a data file or importing data from a URL, a data table preview will be available for you to review and modify the way the DDL reads your dataset.
Guessing Column Data Types
The import process does its best to guess what each column represents - numbers, text, dates and times or true/false values. It does this by taking a sample from the beginning of a file and trying out every possible type for every column. The data type with the fewest errors is selected.
Review the data types the DDL automatically guesses for you. You can override any guess by making a different selection in the data type dropdown menu, under the column name.
This is particularly important when importing a file with a large number of columns. As mentioned above, during the import process, a sample is taken from the beginning of the file. If the number of columns is quite large, the number of rows sampled may decrease, and the system will be less certain of the correct data type.
Any records with errors recognized by the data upload tool will not be imported at all. More details of the errors can be accessed underneath the column data type:
If the errors cannot be resolved by selecting a different data type in the display above, you can update your dataset in an external tool, then select the new data source to import with by going back to the ‘Choose Data Source’ tab from the ‘Preview’ screen.
Note: After a data source has been imported, changing the data source to Link to an External Source will no longer be possible. The restriction also exists if you start with linking to an External Data Source, you will not be able to change to importing a data source.
If you need to change or import a new version of the source file to this dataset, you can do so from this page. Additionally, from the Data Sources list, you can revert to a previously imported dataset.
Dataset Level Formatting
From the Preview window, you will be able to configure the dataset by designating the Header Row, Column Separator, Character Encoding of the source file and Quote Character. From the sidebar, you can add a new column or a georeferenced location column using location data from the dataset.
Column Level Formatting and Transformations
From the data table preview, you can perform the column level actions: selecting a Data Type, formatting how column data is rendered, performing Data Transforms, removing columns from the dataset, and reordering the columns.
If you do make any changes to your data during this step, it is essential that you also attach a complete, unaltered version of the dataset to your submission as a csv file. Use the option to Upload an Attachment in the Data Detail tab of the metadata workflow next to the heading Other Reference Materials, as shown below.
Add a Georeference (location) column
Tabular datasets (e.g. .csv) files with location data, such as addresses or latitude/longitude, require an extra step after uploading to map the data.
From the data table preview, you can add a georeference (location) column to your dataset using the component columns already in your dataset. There are three tabs that give options to create this column. Lat/Long can be used when you have both a column for latitude and longitude in your dataset. Address (separated) is used when fields such as street, city, state, and zip code are separated out into different columns. Combined Address is used when there is a single column that contains the complete address.
Select the map icon at the top left of the column to preview how the Georeferenced column will display before deciding to publish your dataset. If you find a dataset on the DDL that does not have a georeferenced column but appears to have geospatial data, contact data services for assistance at email@example.com.
If your geospatial data contains Shapefiles, they should be uploaded in one of these formats: .json (GeoJSON), .geojson, .kml, .kmz and .zip (shapefile), as described in this article on Importing Geospatial Data. If you do use .zip files for this purpose, make sure that there are only Shapefiles inside. When uploading a shapefile with many layers, choose one layer for each dataset. You can test the success of the data ingestion by using the map visualization tool as discussed in this article on Publishing and Visualizing Geospatial Data.
Collaboration and Publishing
Prior to publishing the dataset, it will exist as a working copy and can be accessed from the Asset Inventory. This allows you to complete different stages of the dataset at different times and can be completed by different contributors.
To make changes to the dataset, such as adding columns or changing data types, select ‘Review & Configure Data.’ This will bring you to the Data Preview page. Any changes made to the dataset will automatically show up in the revision log on the left side of the page. To accept changes, select ‘Done’ in the bottom right corner of the data table.
SUBMIT FOR REVIEW
Once you are finished creating the dataset and are ready to submit it to the DDL for review, select ‘Submit for Review.’ The DDL will alert you if there are steps you still need to take prior to submission.
The platform will prompt you to select visibility permissions for the dataset - since you should have already specified appropriate access levels in the Risk Utility Assessment tab of the metadata entry form, the Data Services team will use that information to determine the appropriate access level for your dataset.
In this step, select Public to continue. Once this process is complete, you can select Go to Primer to view your new dataset.