Dataset Management in DNAWave
Overview
DNAWave provides a robust dataset management system that allows users to upload, organize, and analyze genomic datasets efficiently. This guide covers how to create, edit, and manage datasets using the DNAWave platform.Creating a Dataset
To create a dataset, navigate to the Dataset Creation Page and fill in the required fields.Required Fields:
- Name: A descriptive name for your dataset.
- Description: Provide details about the dataset contents and purpose.
- Dataset Type: Choose between
Omics(genomics, transcriptomics, proteomics, etc.) orResearch(scientific literature and studies). - Source: Select the storage type (
S3orGoogle Cloud Storage (GCS)). - Storage Path:
- For S3, provide the S3 bucket URI.
- For GCS, select a bucket and choose a file path.
- Keywords: Enter relevant keywords to enhance AI-driven literature recommendations.
- Tags: Categorize datasets for easy retrieval.
Viewing and Managing Datasets
Once a dataset is created, users can view and manage it from the Dataset View Page.Dataset Details Page
The Dataset View Page provides an overview of the dataset, including:- Dataset metadata (Name, Description, Source, Type, Keywords, and Tags).
- Storage details (S3/GCS paths and bucket information).
- Workflow information (Linked workflows and versions).
- Data Quality Score (Validation results for dataset integrity).
- Literature Recommendations (AI-powered research paper suggestions).
- File Management (Upload, Download, Delete, and Share dataset files).
Data Quality Validation
DNAWave includes an automated Data Quality Score Section that evaluates dataset files using key metrics:- Headers: Checks for missing or improperly formatted headers.
- Duplicate Records: Identifies redundant entries.
- Missing Data: Flags incomplete or empty fields.
Dataset Versioning
Users can track dataset modifications with Dataset Version History:- Each version logs changes, including timestamps and modification reasons.
- Users can compare changes across versions.
- Versions are linked to the user responsible for updates.
File Management
DNAWave supports robust dataset file management:- Upload Files: Add new files via the Dataset Upload Page.
- Download Files: Retrieve dataset files securely.
- Delete Files: Remove outdated or unnecessary files.
- Share Files: Generate shareable links with expiration times.
- File Validation: Perform automated quality checks on dataset files.
API Integration
Developers can programmatically manage datasets using the DNAWave REST API. The API allows users to:- Create and update datasets.
- Manage dataset metadata.
- Integrate with external analysis pipelines.

