Skip to main contentDataset Management in DNAWave
Overview
DNAWave provides a robust dataset management system that allows users to upload, organize, and analyze genomic datasets efficiently. This guide covers how to create, edit, and manage datasets using the DNAWave platform.
Creating a Dataset
To create a dataset, navigate to the Dataset Creation Page and fill in the required fields.
Required Fields:
- Name: A descriptive name for your dataset.
- Description: Provide details about the dataset contents and purpose.
- Dataset Type: Choose between
Omics (genomics, transcriptomics, proteomics, etc.) or Research (scientific literature and studies).
- Source: Select the storage type (
S3 or Google Cloud Storage (GCS)).
- Storage Path:
- For S3, provide the S3 bucket URI.
- For GCS, select a bucket and choose a file path.
- Keywords: Enter relevant keywords to enhance AI-driven literature recommendations.
- Tags: Categorize datasets for easy retrieval.
Viewing and Managing Datasets
Once a dataset is created, users can view and manage it from the Dataset View Page.
Dataset Details Page
The Dataset View Page provides an overview of the dataset, including:
- Dataset metadata (Name, Description, Source, Type, Keywords, and Tags).
- Storage details (S3/GCS paths and bucket information).
- Workflow information (Linked workflows and versions).
- Data Quality Score (Validation results for dataset integrity).
- Literature Recommendations (AI-powered research paper suggestions).
- File Management (Upload, Download, Delete, and Share dataset files).
Data Quality Validation
DNAWave includes an automated Data Quality Score Section that evaluates dataset files using key metrics:
- Headers: Checks for missing or improperly formatted headers.
- Duplicate Records: Identifies redundant entries.
- Missing Data: Flags incomplete or empty fields.
Each metric is displayed with a percentage score, along with a visual progress bar indicating data quality.
Dataset Versioning
Users can track dataset modifications with Dataset Version History:
- Each version logs changes, including timestamps and modification reasons.
- Users can compare changes across versions.
- Versions are linked to the user responsible for updates.
File Management
DNAWave supports robust dataset file management:
- Upload Files: Add new files via the Dataset Upload Page.
- Download Files: Retrieve dataset files securely.
- Delete Files: Remove outdated or unnecessary files.
- Share Files: Generate shareable links with expiration times.
- File Validation: Perform automated quality checks on dataset files.
API Integration
Developers can programmatically manage datasets using the DNAWave REST API. The API allows users to:
- Create and update datasets.
- Manage dataset metadata.
- Integrate with external analysis pipelines.
Refer to the API Documentation for further details.
Summary
The DNAWave Dataset Management System streamlines genomic data handling by providing an intuitive UI, secure storage integration, and AI-powered insights. Start managing your datasets today to accelerate your genomic research!