Data Governance¶

The Open Data Manager (ODM) platform indexes both data and metadata to support efficient search and exploration.

Metadata¶

A structured representation of metadata is stored in ODM’s internal MySQL database.
This acts as a partial copy of metadata, ensuring fast performance for:
- Search
- Filtering
- Data exploration

Raw files (e.g., images, BAM files) are not copied into ODM. Instead, ODM stores pointers to the files in their existing storage locations.

Processed data is stored in a columnar database, enabling it to be indexed and searchable within ODM.

When data files are imported via the GUI or attached to a study (e.g., supplementary documents), they must be uploaded into the platform’s S3 bucket.
This creates a copy of the file in ODM, ensuring accessibility through the user interface.

Type	Example	Details	Copy	Configurability
Attachments	.pdf, .ppt, .h5 – can be anything	ODM indexes basic file metadata (name, date, type, file contents for archives like .zip and .h5)	Copy is always stored in ODM. We use ODM’s S3 bucket.	Can configure to use the customer’s S3 bucket.
Metadata	E.g. .tsv metadata files (e.g. https://s3.amazonaws.com/bio-test-data/odm/Test_1000g/Test_1000g.samples.tsv).	ODM captures and indexes study, sample, library/prep and other metadata.	Copy is always stored in ODM’s databases.	No region separation made, use permissions to control access.
Raw Data	E.g. .fastq, .bam., images, and so on – if it’s not an attachment and not indexed data.	ODM stores and indexes the pointer to the file and nothing else.	No copy is made anywhere.
Indexed Data	E.g., tabular data, VCFs, etc.	ODM indexes most of the data by storing a compressed partial copy of the data columns.	Copy is made.	No configurability.