SchemaBlocks aims to translate the work of the workstreams into data models that:
After discussions with stakeholders from GA4GH work streams and driver projects who create data models (such as Phenopackets, Search API) or who would use SchemaBlocks for the development of their APIs and data exchange formats (Beacon, EGA, GeL), the SchemaBlocks team has come up with the following principles for this initiative:
Work streams will continue to create standards proposals and their own coherent project implementations, but will work with the SchemaBlocks group to write the Blocks that will come from their own work and are considered of overarching use. Generally, primary work stream and driver project outputs will live in their own spaces outside of SchemaBlocks, with shareable, mature elements - code, documentation, implementation snapshots - being represented in {S}[B].
This process will allow the experts from the various work streams to be in charge of how their models are represented while ensuring alignment with the rest of the Blocks.
Additionally to provide a place for schema elements, SchemaBlocks also collects input from work streams about standard formats and best practices, e.g. the use of genome coordinates in GA4GH projects.
Driver Projects, including ELIXIR Beacon, GeL, EGA, and HCA have requested several schemas for developing implementations. Common requirements include variant representation and annotation, data use conditions, and “phenopackets”. These request reflect the practical need by projects dealing with applications in such areas as data search and exchange, attribute scoping, etc. More use cases will be gathered during the future development process.
On the technical side, SchemaBlocks does not intend to produce a single, complete schema specification for universal use. Representations of schema “blocks” will be implemented using JSON Schema - including inline documentation and examples - as well as tooling for integrity checks and transformations.
A light-weight process (at minimum JSON Schema conformity checks using a dedicated linter) will be used to ensure consistent quality across all Blocks. The technical level of schema maturity as well as its adoption in GA4GH ecosystem products and standards and will be documented.
These Blocks can then be used by other work streams to ensure alignment across products. For example, Beacon or Search API could use the phenotype Blocks to allow searching of phenotypic information; a group wanting to create a storage format for large amounts of phenotypic information could use the same Blocks. Driver Projects may use Blocks to develop data exchange formats, or in other parts of their development processes to ease adoption of GA4GH community standards. Since product teams may rely on different programming languages and schema description formats, we expect that the structure of {S}[B] Blocks may have to be translated between implementations, either manually or - increasingly - using automatic conversions.
This process was designed to be simple while providing a solution to many existing needs within GA4GH and its community. We welcome your thoughts and feedback. Ideas should preferably lead to issues in one of the “Github Issues” trackers: