Header picture

SIMORC Formats 

This page provides up-to-date information and documentation on the metadataformat, the formats, and supporting vocabularies.

Introduction

The SIMORC service consists of a central index metadatabase and a central database of actual data sets, that together are accessible through the Internet. The index metadatabase is public domain, while access to data sets is regulated by a dedicated SIMORC User Licence Agreement. This contains rules for access and use of data sets by scientific users and by others.

Metadata:
The SIMORC metadatabase provides an index to individual data sets, that are stored as individual files in a unified format in the SIMORC database. It also provides links for direct online requests for downloading of data sets from the SIMORC database by registered users and information about the owners of data sets.

For purposes of standardization and international exchange the ISO19115 metadata standard has been adopted. The SIMORC metadata is prepared as a dedicated subset of this standard to make it ISO compliant. For exchange of files the SIMORC metadata format is translated into a SIMORC XML format. This supports the interoperability with other systems and networks. The ISO19115 schema provides the basis and is used as reference model. It is also prescribed in the new European Directive for a Spatial Data Infrastructure in Europe – INSPIRE.

The SIMORC metadata format is defined as a slightly modified version of the Common Data Index (CDI), that has been developed as part of the EU project SeaDataNet (see www.seadatanet.org).

Data:
The database stores and contains metocean data sets, that have undergone quality control and conversion to unified formats, resulting in consistent and high quality, harmonized data sets. The DQC and conversion activities are executed by BODC. Sufficient documentation will be compiled and stored alongside the data sets to accompany each data series so as to ensure that the data are adequately qualified and may therefore be used with confidence by a secondary user. For purposes of standardization all data sets are converted to NetCDF and BODC ASCII formats.

Users can search and request access to data sets using the central index metadatabase and the associated shopping mechanism. Once the data request is ok, then users can download the requested data sets in the indicated format. The downloaded files have a naming convention as b0nnnnnnnn.txt for BODC ASCII and b0nnnnnn.qxf for BODC NetCDF files. The nnnnnn is the SIMORC-Data-ID which is also included in the metadata. In case a user wants to oversee both data files and associated metadata, then the search interface of the central index metadatabase includes the SIMORC-Data-ID in the metadata results table, the metadata results export, the detailed metadata per data set, and also as a search criterium for later look-up.

Data flow:
The SIMORC user interface has been developed by partner MARIS around the SIMORC metadata format. This features an alpha-numeric user interface for searching the metadatabase and locating interesting data sets, and an ordering facility for registered users to submit requests for data use and actual downloading of data sets. This SIMORC Service has been loaded with metadata and data sets, that have been quality controlled and processed by partner BODC. For this purpose BODC has developed an export facility on their in-house data management system for exporting data sets in NetCDF and BODC-ASCII format, while related metadata records are exported in the agreed SIMORC metadata XML format and as an alternative in an Excel metadata format.

Publications:
In addition there might be relevant documents and reports available in digital format, in relation to the data sets. These publication files are stored separately in SIMORC next to the data set files. Publications are described in a publication meta database, following Dublin Core format. Relations are included between the SIMORC metadata of data sets and related publication metadata. Note: these publications might be available at the start of the QC – conversion process, but also might become available in time from scientific users of data sets, e.g. via a thesis report. So the SIMORC system has functionality to add and to include new publications in time.

Basis metadata reference

This guidance document gives a logical description of the SIMORC format, followed by a description of the SIMORC XML schema, which is ISO 19115 compliant and can be considered as a CDI plus format. This includes all XML tags, syntax and semantics, to be used for preparing SIMORC XML records in great detail. It gives an explanation how to apply all documentation to generate SIMORC XML files.

Common Vocabularies

The SIMORC format is supported by a number of Common Vocabularies that cover a broad spectrum of disciplines of relevance to the oceanographic and wider community.. These are available as Web services as part of the EU SeaDataNet project.

Using standardised sets of terms solves the problem of ambiguities associated with data markup and also enables records to be interpreted by computers. This opens up data sets to a whole world of possibilities for computer aided manipulation, distribution and long term reuse. The Common Vocabularies delivered contain the following information for each term:

  • ConceptID - a compact permanent identifier for the term designed for computer storage rather than human readability Term - the text string representing the term in human-readable form
  • Preferred label - a concise text string representing the term in human-readable form where space is limited
  • Alt label - an alternative label
  • Definition - a full description of what is meant by the term
  • All of the vocabularies are fully versioned and a permanent record is kept of all changes made.

The SeaDataNet Vocabulary service is based upon the NERC Vocabulary Service (NVS). This was originally developed in 2006 and operated by BODC. In order to support the requirements of the user community, several enhancements were required to the existing NVS, and therefore a version 2.0 (NVS2.0) was developed. A major upgrade of NVS2.0 consist of a move to the latest version of the World Wide Web Consortium's (W3C) Simple Knowledge Organization System (SKOS) specification for encoding the data dictionaries and taxonomies served through the NVS. There is also a vocabulary Client Interface, that has been developed and is operated by MARIS, to provide users the options to search and browse in the various vocabularies and to make and download export files of selected entries in csv format. The Client Interface always gives the latest updates of the Vocabularies by downloading from the Web service at BODC and loading latest entries into a local buffer for feeding the Search and Browse interface.

Content governance of these vocabularies is very important to stay up-to-date and in sync with ongoing developments. Therefore a combined SeaDataNet and MarineXML Vocabulary Content Governance Group (SeaVoX) has been set up, moderated by BODC, and with active membership from experts from SeaDataNet, MMI, MOTIIVE, JCOMMOPS and more international groups. SeaVox operates by mailing list server.

This Vocabularies Client Interface can be found at:

http://seadatanet.maris2.nl/v_bodc_vocab_v2/welcome.asp

Relevant for SIMORC are:

  • L02: SeaDataNet Geospatial Feature Types (use Preferred label)
  • L03: SeaDataNet_Measurement_Periodicity_Classes (use ConceptID)
  • L05: SeaDataNet device categories (use ConceptID)
  • L06: SeaVox_Platform_Categories (use ConceptID)
  • L07: SeaDataNet_Data_access_mechanisms (use Alt label)
  • L08: SeaDataNet_Data_Access_Restriction_Policies (use ConceptID) (only LI or RS)
  • L10: SeaDataNet_geographic_co-ordinate_reference_frames (use ConceptID)
  • L11: SeaDataNet_depth_measurement_reference_planes (use ConceptID)
  • P02: SeaDataNet_Parameter_Discovery_Vocabulary (use ConceptID)
  • P05: International_Standards_Organisation_ISO19115_Topic_Categories (use Alt label) (only oceans)

For entering Organisations use is made of the SeaDataNet European Directory of Marine Organisations (EDMO), which contains the full addresses of relevant organizations. EDMO is maintained by SeaDataNet partners for their national entries, while MARIS and BODC also can enter international entries from outside Europe.

EDMO is provided by MARIS as a Web service and also as a Client User Interface. More information on EDMO and access to the client user interface can be found at:

http://www.seadatanet.org/metadata/edmo

For entering Publications a dedicated metadatabase has been developed within the framework of SIMORC for publications, that are related to the data sets. Wherever possible, for each publication also a digital file of the publication itself is stored in the SIMORC service and made available together with the data set. The new Publications metadatabase is developed around a format following Dublin Core and specifies the title, author, publication date etc of related publications (analysis reports, thesis, ..). It is maintained by the SIMORC operators (MARIS , BODC) using a dedicated Content Management System (CMS) ( http://www.sea-search.net/vu_publications/welcome.asp).

The Publications metadata format is specified in the following SIMORC document:

XML-ISO19115 xsd file

For purposes of standardization, international exchange and interoperability with other systems and networks it was decided to adopt the International Metadata Standard for Geographic Information ISO19115. This XML ISO19115 schema (or DTD) is defined and managed by the Technical committee TC211 of the International Organization for Standardization (ISO), who is responsible for making international standards on geographic information (www.isotc211.org).

The standard defines more than 300 metadata elements, most of which can be applied optionally. It contains around ten elements, which are mandatory ‘core’ metadata. Moreover one can create profiles and add new elements. The CDI and the SIMORC XML format are defined as subsets of this standard, which is fully ISO19115 compliant.

Example of SIMORC-XML file

DQC methods, NetCDF and BODC ASCII formats for data sets

SIMORC Metadata exchange by Excel file (alternative to XML)

The general principle is that SIMORC metadata records are delivered to MARIS as SIMORC XML files. As an alternative it is also possible to deliver SIMORC metadatarecords by means of a SIMORC-Metadata Excel file, which also makes use of the various Common Vocabularies and EDMO.