Page tree
Skip to end of metadata
Go to start of metadata

Introduction

This allows to synchronize two openBIS instances. One instance (called Data Source) provides the data (meta-data and data sets). The other instance (called Harvester) grabs these data and makes them available. In regular time intervals the harvester instance will synchronize its data with the data on the data source instance. That is, synchronization will delete/add data from/to the harvester instance. The harvester instance can synchronize only partially. It is also possible to gather data from several data-source instances.

Data Source

The Data Source instance provides a service based on the ResourceSync Framework Specification (see http://www.openarchives.org/rs/1.1/resourcesync). This service is provided as core plugin module openbis-sync which has a DSS service based on Service Plugins.

This DSS service access the main openBIS database directly. If the name of this database isn't {{openbis_prod}} the property database.kind in DSS service.properties should be defined with the same value as the same property in AS service.properties. Example:

servers/openBIS-server/jetty/etc/plugin.properties
...
database.kind = production
...
servers/datastore_server/etc/plugin.properties
...
database.kind = production
...


The URL of the service is <DSS base URL>/datastore_server/re-sync. The returned XML document looks like the following:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/">
  <rs:ln href="https://localhost:8444/datastore_server/re-sync/?verb=about.xml" rel="describedby"/>
  <rs:md capability="description"/>
  <url>
    <loc>https://localhost:8444/datastore_server/re-sync/?verb=capabilitylist.xml</loc>
    <rs:md capability="capabilitylist"/>
  </url>
</urlset>

The loc element contains the URL which delivers a list of all capabilities:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/">
  <rs:ln href="https://localhost:8444/datastore_server/re-sync/?verb=about.xml" rel="up"/>
  <rs:md capability="capabilitylist" from="2013-02-07T22:39:00"/>
  <url>
    <loc>https://localhost:8444/datastore_server/re-sync/?verb=resourcelist.xml</loc>
    <rs:md capability="resourcelist"/>
  </url>
</urlset>

From capabilities described in the ResourceSync Framework Specification only resourcelist is supported. The resourcelist returns an XML with all metadata of the data source openBIS instance. This includes master data, meta data including file meta data.

Two optional URL parameters filter the data by spaces:

  • black_list: comma-separated list of regular expressions. All entities which belong to a space which matches one of the regular expressions of this list will be suppressed.
  • white_list: comma-separated list of regular expressions. If defined only entities which belong to a space which matches one of the regular expressions of this list will be delivered (if not suppressed by the black list).

Remarks:

  • Basic HTTP authentication is used for authentication.
  • The resourcelist capability returns only data visible for the user which did the authentication.

Harvester

In order to get the data and meta-data from a Data Source openBIS instance a DSS harvester maintenance task has to be configured on the Harvester openBIS instance. This maintenance task reads another configuration file each time the task is executed.

plugin.properties
class = ch.ethz.sis.openbis.generic.server.dss.plugins.sync.harvester.HarvesterMaintenanceTask
interval = 1 d
harvester-config-file = ../../data/harvester-config.txt

The only specific property of HarvesterMaintenanceTask is harvester-config-file which is absolute or relative path to the actual configuration file. This separation in two configuration files has been done because plugin.properties is only read once (at start up of DSS). Thus changes in Harvester configuration would be possible without restarting DSS.

This DSS service access the main openBIS database directly in order to synchronize timestamps and users. If the name of this database isn't {{openbis_prod}} the property database.kind in DSS service.properties should be defined with the same value as the same property in AS service.properties. Example:

servers/openBIS-server/jetty/etc/plugin.properties
...
database.kind = production
...
servers/datastore_server/etc/plugin.properties
...
database.kind = production
...

Harvester Config File

Here is an example of a typical configuration:

harvester-config.txt
[DS1]

resource-list-url = https://<data source host>:<DSS port>/datastore_server/re-sync

data-source-openbis-url = https://<data source host>:<AS port>/openbis/openbis
data-source-dss-url = https://<data source host>:<DSS port>/datastore_server
data-source-auth-realm = OAI-PMH
data-source-auth-user = <data source user id>
data-source-auth-pass = <data source password>
space-black-list = SYSTEM
space-white-list = ABC_.*

harvester-user = <user id>
harvester-pass = <password>

keep-original-timestamps-and-users = false
harvester-tmp-dir = temp
last-sync-timestamp-file = ../../data/last-sync-timestamp-file_HRVSTR.txt
log-file = log/synchronization.log

email-addresses = <e-mail 1>, <e-mail 2>, ...

translate-using-data-source-alias = true

  • The configuration file can have one or many section for each openBIS instance. Each section start with an arbitrary name in square brackets.
  • <data source host><DSS port> and <AS port> have to be host name and ports of the Data Source openBIS instance as seen by the Harvester instance.
  • <data source user id> and <data source password> are the credential to access the Data Source openBIS instance. Only data seen by this user is harvested.
  • space-black-list and space-white-list have the same meaning as black_list and white_list as specified above in the Data Source section.
  • <user id> and <password> are the credential to access the Harvester openBIS instance. It has to be a user with instance admin rights.
  • Temporary files created during harvesting are store in harvester-tmp-dir which is a path relative to the root of the data store. The root store is specified by storeroot-dir in DSS service.properties. The default value is temp.
  • By default the original timestamps (registration timestamps and modification timestamps) and users (registrator and modifier) are synchronized. If necessary users will be created. With the configuration property  keep-original-timestamps-and-users = false no timestamps and users will be synchronized. 
  • The last-sync-timestamp-file is a relative or absolute path to the file which store the last timestamp of synchronization.
  • The log-file is a relative or absolute path to the file where synchronization information is logged. This information does not appear in the standard DSS log file.
  • In case of an error an e-mail is sent to the specified e-mail addresses.
  • translate-using-data-source-alias is a flag which controls whether the code of spaces, types and materials should have a prefix or not. If true the prefix will be the name in the square bracket followed by an underscore. The default value of this flag is false.








  • No labels