How to configure openBIS to make sure that the DSS is still able to serve any data set even if the DSS file got corrupt or is even completely unavailable?
Currently we do not have a fully out-of-the-box solution to configure a warm failover. However, combining the openBIS features "second copy" and "multi-share data stores", it is possible to configure "warm failover" with only few configuration steps, see details below.
Here are steps necessary to configure warm failover:
Add or extend post-registration task in
service.properties
of DSS:maintenance-plugins = post-registration post-registration.class = ch.systemsx.cisd.etlserver.postregistration.PostRegistrationMaintenanceTask post-registration.interval = 60 post-registration.cleanup-tasks-folder = ../../cleanup-tasks post-registration.ignore-data-sets-before-date = 2010-01-01 post-registration.last-seen-data-set-file = ../../last-seen-data-set-for-postregistration.txt post-registration.post-registration-tasks = second-copy post-registration.second-copy.class = ch.systemsx.cisd.etlserver.postregistration.SecondCopyPostRegistrationTask post-registration.second-copy.destination = ${storeroot-dir}/2
Please make sure you are NOT defining a second
PostRegistrationMaintenanceTask
in service.properties
or a core plugin. If there is already a PostRegistrationMaintenanceTask
just extend it with SecondCopyPostRegistrationTask
.
Add a symbolic link at the store root level. The TARGET argument of the
ln
command must be the actual path to a local directory for the second copies and the LINK_NAME must be N (in this example '2'). For example:$ pwd /raid/data/store $ ln -s /<path-to-second-copy> 2
Add
share.properties
to the share with propertyignored-for-shuffling
set totrue
. For example:$ echo "ignored-for-shuffling = true" > 2/share.properties
When you successfully setup for second copy you have two store directories with the same sharding structure which are both mounted locally:
├── 04 │ ├── 6d │ │ └── 97 │ │ └── 20120420165224800-4406 │ ├── 72 │ │ └── f5 │ │ └── 20120420165124577-4394 │ │ └── original │ │ └── plate-3_4-G6-10x
Given that you know which data sets are no longer available on your main share you simply need to change a DB entry in the openBIS DB so that the files are available again.
Let's have a look at the DB table calledexternal_data
which needs to be changed:openbis_productive=# select share_id,location from external_data limit 5; share_id | location ----------+-------------------------------------------------------------------------- 1 | CD847E34-26E8-4396-965F-8EA210A203B4/cb/4f/c4/20120118171259605-60397154 1 | CD847E34-26E8-4396-965F-8EA210A203B4/f4/1f/c4/20120118171308806-60397160 1 | CD847E34-26E8-4396-965F-8EA210A203B4/47/cf/35/20120118171307830-60397156 1 | CD847E34-26E8-4396-965F-8EA210A203B4/61/59/52/20120118171308860-60397161 1 | CD847E34-26E8-4396-965F-8EA210A203B4/eb/a8/84/20120118171307344-60397155 (5 rows)
Change the entry in table external_data of share_id:openbis_productive=# update external_data set share_id=2 where location like '%20120618134644946-60401863%'; UPDATE 1 -- or for a list of permIDs using the DB indices: openbis_productive=# update external_data set share_id=2 where data_id in (select id from data where code in ('20120618134559701-60401861', '20120618135241171-60401883'));
In this case we assume that the second share has the
share_id
of 2.