Child pages
  • Warm failover when a DSS share fails
Skip to end of metadata
Go to start of metadata

How to configure openBIS to make sure that the DSS is still able to serve any data set even if the DSS file got corrupt or is even completely unavailable?

Currently we do not have a fully out-of-the-box solution to configure a warm failover. However, combining the openBIS features "second copy" and "multi-share data stores", it is possible to configure "warm failover" with only few configuration steps, see details below.

Here are steps necessary to configure warm failover:

 

  • Add or extend post-registration task in service.properties of DSS:

    maintenance-plugins = post-registration
     
    post-registration.class = ch.systemsx.cisd.etlserver.postregistration.PostRegistrationMaintenanceTask
    post-registration.interval = 60
    post-registration.cleanup-tasks-folder = ../../cleanup-tasks
    post-registration.ignore-data-sets-before-date = 2010-01-01
    post-registration.last-seen-data-set-file = ../../last-seen-data-set-for-postregistration.txt
    
    post-registration.post-registration-tasks = second-copy
    post-registration.second-copy.class = ch.systemsx.cisd.etlserver.postregistration.SecondCopyPostRegistrationTask
    post-registration.second-copy.destination = ${storeroot-dir}/2

(lightbulb) Please make sure you are NOT defining a second PostRegistrationMaintenanceTask in service.properties or a core plugin. If there is already a  PostRegistrationMaintenanceTask just extend it with SecondCopyPostRegistrationTask.

 

  • Add a symbolic link at the store root level. The TARGET argument of the ln command must be the actual path to a local directory for the second copies and the LINK_NAME must be N (in this example '2'). For example:

    $ pwd
    /raid/data/store
    $ ln -s /<path-to-second-copy> 2

 

  • Add share.properties to the share with property ignored-for-shuffling set to true. For example:

    $ echo "ignored-for-shuffling = true" > 2/share.properties
  • When you successfully setup for second copy you have two store directories with the same sharding structure which are both mounted locally:

    ├── 04
    │   ├── 6d
    │   │   └── 97
    │   │       └── 20120420165224800-4406
    │   ├── 72
    │   │   └── f5
    │   │       └── 20120420165124577-4394
    │   │           └── original
    │   │               └── plate-3_4-G6-10x
  • Given that you know which data sets are no longer available on your main share you simply need to change a DB entry in the openBIS DB so that the files are available again.
    Let's have a look at the DB table called external_data which needs to be changed: 

    openbis_productive=# select share_id,location from external_data limit 5;
     share_id |                                 location                                 
    ----------+--------------------------------------------------------------------------
     1        | CD847E34-26E8-4396-965F-8EA210A203B4/cb/4f/c4/20120118171259605-60397154
     1        | CD847E34-26E8-4396-965F-8EA210A203B4/f4/1f/c4/20120118171308806-60397160
     1        | CD847E34-26E8-4396-965F-8EA210A203B4/47/cf/35/20120118171307830-60397156
     1        | CD847E34-26E8-4396-965F-8EA210A203B4/61/59/52/20120118171308860-60397161
     1        | CD847E34-26E8-4396-965F-8EA210A203B4/eb/a8/84/20120118171307344-60397155
    (5 rows)


    Change the entry in table external_data of share_id:

    openbis_productive=# update external_data set share_id=2 where location like '%20120618134644946-60401863%';
    UPDATE 1
     
    -- or for a list of permIDs using the DB indices:
    openbis_productive=# update external_data set share_id=2 where data_id in (select id from data where code in ('20120618134559701-60401861', '20120618135241171-60401883'));

    (lightbulb) In this case we assume that the second share has the share_id of 2.

 

  • No labels