Child pages
  • Link Data Sets
Skip to end of metadata
Go to start of metadata

Link Data Sets

Introduction

Link Data Set is a special kind of data set. The data related with this kind of data set is not stored in the local openBIS instance itself. Instead, copies of the content are stored on any amount of external data management systems. The following kinds of external data management systems are supported:

  • openBIS - a remote openBIS instance
  • file system - a file system on remote server
  • URL - a system that gives access to the datasets through an URL

The local openBIS instance stores links to the data set in the external data management systems, optionally along with additional properties. Except for the fact that no data are stored for the dataset, openBIS treats this kind of data set in the same way as regular data sets.

 

External Data Management System

At least one External Data Management System must be created in openBIS before it will be possible to register Link Data Sets. External Data Management System in openBIS represents a system, that is completely independent of openBIS, and that stores data sets that are referred by openBIS.

External Data Management System's properties

PropertyDescription
codeThe unique identifier of the External Data Management System in local openBIS instance.
labelThe descriptive label of the External Data Management System. Optional.
address

The address of the external data management system. Format depends on the external data management system type.

If the external data management system is openBIS or URL, the value is an template of the URL referring the data set in the external data management system. The ${code} pattern in the template will be replaced by data set code. Example: https://sprint-openbis.ethz.ch/openbis/index.html?viewMode=SIMPLE#entity=DATA_SET&permId=${code}.

If the external data management system is a file system, the value is of format hostname:/path/to/directory. Example: sprint-openbis.ethz.ch:/home/openbis/datasets

address_typeThe type of the address: URL, OPENBIS or FILE_SYSTEM.

Registering External Data Management System

External Data Management System can be created through master data registration script or through V3 API.

Here is an example registering External Data Management System through the master data registration script:

Example master data registration script that registers an External Data Management System
tr = service.transaction()

external_data_management_system = tr.getOrCreateNewExternalDataManagementSystem('DMS')
external_data_management_system.setLabel('Example of External Data Management System')
external_data_management_system.setAddress('sprint-openbis.ethz.ch:/home/openbis/datasets')
external_data_management_system.setAddressType(ExternalDataManagementSystemAddressType.FILE_SYSTEM)

Here is an example registering External Data Management System through the V3 API:

Example master data registration script that registers an External Data Management System
List<ExternalDmsCreation> edmsCreations = new ArrayList<>();
ExternalDmsCreation edmsCreation = new ExternalDmsCreation();
edmsCreation.setCode("filesystem-1");
edmsCreation.setLabel("External filesystem 1");
edmsCreation.setAddressType(ExternalDmsAddressType.FILE_SYSTEM);
edmsCreation.setAddress("sprint-openbis.ethz.ch:/home/openbis");
edmsCreations.add(edmsCreation);
openbis.createExternalDataManagementSystems(token, edmsCreations);

Registering a Link Data Set

Link datasets can be registered via DSS V3 API. Before registering a Link Data Set, the user should have registered in openBIS

  • All the external data management systems that hold a copy of the data content of the dataset

Here is an example creation of a link dataset:

Example link data set creation
ExternalDmsPermId openBIS = ...
ExternalDmsPermId url = ...
ExternalDmsPermId fs = ...

LinkedDataCreation linkedDataCreation = new LinkedDataCreation();
List<ContentCopyCreation> copies = new ArrayList<>();

ContentCopyCreation cc = new ContentCopyCreation();
cc.setExternalDmsId(openBIS);
cc.setExternalId("data-set-code");
copies.add(cc);

cc = new ContentCopyCreation();
cc.setExternalDmsId(url);
cc.setExternalId("identifier");
copies.add(cc);

cc = new ContentCopyCreation();
cc.setExternalDmsId(fs);
cc.setPath("a/directory/somewhere");
copies.add(cc);

cc = new ContentCopyCreation();
cc.setExternalDmsId(fs);
cc.setPath("/a/git-repo/somewhere");
cc.setGitCommitHash("abcdef1234567890");
copies.add(cc);

linkedDataCreation.setContentCopies(copies);

DataSetCreation metadataCreation = new DataSetCreation();
metadataCreation.setLinkedData(linkedDataCreation);
metadataCreation.setTypeId(dataSetType);
metadataCreation.setDataSetKind(DataSetKind.LINK);
metadataCreation.setExperimentId(new ExperimentIdentifier("DEFAULT", "DEFAULT", "DEFAULT"));
metadataCreation.setDataStoreId(new DataStorePermId("DSS-SCREENING"));

DataSetFileCreation file1 = new DataSetFileCreation();
file1.setChecksumCRC32(1234);
file1.setDirectory(false);
file1.setFileLength(4321);
file1.setPath("path/to/the/file.txt");

DataSetFileCreation file2 = new DataSetFileCreation();
file2.setDirectory(true);
file2.setPath("path/to/empty/directory");
DataSetFileCreation file3 = new DataSetFileCreation();
file3.setChecksumCRC32(12345);
file3.setDirectory(false);
file3.setFileLength(4321);
file3.setPath("path2/to/another/file.txt");

FullDataSetCreation newDataSet = new FullDataSetCreation();
newDataSet.setMetadataCreation(metadataCreation);
newDataSet.setFileMetadata(Arrays.asList(file1, file2, file3));

dss.createDataSets(token, Arrays.asList(newDataSet));

Using Link Data Sets in openBIS

openBIS handles Link Data Sets like regular datasets: they have same metadata, they can have parent/children relationships, it is possible to attach this kind of data set to sample or experiment. The difference is when user opens a Data Set Detail View. There she will be able to browse the dataset contents, but will not be able to download the actual files (as they're not stored in openBIS). Also, the user will be shown information about location of all the physical copies on remote servers.

 

  • No labels