The Excel import service reads xls definitions for both types and entities and send them to openBIS. It is the replacement of the old master data scripts adding support for the creation of openBIS entities.
The goals are:
For common users an import format with the following features to avoid the shortcomings of the old format:
- Recognisable labels as column names.
- Mutli type imports.
- Parents/Children creation and linking on a single import.
For advanced users like consultants and plugin developers a tool that allows to specify on an Excel sheet:
- Metadata model.
- Basic entity structures used for navigation.
To support different use cases the import service supports the next modes, specifying one of them is mandatory.
- UPDATE IF EXISTS: This one should be the default mode to use to make incremental updates.
- IGNORE EXISTING: This mode should be used when the intention is to force only inserts. Existing entities will be ignored. That way is possible to avoid unintentionally updating entities and at the same time adding new ones.
- FAIL IF EXISTS: This mode should be used when the intention is to force only inserts. If any of the entities or types exist the process will fail. That way is possible to avoid unintentionally updating entities.
Organising Files and Worksheets
All data can be arranged according to the needs of the user, in any number of files and any number of worksheets. All files have to be in one directory. Since there may be dependencies between types and entities described in distributed files, they should all be submitted together. The names of the files and worksheets are ignored by the service, the user is advised to use descriptive names that they can quickly remember/refer to later.
We want to define vocabularies and sample types with properties using these vocabularies. We can arrange our files in several ways:
- put vocabulary and sample types in separate files named i.e vocabulary.xls and sample_types.xlsx respectively
- put vocabulary and sample types in different worksheets in the same xls file
- put everything in one worksheet in the same file
Ordering Definitions within Files and Worksheets
Since all definitions can be distributed, the order of definitions is not important for the Excel import service.
Text cell formatting (colours, fonts, font style, text decorations)
All types of formatting are permitted, and users are encouraged to use them to make their excel files more readable. Adding any non text element (table, clipart) will cause the import to fail.
(A valid, but not easily readable, example)
Definition, rows and sheet formatting
A valid sheet has to start with definition on the first row.
Each definition has to be separated by one empty row (here it is shown in green)
Two or more consecutive empty rows mark the end of the definitions. Everything below these can be considered a comment.
Header rows can contain values that do not have corresponding elements in the properties row (can be left empty).
Empty columns are allowed between each attribute. However, it is possible that this feature might be disabled in future.
If attribute does not exist, it will be ignored.
Values that do not have corresponding headers will be ignored.
The case of letters in header rows is ignored.
Entity Types Definitions
Definition is a single description of an openBIS type or entity. A definition usually consists of:
- Definition type row - contains the type of the entity being defined. Possible options are: VOCABULARY_TYPE, PROPERTY_TYPE, EXPERIMENT_TYPE, SAMPLE_TYPE, DATASET_TYPE.
- Attributes header row - contains attributes of the defined entities. Attributes that are not used can be omitted, or can be added with an empty value.
- Attribute value row - contains values for the attributes listed in the row above.
- Properties header row - intended for creating entity types, and entities which can have properties. Properties header contains the attributes of a property.
- Properties value rows - each row is a new property assigned to entity types.
For in depth explanation of Definitions, please see Definitions, defining entities and other special cases paragraph.
It is possible to create property types, which are not attached to any entity types. In such a case, Definition type row has to state PROPERTY_TYPE; Attribute header and value rows are omitted.
Entity Types Versions
Version is a mandatory field for entity types, it just starts at 1; and during updating a type definition is expected to increment it; otherwise, the system will ignore the changes.
In Deep Explanation
Additionally, use the keyword FORCE to reinstall the type even if is present and has been deleted.
The system keeps track of what versions of entities have been installed storing this information, so in the future when one updates their types, the version specified in the spreadsheet is checked against the stored version.
For every TYPE found in the Excel sheet the next algorithm is performed:
IF (TYPE.Version > STORED_VERSION) OR (TYPE.Version == FORCE):
IF TYPE EXISTS in openBIS:
ELSE IF FAIL_IF_EXISTS:
ELSE IF IGNORE_EXISTING:
PASS (We basically ignore as requested)
ELSE: // Doesn't exist branch
NEW_TYPE = TYPE.VERSION == 1
IF TYPE.Version == FORCE OR NEW_TYPE:
PASS (We basically ignore types that have been deleted by users)
PASS (We basically ignore types that have not been updated)
Most entities can be created, excluding DataSets. There are differences due to the nature of the defined elements themselves.
Space and Project
Spaces and projects cannot have properties, therefore they have to be omitted.
Experiments and Samples
In the properties header and values rows, the user has to specify attributes and properties. The order in which they are listed is arbitrary.
Properties can be referred to as labels in the header row.
Vocabularies in properties value rows can be referred to by either the vocabulary code or the label.
Defining Parent and Children in Samples
Parent and child columns can be used to define relations between samples. Samples can be addressed by:
- $ column identifier (In case the code is not unique or autogenerated) for newly created samples
- Code from Excel (Works only for Samples codes in Excel, in case of ambiguity the $ column should be used)
- Sample identifier
Advanced Features - Defining Validation and Dynamic Scripts
Scripts have to reside in .py files in the scripts directory within the folder that contains the Excel files.
Within scripts, files can be organised in any suitable setup.
In order to refer to a validation or dynamic script (e.g. valid.py above), the relative path (from the scripts directory) to the file has to be provided in the relevant column. See the example columns in green below.
Referring to Existing Elements on the Server. Handling Duplicates
<Not complete, subject to change>
Duplicates may be one of the following:
- Trying to define a type/entity that already exists on the server
- In this case the definition will be ignored, and all references will refer to the existing object on the server
- Trying to define type/definition twice
- Unspecified behaviour - only one of the definitions will be created, all references will refer to the defined element
- Property type assignments to entity types cannot be updated since the current V3 API does not support this functionality. This means that a change in the order of assignments or group names during an update will be ignored.
For further reference please see an example spreadsheet: