Child pages
  • Jython DataSetValidator
Skip to end of metadata
Go to start of metadata


Jython dataset validators are an option for implementing validation of data sets using the python scripting language when using a jython dropbox. See Dropboxes for the basic configuration. The validators can also be run on clients, either the command-line dss client or the web start Data Set Batch Uploader, though there are some additional restrictions on which scripts can be run within the batch uploader.


To configure a validator, add the configuration parameter "validation-script-path" to the thread definition. For example:
# --------------------------------------------------------------------------------------------------
# Jython thread
# --------------------------------------------------------------------------------------------------
# The directory to watch for incoming data.
incoming-dir = /local0/openbis/data/incoming-jython
top-level-data-set-handler = ch.systemsx.cisd.etlserver.registrator.JythonTopLevelDataSetHandler
incoming-data-completeness-condition = auto-detection
strip-file-extension = true
storage-processor = ch.systemsx.cisd.etlserver.DefaultStorageProcessor
script-path =
validation-script-path =

The script file (in this case "") needs to implement one method, validate_data_set_file(file), which takes a file object as an argument and returns a collection of validation error objects as a result. If the collection is empty, then it is assumed that there were no validation errors.

There are convenience methods to create various kinds of validation errors. These methods are:

  • createFileValidationError(message: String),
  • createDataSetTypeValidationError(message : String),
  • createOwnerValidationError(message: String) and
  • createPropertyValidationError(property : String, message : String).
    In the context of the validation scripts as they are currently implemented, the first one is probably the most relevant.

These methods are defined on the class ch.systemsx.cisd.openbis.dss.generic.shared.api.v1.validation.ValidationError. The documentation for this class should be available here:

Example scripts

One can use both python standard libraries and Java libraries.

Simple script using python libraries:

import os
import re

def validate_data_set_file(file):
	found_match = False
	if re.match('foo-.*bar', file.getName()):
		found_match = True

	errors = []
	if found_match:
		errors.append(createFileValidationError(file.getName() + " is not a valid data set."))

	return errors

Simple script using only java libraries:

def validate_data_set_file(file):
	found_match = False
# Note we use the python startswith method here.
	if file.getName().startswith('foo'):
		found_match = True

	errors = []
	if found_match:
		errors.append(createFileValidationError(file.getName() + " is not a valid data set."))

	return errors

Extracting Displaying Metadata

The module that validates a data set may, in addition to performing validation, implement a function that extracts metadata. This makes it possible to give the user immediate feedback about how the system interprets the data, giving her an opportunity to correct any inconsistencies she detects.

To do this, implement a function call extract_metadata in the module that implements valadate_data_set_file. The function extract_metadata should return a dictionary where the keys are the property codes and values are property values.


def extract_metadata(file):
    return { 'FILE-NAME' : file.getName() }


Validation Scripts

Scripts can be tested using the command-line client's "testvalid" command. This command takes the same arguments as put, plus an optional script parameter. If the script is not specified, the data set is validated against the server's validation script.


# Use the server script
./ testvalid -u username -p password -s openbis-url experiment E-TEST-2 /path/to/data/set

# Use a local script
./ testvalid -u username -p password -s openbis-url experiment E-TEST-2 /path/to/data/set /path/to/script

Extract Metadata Scripts

The extract metadata script can be tested with the testextract command in the command-line client. The arguments are the same as for testvalid.

  • No labels