Jython dataset validators are an option for implementing validation of data sets using the python scripting language when using a jython dropbox. See Dropboxes for the basic configuration. The validators can also be run on clients, either the command-line dss client or the web start Data Set Batch Uploader, though there are some additional restrictions on which scripts can be run within the batch uploader.
To configure a validator, add the configuration parameter "validation-script-path" to the thread definition. For example:
The script file (in this case "data-set-validator.py") needs to implement one method, validate_data_set_file(file), which takes a file object as an argument and returns a collection of validation error objects as a result. If the collection is empty, then it is assumed that there were no validation errors.
There are convenience methods to create various kinds of validation errors. These methods are:
createDataSetTypeValidationError(message : String),
createPropertyValidationError(property : String, message : String).
In the context of the validation scripts as they are currently implemented, the first one is probably the most relevant.
These methods are defined on the class ch.systemsx.cisd.openbis.dss.generic.shared.api.v1.validation.ValidationError. The documentation for this class should be available here:
One can use both python standard libraries and Java libraries.
Simple script using python libraries:
Simple script using only java libraries:
Extracting Displaying Metadata
The module that validates a data set may, in addition to performing validation, implement a function that extracts metadata. This makes it possible to give the user immediate feedback about how the system interprets the data, giving her an opportunity to correct any inconsistencies she detects.
To do this, implement a function call
extract_metadata in the module that implements
valadate_data_set_file. The function
extract_metadata should return a dictionary where the keys are the property codes and values are property values.
Scripts can be tested using the command-line client's "testvalid" command. This command takes the same arguments as put, plus an optional script parameter. If the script is not specified, the data set is validated against the server's validation script.
Extract Metadata Scripts
The extract metadata script can be tested with the
testextract command in the command-line client. The arguments are the same as for