Importing Feature Vectors
Typically, an image analysis will create a set of feature vectors, one for each well of the plate.
Image analysis results aggregated on the well level should be stored in datasets of a certain type. Specifically, the dataset type code should always begin with "
The user can use predefined the type
HCS_ANALYSIS_WELL_FEATURES, or create more specific types like
or HCS_ANALYSIS_WELL_CLASSIFICATION to distinguish different types of image analysis results.
To enable importing of the analysis data from a file or a set of files in any format, use the flexible dropbox tool, which is configurable using Python.
Configuring the Datastore Server
Please take note that in order to use the dropbox, the storage process
ch.systemsx.cisd.openbis.dss.etl.featurevector.FeatureVectorStorageProcessor or a sub-class of it needs to be configured. Otherwise, the feature data will not be written to the database.
A screening drop box for importing image analysis results should be created as a core plugin of type
incoming-root-dir is defined in
Define the Jython dropbox script in
If the folder
incoming-analysis doesn't exist it will be created on DSS start up.
Jython Dropbox Configuration
To demonstrate how the API defines feature vectors, here is a very simple example which does not read the data from any files, but constructs them in memory instead. You will have to redefine
extractSpaceCode() methods as follows:
To see how it works:
- Create a file named
- Create a plate named
- Copy the file to the
- Go to the web browser and display the
MY-PLATEdetail view. (You may need to refresh the page in order to see it). You should be able to display a heatmap for the registered features.
- featuresBuilder object implements IFeaturesBuilder interface
- featuresBuilder.defineFeature("<FEATURE_NAME>") returns an object implementing IFeatureDefinition interface
Here is an example of the
defineFeatures() implementation. It parses .csv files like this one.
Importing features for timepoint or depth-scan series
Here is a simple example to demonstrate how to import feature vectors for each timepoint series.
The values will be imported to the database, but, since the current user interface does not yet support viewing multiple timepoints, only the values of the first timepoint will be shown.
In HCS scenarios, image datasets can be analysed several times using different algorithms. This is useful in many cases, such as when bioinformaticians make serial improvements to the analysis procedure, or they wish to try different analysis approaches.
Example: Imagine that all the plates of an assay have been analysed using 3 different algorithms: A, B and C. In openBIS, these algorithms are called 'analysis procedures'.
openBIS offers a way to aggregate all of the analysis results for a particular gene or compound, such as when it finds all the wells where a particular gene has been screened and for each feature calculates the median value of all replicas.
It is clear that if we have 3 sets of analysis results for a plate, and each is produced with a different algorithm, they should not be mixed with each other when calculating aggregates.
Additionally, when the analysis result dataset for the plate is requested (e.g. through the API), then specifying the analysis procedure name helps return a unique result.
For situations such as these, openBIS makes it possible to say which analysis procedure has been used to product each analysis dataset. Internally, this information is stored as a dataset property
To set the analysis procedure in the dropbox, use the default mechanism for setting properties, but call it using the following recommended method:
The analysis procedure should be set at least for:
- Well-level image analysis datasets (
- Image segmentation datasets (
openBIS offers views in the web browser to utilize this information.
- Two datasets should have the same analysis procedure only if the algorithm to compute them was the same and the results are comparable
- It does not make sense to have two datasets of the same plate with the same analysis procedure
It is very common situation, when there is plenty of different features registered. In such case it is becoming difficult to handle the amount of data. To make it easier, it is possible to register features lists, that are grouping together couple of features. By selecting one of the lists, user filters out all the features that are not included on such list.
Registering new features list
Features lists are registered per feature vector data set. It means, that every feature vector data set can have different set of connected features lists. New features list can be registered via dropbox. Below you can find simple example of dropbox code.
The dropbox above registers a new features list called
Example Features List. The list consists of 2 features:
Every dropbox code that registers features list needs to perform following steps:
- New instance of
ch.systemsx.cisd.openbis.dss.etl.dto.api.v2.FeatureListDataConfigconfig object should be created.
- The name needs to be defined. To do this, method
setNameon config object should be called.
- List of feature codes that should be included in the list should be specified. To do this, method
setFeatureListshould be called on the config object.
- Feature vector container data set needs to be specified be calling
setContainerDataSetmethod on the config object.
- When all the configuration is set, as a final step method
createNewFeatureListDataSetshould be called on
Dealing with Features Lists
Features Lists are visible on the Plate View. On the right side of
Choose heatmap kind dropdown list, there is
Choose features list dropdown list with all the features lists registered for given feature vector data set. When
All is chosen, all heatmap kinds are available, but when user selects one of the features lists the list of heatmap kinds gets narrowed to values available on the list. When user decides to remove one of the features lists, he can go to the feature vector data set view on the
Contained tab, and simply delete no necessary features lists data sets.
Features can be either float numbers or strings. A feature is either one or the other, it cannot be a mix. The value type is determined automatically by openBIS: if all values provided can be parsed to a
float, then the feature will be a float feature, otherwise it will be a string feature. Note that setting a value like
NaN will make a feature to become a string feature! See below on how to handle missing values correctly.
Missing values are provided by simply not setting a value. In essence, all feature values start out as a missing value and you need to set it for a given location to change that. Do not set a value of
N/A to set a missing value as this will lead the feature vector to be misinterpreted as a string type feature.
Importing from a CSV file
File Format of Feature Vector Data
The file is a CSV or TSV file. The actual separator can be specified in DSS configuration. The first line contains the headers of the columns. Each following line contains the feature values for one well. The well is denoted by one or two columns. Their names have to be specified. If both names are identical there is only one column denoting wells notated by a combination of letters and digits. Otherwise the row column contains letters or a number and the column a number. A feature is a column with numerical values. An unknown value of feature for a particular well can be denoted by
NaN. Columns with non-numerical values (like bar codes) are ignored.
A column header defines the label as well as the code of a feature vector. The label is used for output (column header in tables, axis label in plots). The code is used for data retrieval e.g. when defining a custom column or filter in an openBIS table. By default label is just the header and the code is a normalized label. Normalization is done as follows:
- Label is converted to upper case.
- All special characters (i.e. characters which are not letters between A-Z or digits) are replace by an underscore character '_'.
If code should be different from normalized label the following syntax has to be used for the column header in the feature vector file:
<code> label . Note, that the actual code is defined by normalized
code . If
label is missing the actual label will be the same as the actual code.
Here are some valid but incomplete examples:
Another example can be downloaded here.
FeatureVectorStorageProceessor has to be configured for uploading feature vector files of above-mentioned format. The following properties control variations of this format:
Separator character between headers and row cells.
Header of the column denoting the row of a well.
Header of the column denoting the column of a well.
Importing other types of analysis results
openBIS can also store the results of other types of image analysis. Such datasets can be attached to each plate or each well.
The user can download such a dataset later on and browse it on his computer. More sophisticated functionality may be added later.
Following types of datasets should be used:
- HCS_ANALYSIS_CELL_SEGMENTATION for HCS image analysis cell segmentation results
- HCS_ANALYSIS_CELL_FEATURES for HCS image analysis cell feature vectors results
- HCS_ANALYSIS_CELL_CLASS for HCS image analysis cell classification results