Segmentation datasets

SegmentationDatasets and SegmentationObjects are implemented in segmentation.py (syconn.reps). It is accompanied by helper functions in segmentation_helper.py and rep_helper.py (syconn.reps) for basic functionality such as loading and storing and sd_proc.py (syconn.proc) intensive processing that is usually parallelized.

Typically, the voxel storage of a SegmentationDatasets is created first (eg. by the object extraction). Please check the corresponding documentation to learn more about that.

On a fundamental level, each SegmentationObject owns voxels, attributes, a skelton and a mesh which are stored in different dictionaries (VoxelDict, AttributeDict, SkeletonDict, MeshDict; see section ‘Backend’). Each dictionary consists of the associated data from many objects and compresses it individually for efficient storage. The number of dictionaries per data type can be defined with n_folders_fs (only powers of 10). Please note for the general way of creating SegmentationDatasets this has to be passed to the object extraction as well.

Initialization

To load (or create) a SegmentationDataset at least the obj_type has to be defined. Defaults exist for other parameters such as version and working_dir. These are stored in config.ini (eg. version) in the working_dir and project wide in config.global_params (eg. working_dir).

sd_cell_sv = SegmentationDataset("sv", working_dir="path/to/wd")

It is useful to run sd_proc.dataset_analysis(...) when loading a SegmentationDataset the first time (after writing its voxel storage) or after making changes to the attributes. dataset_analysis creates global numpy arrays for fast access for each attribute and calculates some attributes itself (such as size and bounding box). This can be viewed as a distributed column store of the underlying database.

sd_proc.dataset_analysis(sd_cell_sv)

When running dataset_analysis one can include only a subset of the attributes to avoid problems with non-consistent entries (see below). As most functions, dataset_analysis can either run on a single shared memory system or on a distributed custer using qsub.

It also is recommended to preprocess the meshes of the SegmentationObjects. See mesh_proc_chunked in syconn/proc/sd_proc.py.

Usage

If sd_proc.dataset_analysis(...) was applied, the SegmentationDataset can access the values of an attribute of all objects as an array. For instance, the attribute size can be accesses via

sizes = sd_cell_sv.load_numpy_data("size")

Some attributes, such as size and id, are also available as object attributes (e.g. sd_cell_sv.sizes). Values in different attribute arrays are always sorted in the same way. Hence, one can use the id array (sd_cell_sv.ids) as a reference.

A SegmentationDataset allows easy access to its SegmentationObjects by

cell_sv_obj = sd_cell_sv.get_segmentation_object(obj_id)

There are four additional data structures for each SegmentationObject: voxels (VoxelStorage), attributes (AttributeDict), meshes (MeshStorage) and skeletons (SkeletonStorage). Typically, every SegmentationObject owns the first three while only supervoxels (sv) have a skeleton. While voxels, meshes and skeletons are predefined datatypes, attributes are an arbitrary key value store. It is advised though to be consistent in type and naming of attributes across the SegmentationDataset to avoid problems with the aforementioned numpy arrays.

The different data structures can be accessed by e.g.

voxels = cell_sv_obj.voxels
mesh = cell_sv_obj.mesh
skeleton = cell_sv_obj.skeleton
attr_value = cell_sv_obj.lookup_in_attribute_dict("attr_key")

The attribute dict can also be accessed as a whole

cell_sv_obj.load_attr_dict()
attr_dict = cell_sv_obj.attr_dict

SegmentationObjects cache data that was accessed. This can be disabled by

cell_sv_obj.mesh_caching = False
cell_sv_obj.voxel_caching = False
cell_sv_obj.skeleton_caching = False

and the cache can be cleared by

cell_sv_obj.clear_cache()