Segmentation datasets
SegmentationDatasets and SegmentationObjects are implemented in segmentation.py (syconn.reps).
It is accompanied by helper functions in segmentation_helper.py and rep_helper.py (syconn.reps) for
basic functionality such as loading and storing and sd_proc.py (syconn.proc) intensive processing that
is usually parallelized.
Typically, the voxel storage of a SegmentationDatasets is created first (eg. by the object extraction).
Please check the corresponding documentation to learn more about that.
On a fundamental level, each SegmentationObject owns voxels, attributes, a skelton and a mesh which
are stored in different dictionaries (VoxelDict, AttributeDict, SkeletonDict, MeshDict; see section ‘Backend’).
Each dictionary consists of the associated data from many objects and compresses it individually for
efficient storage. The number of dictionaries per data type can be defined with n_folders_fs (only powers of 10).
Please note for the general way of creating SegmentationDatasets this has to be passed to the object extraction as well.
Initialization
To load (or create) a SegmentationDataset at least the obj_type has to be defined. Defaults exist for
other parameters such as version and working_dir. These are stored in config.ini (eg. version) in
the working_dir and project wide in config.global_params (eg. working_dir).
sd_cell_sv = SegmentationDataset("sv", working_dir="path/to/wd")
It is useful to run sd_proc.dataset_analysis(...) when loading a SegmentationDataset the first time
(after writing its voxel storage) or after making changes to the attributes. dataset_analysis creates global numpy
arrays for fast access for each attribute and calculates some attributes itself (such as size and bounding box). This can
be viewed as a distributed column store of the underlying database.
sd_proc.dataset_analysis(sd_cell_sv)
When running dataset_analysis one can include only a subset of the attributes to avoid problems with non-consistent
entries (see below). As most functions, dataset_analysis can either run on a single shared memory system or on
a distributed custer using qsub.
It also is recommended to preprocess the meshes of the SegmentationObjects.
See mesh_proc_chunked in syconn/proc/sd_proc.py.
Usage
If sd_proc.dataset_analysis(...) was applied, the SegmentationDataset can access the values of an attribute of all objects
as an array. For instance, the attribute size can be accesses via
sizes = sd_cell_sv.load_numpy_data("size")
Some attributes, such as size and id, are also available as object attributes (e.g. sd_cell_sv.sizes). Values in
different attribute arrays are always sorted in the same way. Hence, one can use the id array (sd_cell_sv.ids) as a reference.
A SegmentationDataset allows easy access to its SegmentationObjects by
cell_sv_obj = sd_cell_sv.get_segmentation_object(obj_id)
There are four additional data structures for each SegmentationObject: voxels (VoxelStorage), attributes
(AttributeDict), meshes (MeshStorage) and skeletons (SkeletonStorage).
Typically, every SegmentationObject owns the first three while only supervoxels (sv) have a skeleton. While
voxels, meshes and skeletons are predefined datatypes, attributes are an arbitrary key value store. It is advised though to be consistent in type and
naming of attributes across the SegmentationDataset to avoid problems with the aforementioned numpy arrays.
The different data structures can be accessed by e.g.
voxels = cell_sv_obj.voxels
mesh = cell_sv_obj.mesh
skeleton = cell_sv_obj.skeleton
attr_value = cell_sv_obj.lookup_in_attribute_dict("attr_key")
The attribute dict can also be accessed as a whole
cell_sv_obj.load_attr_dict()
attr_dict = cell_sv_obj.attr_dict
SegmentationObjects cache data that was accessed. This can be disabled by
cell_sv_obj.mesh_caching = False
cell_sv_obj.voxel_caching = False
cell_sv_obj.skeleton_caching = False
and the cache can be cleared by
cell_sv_obj.clear_cache()