Segmentation datasets¶
SegmentationDatasets
and SegmentationObjects
are implemented in segmentation.py
(syconn.reps
).
It is accompanied by helper functions in segmentation_helper.py
and rep_helper.py
(syconn.reps
) for
basic functionality such as loading and storing and sd_proc.py
(syconn.proc
) intensive processing that
is usually parallelized.
Typically, the voxel storage of a SegmentationDatasets
is created first (eg. by the object extraction).
Please check the corresponding documentation to learn more about that.
On a fundamental level, each SegmentationObject
owns voxels, attributes, a skelton and a mesh which
are stored in different dictionaries (VoxelDict
, AttributeDict
, SkeletonDict
, MeshDict
; see section ‘Backend’).
Each dictionary consists of the associated data from many objects and compresses it individually for
efficient storage. The number of dictionaries per data type can be defined with n_folders_fs
(only powers of 10).
Please note for the general way of creating SegmentationDatasets
this has to be passed to the object extraction as well.
Initialization¶
To load (or create) a SegmentationDataset
at least the obj_type
has to be defined. Defaults exist for
other parameters such as version
and working_dir
. These are stored in config.ini
(eg. version
) in
the working_dir
and project wide in config.global_params
(eg. working_dir
).
sd_cell_sv = SegmentationDataset("sv", working_dir="path/to/wd")
It is useful to run sd_proc.dataset_analysis(...)
when loading a SegmentationDataset
the first time
(after writing its voxel storage) or after making changes to the attributes. dataset_analysis
creates global numpy
arrays for fast access for each attribute and calculates some attributes itself (such as size
and bounding box
). This can
be viewed as a distributed column store of the underlying database.
sd_proc.dataset_analysis(sd_cell_sv)
When running dataset_analysis
one can include only a subset of the attributes to avoid problems with non-consistent
entries (see below). As most functions, dataset_analysis
can either run on a single shared memory system or on
a distributed custer using qsub
.
It also is recommended to preprocess the meshes of the SegmentationObjects.
See mesh_proc_chunked
in syconn/proc/sd_proc.py
.
Usage¶
If sd_proc.dataset_analysis(...)
was applied, the SegmentationDataset
can access the values of an attribute of all objects
as an array. For instance, the attribute size
can be accesses via
sizes = sd_cell_sv.load_numpy_data("size")
Some attributes, such as size
and id
, are also available as object attributes (e.g. sd_cell_sv.sizes
). Values in
different attribute arrays are always sorted in the same way. Hence, one can use the id array (sd_cell_sv.ids
) as a reference.
A SegmentationDataset
allows easy access to its SegmentationObjects
by
cell_sv_obj = sd_cell_sv.get_segmentation_object(obj_id)
There are four additional data structures for each SegmentationObject
: voxels (VoxelStorage
), attributes
(AttributeDict
), meshes (MeshStorage
) and skeletons (SkeletonStorage
).
Typically, every SegmentationObject
owns the first three while only supervoxels (sv
) have a skeleton. While
voxels, meshes and skeletons are predefined datatypes, attributes are an arbitrary key value store. It is advised though to be consistent in type and
naming of attributes across the SegmentationDataset
to avoid problems with the aforementioned numpy arrays.
The different data structures can be accessed by e.g.
voxels = cell_sv_obj.voxels
mesh = cell_sv_obj.mesh
skeleton = cell_sv_obj.skeleton
attr_value = cell_sv_obj.lookup_in_attribute_dict("attr_key")
The attribute dict can also be accessed as a whole
cell_sv_obj.load_attr_dict()
attr_dict = cell_sv_obj.attr_dict
SegmentationObjects
cache data that was accessed. This can be disabled by
cell_sv_obj.mesh_caching = False
cell_sv_obj.voxel_caching = False
cell_sv_obj.skeleton_caching = False
and the cache can be cleared by
cell_sv_obj.clear_cache()