dataset package
Submodules
dataset.config module
Configuration for development - change these with caution!
dataset.dataset module
The Dataset object aggregates information about your whole-slide images, their labels, any filtration that should be applied to regions thereof, and any augmentation functions that should be applied to the regions as they are fetched from the disk. This class inherits from PyTorch’s dataset.
- class dataset.dataset.Dataset(*args: Any, **kwargs: Any)
Bases:
torch.utils.data.DatasetDataset implements automatic label inference, optional augmentation, and optional dynamic filtration.
- augment_region(region: numpy.ndarray) numpy.ndarray
augment_region augments a region using self.augmentation
- Parameters
region (np.ndarray) – the region to be (potentially) augmented
- Returns
the (potentially) augmented region
- Return type
np.ndarray
- get_label(filename: str) Any
get_label returns the label associated with a certain filename
- Parameters
filename (str) – the filename in question
- Returns
the label for the filename in question
- Return type
Any
- get_label_distribution() dict
get_label_distribution gives the count of regions belonging to each label - assumes region label is region’s image’s label
- Returns
a dictionary representing the counts of the images’ labels
- Return type
dict
- get_region(filename: str, region_num: int) numpy.ndarray
get_region returns the region at location region num in filename, applying filtration and filtration cache if necessary
- Parameters
filename (str) – the filename from which to get the region
region_num (int) – the number of the region in the image at filename
- Returns
the region in question
- Return type
np.ndarray
- get_region_labels_as_list() List[Any]
get_region_labels_as_list returns an ordered list of region labels corresponding to the order of regions in the dataset
- Returns
a list of labels
- Return type
list[Any]
- get_region_location_from_index(index: int) Tuple[str, int]
get_region_location_from_index returns the image, region_num location of the region at dataset[i]
- Parameters
index (int) – dataset[i]
- Raises
IndexError – if index is out of bounds
- Returns
the location of the region at dataset[i]
- Return type
Tuple[str, int]
- iterate_by_file(as_pytorch_datasets=False) Generator[Tuple[str, Any, Generator], None, None]
iterate_by_file allows for users to iterate over regions in an image given the filename and the label
- Parameters
as_pytorch_datasets (bool) – Whether to return pytorch datasets instead of the normal generators, defaults to False
- Yield
the filename, the label, and an iterator for the regions in the image
- Return type
Tuple[str, Any, Generator]
- number_of_regions(filename: Optional[str] = None) int
number_of_regions get the number of regions in the dataset or in a single image managed by the dataset
- Parameters
filename (Optional[str]) – An optional parameter which will get the number of regions in that image (if it’s in the dataset) instead of the number of regions in the entire dataset, defaults to None
- Returns
the number of regions in the dataset or image at filename
- Return type
int
dataset.filtration_cache module
A system for caching information about which regions of an image pass through a filter. Includes metadata verification and context management.
- class dataset.filtration_cache.FiltrationCache(h5filepath: Optional[Filepath] = 'filtration_cache.h5', h5filetitle: Optional[str] = 'filtration_cache', region_dims: Optional[RegionDimensions] = (512, 512))
Bases:
contextlib.AbstractContextManagerFiltrationCache Tracks images’ regions’ filtration statuses in a PyTables hdf5 database
- Raises
NotImplementedError – when region_index is region coordinates instead
TypeError – when region_index is of the wrong type
- class Description(*args: Any, **kwargs: Any)
Bases:
tables.IsDescriptionDescription - the structure for tables in the database
- get_metadata(filtration: FiltrationRepr, filepath: Filepath) FiltrationCacheMetadata
get_metadata gets the metadata for a table if it exists
- Parameters
filtration (util.FiltrationRepr) – the filtration for which to get metadata
filepath (util.FilePath) – the image for which to get metadata
- Returns
the metadata in question, if available
- Return type
util.FiltrationCacheMetadata
- get_status(filtration: Union[FiltrationRepr, str], filepath: Filepath, region_index: Optional[RegionIndex] = None) FiltrationStatus
get_status gets one or all records from table for filtration, os.path.basename(filepath)
- Parameters
filtration (Union[util.FiltrationRepr, str]) – a string representing the filtration applied to the image
filepath (util.FilePath) – a filepath representing the image filtration was applied to
region_index (Optional[util.RegionIndex], optional) – the index of the region in question, defaults to None
- Raises
NotImplementedError – if region_index is coordinates
TypeError – if region_index is of the wrong type
- Returns
a tuple of (region index, target region index, region index filtration status)
- Return type
util.FiltrationReprStatus
- has_data(filtration: FiltrationRepr, filepath: Filepath, **kwargs) bool
has_data checks if the filtrationcache has a table at filtration/filepath
- Parameters
filtration (util.FiltrationRepr) – the filtration for which statuses are checked
filepath (util.FilePath) – a key for the image in question
- Returns
whether the FiltrationCache has the data in question
- Return type
bool
- metadata_fields = ['_image_filepath', '_image_size', '_image_region_count', '_image_dark_regions_count', '_image_regions_discounted', '_image_region_dims']
- preprocess(filtration: FiltrationRepr, filepath: Filepath, loadingbars: bool, overwrite: bool = True, **kwargs) None
preprocess applies filtration to image(s) at filepath (listdir recursive)
- Parameters
filtration (util.FiltrationRepr) – filtration applied to images’ regions’
filepath (util.FilePath) – if a directory, applied to all files in directory
overwrite (bool, optional) – whether to overwrite existing data, if applicable, defaults to True
- dataset.filtration_cache.clear_table(table: tables.Table) None
clear_table clears a pytables table of all contents (currently leaves metadata)
- Parameters
table (pt.Table) – the table to clear
- dataset.filtration_cache.postprocess_filepath(filepath: Filepath) str
postprocess_filepath undoes the preprocessing
- Parameters
filepath (util.FilePath) – the filepath to un-preprocess
- Returns
the natural filepath
- Return type
str
- dataset.filtration_cache.preprocess(filtration: Callable, filepath: Filepath, region_dims: RegionDimensions, loadingbars: bool, **kwargs) Tuple[Dict[int, FiltrationStatus], int]
preprocess returns a mapping of region index to (fitration status and dark-region-mapped region index)
- Parameters
filtration (util.FiltrationRepr) – the filtration to apply to the image’s regions. If callable and not strictly filtration, then a ranked threshold approach is used - see _apply_filtration_to_regions_ranked_threshold
filepath (util.FilePath) – the filepath where the image in question is found
region_dims (unified_image_reader.util.RegionDimensions) – the dimensions of the regions to which filtration is applied
- Returns
filtration status records and dark region count
- Return type
Tuple[Dict[int, util.FiltrationStatus], int]
- dataset.filtration_cache.preprocess_filepath(filepath: Filepath) str
preprocess_filepath pytables can’t handle certain characters
- Parameters
filepath (util.FilePath) – the filepath to preprocess
- Returns
a filepath more coherent to pytables’s restrictions
- Return type
str
- dataset.filtration_cache.preprocess_filtration(filtration: FiltrationRepr, **kwargs) FiltrationRepr
preprocess_filtration removes whitespace for pytables compatibility
- Parameters
filtration (util.FiltrationRepr) – the filtration to represent
- Returns
a pytables-agreeable filtration representation
- Return type
util.FiltrationRepr
- dataset.filtration_cache.process_region(filepath: util.FilePath, filtration: util.FiltrationRepr, region_index: unified_image_reader.util.RegionIndex, region_dims: util.RegionDimensions) Tuple[int, Dict[str:Any]]
process_region applies filtration to the specified region of the given image
- Returns
the region index and the filtration status
- Return type
Tuple[int, Dict[str: Any]]
dataset.label_extractor module
An automatic (or overloaded) label inference class
- class dataset.label_extractor.LabelExtractor
Bases:
abc.ABCStrategy Pattern –> extracts labels from path for dictionary-based lookup
- abstract static extract_labels(path: str)
extracts labels from path for dictionary-based lookup
- class dataset.label_extractor.LabelExtractorCSV
Bases:
dataset.label_extractor.LabelExtractorlabels in csv file
- static extract_labels(path: str)
labels are inside of a csv file at path of structure (each line) <key><sep><label>
- class dataset.label_extractor.LabelExtractorJSON
Bases:
dataset.label_extractor.LabelExtractorlabels in json file
- static extract_labels(path: str)
labels are inside of a json file at path of structure {key: label, …}
- class dataset.label_extractor.LabelExtractorNoLabels
Bases:
dataset.label_extractor.LabelExtractor- class DefaultDictWithGet
Bases:
collections.defaultdict- get(*args, **kwargs)
Return the value for key if key is in the dictionary, else default.
- static extract_labels(path: str)
returns ‘LabelExtractorNoLabels’ for all labels
- class dataset.label_extractor.LabelExtractorParentDir
Bases:
dataset.label_extractor.LabelExtractorlabels represented by relative path
- static extract_labels(path: str)
labels are path relative to path arg (label_postprocessor recommended)
dataset.label_manager module
Label inference for the custom dataset
- class dataset.label_manager.LabelManager(path: Filepath, label_extraction: Optional[dataset.label_extractor.LabelExtractor] = None, label_preprocessor: Optional[Callable] = None, label_postprocessor: Optional[Callable] = None, error_if_no_labels: bool = True)
Bases:
objectA dictionary wrapper for managing labels
- Raises
NotImplementedError – when a given file extension cannot be parsed natively
TypeError – when the label_extractor isn’t a LabelExtractor
TypeError – when label_preprocessor isn’t Callable
TypeError – when label_postprocessor isn’t Callable
IndexError – when a key doesn’t have a value
dataset.util module
Utility functions, etc. for the custom dataset
- class dataset.util.ThreadingLock
Bases:
contextlib.AbstractContextManagerA wrapper on threading.Lock that implements a context manager so that when the context closes the lock will unlock
- Example:
- with status_lock as permission: # will hang until it gets permission
# do things
- exception dataset.util.UnsupportedFileType
Bases:
ExceptionUnsupportedFileType is raised when a file extension can’t be parsed natively
- dataset.util.apply_args_and_kwargs(fn, args, kwargs)
- dataset.util.listdir_recursive(path: Filepath) List[Filepath]
listdir_recursive lists files (not directories) recursively from path
- Parameters
path (FilePath) – the path to the directory whose files should be listed recursively
- Returns
a list of filepaths relative to path
- Return type
List[FilePath]
- dataset.util.starmap_with_kwargs(pool, fn, args_iter, kwargs_iter)