Dataset 📂
Download the OCELOT Dataset
OCELOT dataset can be downloaded in Zenodo. We are collecting basic information (name, email, institution) with a short justification for requiring the dataset.
Introduction
The OCELOT dataset is a histopathology dataset designed to facilitate the development of methods that utilize cell and tissue relationships. The dataset is comprised of both small and large field-of-view (FoV) patches extracted from digitally scanned whole slide images (WSIs), with overlapping regions. The small and large FoV patches are accompanied by annotations of cells and tissues, respectively. The WSIs are sourced from the publicly available TCGA database and were stained using the H&E method before being scanned with an Aperio scanner. Each sample of the OCELOT dataset is composed of six components,
Each sample of the dataset consists of two input patches and the corresponding annotations. The left shows the large FoV patch x_l with the tissue segmentation annotation y_l^t, where green denotes the cancer area. The right shows the small FoV patch x_s with cell point annotation y_s^c, where blue and yellow dots denote tumor and background cells, respectively. The red box indicates the size and location of the x_s, with respect to the x_l. Note that for every sample, x_s and x_l are overlapping, i.e. x_s exists inside x_l. However, a relative location of x_s over x_l varies per sample.
Patch Configurations
Cell detection tasks benefit from fine-grained spatial information to better capture detailed cell properties (e.g. border, shape, color, and opacity). In contrast, tissue segmentation requires a larger context to enable a better understanding of the overall structural information. Therefore, we define the FoV sizes of x_s (cell detection) and x_l (tissue segmentation) as 1024×1024 and 4096×4096 pixels, respectively, at a resolution of 0.2 Microns-per-Pixel (MPP). Finally, the large FoV patches and tissue annotations (x_l, y_l^t) are down-sampled by a factor of 4, resulting in a size of 1024x1024 pixels.
Subsets
Label Information
- Cell: Background Cell (BC, index 1) and Tumor Cell (TC, index 2)
- Tissue: Background (BG, index 1), Cancer Area (CA, index 2), and Unknown (not labeled, index 255)
For cell point annotation, we followed an x-y coordinate system starting from Top-Left (0,0) and ending with Bottom-Right (1023,1023).