OpenML
Meta_Album_PRT_Extended

Meta_Album_PRT_Extended

active ARFF CC BY-NC 4.0 Visibility: public Uploaded 08-11-2022 by Meta Album
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
## Meta-Album Subcellular Human Protein Dataset (Extended) * This dataset is a subset of the Subcellular dataset in the Protein Atlas project(https://www.proteinatlas.org/). The original dataset, which stems from the Human Protein Atlas Image Classification Kaggle competition(https://www.kaggle.com/competitions/human-protein-atlas-image-classification), comprises 31 072 RGBY images of size 512x512 px, each of which belongs to one or more out of 28 classes. The labels correspond to protein organelle localizations. For Meta-Album, we performed two modifications: (1), to turn the dataset into a multi-class dataset, we dropped all images belonging to more than a single class and also those images that belong to classes with less than 40 members; (2) we converted the remaining images into RGB simply by dropping the yellow channel; this was also a common practice in the competition. Finally, and as for all datasets in Meta-Album, the images from the original dataset were resized to 128x128 image size. ### Dataset Details ![](https://meta-album.github.io/assets/img/samples/PRT.png) Meta Album ID: MCR.PRT Meta Album URL: [https://meta-album.github.io/datasets/PRT.html](https://meta-album.github.io/datasets/PRT.html) Domain ID: MCR Domain Name: Microscopy Dataset ID: PRT Dataset Name: Subcellular Human Protein Short Description: Subcellular protein patterns in human cells \# Classes: 21 \# Images: 15050 Keywords: human protein, subcellular Data Format: images Image size: 128x128 License (original data release): CC BY-SA 3.0 License URL(original data release): https://www.proteinatlas.org/about/licence License (Meta-Album data release): CC BY-SA 3.0 License URL (Meta-Album data release): [https://www.proteinatlas.org/about/licence](https://www.proteinatlas.org/about/licence) Source: The Human Protein Atlas Source URL: https://proteinatlas.org https://www.kaggle.com/c/human-protein-atlas-image-classification Original Author: Peter J Thul, Lovisa Akesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Bjork, Lisa Breckels, and others Original contact: contact@proteinatlas.org Meta Album author: Felix Mohr Created Date: 01 June 2022 Contact Name: Felix Mohr Contact Email: meta-album@chalearn.org Contact URL: [https://meta-album.github.io/](https://meta-album.github.io/) ### Cite this dataset ``` @article{thul2017subcellular, title={A subcellular map of the human proteome}, author={Thul, Peter J and Akesson, Lovisa and Wiking, Mikaela and Mahdessian, Diana and Geladaki, Aikaterini and Ait Blal, Hammou and Alm, Tove and Asplund, Anna and Bjork, Lars and Breckels, Lisa M}, journal={Science}, volume={356}, number={6340}, year={2017}, publisher={American Association for the Advancement of Science} } ``` ### Cite Meta-Album ``` @inproceedings{meta-album-2022, title={Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification}, author={Ullah, Ihsan and Carrion, Dustin and Escalera, Sergio and Guyon, Isabelle M and Huisman, Mike and Mohr, Felix and van Rijn, Jan N and Sun, Haozhe and Vanschoren, Joaquin and Vu, Phan Anh}, booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, url = {https://meta-album.github.io/}, year = {2022} } ``` ### More For more information on the Meta-Album dataset, please see the [[NeurIPS 2022 paper]](https://meta-album.github.io/paper/Meta-Album.pdf) For details on the dataset preprocessing, please see the [[supplementary materials]](https://openreview.net/attachment?id=70_Wx-dON3q&name=supplementary_material) Supporting code can be found on our [[GitHub repo]](https://github.com/ihsaan-ullah/meta-album) Meta-Album on Papers with Code [[Meta-Album]](https://paperswithcode.com/dataset/meta-album) ### Other versions of this dataset [[Micro]](https://www.openml.org/d/44278) [[Mini]](https://www.openml.org/d/44308)

3 features

CATEGORY (target)string21 unique values
0 missing
FILE_NAMEstring15050 unique values
0 missing
SUPER_CATEGORYnumeric0 unique values
15050 missing

19 properties

15050
Number of instances (rows) of the dataset.
3
Number of attributes (columns) of the dataset.
21
Number of distinct values of the target attribute (if it is nominal).
15050
Number of missing values in the dataset.
15050
Number of instances with at least one value missing.
1
Number of numeric attributes.
0
Number of nominal attributes.
98
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
100
Percentage of instances having missing values.
33.33
Percentage of missing values.
1
Average class difference between consecutive instances.
33.33
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
0
Percentage of nominal attributes.
16.04
Percentage of instances belonging to the most frequent class.
2414
Number of instances belonging to the most frequent class.
0.65
Percentage of instances belonging to the least frequent class.

1 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: CATEGORY
Define a new task