Datasets
No need to download common datasets yourselves!
On the Umbrella Cluster, we maintain a list of datasets frequently used to either train or benchmark a model, usually in the context of machine learning. Instead of occupying space on your own space or waiting for the download of the data to finish to your own space, freely use the available datasets at the dataset folder on Umbrella Cluster.
List of available datasets
Name | Versions | Free access | Path | License | References |
---|---|---|---|---|---|
ADE20K | 2021-17-01 | ✓ | /dataset/ADE20K | ADE20K license | Website |
Note: the ADE20K dataset must be unzipped before use. E.g.: datadir=$TMPDIR/ade20k # <-- use this in jobs (and Open OnDemand interactive; and through salloc, srun) datadir=/scratch-shared/$USER/ade20k # <-- use this in interactive sessions mkdir $datadir unzip /dataset/ADE20K/ADE20K.zip -d $datadir | |||||
AlphaFold | 2.3.1 | ✓ | /dataset/AlphaFold Related module: module load AlphaFold/2.3.1-foss-2022a | Apache 2.0 | GitHub |
Note: AlphaFold has a related module: module load AlphaFold/2.3.1-foss-2022a | |||||
CAMELYON16 | — | ✓ | /dataset/CAMELYON16 | CC0 1.0 | Website |
CIFAR-10 | — | ✓ | /dataset/CIFAR-10 | See website | Website |
MNIST | — | ✓ | /dataset/MNIST | CC BY-SA 4.0 | Website |
Dataset or model not listed?
If the dataset or model is missing, it can be downloaded or uploaded to Umbrella Cluster. Please contact us if you think other people would also use this model or dataset, we can then add a copy of this to the public model and dataset space. This way, we alleviate having many duplicates of models or datasets on the system and users needing to download or uploaded from external sources. Of course, if your dataset or model is proprietary or privacy-sensitive, this does not apply.
Getting access to restricted datasets and models
Some datasets and models are not accessible by default on the Umbrella Cluster, because they require explicit acceptance of a license or agreeing to a terms of use on the website of the dataset or model provider.
If you would like to access these datasets or models on the Umbrella Cluster, please contact the system administrators with a screenshot of the dataset or model provider giving you access to the data.
Even if access to a datasets is not restricted, it usually still has a license and a terms of conduct. By using the dataset or model you are agreeing to both the license and the terms of conduct.