Data
SGEMM_GPU_kernel_performance

SGEMM_GPU_kernel_performance

active ARFF CC BY 4.0 Visibility: public Uploaded 20-04-2022 by Sebastian Fischer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Dataset description This data set measures the running time of a matrix-matrix product $A \times B = C$, where all matrices have size 2048 x 2048, using a parameterizable *SGEMM GPU* (Single Precision General Matrix Multiply) kernel with 241600 possible parameter combinations. For each tested combination, 4 runs were performed and their results are reported as the 4 last columns. All times are measured in milliseconds*. There are 14 parameters, the first 10 are ordinal and can only take up to 4 different powers of two values, and the 4 last variables are binary. Out of 1327104 total parameter combinations, only 241600 are feasible (due to various kernel constraints). This data set contains the results for all these feasible combinations. The experiment was run on a desktop workstation running Ubuntu 16.04 Linux with an Intel Core i5 (3.5GHz), 16GB RAM, and a NVidia Geforce GTX 680 4GB GF580 GTX-1.5GB GPU. We use the 'gemm_fast' kernel from the automatic OpenCL kernel tuning library 'CLTune' (https://github.com/CNugteren/CLTune). \* *Note*: For this kind of data sets it is usually better to work with the logarithm of the running times (see e.g. Falch and Elster, 'Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications', 2015). Attribute description *Independent variables* * MWG, NWG: per-matrix 2D tiling at workgroup level: {16, 32, 64, 128} (integer) * KWG: inner dimension of 2D tiling at workgroup level: {16, 32} (integer) * MDIMC, NDIMC: local workgroup size: {8, 16, 32} (integer) 6-7. MDIMA, NDIMB: local memory shape: {8, 16, 32} (integer) * KWI: kernel loop unrolling factor: {2, 8} (integer) * VWM, VWN: per-matrix vector widths for loading and storing: {1, 2, 4, 8} (integer) * STRM, STRN: enable stride for accessing off-chip memory within a single thread: {0, 1} (categorical) * SA, SB: per-matrix manual caching of the 2D workgroup tile: {0, 1} (categorical) - *Output* * Run1, Run2, Run3, Run4: performance times in milliseconds for 4 independent runs using the same parameters. They range between 13.25 and 3397.08. Run1 is used as the default target variable. Related Studies Rafael Ballester-Ripoll, Enrique G. Paredes, Renato Pajarola. Sobol Tensor Trains for Global Sensitivity Analysis. In arXiv Computer Science / Numerical Analysis e-prints, 2017, https://doi.org/10.1016/j.ress.2018.11.007 Authors Enrique Paredes and Rafael Ballester-Ripoll. The original data was obtained from the UCI Machine Learning repository [Link](https://archive.ics.uci.edu/ml/datasets/sgemm+gpu+kernel+performance). Citation Please cite one of the following papers: * Rafael Ballester-Ripoll, Enrique G. Paredes, Renato Pajarola. Sobol Tensor Trains for Global Sensitivity Analysis. In arXiv Computer Science / Numerical Analysis e-prints, 2017, https://arxiv.org/abs/1712.00233 * Cedric Nugteren and Valeriu Codreanu. CLTune: A Generic Auto-Tuner for OpenCL Kernels. In: MCSoC: 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip. IEEE, 2015, https://doi.org/10.1109/MCSoC.2015.10

15 features

Run1 (target)numeric58161 unique values
0 missing
MWGnumeric4 unique values
0 missing
NWGnumeric4 unique values
0 missing
KWGnumeric2 unique values
0 missing
MDIMCnumeric3 unique values
0 missing
NDIMCnumeric3 unique values
0 missing
MDIMAnumeric3 unique values
0 missing
NDIMBnumeric3 unique values
0 missing
KWInumeric2 unique values
0 missing
VWMnumeric4 unique values
0 missing
VWNnumeric4 unique values
0 missing
STRMnumeric2 unique values
0 missing
STRNnumeric2 unique values
0 missing
SAnumeric2 unique values
0 missing
SBnumeric2 unique values
0 missing
Run2 (ignore)numeric58269 unique values
0 missing
Run3 (ignore)numeric58264 unique values
0 missing
Run4 (ignore)numeric58154 unique values
0 missing

19 properties

241600
Number of instances (rows) of the dataset.
15
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
15
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
100
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
-95.49
Average class difference between consecutive instances.
0
Percentage of missing values.

1 tasks

0 runs - estimation_procedure: 33% Holdout set - target_feature: Run1
Define a new task