Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original description:
Author:
Source: Unknown -
Please cite:
The Computer Activity databases are a collection of computer systems
activity measures. The data was collected from a Sun Sparcstation
20/712 with 128 Mbytes of memory running in a multi-user university
department. Users would typically be doing a large variety of tasks
ranging from accessing the internet, editing files or running very
cpu-bound programs. The data was collected continuously on two
separate occasions. On both occassions, system activity was gathered
every 5 seconds. The final dataset is taken from both occasions with
equal numbers of observations coming from each collection epoch.
System measures used:
1. lread - Reads (transfers per second ) between system memory and user memory.
2. lwrite - writes (transfers per second) between system memory and user memory.
3. scall - Number of system calls of all types per second.
4. sread - Number of system read calls per second.
5. swrite - Number of system write calls per second .
6. fork - Number of system fork calls per second.
7. exec - Number of system exec calls per second.
8. rchar - Number of characters transferred per second by system read calls.
9. wchar - Number of characters transfreed per second by system write calls.
10. pgout - Number of page out requests per second.
11. ppgout - Number of pages, paged out per second.
12. pgfree - Number of pages per second placed on the free list.
13. pgscan - Number of pages checked if they can be freed per second.
14. atch - Number of page attaches (satisfying a page fault by reclaiming a page in memory) per second.
15. pgin - Number of page-in requests per second.
16. ppgin - Number of pages paged in per second.
17. pflt - Number of page faults caused by protection errors (copy-on-writes).
18. vflt - Number of page faults caused by address translation.
19. runqsz - Process run queue size.
20. freemem - Number of memory pages available to user processes.
21. freeswap - Number of disk blocks available for page swapping.
22. usr - Portion of time (%) that cpus run in user mode.
23. sys - Portion of time (%) that cpus run in system mode.
24. wio - Portion of time (%) that cpus are idle waiting for block IO.
25. idle - Portion of time (%) that cpus are otherwise idle.
The two different regression tasks obtained from these databases are:
CompAct
Predict usr, the portion of time that cpus run in user mode from all attributes 1-21.
CompAct(s)
Predict usr using a restricted number (excluding the paging information (10-18)
Source: collection of regression datasets by Luis Torgo (ltorgo@ncc.up.pt) at
http://www.ncc.up.pt/~ltorgo/Regression/DataSets.html
Original source: DELVE repository of data.
Characteristics: 8192 cases, 22 continuous attributes