OpenML
data-science-survey-on-Kaggle

data-science-survey-on-Kaggle

active ARFF GPL 2 Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context Throughout the world of data science, there are many languages and tools that can be used to complete a given task. While you are often able to use whichever tool you prefer, it is often important for analysts to work with similar platforms so that they can share their code with one another. Learning what professionals in the data science industry use while at work can help you gain a better understanding of things that you may be asked to do in the future. Content In this project, we are going to find out what tools and languages professionals use in their day-to-day work. Our data comes from the Kaggle Data Science Survey which includes responses from over 10,000 people that write code to analyze data in their daily work. Acknowledgements Kaggle and DataCamp helped me with the dataset.

4 features

Respondent (ignore)numeric10153 unique values
0 missing
WorkToolsSelectstring5248 unique values
2198 missing
LanguageRecommendationSelectstring13 unique values
3619 missing
EmployerIndustrystring16 unique values
1155 missing
WorkAlgorithmsSelectstring1420 unique values
2852 missing

19 properties

10153
Number of instances (rows) of the dataset.
4
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
9824
Number of missing values in the dataset.
4162
Number of instances with at least one value missing.
0
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
0
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
40.99
Percentage of instances having missing values.
Average class difference between consecutive instances.
24.19
Percentage of missing values.

0 tasks

Define a new task