Data
Govt.-of-India-Census-2001-District-Wise

Govt.-of-India-Census-2001-District-Wise

active ARFF Database: Open Database, Contents: Database Contents Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context Census of India is a rich database which can tell stories of over a billion Indians. It is important not only for research point of view, but commercially as well for the organizations that want to understand India's complex yet strongly knitted heterogeneity. However, nowhere on the web, there exists a single database that combines the district- wise information of all the variables (most include no more than 4-5 out of over 50 variables!). Extracting and using data from Census of India 2001 is quite a laborious task since all data is made available in scattered PDFs district wise. Individual PDFs can be extracted from http://www.censusindia.gov.in/(S(ogvuk1y2e5sueoyc5eyc0g55))/Tables_Published/Basic_Data_Sheet.aspx. Content This database has been extracted from Census of 2001 and includes data of 590 districts, having around 80 variables each. In case of confusion regarding the context of the variable, refer to the following PDF and you will be able to make sense out of it: http://censusindia.gov.in/Dist_File/datasheet-2923.pdf All the extraction work can be found https://github.com/preetskhalsa97/census2001auto The final CSV can be found at finalCSV/all.csv The subtle hack that was used to automate extraction to a great extent was the the URLs of all the PDFs were same except the four digits (that were respective state and district codes). A few abbreviations used for states: AN- Andaman and Nicobar CG- Chhattisgarh DD- Daman and Diu DN_H- Dadra and Nagar Haveli JK- Jammu and Kashmir MP- Madhya Pradesh TN- Tamil Nadu UP- Uttar Pradesh WB- West Bengal A few variables for clarification: Growth..19912001- population growth from 1991 to 2001 X0..4 years- People in age group 0 to 4 years SC1- Scheduled Class with highest population Acknowledgements Inspiration This is a massive dataset which can be used to explain the interplay between education, caste, development, gender and much more. It really can explain a lot about India and propel data driven research. Happy Number Crunching!

82 features

Unnamed:_0numeric590 unique values
0 missing
Statestring35 unique values
0 missing
Districtstring589 unique values
0 missing
Personsnumeric590 unique values
0 missing
Malesnumeric590 unique values
0 missing
Femalesnumeric590 unique values
0 missing
Growth..1991...2001.string540 unique values
0 missing
Ruralstring582 unique values
0 missing
Urbanstring60 unique values
522 missing
Scheduled.Caste.populationstring14 unique values
563 missing
Percentage...SC.to.totalstring13 unique values
562 missing
Number.of.householdsnumeric587 unique values
3 missing
Household.size..per.household.numeric5 unique values
3 missing
Sex.ratio..females.per.1000.males.numeric219 unique values
3 missing
Sex.ratio..0.6.years.numeric160 unique values
3 missing
Scheduled.Tribe.populationstring538 unique values
3 missing
Percentage.to.total.population..ST.string406 unique values
3 missing
Persons..literatenumeric590 unique values
0 missing
Males..Literatenumeric590 unique values
0 missing
Females..Literatenumeric590 unique values
0 missing
Persons..literacy.ratenumeric551 unique values
0 missing
Males..Literatacy.Ratenumeric557 unique values
0 missing
Females..Literacy.Ratenumeric557 unique values
0 missing
Total.Educatednumeric587 unique values
3 missing
Data.without.levelnumeric582 unique values
3 missing
Below.Primarynumeric587 unique values
3 missing
Primarynumeric587 unique values
3 missing
Middlenumeric587 unique values
3 missing
Matric.Higher.Secondary.Diplomanumeric587 unique values
3 missing
Graduate.and.Abovenumeric586 unique values
3 missing
X0...4.yearsnumeric586 unique values
3 missing
X5...14.yearsnumeric587 unique values
3 missing
X15...59.yearsnumeric587 unique values
3 missing
X60.years.and.above..Incl..A.N.S..numeric586 unique values
3 missing
Total.workersnumeric590 unique values
0 missing
Main.workersnumeric590 unique values
0 missing
Marginal.workersnumeric589 unique values
0 missing
Non.workersnumeric590 unique values
0 missing
SC.1.Namestring76 unique values
51 missing
SC.1.Populationnumeric575 unique values
13 missing
SC.2.Namestring111 unique values
64 missing
SC.2.Populationnumeric570 unique values
13 missing
SC.3.Namestring128 unique values
66 missing
SC.3.Populationnumeric562 unique values
13 missing
Religeon.1.Namestring6 unique values
53 missing
Religeon.1.Populationnumeric590 unique values
0 missing
Religeon.2.Namestring6 unique values
53 missing
Religeon.2.Populationnumeric590 unique values
0 missing
Religeon.3.Namestring8 unique values
53 missing
Religeon.3.Populationnumeric586 unique values
0 missing
ST.1.Namestring118 unique values
0 missing
ST.1.Populationstring527 unique values
51 missing
ST.2.Namestring154 unique values
50 missing
ST.2.Populationstring511 unique values
53 missing
ST.3.Namestring170 unique values
50 missing
ST.3.Populationstring475 unique values
52 missing
Imp.Town.1.Namestring528 unique values
21 missing
Imp.Town.1.Populationnumeric576 unique values
14 missing
Imp.Town.2.Namestring480 unique values
97 missing
Imp.Town.2.Populationnumeric526 unique values
64 missing
Imp.Town.3.Namestring432 unique values
142 missing
Imp.Town.3.Populationnumeric480 unique values
108 missing
Total.Inhabited.Villagesnumeric521 unique values
12 missing
Drinking.water.facilitiesnumeric515 unique values
12 missing
Safe.Drinking.waternumeric522 unique values
12 missing
Electricity..Power.Supply.numeric480 unique values
12 missing
Electricity..domestic.string375 unique values
12 missing
Electricity..Agriculture.string95 unique values
208 missing
Primary.schoolnumeric487 unique values
12 missing
Middle.schoolsstring361 unique values
12 missing
Secondary.Sr.Secondary.schoolsstring286 unique values
12 missing
Collegestring34 unique values
52 missing
Medical.facilitystring412 unique values
12 missing
Primary.Health.Centrestring98 unique values
12 missing
Primary.Health.Sub.Centrestring270 unique values
12 missing
Post..telegraph.and.telephone.facilitynumeric422 unique values
12 missing
Bus.servicesstring401 unique values
12 missing
Paved.approach.roadstring458 unique values
12 missing
Mud.approach.roadstring469 unique values
12 missing
Permanent.Housenumeric437 unique values
0 missing
Semi.permanent.Housenumeric413 unique values
0 missing
Temporary.Housenumeric325 unique values
0 missing

19 properties

590
Number of instances (rows) of the dataset.
82
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
3219
Number of missing values in the dataset.
582
Number of instances with at least one value missing.
47
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
98.64
Percentage of instances having missing values.
6.65
Percentage of missing values.
Average class difference between consecutive instances.
57.32
Percentage of numeric attributes.
0.14
Number of attributes divided by the number of instances.
0
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task