Commit ca0977cb authored by Liwen Huang's avatar Liwen Huang
Browse files

Upload New File

parent 55669682
%% Cell type:markdown id: tags:
## Question 1
Write a function called proportion_of_education which returns the proportion of children in the dataset who had a mother with the education levels equal to less than high school (<12), high school (12), more than high school but not a college graduate (>12) and college degree.
This function should return a dictionary in the form of (use the correct numbers, do not round numbers):
{"less than high school":0.2,
"high school":0.4,
"more than high school but not college":0.2,
"college":0.2}
%% Cell type:code id: tags:
``` python
import pandas as pd
import numpy as np
df = pd.read_csv('datasets/NISPUF17.csv', index_col = 0)
df.head()
```
%%%% Output: execute_result
SEQNUMC SEQNUMHH PDAT PROVWT_D RDDWT_D STRATUM YEAR AGECPOXR \
1 128521 12852 2 NaN 235.916956 1031 2017 NaN
2 10741 1074 2 NaN 957.353840 1068 2017 NaN
3 220011 22001 2 NaN 189.611299 1050 2017 NaN
4 86131 8613 1 675.430817 333.447418 1040 2017 NaN
5 227141 22714 1 482.617748 278.768063 1008 2017 NaN
HAD_CPOX AGEGRP ... XVRCTY2 XVRCTY3 XVRCTY4 XVRCTY5 XVRCTY6 \
1 2 1 ... NaN NaN NaN
2 2 1 ... NaN NaN NaN
3 2 3 ... NaN NaN NaN
4 2 1 ... NaN NaN NaN
5 2 1 ... NaN NaN NaN
XVRCTY7 XVRCTY8 XVRCTY9 INS_STAT2_I INS_BREAK_I
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN 1.0 2.0
5 NaN NaN NaN 2.0 1.0
[5 rows x 453 columns]
%% Cell type:code id: tags:
``` python
edu = df['EDUC1']
edu.head()
```
%%%% Output: execute_result
1 4
2 3
3 3
4 4
5 1
Name: EDUC1, dtype: int64
%% Cell type:code id: tags:
``` python
mum_edu = np.sort(edu.values)
mum_edu
```
%%%% Output: execute_result
array([1, 1, 1, ..., 4, 4, 4], dtype=int64)
%% Cell type:code id: tags:
``` python
prop_edu = {"less than high school":0.00,
"high school":0,
"more than high school but not college":0,
"college":0}
prop_edu
```
%%%% Output: execute_result
{'less than high school': 0.0,
'high school': 0,
'more than high school but not college': 0,
'college': 0}
%% Cell type:code id: tags:
``` python
n = len(mum_edu)
```
%% Cell type:code id: tags:
``` python
prop_edu["less than high school"] = np.sum(mum_edu == 1)/n
prop_edu
```
%%%% Output: execute_result
{'less than high school': 0.10202002459160373,
'high school': 0.172352011241876,
'more than high school but not college': 0.24588090637625154,
'college': 0.47974705779026877}
%% Cell type:code id: tags:
``` python
prop_edu["high school"] = np.sum(mum_edu == 2)/n
prop_edu
```
%%%% Output: execute_result
{'less than high school': 0.10202002459160373,
'high school': 0.172352011241876,
'more than high school but not college': 0,
'college': 0}
%% Cell type:code id: tags:
``` python
prop_edu["more than high school but not college"] = np.sum(mum_edu == 3)/n
prop_edu["college"] = np.sum(mum_edu == 4)/n
prop_edu
```
%%%% Output: execute_result
{'less than high school': 0.10202002459160373,
'high school': 0.172352011241876,
'more than high school but not college': 0.24588090637625154,
'college': 0.47974705779026877}
%% Cell type:code id: tags:
``` python
print(prop_edu)
```
%%%% Output: error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-469d995073e1> in <module>
----> 1 print(round(prop_edu, 2))
TypeError: type dict doesn't define __round__ method
%% Cell type:code id: tags:
``` python
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment