이번 포스팅에서는 앞에사용했던 기법들을 이용해서 diabetes 데이터셋을 분석해 보겠습니다.
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.preprocessing import StandardScaler,MinMaxScaler
from sklearn.utils import all_estimators
import warnings
warnings.filterwarnings('ignore')
datasets = load_diabetes()
x = datasets.data
y = datasets.target.reshape(-1,1)
print(x.shape,y.shape)
(442, 10) (442, 1)
이전과 마찬가지로 데이터셋을 로드해준 후에 타겟 데이터와 feature데이터셋을 각각 x,y로 지정해 주었습니다. diabetes 데이터셋은 442개의 행으로 이루어져 있네요.
print(datasets.feature_names)
print(datasets.DESCR)
['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
.. _diabetes_dataset:
Diabetes dataset
----------------
Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.
**Data Set Characteristics:**
:Number of Instances: 442
:Number of Attributes: First 10 columns are numeric predictive values
:Target: Column 11 is a quantitative measure of disease progression one year after baseline
:Attribute Information:
- age age in years
- sex
- bmi body mass index
- bp average blood pressure
- s1 tc, T-Cells (a type of white blood cells)
- s2 ldl, low-density lipoproteins
- s3 hdl, high-density lipoproteins
- s4 tch, thyroid stimulating hormone
- s5 ltg, lamotrigine
- s6 glu, blood sugar level
Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).
Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)
columns = datasets.feature_names
columns.append("Target(Diabetes dataset)")
data = np.concatenate([x,y],axis=1)
dataframe = pd.DataFrame(data,columns = columns)
dataframe
age | sex | bmi | bp | s1 | s2 | s3 | s4 | s5 | s6 | Target(Diabetes dataset) | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.038076 | 0.050680 | 0.061696 | 0.021872 | -0.044223 | -0.034821 | -0.043401 | -0.002592 | 0.019908 | -0.017646 | 151.0 |
1 | -0.001882 | -0.044642 | -0.051474 | -0.026328 | -0.008449 | -0.019163 | 0.074412 | -0.039493 | -0.068330 | -0.092204 | 75.0 |
2 | 0.085299 | 0.050680 | 0.044451 | -0.005671 | -0.045599 | -0.034194 | -0.032356 | -0.002592 | 0.002864 | -0.025930 | 141.0 |
3 | -0.089063 | -0.044642 | -0.011595 | -0.036656 | 0.012191 | 0.024991 | -0.036038 | 0.034309 | 0.022692 | -0.009362 | 206.0 |
4 | 0.005383 | -0.044642 | -0.036385 | 0.021872 | 0.003935 | 0.015596 | 0.008142 | -0.002592 | -0.031991 | -0.046641 | 135.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
437 | 0.041708 | 0.050680 | 0.019662 | 0.059744 | -0.005697 | -0.002566 | -0.028674 | -0.002592 | 0.031193 | 0.007207 | 178.0 |
438 | -0.005515 | 0.050680 | -0.015906 | -0.067642 | 0.049341 | 0.079165 | -0.028674 | 0.034309 | -0.018118 | 0.044485 | 104.0 |
439 | 0.041708 | 0.050680 | -0.015906 | 0.017282 | -0.037344 | -0.013840 | -0.024993 | -0.011080 | -0.046879 | 0.015491 | 132.0 |
440 | -0.045472 | -0.044642 | 0.039062 | 0.001215 | 0.016318 | 0.015283 | -0.028674 | 0.026560 | 0.044528 | -0.025930 | 220.0 |
441 | -0.045472 | -0.044642 | -0.073030 | -0.081414 | 0.083740 | 0.027809 | 0.173816 | -0.039493 | -0.004220 | 0.003064 | 57.0 |
442 rows × 11 columns
데이터셋을 보면 우리가 예측해야할 값이 클래스변수(분류를 해야하는)가 아닌 수치인 것으로 보아 회귀문제네요.
마찬가지로 데이터셋을 위의 모양에서 다시 잘라주겠습니다.
datasets = dataframe.values
x = datasets[:,:-1]
y = datasets[:,-1]
print(x.shape,y.shape)
(442, 10) (442,)
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)
print(x_train.shape,x_test.shape)
print(y_train.shape,y_test.shape)
(353, 10) (89, 10)
(353,) (89,)
all_Algorithm = all_estimators(type_filter = 'regressor')
scaler_list = [StandardScaler(),MinMaxScaler()]
best_r2=[]
for scaler in scaler_list:
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)
for (name,algorithm) in all_Algorithm:
try:
model = algorithm()
model.fit(x_train,y_train)
y_pred = model.predict(x_test)
r2 = r2_score(y_test,y_pred)
best_r2.append(r2)
print(name , "\'s R2 : ", r2)
print("-----------------------------------------------")
print(r2,max(best_r2))
if max(best_r2)<=r2:
final_r2 = r2
final_r2_model = name
final_r2_scaler = scaler
except:
continue
print("Best R2 & model & scaler: ",final_r2," & ",final_r2_model, "&",scaler)
ARDRegression 's R2 : 0.3750119550784249
-----------------------------------------------
0.3750119550784249 0.3750119550784249
AdaBoostRegressor 's R2 : 0.35328796754369163
-----------------------------------------------
0.35328796754369163 0.3750119550784249
BaggingRegressor 's R2 : 0.31602822849489987
-----------------------------------------------
0.31602822849489987 0.3750119550784249
BayesianRidge 's R2 : 0.37353709794963996
-----------------------------------------------
0.37353709794963996 0.3750119550784249
CCA 's R2 : 0.3645458498695413
-----------------------------------------------
0.3645458498695413 0.3750119550784249
DecisionTreeRegressor 's R2 : -0.5894581868323021
-----------------------------------------------
-0.5894581868323021 0.3750119550784249
DummyRegressor 's R2 : -0.00123645803999195
-----------------------------------------------
-0.00123645803999195 0.3750119550784249
ElasticNet 's R2 : 0.3790340028177188
-----------------------------------------------
0.3790340028177188 0.3790340028177188
ElasticNetCV 's R2 : 0.3754605025485025
-----------------------------------------------
0.3754605025485025 0.3790340028177188
ExtraTreeRegressor 's R2 : -0.3050881139739763
-----------------------------------------------
-0.3050881139739763 0.3790340028177188
ExtraTreesRegressor 's R2 : 0.3497806996920655
-----------------------------------------------
0.3497806996920655 0.3790340028177188
GammaRegressor 's R2 : 0.3701174806186931
-----------------------------------------------
0.3701174806186931 0.3790340028177188
GaussianProcessRegressor 's R2 : -1.069012594401333
-----------------------------------------------
-1.069012594401333 0.3790340028177188
GradientBoostingRegressor 's R2 : 0.33250985042993264
-----------------------------------------------
0.33250985042993264 0.3790340028177188
HistGradientBoostingRegressor 's R2 : 0.2751147773065651
-----------------------------------------------
0.2751147773065651 0.3790340028177188
HuberRegressor 's R2 : 0.34174750477637783
-----------------------------------------------
0.34174750477637783 0.3790340028177188
KNeighborsRegressor 's R2 : 0.25446926061787567
-----------------------------------------------
0.25446926061787567 0.3790340028177188
KernelRidge 's R2 : -4.165048939009507
-----------------------------------------------
-4.165048939009507 0.3790340028177188
Lars 's R2 : 0.34837209873152053
-----------------------------------------------
0.34837209873152053 0.3790340028177188
LarsCV 's R2 : 0.40257159246468
-----------------------------------------------
0.40257159246468 0.40257159246468
Lasso 's R2 : 0.38595024820903034
-----------------------------------------------
0.38595024820903034 0.40257159246468
LassoCV 's R2 : 0.35338208122936043
-----------------------------------------------
0.35338208122936043 0.40257159246468
LassoLars 's R2 : 0.35382890509034337
-----------------------------------------------
0.35382890509034337 0.40257159246468
LassoLarsCV 's R2 : 0.3531078796996593
-----------------------------------------------
0.3531078796996593 0.40257159246468
LassoLarsIC 's R2 : 0.39914059916427813
-----------------------------------------------
0.39914059916427813 0.40257159246468
LinearRegression 's R2 : 0.34837209873152
-----------------------------------------------
0.34837209873152 0.40257159246468
LinearSVR 's R2 : 0.23760072727231019
-----------------------------------------------
0.23760072727231019 0.40257159246468
MLPRegressor 's R2 : -1.25918081681785
-----------------------------------------------
-1.25918081681785 0.40257159246468
NuSVR 's R2 : 0.14418144713660686
-----------------------------------------------
0.14418144713660686 0.40257159246468
OrthogonalMatchingPursuit 's R2 : 0.30876981915826296
-----------------------------------------------
0.30876981915826296 0.40257159246468
OrthogonalMatchingPursuitCV 's R2 : 0.34286930368891755
-----------------------------------------------
0.34286930368891755 0.40257159246468
PLSCanonical 's R2 : -2.0847428167722253
-----------------------------------------------
-2.0847428167722253 0.40257159246468
PLSRegression 's R2 : 0.33865842534282786
-----------------------------------------------
0.33865842534282786 0.40257159246468
PassiveAggressiveRegressor 's R2 : 0.30424743439568547
-----------------------------------------------
0.30424743439568547 0.40257159246468
PoissonRegressor 's R2 : 0.39452724195999944
-----------------------------------------------
0.39452724195999944 0.40257159246468
RANSACRegressor 's R2 : 0.07825356015037255
-----------------------------------------------
0.07825356015037255 0.40257159246468
RandomForestRegressor 's R2 : 0.3917103268004255
-----------------------------------------------
0.3917103268004255 0.40257159246468
Ridge 's R2 : 0.3564366035510431
-----------------------------------------------
0.3564366035510431 0.40257159246468
RidgeCV 's R2 : 0.3494465130272413
-----------------------------------------------
0.3494465130272413 0.40257159246468
SGDRegressor 's R2 : 0.37452975226115104
-----------------------------------------------
0.37452975226115104 0.40257159246468
SVR 's R2 : 0.13622570343080198
-----------------------------------------------
0.13622570343080198 0.40257159246468
TheilSenRegressor 's R2 : 0.36489787952814867
-----------------------------------------------
0.36489787952814867 0.40257159246468
TransformedTargetRegressor 's R2 : 0.34837209873152
-----------------------------------------------
0.34837209873152 0.40257159246468
TweedieRegressor 's R2 : 0.36243293782488917
-----------------------------------------------
0.36243293782488917 0.40257159246468
ARDRegression 's R2 : 0.3750102764327631
-----------------------------------------------
0.3750102764327631 0.40257159246468
AdaBoostRegressor 's R2 : 0.37264064070142944
-----------------------------------------------
0.37264064070142944 0.40257159246468
BaggingRegressor 's R2 : 0.31819177735841186
-----------------------------------------------
0.31819177735841186 0.40257159246468
BayesianRidge 's R2 : 0.3752167808635509
-----------------------------------------------
0.3752167808635509 0.40257159246468
CCA 's R2 : 0.3645458498695415
-----------------------------------------------
0.3645458498695415 0.40257159246468
DecisionTreeRegressor 's R2 : -0.5959577953403712
-----------------------------------------------
-0.5959577953403712 0.40257159246468
DummyRegressor 's R2 : -0.00123645803999195
-----------------------------------------------
-0.00123645803999195 0.40257159246468
ElasticNet 's R2 : 0.12758535659494175
-----------------------------------------------
0.12758535659494175 0.40257159246468
ElasticNetCV 's R2 : 0.3871581150993674
-----------------------------------------------
0.3871581150993674 0.40257159246468
ExtraTreeRegressor 's R2 : -0.3061839524050296
-----------------------------------------------
-0.3061839524050296 0.40257159246468
ExtraTreesRegressor 's R2 : 0.388669846985901
-----------------------------------------------
0.388669846985901 0.40257159246468
GammaRegressor 's R2 : 0.0808616210477957
-----------------------------------------------
0.0808616210477957 0.40257159246468
GaussianProcessRegressor 's R2 : -18.53318219435217
-----------------------------------------------
-18.53318219435217 0.40257159246468
GradientBoostingRegressor 's R2 : 0.32579797431094626
-----------------------------------------------
0.32579797431094626 0.40257159246468
HistGradientBoostingRegressor 's R2 : 0.27523810100653423
-----------------------------------------------
0.27523810100653423 0.40257159246468
HuberRegressor 's R2 : 0.34367339121978224
-----------------------------------------------
0.34367339121978224 0.40257159246468
KNeighborsRegressor 's R2 : 0.24594913899940074
-----------------------------------------------
0.24594913899940074 0.40257159246468
KernelRidge 's R2 : 0.3628155857813502
-----------------------------------------------
0.3628155857813502 0.40257159246468
Lars 's R2 : 0.3483720987315203
-----------------------------------------------
0.3483720987315203 0.40257159246468
LarsCV 's R2 : 0.40257159246468
-----------------------------------------------
0.40257159246468 0.40257159246468
Lasso 's R2 : 0.416032632027838
-----------------------------------------------
0.416032632027838 0.416032632027838
LassoCV 's R2 : 0.3540874239008369
-----------------------------------------------
0.3540874239008369 0.416032632027838
LassoLars 's R2 : 0.3538289050903437
-----------------------------------------------
0.3538289050903437 0.416032632027838
LassoLarsCV 's R2 : 0.3531078796996586
-----------------------------------------------
0.3531078796996586 0.416032632027838
LassoLarsIC 's R2 : 0.39914059916427813
-----------------------------------------------
0.39914059916427813 0.416032632027838
LinearRegression 's R2 : 0.3483720987315203
-----------------------------------------------
0.3483720987315203 0.416032632027838
LinearSVR 's R2 : 0.18218041356580628
-----------------------------------------------
0.18218041356580628 0.416032632027838
MLPRegressor 's R2 : -0.7391046842747242
-----------------------------------------------
-0.7391046842747242 0.416032632027838
NuSVR 's R2 : 0.1330302415587895
-----------------------------------------------
0.1330302415587895 0.416032632027838
OrthogonalMatchingPursuit 's R2 : 0.30876981915826285
-----------------------------------------------
0.30876981915826285 0.416032632027838
OrthogonalMatchingPursuitCV 's R2 : 0.3428693036889173
-----------------------------------------------
0.3428693036889173 0.416032632027838
PLSCanonical 's R2 : -2.0847428167722244
-----------------------------------------------
-2.0847428167722244 0.416032632027838
PLSRegression 's R2 : 0.33865842534282786
-----------------------------------------------
0.33865842534282786 0.416032632027838
PassiveAggressiveRegressor 's R2 : 0.33870828293724464
-----------------------------------------------
0.33870828293724464 0.416032632027838
PoissonRegressor 's R2 : 0.3892955288595463
-----------------------------------------------
0.3892955288595463 0.416032632027838
RANSACRegressor 's R2 : 0.011896314581101852
-----------------------------------------------
0.011896314581101852 0.416032632027838
RadiusNeighborsRegressor 's R2 : 0.1656451755381757
-----------------------------------------------
0.1656451755381757 0.416032632027838
RandomForestRegressor 's R2 : 0.35712332450451356
-----------------------------------------------
0.35712332450451356 0.416032632027838
Ridge 's R2 : 0.3795714326643239
-----------------------------------------------
0.3795714326643239 0.416032632027838
RidgeCV 's R2 : 0.36502947676996134
-----------------------------------------------
0.36502947676996134 0.416032632027838
SGDRegressor 's R2 : 0.3790051512357022
-----------------------------------------------
0.3790051512357022 0.416032632027838
SVR 's R2 : 0.13059068934603013
-----------------------------------------------
0.13059068934603013 0.416032632027838
TheilSenRegressor 's R2 : 0.3668935873138036
-----------------------------------------------
0.3668935873138036 0.416032632027838
TransformedTargetRegressor 's R2 : 0.3483720987315203
-----------------------------------------------
0.3483720987315203 0.416032632027838
TweedieRegressor 's R2 : 0.07959167686934143
-----------------------------------------------
0.07959167686934143 0.416032632027838
Best R2 & model & scaler: 0.416032632027838 & Lasso & MinMaxScaler()
print("Best R2 & model & scaler: ",final_r2," & ",final_r2_model, "&",scaler)
Best R2 & model & scaler: 0.416032632027838 & Lasso & MinMaxScaler()
sklearn_iris (0) | 2021.07.21 |
---|---|
sklearn_wine (0) | 2021.07.15 |
사이킷런 sklearn - boston house price (1) | 2021.07.06 |
Sklearn 함수 (0) | 2021.06.28 |
댓글 영역