이번 포스팅에서는 앞에사용했던 기법들을 이용해서 diabetes 데이터셋을 분석해 보겠습니다.
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.preprocessing import StandardScaler,MinMaxScaler
from sklearn.utils import all_estimators
import warnings
warnings.filterwarnings('ignore')
datasets = load_diabetes()
x = datasets.data
y = datasets.target.reshape(-1,1)
print(x.shape,y.shape)
(442, 10) (442, 1)
이전과 마찬가지로 데이터셋을 로드해준 후에 타겟 데이터와 feature데이터셋을 각각 x,y로 지정해 주었습니다. diabetes 데이터셋은 442개의 행으로 이루어져 있네요.
print(datasets.feature_names)
print(datasets.DESCR)
['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] .. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline. **Data Set Characteristics:** :Number of Instances: 442 :Number of Attributes: First 10 columns are numeric predictive values :Target: Column 11 is a quantitative measure of disease progression one year after baseline :Attribute Information: - age age in years - sex - bmi body mass index - bp average blood pressure - s1 tc, T-Cells (a type of white blood cells) - s2 ldl, low-density lipoproteins - s3 hdl, high-density lipoproteins - s4 tch, thyroid stimulating hormone - s5 ltg, lamotrigine - s6 glu, blood sugar level Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1). Source URL: https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html For more information see: Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499. (https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)
columns = datasets.feature_names
columns.append("Target(Diabetes dataset)")
data = np.concatenate([x,y],axis=1)
dataframe = pd.DataFrame(data,columns = columns)
dataframe
age | sex | bmi | bp | s1 | s2 | s3 | s4 | s5 | s6 | Target(Diabetes dataset) | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.038076 | 0.050680 | 0.061696 | 0.021872 | -0.044223 | -0.034821 | -0.043401 | -0.002592 | 0.019908 | -0.017646 | 151.0 |
1 | -0.001882 | -0.044642 | -0.051474 | -0.026328 | -0.008449 | -0.019163 | 0.074412 | -0.039493 | -0.068330 | -0.092204 | 75.0 |
2 | 0.085299 | 0.050680 | 0.044451 | -0.005671 | -0.045599 | -0.034194 | -0.032356 | -0.002592 | 0.002864 | -0.025930 | 141.0 |
3 | -0.089063 | -0.044642 | -0.011595 | -0.036656 | 0.012191 | 0.024991 | -0.036038 | 0.034309 | 0.022692 | -0.009362 | 206.0 |
4 | 0.005383 | -0.044642 | -0.036385 | 0.021872 | 0.003935 | 0.015596 | 0.008142 | -0.002592 | -0.031991 | -0.046641 | 135.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
437 | 0.041708 | 0.050680 | 0.019662 | 0.059744 | -0.005697 | -0.002566 | -0.028674 | -0.002592 | 0.031193 | 0.007207 | 178.0 |
438 | -0.005515 | 0.050680 | -0.015906 | -0.067642 | 0.049341 | 0.079165 | -0.028674 | 0.034309 | -0.018118 | 0.044485 | 104.0 |
439 | 0.041708 | 0.050680 | -0.015906 | 0.017282 | -0.037344 | -0.013840 | -0.024993 | -0.011080 | -0.046879 | 0.015491 | 132.0 |
440 | -0.045472 | -0.044642 | 0.039062 | 0.001215 | 0.016318 | 0.015283 | -0.028674 | 0.026560 | 0.044528 | -0.025930 | 220.0 |
441 | -0.045472 | -0.044642 | -0.073030 | -0.081414 | 0.083740 | 0.027809 | 0.173816 | -0.039493 | -0.004220 | 0.003064 | 57.0 |
442 rows × 11 columns
데이터셋을 보면 우리가 예측해야할 값이 클래스변수(분류를 해야하는)가 아닌 수치인 것으로 보아 회귀문제네요.
마찬가지로 데이터셋을 위의 모양에서 다시 잘라주겠습니다.
datasets = dataframe.values
x = datasets[:,:-1]
y = datasets[:,-1]
print(x.shape,y.shape)
(442, 10) (442,)
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)
print(x_train.shape,x_test.shape)
print(y_train.shape,y_test.shape)
(353, 10) (89, 10) (353,) (89,)
all_Algorithm = all_estimators(type_filter = 'regressor')
scaler_list = [StandardScaler(),MinMaxScaler()]
best_r2=[]
for scaler in scaler_list:
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)
for (name,algorithm) in all_Algorithm:
try:
model = algorithm()
model.fit(x_train,y_train)
y_pred = model.predict(x_test)
r2 = r2_score(y_test,y_pred)
best_r2.append(r2)
print(name , "\'s R2 : ", r2)
print("-----------------------------------------------")
print(r2,max(best_r2))
if max(best_r2)<=r2:
final_r2 = r2
final_r2_model = name
final_r2_scaler = scaler
except:
continue
print("Best R2 & model & scaler: ",final_r2," & ",final_r2_model, "&",scaler)
ARDRegression 's R2 : 0.3750119550784249 ----------------------------------------------- 0.3750119550784249 0.3750119550784249 AdaBoostRegressor 's R2 : 0.35328796754369163 ----------------------------------------------- 0.35328796754369163 0.3750119550784249 BaggingRegressor 's R2 : 0.31602822849489987 ----------------------------------------------- 0.31602822849489987 0.3750119550784249 BayesianRidge 's R2 : 0.37353709794963996 ----------------------------------------------- 0.37353709794963996 0.3750119550784249 CCA 's R2 : 0.3645458498695413 ----------------------------------------------- 0.3645458498695413 0.3750119550784249 DecisionTreeRegressor 's R2 : -0.5894581868323021 ----------------------------------------------- -0.5894581868323021 0.3750119550784249 DummyRegressor 's R2 : -0.00123645803999195 ----------------------------------------------- -0.00123645803999195 0.3750119550784249 ElasticNet 's R2 : 0.3790340028177188 ----------------------------------------------- 0.3790340028177188 0.3790340028177188 ElasticNetCV 's R2 : 0.3754605025485025 ----------------------------------------------- 0.3754605025485025 0.3790340028177188 ExtraTreeRegressor 's R2 : -0.3050881139739763 ----------------------------------------------- -0.3050881139739763 0.3790340028177188 ExtraTreesRegressor 's R2 : 0.3497806996920655 ----------------------------------------------- 0.3497806996920655 0.3790340028177188 GammaRegressor 's R2 : 0.3701174806186931 ----------------------------------------------- 0.3701174806186931 0.3790340028177188 GaussianProcessRegressor 's R2 : -1.069012594401333 ----------------------------------------------- -1.069012594401333 0.3790340028177188 GradientBoostingRegressor 's R2 : 0.33250985042993264 ----------------------------------------------- 0.33250985042993264 0.3790340028177188 HistGradientBoostingRegressor 's R2 : 0.2751147773065651 ----------------------------------------------- 0.2751147773065651 0.3790340028177188 HuberRegressor 's R2 : 0.34174750477637783 ----------------------------------------------- 0.34174750477637783 0.3790340028177188 KNeighborsRegressor 's R2 : 0.25446926061787567 ----------------------------------------------- 0.25446926061787567 0.3790340028177188 KernelRidge 's R2 : -4.165048939009507 ----------------------------------------------- -4.165048939009507 0.3790340028177188 Lars 's R2 : 0.34837209873152053 ----------------------------------------------- 0.34837209873152053 0.3790340028177188 LarsCV 's R2 : 0.40257159246468 ----------------------------------------------- 0.40257159246468 0.40257159246468 Lasso 's R2 : 0.38595024820903034 ----------------------------------------------- 0.38595024820903034 0.40257159246468 LassoCV 's R2 : 0.35338208122936043 ----------------------------------------------- 0.35338208122936043 0.40257159246468 LassoLars 's R2 : 0.35382890509034337 ----------------------------------------------- 0.35382890509034337 0.40257159246468 LassoLarsCV 's R2 : 0.3531078796996593 ----------------------------------------------- 0.3531078796996593 0.40257159246468 LassoLarsIC 's R2 : 0.39914059916427813 ----------------------------------------------- 0.39914059916427813 0.40257159246468 LinearRegression 's R2 : 0.34837209873152 ----------------------------------------------- 0.34837209873152 0.40257159246468 LinearSVR 's R2 : 0.23760072727231019 ----------------------------------------------- 0.23760072727231019 0.40257159246468 MLPRegressor 's R2 : -1.25918081681785 ----------------------------------------------- -1.25918081681785 0.40257159246468 NuSVR 's R2 : 0.14418144713660686 ----------------------------------------------- 0.14418144713660686 0.40257159246468 OrthogonalMatchingPursuit 's R2 : 0.30876981915826296 ----------------------------------------------- 0.30876981915826296 0.40257159246468 OrthogonalMatchingPursuitCV 's R2 : 0.34286930368891755 ----------------------------------------------- 0.34286930368891755 0.40257159246468 PLSCanonical 's R2 : -2.0847428167722253 ----------------------------------------------- -2.0847428167722253 0.40257159246468 PLSRegression 's R2 : 0.33865842534282786 ----------------------------------------------- 0.33865842534282786 0.40257159246468 PassiveAggressiveRegressor 's R2 : 0.30424743439568547 ----------------------------------------------- 0.30424743439568547 0.40257159246468 PoissonRegressor 's R2 : 0.39452724195999944 ----------------------------------------------- 0.39452724195999944 0.40257159246468 RANSACRegressor 's R2 : 0.07825356015037255 ----------------------------------------------- 0.07825356015037255 0.40257159246468 RandomForestRegressor 's R2 : 0.3917103268004255 ----------------------------------------------- 0.3917103268004255 0.40257159246468 Ridge 's R2 : 0.3564366035510431 ----------------------------------------------- 0.3564366035510431 0.40257159246468 RidgeCV 's R2 : 0.3494465130272413 ----------------------------------------------- 0.3494465130272413 0.40257159246468 SGDRegressor 's R2 : 0.37452975226115104 ----------------------------------------------- 0.37452975226115104 0.40257159246468 SVR 's R2 : 0.13622570343080198 ----------------------------------------------- 0.13622570343080198 0.40257159246468 TheilSenRegressor 's R2 : 0.36489787952814867 ----------------------------------------------- 0.36489787952814867 0.40257159246468 TransformedTargetRegressor 's R2 : 0.34837209873152 ----------------------------------------------- 0.34837209873152 0.40257159246468 TweedieRegressor 's R2 : 0.36243293782488917 ----------------------------------------------- 0.36243293782488917 0.40257159246468 ARDRegression 's R2 : 0.3750102764327631 ----------------------------------------------- 0.3750102764327631 0.40257159246468 AdaBoostRegressor 's R2 : 0.37264064070142944 ----------------------------------------------- 0.37264064070142944 0.40257159246468 BaggingRegressor 's R2 : 0.31819177735841186 ----------------------------------------------- 0.31819177735841186 0.40257159246468 BayesianRidge 's R2 : 0.3752167808635509 ----------------------------------------------- 0.3752167808635509 0.40257159246468 CCA 's R2 : 0.3645458498695415 ----------------------------------------------- 0.3645458498695415 0.40257159246468 DecisionTreeRegressor 's R2 : -0.5959577953403712 ----------------------------------------------- -0.5959577953403712 0.40257159246468 DummyRegressor 's R2 : -0.00123645803999195 ----------------------------------------------- -0.00123645803999195 0.40257159246468 ElasticNet 's R2 : 0.12758535659494175 ----------------------------------------------- 0.12758535659494175 0.40257159246468 ElasticNetCV 's R2 : 0.3871581150993674 ----------------------------------------------- 0.3871581150993674 0.40257159246468 ExtraTreeRegressor 's R2 : -0.3061839524050296 ----------------------------------------------- -0.3061839524050296 0.40257159246468 ExtraTreesRegressor 's R2 : 0.388669846985901 ----------------------------------------------- 0.388669846985901 0.40257159246468 GammaRegressor 's R2 : 0.0808616210477957 ----------------------------------------------- 0.0808616210477957 0.40257159246468 GaussianProcessRegressor 's R2 : -18.53318219435217 ----------------------------------------------- -18.53318219435217 0.40257159246468 GradientBoostingRegressor 's R2 : 0.32579797431094626 ----------------------------------------------- 0.32579797431094626 0.40257159246468 HistGradientBoostingRegressor 's R2 : 0.27523810100653423 ----------------------------------------------- 0.27523810100653423 0.40257159246468 HuberRegressor 's R2 : 0.34367339121978224 ----------------------------------------------- 0.34367339121978224 0.40257159246468 KNeighborsRegressor 's R2 : 0.24594913899940074 ----------------------------------------------- 0.24594913899940074 0.40257159246468 KernelRidge 's R2 : 0.3628155857813502 ----------------------------------------------- 0.3628155857813502 0.40257159246468 Lars 's R2 : 0.3483720987315203 ----------------------------------------------- 0.3483720987315203 0.40257159246468 LarsCV 's R2 : 0.40257159246468 ----------------------------------------------- 0.40257159246468 0.40257159246468 Lasso 's R2 : 0.416032632027838 ----------------------------------------------- 0.416032632027838 0.416032632027838 LassoCV 's R2 : 0.3540874239008369 ----------------------------------------------- 0.3540874239008369 0.416032632027838 LassoLars 's R2 : 0.3538289050903437 ----------------------------------------------- 0.3538289050903437 0.416032632027838 LassoLarsCV 's R2 : 0.3531078796996586 ----------------------------------------------- 0.3531078796996586 0.416032632027838 LassoLarsIC 's R2 : 0.39914059916427813 ----------------------------------------------- 0.39914059916427813 0.416032632027838 LinearRegression 's R2 : 0.3483720987315203 ----------------------------------------------- 0.3483720987315203 0.416032632027838 LinearSVR 's R2 : 0.18218041356580628 ----------------------------------------------- 0.18218041356580628 0.416032632027838 MLPRegressor 's R2 : -0.7391046842747242 ----------------------------------------------- -0.7391046842747242 0.416032632027838 NuSVR 's R2 : 0.1330302415587895 ----------------------------------------------- 0.1330302415587895 0.416032632027838 OrthogonalMatchingPursuit 's R2 : 0.30876981915826285 ----------------------------------------------- 0.30876981915826285 0.416032632027838 OrthogonalMatchingPursuitCV 's R2 : 0.3428693036889173 ----------------------------------------------- 0.3428693036889173 0.416032632027838 PLSCanonical 's R2 : -2.0847428167722244 ----------------------------------------------- -2.0847428167722244 0.416032632027838 PLSRegression 's R2 : 0.33865842534282786 ----------------------------------------------- 0.33865842534282786 0.416032632027838 PassiveAggressiveRegressor 's R2 : 0.33870828293724464 ----------------------------------------------- 0.33870828293724464 0.416032632027838 PoissonRegressor 's R2 : 0.3892955288595463 ----------------------------------------------- 0.3892955288595463 0.416032632027838 RANSACRegressor 's R2 : 0.011896314581101852 ----------------------------------------------- 0.011896314581101852 0.416032632027838 RadiusNeighborsRegressor 's R2 : 0.1656451755381757 ----------------------------------------------- 0.1656451755381757 0.416032632027838 RandomForestRegressor 's R2 : 0.35712332450451356 ----------------------------------------------- 0.35712332450451356 0.416032632027838 Ridge 's R2 : 0.3795714326643239 ----------------------------------------------- 0.3795714326643239 0.416032632027838 RidgeCV 's R2 : 0.36502947676996134 ----------------------------------------------- 0.36502947676996134 0.416032632027838 SGDRegressor 's R2 : 0.3790051512357022 ----------------------------------------------- 0.3790051512357022 0.416032632027838 SVR 's R2 : 0.13059068934603013 ----------------------------------------------- 0.13059068934603013 0.416032632027838 TheilSenRegressor 's R2 : 0.3668935873138036 ----------------------------------------------- 0.3668935873138036 0.416032632027838 TransformedTargetRegressor 's R2 : 0.3483720987315203 ----------------------------------------------- 0.3483720987315203 0.416032632027838 TweedieRegressor 's R2 : 0.07959167686934143 ----------------------------------------------- 0.07959167686934143 0.416032632027838 Best R2 & model & scaler: 0.416032632027838 & Lasso & MinMaxScaler()
print("Best R2 & model & scaler: ",final_r2," & ",final_r2_model, "&",scaler)
Best R2 & model & scaler: 0.416032632027838 & Lasso & MinMaxScaler()
sklearn_iris (0) | 2021.07.21 |
---|---|
sklearn_wine (0) | 2021.07.15 |
사이킷런 (scikit-learn,sklearn, diabetes) (0) | 2021.07.12 |
Sklearn 함수 (0) | 2021.06.28 |
댓글 영역