sklearn_iris

scikit-learn(sklearn)

by DeepLearning Engineer 2021. 7. 21. 23:38

SMALL

sklearn_iris

사이킷런(scikit-learn / sklearn) - iris datasets¶

이번에는 iris데이터셋과 좀더 다양한 scaler를 이용해서 분석해 보겠습니다.

In [25]:

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split,StratifiedKFold,cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler,MinMaxScaler,MaxAbsScaler,RobustScaler,QuantileTransformer,PowerTransformer
from sklearn.utils import all_estimators
import warnings
warnings.filterwarnings('ignore')

In [26]:

datasets = load_iris()
x = datasets.data
y = datasets.target.reshape(-1,1)
print(x.shape,y.shape)

(150, 4) (150, 1)

iris dataset은 4개의 컬럼과 3개의 카테고리로 이루어져 있습니다.

In [27]:

print(datasets.feature_names)
print(datasets.DESCR)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

    ============== ==== ==== ======= ===== ====================
                    Min  Max   Mean    SD   Class Correlation
    ============== ==== ==== ======= ===== ====================
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)
    ============== ==== ==== ======= ===== ====================

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fisher's paper. Note that it's the same as in R, but not as in the UCI
Machine Learning Repository, which has two wrong data points.

This is perhaps the best known database to be found in the
pattern recognition literature.  Fisher's paper is a classic in the field and
is referenced frequently to this day.  (See Duda & Hart, for example.)  The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant.  One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.

.. topic:: References

   - Fisher, R.A. "The use of multiple measurements in taxonomic problems"
     Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
     Mathematical Statistics" (John Wiley, NY, 1950).
   - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
     (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
   - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
     Structure and Classification Rule for Recognition in Partially Exposed
     Environments".  IEEE Transactions on Pattern Analysis and Machine
     Intelligence, Vol. PAMI-2, No. 1, 67-71.
   - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
     on Information Theory, May 1972, 431-433.
   - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
     conceptual clustering system finds 3 classes in the data.
   - Many, many more ...

In [28]:

columns = datasets.feature_names
columns.append("Target")

data = np.concatenate([x,y],axis=1)
dataframe = pd.DataFrame(data,columns = columns)
dataframe

Out[28]:

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	Target
0	5.1	3.5	1.4	0.2	0.0
1	4.9	3.0	1.4	0.2	0.0
2	4.7	3.2	1.3	0.2	0.0
3	4.6	3.1	1.5	0.2	0.0
4	5.0	3.6	1.4	0.2	0.0
...	...	...	...	...	...
145	6.7	3.0	5.2	2.3	2.0
146	6.3	2.5	5.0	1.9	2.0
147	6.5	3.0	5.2	2.0	2.0
148	6.2	3.4	5.4	2.3	2.0
149	5.9	3.0	5.1	1.8	2.0

150 rows × 5 columns

In [29]:

datasets = dataframe.values

x = datasets[:,:-1]
y = datasets[:,-1]
print(x.shape,y.shape)

(150, 4) (150,)

In [30]:

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)

In [31]:

kfold = StratifiedKFold(n_splits=5,shuffle=True)

In [36]:

scaler_list = [StandardScaler(),MinMaxScaler(),MaxAbsScaler(),RobustScaler(),QuantileTransformer(),PowerTransformer()]
all_Algorithm = all_estimators(type_filter = 'classifier')
best_acc_score=[]

for scaler in scaler_list:
    scaler.fit(x_train)
    x_train = scaler.transform(x_train)
    x_test = scaler.transform(x_test)
    for (name,algorithm) in all_Algorithm:
        try:
            score = cross_val_score(algorithm(),x_train,y_train,cv=kfold)
            print("Model : ",name,"\n Mean Score : ",score.mean(),"\n")

            acc_score = best_acc_score.append((name,score.mean()))
        except:
            continue

print("Best Model")
print(max(best_acc_score,key=lambda x:x[1]))

Model :  AdaBoostClassifier 
 Mean Score :  0.9666666666666668 

Model :  BaggingClassifier 
 Mean Score :  0.9333333333333332 

Model :  BernoulliNB 
 Mean Score :  0.7416666666666666 

Model :  CalibratedClassifierCV 
 Mean Score :  0.925 

Model :  CategoricalNB 
 Mean Score :  nan 

Model :  ComplementNB 
 Mean Score :  nan 

Model :  DecisionTreeClassifier 
 Mean Score :  0.9583333333333334 

Model :  DummyClassifier 
 Mean Score :  0.325 

Model :  ExtraTreeClassifier 
 Mean Score :  0.9416666666666668 

Model :  ExtraTreesClassifier 
 Mean Score :  0.9666666666666668 

Model :  GaussianNB 
 Mean Score :  0.9583333333333334 

Model :  GaussianProcessClassifier 
 Mean Score :  0.9416666666666668 

Model :  GradientBoostingClassifier 
 Mean Score :  0.95 

Model :  HistGradientBoostingClassifier 
 Mean Score :  0.9333333333333332 

Model :  KNeighborsClassifier 
 Mean Score :  0.9583333333333334 

Model :  LabelPropagation 
 Mean Score :  0.925 

Model :  LabelSpreading 
 Mean Score :  0.9333333333333333 

Model :  LinearDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  LinearSVC 
 Mean Score :  0.95 

Model :  LogisticRegression 
 Mean Score :  0.9666666666666668 

Model :  LogisticRegressionCV 
 Mean Score :  0.975 

Model :  MLPClassifier 
 Mean Score :  0.9666666666666666 

Model :  MultinomialNB 
 Mean Score :  nan 

Model :  NearestCentroid 
 Mean Score :  0.9083333333333334 

Model :  NuSVC 
 Mean Score :  0.9666666666666668 

Model :  PassiveAggressiveClassifier 
 Mean Score :  0.8583333333333332 

Model :  Perceptron 
 Mean Score :  0.8916666666666666 

Model :  QuadraticDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  RandomForestClassifier 
 Mean Score :  0.9666666666666668 

Model :  RidgeClassifier 
 Mean Score :  0.8916666666666668 

Model :  RidgeClassifierCV 
 Mean Score :  0.8833333333333334 

Model :  SGDClassifier 
 Mean Score :  0.9083333333333334 

Model :  SVC 
 Mean Score :  0.9666666666666666 

Model :  AdaBoostClassifier 
 Mean Score :  0.9666666666666666 

Model :  BaggingClassifier 
 Mean Score :  0.9666666666666668 

Model :  BernoulliNB 
 Mean Score :  0.3583333333333333 

Model :  CalibratedClassifierCV 
 Mean Score :  0.925 

Model :  ComplementNB 
 Mean Score :  0.6249999999999999 

Model :  DecisionTreeClassifier 
 Mean Score :  0.9583333333333334 

Model :  DummyClassifier 
 Mean Score :  0.4166666666666667 

Model :  ExtraTreeClassifier 
 Mean Score :  0.8916666666666666 

Model :  ExtraTreesClassifier 
 Mean Score :  0.95 

Model :  GaussianNB 
 Mean Score :  0.9583333333333334 

Model :  GaussianProcessClassifier 
 Mean Score :  0.9166666666666666 

Model :  GradientBoostingClassifier 
 Mean Score :  0.9583333333333334 

Model :  HistGradientBoostingClassifier 
 Mean Score :  0.9583333333333334 

Model :  KNeighborsClassifier 
 Mean Score :  0.9333333333333333 

Model :  LabelPropagation 
 Mean Score :  0.9583333333333334 

Model :  LabelSpreading 
 Mean Score :  0.9583333333333334 

Model :  LinearDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  LinearSVC 
 Mean Score :  0.9333333333333333 

Model :  LogisticRegression 
 Mean Score :  0.9166666666666667 

Model :  LogisticRegressionCV 
 Mean Score :  0.9666666666666668 

Model :  MLPClassifier 
 Mean Score :  0.95 

Model :  MultinomialNB 
 Mean Score :  0.6916666666666667 

Model :  NearestCentroid 
 Mean Score :  0.9083333333333334 

Model :  NuSVC 
 Mean Score :  0.9666666666666668 

Model :  PassiveAggressiveClassifier 
 Mean Score :  0.925 

Model :  Perceptron 
 Mean Score :  0.8 

Model :  QuadraticDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  RadiusNeighborsClassifier 
 Mean Score :  0.8 

Model :  RandomForestClassifier 
 Mean Score :  0.9666666666666668 

Model :  RidgeClassifier 
 Mean Score :  0.8666666666666668 

Model :  RidgeClassifierCV 
 Mean Score :  0.8666666666666668 

Model :  SGDClassifier 
 Mean Score :  0.9166666666666666 

Model :  SVC 
 Mean Score :  0.9666666666666666 

Model :  AdaBoostClassifier 
 Mean Score :  0.95 

Model :  BaggingClassifier 
 Mean Score :  0.95 

Model :  BernoulliNB 
 Mean Score :  0.35833333333333334 

Model :  CalibratedClassifierCV 
 Mean Score :  0.8916666666666668 

Model :  ComplementNB 
 Mean Score :  0.6333333333333333 

Model :  DecisionTreeClassifier 
 Mean Score :  0.95 

Model :  DummyClassifier 
 Mean Score :  0.35 

Model :  ExtraTreeClassifier 
 Mean Score :  0.9 

Model :  ExtraTreesClassifier 
 Mean Score :  0.9583333333333334 

Model :  GaussianNB 
 Mean Score :  0.95 

Model :  GaussianProcessClassifier 
 Mean Score :  0.9166666666666667 

Model :  GradientBoostingClassifier 
 Mean Score :  0.95 

Model :  HistGradientBoostingClassifier 
 Mean Score :  0.95 

Model :  KNeighborsClassifier 
 Mean Score :  0.95 

Model :  LabelPropagation 
 Mean Score :  0.9333333333333332 

Model :  LabelSpreading 
 Mean Score :  0.9583333333333333 

Model :  LinearDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  LinearSVC 
 Mean Score :  0.9333333333333333 

Model :  LogisticRegression 
 Mean Score :  0.9166666666666667 

Model :  LogisticRegressionCV 
 Mean Score :  0.9666666666666666 

Model :  MLPClassifier 
 Mean Score :  0.95 

Model :  MultinomialNB 
 Mean Score :  0.675 

Model :  NearestCentroid 
 Mean Score :  0.9083333333333332 

Model :  NuSVC 
 Mean Score :  0.975 

Model :  PassiveAggressiveClassifier 
 Mean Score :  0.8833333333333334 

Model :  Perceptron 
 Mean Score :  0.825 

Model :  QuadraticDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  RadiusNeighborsClassifier 
 Mean Score :  0.8 

Model :  RandomForestClassifier 
 Mean Score :  0.9583333333333334 

Model :  RidgeClassifier 
 Mean Score :  0.85 

Model :  RidgeClassifierCV 
 Mean Score :  0.8833333333333332 

Model :  SGDClassifier 
 Mean Score :  0.9166666666666666 

Model :  SVC 
 Mean Score :  0.9583333333333334 

Model :  AdaBoostClassifier 
 Mean Score :  0.95 

Model :  BaggingClassifier 
 Mean Score :  0.9583333333333334 

Model :  BernoulliNB 
 Mean Score :  0.7 

Model :  CalibratedClassifierCV 
 Mean Score :  0.9333333333333333 

Model :  CategoricalNB 
 Mean Score :  nan 

Model :  ComplementNB 
 Mean Score :  nan 

Model :  DecisionTreeClassifier 
 Mean Score :  0.9666666666666668 

Model :  DummyClassifier 
 Mean Score :  0.3583333333333333 

Model :  ExtraTreeClassifier 
 Mean Score :  0.9416666666666668 

Model :  ExtraTreesClassifier 
 Mean Score :  0.95 

Model :  GaussianNB 
 Mean Score :  0.9583333333333333 

Model :  GaussianProcessClassifier 
 Mean Score :  0.9416666666666667 

Model :  GradientBoostingClassifier 
 Mean Score :  0.9583333333333334 

Model :  HistGradientBoostingClassifier 
 Mean Score :  0.9583333333333334 

Model :  KNeighborsClassifier 
 Mean Score :  0.9416666666666667 

Model :  LabelPropagation 
 Mean Score :  0.9333333333333333 

Model :  LabelSpreading 
 Mean Score :  0.95 

Model :  LinearDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  LinearSVC 
 Mean Score :  0.95 

Model :  LogisticRegression 
 Mean Score :  0.9416666666666668 

Model :  LogisticRegressionCV 
 Mean Score :  0.9666666666666666 

Model :  MLPClassifier 
 Mean Score :  0.9333333333333333 

Model :  MultinomialNB 
 Mean Score :  nan 

Model :  NearestCentroid 
 Mean Score :  0.875 

Model :  NuSVC 
 Mean Score :  0.9666666666666668 

Model :  PassiveAggressiveClassifier 
 Mean Score :  0.9416666666666668 

Model :  Perceptron 
 Mean Score :  0.9 

Model :  QuadraticDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  RadiusNeighborsClassifier 
 Mean Score :  0.8916666666666666 

Model :  RandomForestClassifier 
 Mean Score :  0.9416666666666668 

Model :  RidgeClassifier 
 Mean Score :  0.8833333333333332 

Model :  RidgeClassifierCV 
 Mean Score :  0.875 

Model :  SGDClassifier 
 Mean Score :  0.9333333333333333 

Model :  SVC 
 Mean Score :  0.9666666666666668 

Model :  AdaBoostClassifier 
 Mean Score :  0.9583333333333334 

Model :  BaggingClassifier 
 Mean Score :  0.9583333333333333 

Model :  BernoulliNB 
 Mean Score :  0.3583333333333333 

Model :  CalibratedClassifierCV 
 Mean Score :  0.9333333333333333 

Model :  ComplementNB 
 Mean Score :  0.625 

Model :  DecisionTreeClassifier 
 Mean Score :  0.9416666666666668 

Model :  DummyClassifier 
 Mean Score :  0.35 

Model :  ExtraTreeClassifier 
 Mean Score :  0.9333333333333333 

Model :  ExtraTreesClassifier 
 Mean Score :  0.95 

Model :  GaussianNB 
 Mean Score :  0.9666666666666666 

Model :  GaussianProcessClassifier 
 Mean Score :  0.925 

Model :  GradientBoostingClassifier 
 Mean Score :  0.95 

Model :  HistGradientBoostingClassifier 
 Mean Score :  0.95 

Model :  KNeighborsClassifier 
 Mean Score :  0.9416666666666668 

Model :  LabelPropagation 
 Mean Score :  0.95 

Model :  LabelSpreading 
 Mean Score :  0.95 

Model :  LinearDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  LinearSVC 
 Mean Score :  0.9416666666666667 

Model :  LogisticRegression 
 Mean Score :  0.9166666666666666 

Model :  LogisticRegressionCV 
 Mean Score :  0.9666666666666668 

Model :  MLPClassifier 
 Mean Score :  0.9333333333333333 

Model :  MultinomialNB 
 Mean Score :  0.7083333333333333 

Model :  NearestCentroid 
 Mean Score :  0.9166666666666667 

Model :  NuSVC 
 Mean Score :  0.9583333333333333 

Model :  PassiveAggressiveClassifier 
 Mean Score :  0.9083333333333334 

Model :  Perceptron 
 Mean Score :  0.8083333333333332 

Model :  QuadraticDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  RadiusNeighborsClassifier 
 Mean Score :  0.7750000000000001 

Model :  RandomForestClassifier 
 Mean Score :  0.9583333333333334 

Model :  RidgeClassifier 
 Mean Score :  0.8666666666666668 

Model :  RidgeClassifierCV 
 Mean Score :  0.8916666666666668 

Model :  SGDClassifier 
 Mean Score :  0.8666666666666666 

Model :  SVC 
 Mean Score :  0.9666666666666666 

Model :  AdaBoostClassifier 
 Mean Score :  0.95 

Model :  BaggingClassifier 
 Mean Score :  0.9666666666666668 

Model :  BernoulliNB 
 Mean Score :  0.7416666666666666 

Model :  CalibratedClassifierCV 
 Mean Score :  0.9166666666666667 

Model :  CategoricalNB 
 Mean Score :  nan 

Model :  ComplementNB 
 Mean Score :  nan 

Model :  DecisionTreeClassifier 
 Mean Score :  0.9333333333333333 

Model :  DummyClassifier 
 Mean Score :  0.3416666666666667 

Model :  ExtraTreeClassifier 
 Mean Score :  0.9166666666666666 

Model :  ExtraTreesClassifier 
 Mean Score :  0.9583333333333334 

Model :  GaussianNB 
 Mean Score :  0.95 

Model :  GaussianProcessClassifier 
 Mean Score :  0.9416666666666667 

Model :  GradientBoostingClassifier 
 Mean Score :  0.925 

Model :  HistGradientBoostingClassifier 
 Mean Score :  0.9666666666666668 

Model :  KNeighborsClassifier 
 Mean Score :  0.9416666666666667 

Model :  LabelPropagation 
 Mean Score :  0.9333333333333333 

Model :  LabelSpreading 
 Mean Score :  0.9333333333333332 

Model :  LinearDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  LinearSVC 
 Mean Score :  0.9333333333333332 

Model :  LogisticRegression 
 Mean Score :  0.975 

Model :  LogisticRegressionCV 
 Mean Score :  0.9666666666666668 

Model :  MLPClassifier 
 Mean Score :  0.9666666666666666 

Model :  MultinomialNB 
 Mean Score :  nan 

Model :  NearestCentroid 
 Mean Score :  0.9083333333333332 

Model :  NuSVC 
 Mean Score :  0.95 

Model :  PassiveAggressiveClassifier 
 Mean Score :  0.8833333333333334 

Model :  Perceptron 
 Mean Score :  0.85 

Model :  QuadraticDiscriminantAnalysis 
 Mean Score :  0.975 

Model :  RandomForestClassifier 
 Mean Score :  0.9583333333333334 

Model :  RidgeClassifier 
 Mean Score :  0.8833333333333334 

Model :  RidgeClassifierCV 
 Mean Score :  0.8833333333333334 

Model :  SGDClassifier 
 Mean Score :  0.9416666666666667 

Model :  SVC 
 Mean Score :  0.9583333333333334 

Best Model
('LinearDiscriminantAnalysis', 0.975)

In [ ]:

LIST

'scikit-learn(sklearn)' 카테고리의 다른 글

sklearn_wine (0)	2021.07.15
사이킷런 (scikit-learn,sklearn, diabetes) (0)	2021.07.12
사이킷런 sklearn - boston house price (1)	2021.07.06
Sklearn 함수 (0)	2021.06.28

파이썬 딥러닝

고정 헤더 영역

메뉴 레이어

메뉴 리스트

검색 레이어

검색 영역

상세 컨텐츠

본문 제목

본문

사이킷런(scikit-learn / sklearn) - iris datasets¶

'scikit-learn(sklearn)' 카테고리의 다른 글

관련글 더보기

댓글 영역

추가 정보

티스토리툴바

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	Target
0	5.1	3.5	1.4	0.2	0.0
1	4.9	3.0	1.4	0.2	0.0
2	4.7	3.2	1.3	0.2	0.0
3	4.6	3.1	1.5	0.2	0.0
4	5.0	3.6	1.4	0.2	0.0
...	...	...	...	...	...
145	6.7	3.0	5.2	2.3	2.0
146	6.3	2.5	5.0	1.9	2.0
147	6.5	3.0	5.2	2.0	2.0
148	6.2	3.4	5.4	2.3	2.0
149	5.9	3.0	5.1	1.8	2.0

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	Target
0	5.1	3.5	1.4	0.2	0.0
1	4.9	3.0	1.4	0.2	0.0
2	4.7	3.2	1.3	0.2	0.0
3	4.6	3.1	1.5	0.2	0.0
4	5.0	3.6	1.4	0.2	0.0
...	...	...	...	...	...
145	6.7	3.0	5.2	2.3	2.0
146	6.3	2.5	5.0	1.9	2.0
147	6.5	3.0	5.2	2.0	2.0
148	6.2	3.4	5.4	2.3	2.0
149	5.9	3.0	5.1	1.8	2.0

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	Target
0	5.1	3.5	1.4	0.2	0.0
1	4.9	3.0	1.4	0.2	0.0
2	4.7	3.2	1.3	0.2	0.0
3	4.6	3.1	1.5	0.2	0.0
4	5.0	3.6	1.4	0.2	0.0
...	...	...	...	...	...
145	6.7	3.0	5.2	2.3	2.0
146	6.3	2.5	5.0	1.9	2.0
147	6.5	3.0	5.2	2.0	2.0
148	6.2	3.4	5.4	2.3	2.0
149	5.9	3.0	5.1	1.8	2.0