Basic end-to-end training of a LightGBM modelΒΆ

Features illustrated in this kernel:

  • Data reading with memory footprint reduction
  • A bit of feature engineering adding estimated credit length, which boosts AUC ROC by 0.015 on PLB and by 0.035 in local CV
  • Categorical feature encoding using one-hot-encoding (OHE)
  • Internal category weighting by LightGBM was tuned and no need of resampling is shown
  • Gradient-boosted decision trees using LightGBM package
  • Early stopping in LightGBM model training to avoid overtraining
  • Learning rate decay in LightGBM model training to improve convergence to the minimum
  • Hyperparameter optimisation of the model using random search in cross validation
  • Submission preparation
In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 
plt.xkcd()

# List files in the input directory
import os
PATH = "./input/"
print(os.listdir(PATH))
['application_test.csv', 'application_train.csv', 'bureau.csv', 'bureau_balance.csv', 'credit_card_balance.csv', 'HomeCredit_columns_description.csv', 'installments_payments.csv', 'POS_CASH_balance.csv', 'previous_application.csv', 'sample_submission.csv']

Read in the data reducing memory pattern for variables.ΒΆ

The implementation was copied over from this kernel

  1. Iterate over every column
  2. Determine if the column is numeric
  3. Determine if the column can be represented by an integer
  4. Find the min and the max value
  5. Determine and apply the smallest datatype that can fit the range of values
In [2]:
def reduce_mem_usage(df):
    """ iterate through all the columns of a dataframe and modify the data type
        to reduce memory usage.        
    """
    start_mem = df.memory_usage().sum() / 1024**2
    print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))
    
    for col in df.columns:
        col_type = df[col].dtype
        
        if col_type != object:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)
        else:
            df[col] = df[col].astype('category')

    end_mem = df.memory_usage().sum() / 1024**2
    print('Memory usage after optimization is: {:.2f} MB'.format(end_mem))
    print('Decreased by {:.1f}%'.format(100 * (start_mem - end_mem) / start_mem))
    
    return df


def import_data(file):
    """create a dataframe and optimize its memory usage"""
    df = pd.read_csv(file, parse_dates=True, keep_date_col=True)
    df = reduce_mem_usage(df)
    return df
In [3]:
application_train = import_data(PATH + 'application_train.csv')
application_test = import_data(PATH + 'application_test.csv')
Memory usage of dataframe is 286.23 MB
Memory usage after optimization is: 59.54 MB
Decreased by 79.2%
Memory usage of dataframe is 45.00 MB
Memory usage after optimization is: 9.40 MB
Decreased by 79.1%

The following 2 cells with cleaning criteria were inherited from this kernel

In [4]:
application_train = application_train[application_train['AMT_INCOME_TOTAL'] != 1.170000e+08]
application_train = application_train[application_train['AMT_REQ_CREDIT_BUREAU_QRT'] != 261]
application_train = application_train[application_train['OBS_30_CNT_SOCIAL_CIRCLE'] < 300]
In [5]:
application_train['DAYS_EMPLOYED'] = (application_train['DAYS_EMPLOYED'].apply(lambda x: x if x != 365243 else np.nan))

Additional numerical featuresΒΆ

The credit length feature idea comes from the following kernel

In [6]:
def feat_ext_source(df):
    x1 = df['EXT_SOURCE_1'].fillna(-1) + 1e-1
    x2 = df['EXT_SOURCE_2'].fillna(-1) + 1e-1
    x3 = df['EXT_SOURCE_3'].fillna(-1) + 1e-1
    
    df['EXT_SOURCE_1over2_NAminus1_Add0.1'] = x1/x2
    df['EXT_SOURCE_2over1_NAminus1_Add0.1'] = x2/x1
    df['EXT_SOURCE_1over3_NAminus1_Add0.1'] = x1/x3
    df['EXT_SOURCE_3over1_NAminus1_Add0.1'] = x3/x1
    df['EXT_SOURCE_2over3_NAminus1_Add0.1'] = x2/x3
    df['EXT_SOURCE_3over2_NAminus1_Add0.1'] = x3/x2
    
    df['EXT_SOURCE_na1_2'] = (df['EXT_SOURCE_1'].isnull()) * (df['EXT_SOURCE_2'].fillna(0))
    df['EXT_SOURCE_na1_3'] = (df['EXT_SOURCE_1'].isnull()) * (df['EXT_SOURCE_3'].fillna(0))
    df['EXT_SOURCE_na2_1'] = (df['EXT_SOURCE_2'].isnull()) * (df['EXT_SOURCE_1'].fillna(0))
    df['EXT_SOURCE_na2_3'] = (df['EXT_SOURCE_2'].isnull()) * (df['EXT_SOURCE_3'].fillna(0))
    df['EXT_SOURCE_na3_1'] = (df['EXT_SOURCE_3'].isnull()) * (df['EXT_SOURCE_1'].fillna(0))
    df['EXT_SOURCE_na3_2'] = (df['EXT_SOURCE_3'].isnull()) * (df['EXT_SOURCE_2'].fillna(0))
    
    df['CREDIT_LENGTH'] = df['AMT_CREDIT'] / df['AMT_ANNUITY']
    
    return df
In [7]:
application_train = feat_ext_source(application_train)
application_test  = feat_ext_source(application_test)

Categorical encodingΒΆ

The function was taken from this kernel. It allows to do OneHotEncoding (OHE) keeping only those columns that are common to train and test samples. OHE is performed using pd.get_dummies, which allows to convert categorical features, while keeping numerical untouched

In [8]:
# use this if you want to convert categorical features to dummies(default)
def cat_to_dummy(train, test):
    train_d = pd.get_dummies(train, drop_first=False)
    test_d = pd.get_dummies(test, drop_first=False)
    # make sure that the number of features in train and test should be same
    for i in train_d.columns:
        if i not in test_d.columns:
            if i!='TARGET':
                train_d = train_d.drop(i, axis=1)
    for j in test_d.columns:
        if j not in train_d.columns:
            if j!='TARGET':
                test_d = test_d.drop(i, axis=1)
    print('Memory usage of train increases from {:.2f} to {:.2f} MB'.format(train.memory_usage().sum() / 1024**2, 
                                                                            train_d.memory_usage().sum() / 1024**2))
    print('Memory usage of test increases from {:.2f} to {:.2f} MB'.format(test.memory_usage().sum() / 1024**2, 
                                                                            test_d.memory_usage().sum() / 1024**2))
    return train_d, test_d
In [9]:
application_train_ohe, application_test_ohe = cat_to_dummy(application_train, application_test)
Memory usage of train increases from 71.03 to 106.39 MB
Memory usage of test increases from 10.70 to 16.32 MB
In [10]:
# use this if you want to convert categorical features to dummies(default)
def cat_to_int(train, test):
    mem_orig_train = train.memory_usage().sum() / 1024**2
    mem_orig_test  = test .memory_usage().sum() / 1024**2
    categorical_feats = [ f for f in train.columns if train[f].dtype == 'object' or train[f].dtype.name == 'category' ]
    print('---------------------')
    print(categorical_feats)
    for f_ in categorical_feats:
        train[f_], indexer = pd.factorize(train[f_])
        test[f_] = indexer.get_indexer(test[f_])
    print('Memory usage of train increases from {:.2f} to {:.2f} MB'.format(mem_orig_train, 
                                                                            train.memory_usage().sum() / 1024**2))
    print('Memory usage of test increases from {:.2f} to {:.2f} MB'.format(mem_orig_test, 
                                                                            test.memory_usage().sum() / 1024**2))
    return categorical_feats, train, test
In [11]:
#categorical_feats, application_train_ohe, application_test_ohe = cat_to_int(application_train, application_test)

Use this instead if you want to make use of the internal categorical feature treatment in lightgbm.

In [12]:
#application_train_ohe, application_test_ohe = (application_train, application_test)

Deal with category imbalanceΒΆ

Use a standard library (imblearn) to do random undersampling on the dominating category. Use it if you want to repeat the HP optimisation

In [15]:
from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler(random_state=314)
X_rus, y_rus = rus.fit_sample(application_train_ohe.drop(['SK_ID_CURR', 'TARGET'], axis=1).fillna(-1), 
                              application_train_ohe['TARGET'])

# You can use the full sample and do sample weighting in lightgbm using `is_unbalance` OR `scale_pos_weight` argument
# But it makes the code to run 8x..10x slower, which is ok for the run with pre-optimised parameters 
# but is too slow for HP optimisation
#X_rus, y_rus = (application_train_ohe.drop(['SK_ID_CURR', 'TARGET'], axis=1),
#                application_train_ohe['TARGET'])

Model fitting with HyperParameter optimisationΒΆ

We will use LightGBM classifier - LightGBM allows to build very sophysticated models with a very short training time.

Split the full sample into train/test (80/20)ΒΆ

In [16]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_rus, y_rus, test_size=0.20, random_state=314, stratify=y_rus)

Prepare learning rate shrinkageΒΆ

In [17]:
def learning_rate_010_decay_power_099(current_iter):
    base_learning_rate = 0.1
    lr = base_learning_rate  * np.power(.99, current_iter)
    return lr if lr > 1e-3 else 1e-3

def learning_rate_010_decay_power_0995(current_iter):
    base_learning_rate = 0.1
    lr = base_learning_rate  * np.power(.995, current_iter)
    return lr if lr > 1e-3 else 1e-3

def learning_rate_005_decay_power_099(current_iter):
    base_learning_rate = 0.05
    lr = base_learning_rate  * np.power(.99, current_iter)
    return lr if lr > 1e-3 else 1e-3

Use test subset for early stopping criterionΒΆ

This allows us to avoid overtraining and we do not need to optimise the number of trees

In [18]:
import lightgbm as lgb
fit_params={"early_stopping_rounds":30, 
            "eval_metric" : 'auc', 
            "eval_set" : [(X_test,y_test)],
            'eval_names': ['valid'],
            #'callbacks': [lgb.reset_parameter(learning_rate=learning_rate_010_decay_power_099)],
            'verbose': 100,
            'categorical_feature': 'auto'}

We use random search, which is more flexible and more efficient than a grid search

In [19]:
from scipy.stats import randint as sp_randint
from scipy.stats import uniform as sp_uniform
param_test ={'num_leaves': sp_randint(6, 50), 
             'min_child_samples': sp_randint(100, 500), 
             'min_child_weight': [1e-5, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4],
             'subsample': sp_uniform(loc=0.2, scale=0.8), 
             'colsample_bytree': sp_uniform(loc=0.4, scale=0.6),
             'reg_alpha': [0, 1e-1, 1, 2, 5, 7, 10, 50, 100],
             'reg_lambda': [0, 1e-1, 1, 5, 10, 20, 50, 100]}
In [20]:
#This parameter defines the number of HP points to be tested
n_HP_points_to_test = 100

import lightgbm as lgb
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

#n_estimators is set to a "large value". The actual number of trees build will depend on early stopping and 5000 define only the absolute maximum
clf = lgb.LGBMClassifier(max_depth=-1, random_state=314, silent=True, metric='None', n_jobs=4, n_estimators=5000)
gs = RandomizedSearchCV(
    estimator=clf, param_distributions=param_test, 
    n_iter=n_HP_points_to_test,
    scoring='roc_auc',
    cv=3,
    refit=True,
    random_state=314,
    verbose=True)

Run the followning cell, to do HP optimisation. To save time opt_parameters was directly hardcoded below if needed.

In [21]:
gs.fit(X_train, y_train, **fit_params)
print('Best score reached: {} with params: {} '.format(gs.best_score_, gs.best_params_))
Fitting 3 folds for each of 100 candidates, totalling 300 fits
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757138
Early stopping, best iteration is:
[120]	valid's auc: 0.757999
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754595
Early stopping, best iteration is:
[149]	valid's auc: 0.755502
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756748
Early stopping, best iteration is:
[157]	valid's auc: 0.758432
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757663
Early stopping, best iteration is:
[163]	valid's auc: 0.759444
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.753874
Early stopping, best iteration is:
[157]	valid's auc: 0.755239
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756783
Early stopping, best iteration is:
[138]	valid's auc: 0.758076
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742044
[200]	valid's auc: 0.747009
[300]	valid's auc: 0.749073
[400]	valid's auc: 0.750411
[500]	valid's auc: 0.751259
[600]	valid's auc: 0.751877
[700]	valid's auc: 0.752233
[800]	valid's auc: 0.75245
Early stopping, best iteration is:
[788]	valid's auc: 0.752505
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.74267
[200]	valid's auc: 0.747561
[300]	valid's auc: 0.749881
[400]	valid's auc: 0.750775
Early stopping, best iteration is:
[396]	valid's auc: 0.750819
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.743095
[200]	valid's auc: 0.748496
[300]	valid's auc: 0.75048
[400]	valid's auc: 0.751769
[500]	valid's auc: 0.75253
Early stopping, best iteration is:
[531]	valid's auc: 0.752697
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.74757
Early stopping, best iteration is:
[115]	valid's auc: 0.748097
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.748491
Early stopping, best iteration is:
[115]	valid's auc: 0.749078
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.746299
Early stopping, best iteration is:
[122]	valid's auc: 0.747115
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756933
[200]	valid's auc: 0.75896
Early stopping, best iteration is:
[235]	valid's auc: 0.759572
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754594
Early stopping, best iteration is:
[128]	valid's auc: 0.755755
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756182
[200]	valid's auc: 0.757928
Early stopping, best iteration is:
[184]	valid's auc: 0.758455
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757326
Early stopping, best iteration is:
[96]	valid's auc: 0.757827
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754091
Early stopping, best iteration is:
[79]	valid's auc: 0.754576
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757317
Early stopping, best iteration is:
[79]	valid's auc: 0.757653
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758527
Early stopping, best iteration is:
[113]	valid's auc: 0.758794
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755713
Early stopping, best iteration is:
[96]	valid's auc: 0.756239
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757847
Early stopping, best iteration is:
[89]	valid's auc: 0.758154
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758539
[200]	valid's auc: 0.760285
Early stopping, best iteration is:
[197]	valid's auc: 0.760296
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754495
[200]	valid's auc: 0.757034
Early stopping, best iteration is:
[220]	valid's auc: 0.7574
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757082
[200]	valid's auc: 0.757999
Early stopping, best iteration is:
[206]	valid's auc: 0.758133
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758392
Early stopping, best iteration is:
[167]	valid's auc: 0.759365
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755529
Early stopping, best iteration is:
[121]	valid's auc: 0.756019
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758276
Early stopping, best iteration is:
[101]	valid's auc: 0.758284
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.7587
Early stopping, best iteration is:
[148]	valid's auc: 0.75927
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756075
Early stopping, best iteration is:
[107]	valid's auc: 0.756363
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758368
Early stopping, best iteration is:
[138]	valid's auc: 0.758848
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758171
Early stopping, best iteration is:
[101]	valid's auc: 0.758373
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.753987
Early stopping, best iteration is:
[97]	valid's auc: 0.754146
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757707
Early stopping, best iteration is:
[134]	valid's auc: 0.757951
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758833
Early stopping, best iteration is:
[136]	valid's auc: 0.759288
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755829
Early stopping, best iteration is:
[105]	valid's auc: 0.756195
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757875
Early stopping, best iteration is:
[92]	valid's auc: 0.758216
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.741829
[200]	valid's auc: 0.746463
[300]	valid's auc: 0.748696
[400]	valid's auc: 0.749826
[500]	valid's auc: 0.750573
[600]	valid's auc: 0.75141
[700]	valid's auc: 0.751842
Early stopping, best iteration is:
[719]	valid's auc: 0.751983
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742191
[200]	valid's auc: 0.747004
[300]	valid's auc: 0.749187
[400]	valid's auc: 0.750606
[500]	valid's auc: 0.751129
[600]	valid's auc: 0.751458
Early stopping, best iteration is:
[574]	valid's auc: 0.751557
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.741943
[200]	valid's auc: 0.746698
[300]	valid's auc: 0.749232
[400]	valid's auc: 0.750551
[500]	valid's auc: 0.751382
[600]	valid's auc: 0.752067
[700]	valid's auc: 0.752428
[800]	valid's auc: 0.752814
Early stopping, best iteration is:
[790]	valid's auc: 0.752824
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757429
Early stopping, best iteration is:
[153]	valid's auc: 0.759747
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754474
[200]	valid's auc: 0.755413
Early stopping, best iteration is:
[192]	valid's auc: 0.755746
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756087
Early stopping, best iteration is:
[168]	valid's auc: 0.757929
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75694
[200]	valid's auc: 0.760097
Early stopping, best iteration is:
[212]	valid's auc: 0.760157
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754102
[200]	valid's auc: 0.756527
Early stopping, best iteration is:
[218]	valid's auc: 0.756937
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755923
[200]	valid's auc: 0.759145
Early stopping, best iteration is:
[201]	valid's auc: 0.759194
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757989
Early stopping, best iteration is:
[123]	valid's auc: 0.758891
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756291
Early stopping, best iteration is:
[138]	valid's auc: 0.757441
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755895
Early stopping, best iteration is:
[147]	valid's auc: 0.757403
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752496
[200]	valid's auc: 0.755151
Early stopping, best iteration is:
[175]	valid's auc: 0.755166
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751356
Early stopping, best iteration is:
[167]	valid's auc: 0.75348
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751821
[200]	valid's auc: 0.754895
Early stopping, best iteration is:
[173]	valid's auc: 0.754896
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756733
Early stopping, best iteration is:
[92]	valid's auc: 0.757009
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754148
Early stopping, best iteration is:
[89]	valid's auc: 0.754864
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755014
Early stopping, best iteration is:
[93]	valid's auc: 0.755713
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.746815
Early stopping, best iteration is:
[114]	valid's auc: 0.747496
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.748437
Early stopping, best iteration is:
[116]	valid's auc: 0.749076
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.747322
Early stopping, best iteration is:
[119]	valid's auc: 0.748094
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756675
[200]	valid's auc: 0.759623
Early stopping, best iteration is:
[216]	valid's auc: 0.759754
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754657
[200]	valid's auc: 0.757696
[300]	valid's auc: 0.758927
Early stopping, best iteration is:
[322]	valid's auc: 0.759196
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755697
[200]	valid's auc: 0.757575
Early stopping, best iteration is:
[200]	valid's auc: 0.757575
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.747427
Early stopping, best iteration is:
[116]	valid's auc: 0.74801
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.747211
Early stopping, best iteration is:
[124]	valid's auc: 0.748202
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.745953
Early stopping, best iteration is:
[110]	valid's auc: 0.746443
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75595
[200]	valid's auc: 0.759554
Early stopping, best iteration is:
[245]	valid's auc: 0.760223
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755297
[200]	valid's auc: 0.757844
[300]	valid's auc: 0.75773
Early stopping, best iteration is:
[278]	valid's auc: 0.758039
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754707
[200]	valid's auc: 0.758289
Early stopping, best iteration is:
[241]	valid's auc: 0.759037
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756172
Early stopping, best iteration is:
[113]	valid's auc: 0.756772
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755156
Early stopping, best iteration is:
[120]	valid's auc: 0.755477
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757267
Early stopping, best iteration is:
[112]	valid's auc: 0.757587
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758637
Early stopping, best iteration is:
[149]	valid's auc: 0.759563
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757443
Early stopping, best iteration is:
[133]	valid's auc: 0.758095
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757672
Early stopping, best iteration is:
[151]	valid's auc: 0.758166
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757104
Early stopping, best iteration is:
[109]	valid's auc: 0.757514
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756686
Early stopping, best iteration is:
[93]	valid's auc: 0.756757
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755602
Early stopping, best iteration is:
[113]	valid's auc: 0.756338
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754873
[200]	valid's auc: 0.758475
[300]	valid's auc: 0.759207
Early stopping, best iteration is:
[293]	valid's auc: 0.759339
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754742
[200]	valid's auc: 0.757418
Early stopping, best iteration is:
[198]	valid's auc: 0.757462
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75416
[200]	valid's auc: 0.758344
[300]	valid's auc: 0.75918
Early stopping, best iteration is:
[294]	valid's auc: 0.759211
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752134
[200]	valid's auc: 0.755469
Early stopping, best iteration is:
[207]	valid's auc: 0.755479
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.750196
[200]	valid's auc: 0.75345
Early stopping, best iteration is:
[196]	valid's auc: 0.75345
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751373
[200]	valid's auc: 0.754923
Early stopping, best iteration is:
[188]	valid's auc: 0.754923
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.747889
Early stopping, best iteration is:
[109]	valid's auc: 0.748172
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.747926
Early stopping, best iteration is:
[120]	valid's auc: 0.748521
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.747343
Early stopping, best iteration is:
[115]	valid's auc: 0.747972
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758725
Early stopping, best iteration is:
[117]	valid's auc: 0.759062
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756493
Early stopping, best iteration is:
[92]	valid's auc: 0.756555
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75794
Early stopping, best iteration is:
[146]	valid's auc: 0.758546
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758334
Early stopping, best iteration is:
[116]	valid's auc: 0.759078
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754816
[200]	valid's auc: 0.755607
Early stopping, best iteration is:
[181]	valid's auc: 0.756003
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758228
Early stopping, best iteration is:
[150]	valid's auc: 0.759202
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752968
[200]	valid's auc: 0.755636
Early stopping, best iteration is:
[173]	valid's auc: 0.755636
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751703
[200]	valid's auc: 0.753628
Early stopping, best iteration is:
[171]	valid's auc: 0.753639
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751932
[200]	valid's auc: 0.754851
Early stopping, best iteration is:
[181]	valid's auc: 0.754853
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756928
Early stopping, best iteration is:
[127]	valid's auc: 0.757604
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754369
Early stopping, best iteration is:
[113]	valid's auc: 0.755058
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756029
Early stopping, best iteration is:
[145]	valid's auc: 0.757198
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758716
Early stopping, best iteration is:
[135]	valid's auc: 0.759593
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757415
Early stopping, best iteration is:
[141]	valid's auc: 0.758465
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.759522
Early stopping, best iteration is:
[136]	valid's auc: 0.760305
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742542
[200]	valid's auc: 0.747562
[300]	valid's auc: 0.749247
[400]	valid's auc: 0.750293
[500]	valid's auc: 0.75118
[600]	valid's auc: 0.751926
[700]	valid's auc: 0.752398
Early stopping, best iteration is:
[703]	valid's auc: 0.752402
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.743018
[200]	valid's auc: 0.747754
[300]	valid's auc: 0.749458
[400]	valid's auc: 0.750535
Early stopping, best iteration is:
[447]	valid's auc: 0.750819
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.743205
[200]	valid's auc: 0.748451
[300]	valid's auc: 0.750947
[400]	valid's auc: 0.752199
[500]	valid's auc: 0.752915
Early stopping, best iteration is:
[542]	valid's auc: 0.753133
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758757
Early stopping, best iteration is:
[150]	valid's auc: 0.759744
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756041
Early stopping, best iteration is:
[96]	valid's auc: 0.756484
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75617
Early stopping, best iteration is:
[152]	valid's auc: 0.756876
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757847
Early stopping, best iteration is:
[112]	valid's auc: 0.758445
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754627
Early stopping, best iteration is:
[115]	valid's auc: 0.754702
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757958
Early stopping, best iteration is:
[115]	valid's auc: 0.758037
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.748149
Early stopping, best iteration is:
[115]	valid's auc: 0.748714
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.748295
Early stopping, best iteration is:
[125]	valid's auc: 0.749082
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.745843
Early stopping, best iteration is:
[124]	valid's auc: 0.746708
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757467
Early stopping, best iteration is:
[104]	valid's auc: 0.757725
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756453
Early stopping, best iteration is:
[139]	valid's auc: 0.756868
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757289
Early stopping, best iteration is:
[99]	valid's auc: 0.75735
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757811
Early stopping, best iteration is:
[151]	valid's auc: 0.759446
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754717
[200]	valid's auc: 0.756211
Early stopping, best iteration is:
[206]	valid's auc: 0.756442
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756808
Early stopping, best iteration is:
[165]	valid's auc: 0.758513
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742647
[200]	valid's auc: 0.747294
[300]	valid's auc: 0.749118
[400]	valid's auc: 0.749966
[500]	valid's auc: 0.751028
[600]	valid's auc: 0.751766
[700]	valid's auc: 0.752315
Early stopping, best iteration is:
[691]	valid's auc: 0.752359
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742933
[200]	valid's auc: 0.747306
[300]	valid's auc: 0.749558
[400]	valid's auc: 0.750481
[500]	valid's auc: 0.751063
Early stopping, best iteration is:
[514]	valid's auc: 0.751109
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.743105
[200]	valid's auc: 0.747868
[300]	valid's auc: 0.750452
[400]	valid's auc: 0.751812
[500]	valid's auc: 0.752479
[600]	valid's auc: 0.752581
Early stopping, best iteration is:
[591]	valid's auc: 0.752659
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742597
[200]	valid's auc: 0.747795
[300]	valid's auc: 0.749498
[400]	valid's auc: 0.750629
[500]	valid's auc: 0.751528
[600]	valid's auc: 0.75225
[700]	valid's auc: 0.752707
Early stopping, best iteration is:
[750]	valid's auc: 0.75297
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742586
[200]	valid's auc: 0.747308
[300]	valid's auc: 0.749386
[400]	valid's auc: 0.750559
Early stopping, best iteration is:
[466]	valid's auc: 0.751311
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.74113
[200]	valid's auc: 0.746735
[300]	valid's auc: 0.749336
[400]	valid's auc: 0.750896
[500]	valid's auc: 0.751944
Early stopping, best iteration is:
[555]	valid's auc: 0.752268
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.7518
Early stopping, best iteration is:
[163]	valid's auc: 0.754968
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751678
[200]	valid's auc: 0.75446
Early stopping, best iteration is:
[187]	valid's auc: 0.75446
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751043
[200]	valid's auc: 0.754306
Early stopping, best iteration is:
[210]	valid's auc: 0.754307
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757225
Early stopping, best iteration is:
[117]	valid's auc: 0.75743
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754178
Early stopping, best iteration is:
[107]	valid's auc: 0.754514
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757813
Early stopping, best iteration is:
[104]	valid's auc: 0.757972
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742943
[200]	valid's auc: 0.747737
[300]	valid's auc: 0.749575
[400]	valid's auc: 0.750664
[500]	valid's auc: 0.751547
[600]	valid's auc: 0.752087
[700]	valid's auc: 0.752444
[800]	valid's auc: 0.752899
Early stopping, best iteration is:
[804]	valid's auc: 0.752943
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.743572
[200]	valid's auc: 0.747946
[300]	valid's auc: 0.749734
[400]	valid's auc: 0.751021
[500]	valid's auc: 0.75164
Early stopping, best iteration is:
[486]	valid's auc: 0.751815
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742232
[200]	valid's auc: 0.747549
[300]	valid's auc: 0.750199
[400]	valid's auc: 0.751178
Early stopping, best iteration is:
[375]	valid's auc: 0.751282
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.759325
Early stopping, best iteration is:
[92]	valid's auc: 0.759608
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756845
Early stopping, best iteration is:
[92]	valid's auc: 0.757195
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756546
Early stopping, best iteration is:
[109]	valid's auc: 0.75692
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752596
Early stopping, best iteration is:
[167]	valid's auc: 0.754995
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751351
Early stopping, best iteration is:
[157]	valid's auc: 0.753425
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752005
[200]	valid's auc: 0.754837
Early stopping, best iteration is:
[188]	valid's auc: 0.754837
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757639
Early stopping, best iteration is:
[120]	valid's auc: 0.75829
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754823
Early stopping, best iteration is:
[128]	valid's auc: 0.755449
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756486
Early stopping, best iteration is:
[157]	valid's auc: 0.758308
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742758
[200]	valid's auc: 0.747478
[300]	valid's auc: 0.749547
[400]	valid's auc: 0.750649
[500]	valid's auc: 0.751357
[600]	valid's auc: 0.751939
[700]	valid's auc: 0.752381
[800]	valid's auc: 0.752778
Early stopping, best iteration is:
[785]	valid's auc: 0.752807
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.74232
[200]	valid's auc: 0.747484
[300]	valid's auc: 0.749922
[400]	valid's auc: 0.751087
[500]	valid's auc: 0.75155
Early stopping, best iteration is:
[521]	valid's auc: 0.751589
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742236
[200]	valid's auc: 0.747842
[300]	valid's auc: 0.750342
[400]	valid's auc: 0.752025
[500]	valid's auc: 0.752291
Early stopping, best iteration is:
[557]	valid's auc: 0.752689
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752681
Early stopping, best iteration is:
[159]	valid's auc: 0.754898
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751343
Early stopping, best iteration is:
[152]	valid's auc: 0.753258
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752244
Early stopping, best iteration is:
[155]	valid's auc: 0.755148
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75626
[200]	valid's auc: 0.759951
[300]	valid's auc: 0.760682
Early stopping, best iteration is:
[278]	valid's auc: 0.760809
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754429
[200]	valid's auc: 0.757215
Early stopping, best iteration is:
[238]	valid's auc: 0.7575
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755577
[200]	valid's auc: 0.758416
[300]	valid's auc: 0.758964
Early stopping, best iteration is:
[282]	valid's auc: 0.759442
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.760679
Early stopping, best iteration is:
[111]	valid's auc: 0.760959
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755398
Early stopping, best iteration is:
[120]	valid's auc: 0.755656
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75787
Early stopping, best iteration is:
[100]	valid's auc: 0.75787
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75688
Early stopping, best iteration is:
[168]	valid's auc: 0.758716
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755885
[200]	valid's auc: 0.757114
Early stopping, best iteration is:
[221]	valid's auc: 0.75756
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755749
[200]	valid's auc: 0.757916
Early stopping, best iteration is:
[202]	valid's auc: 0.757977
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757343
Early stopping, best iteration is:
[116]	valid's auc: 0.757852
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752808
Early stopping, best iteration is:
[85]	valid's auc: 0.753039
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75644
Early stopping, best iteration is:
[113]	valid's auc: 0.756986
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756599
Early stopping, best iteration is:
[166]	valid's auc: 0.759191
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754094
Early stopping, best iteration is:
[124]	valid's auc: 0.755152
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756454
[200]	valid's auc: 0.759148
Early stopping, best iteration is:
[193]	valid's auc: 0.759284
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757581
Early stopping, best iteration is:
[88]	valid's auc: 0.757912
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754381
Early stopping, best iteration is:
[105]	valid's auc: 0.754858
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[69]	valid's auc: 0.755966
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756678
[200]	valid's auc: 0.7588
Early stopping, best iteration is:
[194]	valid's auc: 0.759008
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754502
[200]	valid's auc: 0.756557
Early stopping, best iteration is:
[200]	valid's auc: 0.756557
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756735
[200]	valid's auc: 0.758063
[300]	valid's auc: 0.758666
Early stopping, best iteration is:
[271]	valid's auc: 0.758706
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754039
[200]	valid's auc: 0.758403
[300]	valid's auc: 0.759502
Early stopping, best iteration is:
[361]	valid's auc: 0.760059
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752888
[200]	valid's auc: 0.756473
[300]	valid's auc: 0.757321
Early stopping, best iteration is:
[294]	valid's auc: 0.757369
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.753131
[200]	valid's auc: 0.757705
[300]	valid's auc: 0.758496
Early stopping, best iteration is:
[299]	valid's auc: 0.758576
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757651
Early stopping, best iteration is:
[111]	valid's auc: 0.758125
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754648
Early stopping, best iteration is:
[125]	valid's auc: 0.755121
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756743
Early stopping, best iteration is:
[159]	valid's auc: 0.75699
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.753275
Early stopping, best iteration is:
[162]	valid's auc: 0.755245
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751491
Early stopping, best iteration is:
[154]	valid's auc: 0.753483
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752254
Early stopping, best iteration is:
[149]	valid's auc: 0.754076
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757271
[200]	valid's auc: 0.759975
Early stopping, best iteration is:
[222]	valid's auc: 0.760111
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.753442
[200]	valid's auc: 0.755902
[300]	valid's auc: 0.756644
Early stopping, best iteration is:
[299]	valid's auc: 0.756711
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755452
[200]	valid's auc: 0.757773
[300]	valid's auc: 0.758733
Early stopping, best iteration is:
[293]	valid's auc: 0.758799
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757128
[200]	valid's auc: 0.758302
Early stopping, best iteration is:
[172]	valid's auc: 0.758597
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755975
Early stopping, best iteration is:
[119]	valid's auc: 0.756851
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756102
Early stopping, best iteration is:
[161]	valid's auc: 0.757271
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751969
[200]	valid's auc: 0.756997
[300]	valid's auc: 0.758064
Early stopping, best iteration is:
[340]	valid's auc: 0.758357
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751203
[200]	valid's auc: 0.755525
[300]	valid's auc: 0.756342
[400]	valid's auc: 0.757205
[500]	valid's auc: 0.757988
Early stopping, best iteration is:
[504]	valid's auc: 0.758144
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75147
[200]	valid's auc: 0.755991
[300]	valid's auc: 0.757753
Early stopping, best iteration is:
[290]	valid's auc: 0.758025
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751352
[200]	valid's auc: 0.757099
Early stopping, best iteration is:
[233]	valid's auc: 0.757928
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751858
[200]	valid's auc: 0.756057
[300]	valid's auc: 0.756901
Early stopping, best iteration is:
[281]	valid's auc: 0.757111
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752594
[200]	valid's auc: 0.756866
[300]	valid's auc: 0.758107
Early stopping, best iteration is:
[290]	valid's auc: 0.758178
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752852
[200]	valid's auc: 0.755407
Early stopping, best iteration is:
[176]	valid's auc: 0.755407
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751751
Early stopping, best iteration is:
[165]	valid's auc: 0.75335
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751326
Early stopping, best iteration is:
[169]	valid's auc: 0.754026
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742506
[200]	valid's auc: 0.747729
[300]	valid's auc: 0.749483
[400]	valid's auc: 0.750974
[500]	valid's auc: 0.751901
[600]	valid's auc: 0.752527
[700]	valid's auc: 0.75297
Early stopping, best iteration is:
[758]	valid's auc: 0.753333
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742771
[200]	valid's auc: 0.747808
[300]	valid's auc: 0.749688
[400]	valid's auc: 0.751069
[500]	valid's auc: 0.751623
Early stopping, best iteration is:
[536]	valid's auc: 0.751839
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.741584
[200]	valid's auc: 0.747003
[300]	valid's auc: 0.749446
[400]	valid's auc: 0.751109
[500]	valid's auc: 0.752014
[600]	valid's auc: 0.752388
Early stopping, best iteration is:
[578]	valid's auc: 0.752518
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752692
[200]	valid's auc: 0.755626
Early stopping, best iteration is:
[178]	valid's auc: 0.755653
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.750742
[200]	valid's auc: 0.753595
Early stopping, best iteration is:
[176]	valid's auc: 0.753595
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751047
[200]	valid's auc: 0.75405
Early stopping, best iteration is:
[182]	valid's auc: 0.75405
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758563
Early stopping, best iteration is:
[162]	valid's auc: 0.759807
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755835
Early stopping, best iteration is:
[162]	valid's auc: 0.756839
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755896
[200]	valid's auc: 0.75708
Early stopping, best iteration is:
[172]	valid's auc: 0.757265
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.740996
[200]	valid's auc: 0.746251
[300]	valid's auc: 0.748515
[400]	valid's auc: 0.749702
[500]	valid's auc: 0.750694
[600]	valid's auc: 0.751485
[700]	valid's auc: 0.751857
[800]	valid's auc: 0.752413
Early stopping, best iteration is:
[857]	valid's auc: 0.752722
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.74182
[200]	valid's auc: 0.74724
[300]	valid's auc: 0.749264
[400]	valid's auc: 0.750806
[500]	valid's auc: 0.751379
[600]	valid's auc: 0.751693
Early stopping, best iteration is:
[591]	valid's auc: 0.751763
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.740988
[200]	valid's auc: 0.746645
[300]	valid's auc: 0.749358
[400]	valid's auc: 0.750581
[500]	valid's auc: 0.751454
[600]	valid's auc: 0.751983
[700]	valid's auc: 0.752448
Early stopping, best iteration is:
[707]	valid's auc: 0.75247
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75119
[200]	valid's auc: 0.75453
Early stopping, best iteration is:
[212]	valid's auc: 0.754633
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.750709
[200]	valid's auc: 0.753628
Early stopping, best iteration is:
[189]	valid's auc: 0.753628
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.750699
[200]	valid's auc: 0.754147
Early stopping, best iteration is:
[203]	valid's auc: 0.754186
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755695
Early stopping, best iteration is:
[147]	valid's auc: 0.757242
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754579
Early stopping, best iteration is:
[156]	valid's auc: 0.75575
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755844
Early stopping, best iteration is:
[158]	valid's auc: 0.757668
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.759287
Early stopping, best iteration is:
[116]	valid's auc: 0.759515
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756642
Early stopping, best iteration is:
[120]	valid's auc: 0.757505
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756776
Early stopping, best iteration is:
[159]	valid's auc: 0.757993
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752509
Early stopping, best iteration is:
[164]	valid's auc: 0.75484
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751368
Early stopping, best iteration is:
[158]	valid's auc: 0.753724
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75273
Early stopping, best iteration is:
[154]	valid's auc: 0.755518
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742523
[200]	valid's auc: 0.747093
[300]	valid's auc: 0.748983
[400]	valid's auc: 0.749515
[500]	valid's auc: 0.750614
[600]	valid's auc: 0.751152
[700]	valid's auc: 0.751823
[800]	valid's auc: 0.752138
[900]	valid's auc: 0.752366
Early stopping, best iteration is:
[923]	valid's auc: 0.752473
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.742283
[200]	valid's auc: 0.747178
[300]	valid's auc: 0.749277
[400]	valid's auc: 0.7502
Early stopping, best iteration is:
[444]	valid's auc: 0.750529
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.741361
[200]	valid's auc: 0.746965
[300]	valid's auc: 0.749419
[400]	valid's auc: 0.75098
Early stopping, best iteration is:
[454]	valid's auc: 0.751541
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758754
Early stopping, best iteration is:
[124]	valid's auc: 0.759378
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755021
Early stopping, best iteration is:
[126]	valid's auc: 0.755299
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757724
Early stopping, best iteration is:
[121]	valid's auc: 0.758038
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756672
Early stopping, best iteration is:
[138]	valid's auc: 0.757435
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.753534
Early stopping, best iteration is:
[82]	valid's auc: 0.754145
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756215
Early stopping, best iteration is:
[125]	valid's auc: 0.756606
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.752148
Early stopping, best iteration is:
[162]	valid's auc: 0.754384
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.750899
Early stopping, best iteration is:
[164]	valid's auc: 0.753303
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.751944
Early stopping, best iteration is:
[169]	valid's auc: 0.754443
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75731
[200]	valid's auc: 0.759522
Early stopping, best iteration is:
[193]	valid's auc: 0.759575
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755253
[200]	valid's auc: 0.757428
Early stopping, best iteration is:
[214]	valid's auc: 0.757574
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757035
[200]	valid's auc: 0.759395
Early stopping, best iteration is:
[194]	valid's auc: 0.759452
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757581
Early stopping, best iteration is:
[106]	valid's auc: 0.75778
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755546
Early stopping, best iteration is:
[110]	valid's auc: 0.755699
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757264
Early stopping, best iteration is:
[100]	valid's auc: 0.757264
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758042
Early stopping, best iteration is:
[138]	valid's auc: 0.75951
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757496
Early stopping, best iteration is:
[109]	valid's auc: 0.757978
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758422
Early stopping, best iteration is:
[113]	valid's auc: 0.758837
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75799
Early stopping, best iteration is:
[101]	valid's auc: 0.758149
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754733
Early stopping, best iteration is:
[102]	valid's auc: 0.754852
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757013
Early stopping, best iteration is:
[108]	valid's auc: 0.757529
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757503
[200]	valid's auc: 0.759439
Early stopping, best iteration is:
[205]	valid's auc: 0.759445
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755139
Early stopping, best iteration is:
[157]	valid's auc: 0.75618
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75684
Early stopping, best iteration is:
[153]	valid's auc: 0.758848
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757382
[200]	valid's auc: 0.758623
Early stopping, best iteration is:
[198]	valid's auc: 0.75875
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755268
Early stopping, best iteration is:
[139]	valid's auc: 0.755651
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756456
Early stopping, best iteration is:
[160]	valid's auc: 0.757807
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75753
[200]	valid's auc: 0.758401
Early stopping, best iteration is:
[183]	valid's auc: 0.758982
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754237
[200]	valid's auc: 0.754901
Early stopping, best iteration is:
[210]	valid's auc: 0.755434
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756776
Early stopping, best iteration is:
[95]	valid's auc: 0.756893
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758111
Early stopping, best iteration is:
[166]	valid's auc: 0.760074
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756556
Early stopping, best iteration is:
[146]	valid's auc: 0.757504
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757027
[200]	valid's auc: 0.758655
Early stopping, best iteration is:
[175]	valid's auc: 0.759002
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757614
Early stopping, best iteration is:
[150]	valid's auc: 0.757832
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756416
Early stopping, best iteration is:
[126]	valid's auc: 0.756467
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756052
Early stopping, best iteration is:
[164]	valid's auc: 0.758475
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755644
[200]	valid's auc: 0.759113
[300]	valid's auc: 0.75977
Early stopping, best iteration is:
[282]	valid's auc: 0.759951
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754171
[200]	valid's auc: 0.757034
Early stopping, best iteration is:
[229]	valid's auc: 0.757626
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.753729
[200]	valid's auc: 0.75783
[300]	valid's auc: 0.759103
Early stopping, best iteration is:
[309]	valid's auc: 0.759136
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
Early stopping, best iteration is:
[1]	valid's auc: 0.5
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757962
Early stopping, best iteration is:
[142]	valid's auc: 0.758977
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756115
[200]	valid's auc: 0.756836
Early stopping, best iteration is:
[236]	valid's auc: 0.757136
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757196
Early stopping, best iteration is:
[140]	valid's auc: 0.758668
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757338
[200]	valid's auc: 0.759734
Early stopping, best iteration is:
[187]	valid's auc: 0.75993
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755481
[200]	valid's auc: 0.757583
Early stopping, best iteration is:
[209]	valid's auc: 0.757722
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756596
Early stopping, best iteration is:
[131]	valid's auc: 0.757379
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758647
Early stopping, best iteration is:
[123]	valid's auc: 0.758851
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.75509
Early stopping, best iteration is:
[158]	valid's auc: 0.755556
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.758205
Early stopping, best iteration is:
[99]	valid's auc: 0.758316
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757373
[200]	valid's auc: 0.759987
Early stopping, best iteration is:
[264]	valid's auc: 0.760244
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755042
Early stopping, best iteration is:
[167]	valid's auc: 0.756734
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757271
[200]	valid's auc: 0.759962
Early stopping, best iteration is:
[196]	valid's auc: 0.760087
[Parallel(n_jobs=1)]: Done 300 out of 300 | elapsed:  9.5min finished
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756502
[200]	valid's auc: 0.760543
[300]	valid's auc: 0.761467
[400]	valid's auc: 0.761586
Early stopping, best iteration is:
[373]	valid's auc: 0.761723
Best score reached: 0.7618206647578548 with params: {'colsample_bytree': 0.4453924022919999, 'min_child_samples': 331, 'min_child_weight': 0.01, 'num_leaves': 10, 'reg_alpha': 7, 'reg_lambda': 50, 'subsample': 0.8388825269901063} 
In [23]:
opt_parameters = gs.best_params_
#opt_parameters = {'colsample_bytree': 0.9234, 'min_child_samples': 399, 'min_child_weight': 0.1, 'num_leaves': 13, 'reg_alpha': 2, 'reg_lambda': 5, 'subsample': 0.855}

Tune the weights of unbalanced classesΒΆ

Following discussion in this comment, there was a small tuning of the disbalanced sample weight:

In [24]:
clf_sw = lgb.LGBMClassifier(**clf.get_params())
#set optimal parameters
clf_sw.set_params(**opt_parameters)
Out[24]:
LGBMClassifier(boosting_type='gbdt', class_weight=None,
        colsample_bytree=0.4453924022919999, learning_rate=0.1,
        max_depth=-1, metric='None', min_child_samples=331,
        min_child_weight=0.01, min_split_gain=0.0, n_estimators=5000,
        n_jobs=4, num_leaves=10, objective=None, random_state=314,
        reg_alpha=7, reg_lambda=50, silent=True,
        subsample=0.8388825269901063, subsample_for_bin=200000,
        subsample_freq=0)
In [25]:
gs_sample_weight = GridSearchCV(estimator=clf_sw, 
                                param_grid={'scale_pos_weight':[1,2,6,12]},
                                scoring='roc_auc',
                                cv=5,
                                refit=True,
                                verbose=True)
In [26]:
gs_sample_weight.fit(X_train, y_train, **fit_params)
print('Best score reached: {} with params: {} '.format(gs_sample_weight.best_score_, gs_sample_weight.best_params_))
Fitting 5 folds for each of 4 candidates, totalling 20 fits
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.757183
[200]	valid's auc: 0.760806
Early stopping, best iteration is:
[256]	valid's auc: 0.761519
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755028
[200]	valid's auc: 0.759327
[300]	valid's auc: 0.75994
Early stopping, best iteration is:
[291]	valid's auc: 0.760046
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754919
[200]	valid's auc: 0.758335
Early stopping, best iteration is:
[268]	valid's auc: 0.759093
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754703
[200]	valid's auc: 0.758161
[300]	valid's auc: 0.758705
Early stopping, best iteration is:
[359]	valid's auc: 0.759164
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.7557
[200]	valid's auc: 0.759892
[300]	valid's auc: 0.761004
Early stopping, best iteration is:
[324]	valid's auc: 0.761242
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756685
[200]	valid's auc: 0.760767
[300]	valid's auc: 0.761757
Early stopping, best iteration is:
[334]	valid's auc: 0.762055
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755617
[200]	valid's auc: 0.759716
[300]	valid's auc: 0.760135
Early stopping, best iteration is:
[319]	valid's auc: 0.760355
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755878
[200]	valid's auc: 0.759239
[300]	valid's auc: 0.760367
Early stopping, best iteration is:
[288]	valid's auc: 0.760506
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755323
[200]	valid's auc: 0.758806
Early stopping, best iteration is:
[198]	valid's auc: 0.758845
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756247
[200]	valid's auc: 0.759972
[300]	valid's auc: 0.760939
Early stopping, best iteration is:
[275]	valid's auc: 0.760977
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756221
[200]	valid's auc: 0.760043
[300]	valid's auc: 0.760702
Early stopping, best iteration is:
[335]	valid's auc: 0.760918
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754842
[200]	valid's auc: 0.759016
[300]	valid's auc: 0.759849
Early stopping, best iteration is:
[365]	valid's auc: 0.760575
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754518
[200]	valid's auc: 0.758711
[300]	valid's auc: 0.759257
Early stopping, best iteration is:
[335]	valid's auc: 0.759627
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755749
[200]	valid's auc: 0.758294
Early stopping, best iteration is:
[267]	valid's auc: 0.758848
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756546
[200]	valid's auc: 0.760047
Early stopping, best iteration is:
[209]	valid's auc: 0.760327
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.755407
[200]	valid's auc: 0.759471
Early stopping, best iteration is:
[252]	valid's auc: 0.760592
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.753451
[200]	valid's auc: 0.758102
[300]	valid's auc: 0.759334
Early stopping, best iteration is:
[287]	valid's auc: 0.759609
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754798
[200]	valid's auc: 0.758286
[300]	valid's auc: 0.759209
[400]	valid's auc: 0.760137
Early stopping, best iteration is:
[428]	valid's auc: 0.76039
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754622
[200]	valid's auc: 0.758359
Early stopping, best iteration is:
[233]	valid's auc: 0.758693
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754687
[200]	valid's auc: 0.759112
Early stopping, best iteration is:
[196]	valid's auc: 0.75913
[Parallel(n_jobs=1)]: Done  20 out of  20 | elapsed:  3.1min finished
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.756502
[200]	valid's auc: 0.760543
[300]	valid's auc: 0.761467
[400]	valid's auc: 0.761586
Early stopping, best iteration is:
[373]	valid's auc: 0.761723
Best score reached: 0.7637377873294663 with params: {'scale_pos_weight': 1} 

As an outcome, precision of the classifier does not depend much on the internal class weighting, but weight=1 still turns out to give slightly better performance that weighted scenarios.

Look at the performance of the top-5 parameter choicesΒΆ

(the list is inverted)

In [27]:
print("Valid+-Std        Train  :   Parameters")
for i in np.argsort(gs.cv_results_['mean_test_score'])[-5:]:
    print('{1:.3f}+-{3:.3f}     {2:.3f}   :  {0}'.format(gs.cv_results_['params'][i], 
                                    gs.cv_results_['mean_test_score'][i], 
                                    gs.cv_results_['mean_train_score'][i],
                                    gs.cv_results_['std_test_score'][i]))
Valid+-Std        Train  :   Parameters
0.762+-0.003     0.808   :  {'colsample_bytree': 0.7792703648870174, 'min_child_samples': 344, 'min_child_weight': 0.01, 'num_leaves': 10, 'reg_alpha': 0, 'reg_lambda': 10, 'subsample': 0.8503048560728566}
0.762+-0.002     0.803   :  {'colsample_bytree': 0.43473068607216775, 'min_child_samples': 478, 'min_child_weight': 0.01, 'num_leaves': 9, 'reg_alpha': 1, 'reg_lambda': 5, 'subsample': 0.4261926450859534}
0.762+-0.003     0.822   :  {'colsample_bytree': 0.5284213741879101, 'min_child_samples': 125, 'min_child_weight': 10.0, 'num_leaves': 22, 'reg_alpha': 0.1, 'reg_lambda': 20, 'subsample': 0.3080033455431848}
0.762+-0.003     0.797   :  {'colsample_bytree': 0.48263575356020577, 'min_child_samples': 311, 'min_child_weight': 1, 'num_leaves': 7, 'reg_alpha': 7, 'reg_lambda': 0.1, 'subsample': 0.3542761367404292}
0.762+-0.002     0.798   :  {'colsample_bytree': 0.4453924022919999, 'min_child_samples': 331, 'min_child_weight': 0.01, 'num_leaves': 10, 'reg_alpha': 7, 'reg_lambda': 50, 'subsample': 0.8388825269901063}
/opt/conda/lib/python3.6/site-packages/sklearn/utils/deprecation.py:125: FutureWarning: You are accessing a training score ('mean_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True
  warnings.warn(*warn_args, **warn_kwargs)
In [ ]:
#print("Valid+-Std     Train  :   Parameters")
#for i in np.argsort(gs_sample_weight.cv_results_['mean_test_score'])[-5:]:
#    print('{1:.3f}+-{3:.3f}     {2:.3f}   :  {0}'.format(gs_sample_weight.cv_results_['params'][i], 
#                                    gs_sample_weight.cv_results_['mean_test_score'][i], 
#                                    gs_sample_weight.cv_results_['mean_train_score'][i],
#                                    gs_sample_weight.cv_results_['std_test_score'][i]))

Build the final modelΒΆ

We do training with the 0.8 subset of the dataset and 0.2 subset for early stopping. We use the tuned parameter values but a smaller learning rate to allow smoother convergence to the minimum

In [28]:
#Configure from the HP optimisation
clf_final = lgb.LGBMClassifier(**gs.best_estimator_.get_params())

#Configure locally from hardcoded values
#clf_final = lgb.LGBMClassifier(**clf.get_params())

#set optimal parameters
clf_final.set_params(**opt_parameters)

#Train the final model with learning rate decay
clf_final.fit(X_train, y_train, **fit_params, callbacks=[lgb.reset_parameter(learning_rate=learning_rate_010_decay_power_0995)])
Training until validation scores don't improve for 30 rounds.
[100]	valid's auc: 0.754885
[200]	valid's auc: 0.759045
[300]	valid's auc: 0.760362
[400]	valid's auc: 0.760888
[500]	valid's auc: 0.761139
[600]	valid's auc: 0.761268
[700]	valid's auc: 0.761316
[800]	valid's auc: 0.761357
[900]	valid's auc: 0.76138
Early stopping, best iteration is:
[906]	valid's auc: 0.761381
Out[28]:
LGBMClassifier(boosting_type='gbdt', class_weight=None,
        colsample_bytree=0.4453924022919999, learning_rate=0.1,
        max_depth=-1, metric='None', min_child_samples=331,
        min_child_weight=0.01, min_split_gain=0.0, n_estimators=5000,
        n_jobs=4, num_leaves=10, objective=None, random_state=314,
        reg_alpha=7, reg_lambda=50, silent=True,
        subsample=0.8388825269901063, subsample_for_bin=200000,
        subsample_freq=0)

Plot feature importanceΒΆ

In [29]:
feat_imp = pd.Series(clf_final.feature_importances_, index=application_train_ohe.drop(['SK_ID_CURR', 'TARGET'], axis=1).columns)
feat_imp.nlargest(20).plot(kind='barh', figsize=(8,10))
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9108666128>
/opt/conda/lib/python3.6/site-packages/matplotlib/font_manager.py:1316: UserWarning: findfont: Font family ['xkcd', 'Humor Sans', 'Comic Sans MS'] not found. Falling back to DejaVu Sans
  (prop.get_family(), self.defaultFamily[fontext]))

Predict on the submission test sampleΒΆ

In [30]:
probabilities = clf_final.predict_proba(application_test_ohe.drop(['SK_ID_CURR'], axis=1))
submission = pd.DataFrame({
    'SK_ID_CURR': application_test_ohe['SK_ID_CURR'],
    'TARGET':     [ row[1] for row in probabilities]
})
submission.to_csv("submission.csv", index=False)

(Internal reference in KB: kb001)