An insurance company has approached you with a dataset of previous claims of their clients. The insurance company wants you to develop a model to help them predict which claims look fraudulent. By doing so you hope to save the company millions of dollars annually.
#Import necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
#Load the dataset into a dataframe
df = pd.read_csv('insurance_claims.csv')
df.head()
#Check the shape of the dataframe
df.shape
#Check the columns of the dataframe
df.columns
#check the data types of each column
df.info()
objectcols = [col for col in df.columns if df[col].dtype == object]
objectcols
feats = ['policy_state','insured_sex', 'insured_education_level','insured_occupation','insured_hobbies','insured_relationship',\
'collision_type','incident_severity','authorities_contacted','incident_state','incident_city','incident_location',\
'property_damage','police_report_available','auto_make','auto_model','fraud_reported','incident_type']
df_final = pd.get_dummies(df,columns=feats,drop_first=True)
df_final.head()
#We use sklearn’s train_test_split to split the data into a training set and a test set.
from sklearn.model_selection import train_test_split
X = df_final.drop(['fraud_reported_Y','policy_csl','policy_bind_date','incident_date'],axis=1).values
y = df_final['fraud_reported_Y'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
Due to the massive amounts of computations taking place in deep learning, feature scaling is compulsory. Feature scaling standardizes the range of our independent variables.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
print(type(X_train))
print(type(X_test))
# apply PCA on X_train
from sklearn.decomposition import PCA
pca = PCA().fit(X_train)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Number of components')
plt.ylabel('Cumulative explained variance')
plt.annotate('15',xy=(15, .90))
# individual explained variance
plt.figure(figsize=(10, 5))
plt.bar(range(44), pca.explained_variance_, alpha=0.5,
label='individual explained variance')
plt.ylabel('Explained variance ratio')
plt.xlabel('Principal components')
plt.legend(loc='best')
#import keras and its modules
import keras
from keras.models import Sequential #Sequential module is required to initialize ANN
from keras.layers import Dense #Dense module is required to build the layers of ANN
Next we need to initialize our ANN by creating an instance of Sequential. The Sequential function initializes a linear stack of layers. This allows us to add more layers later using the Dense module.
classifier = Sequential()
Adding input layer (First Hidden Layer)
We use the add method to add different layers to our ANN. The first parameter is the number of nodes you want to add to this layer. There is no rule of thumb as to how many nodes you should add. However a common strategy is to choose the number of nodes as the average of nodes in the input layer and the number of nodes in the output layer.
Say for example you had five independent variables and one output. Then you would take the sum of that and divide by two, which is three. You can also decide to experiment with a technique called parameter tuning. The second parameter, kernel_initializer, is the function that will be used to initialize the weights. In this case, it will use a uniform distribution to make sure that the weights are small numbers close to zero. The next parameter is the activation function. We use the Rectifier function, shortened as relu. We mostly use this function for the hidden layer in ANN. The final parameter is input_dim, which is the number of nodes in the input layer. It represents the number of independent variables.
classifier.add(
Dense(500, kernel_initializer = 'uniform',
activation = 'relu', input_dim=1143))
Adding Second Hidden Layer
Adding the second hidden layer is similar to adding the first hidden layer.
We don’t need to specify the input_dim parameter because we have already specified it in the first hidden layer. In the first hidden layer we specified this in order to let the layer know how many input nodes to expect. In the second hidden layer the ANN already knows how many input nodes to expect so we don’t need to repeat ourselves.
classifier.add(
Dense(500, kernel_initializer = 'uniform',
activation = 'relu'))
Adding the Output layer
We change the first parameter because in our output node we expect one node. This is because we are only interested in knowing whether a claim was fraudulent or not. We change the activation function because we want to get the probabilities that a claim is fraudulent. We do this by using the Sigmoid activation function. In case you’re dealing with a classification problem that has more than two classes (i.e. classifying cats, dogs, and monkeys) we’d need to change two things. We ‘d change the first parameter to 3 and change the activation function to softmax. Softmax is a sigmoid function applied to an independent variable with more than two categories.
classifier.add(
Dense(1, kernel_initializer = 'uniform',
activation = 'sigmoid'))
Compiling the ANN
Compiling is basically applying a stochastic gradient descent to the whole neural network. The first parameter is the algorithm you want to use to get the optimal set of weights in the neural network. The algorithm used here is a stochastic gradient algorithm. There are many variants of this. A very efficient one to use is adam. The second parameter is the loss function within the stochastic gradient algorithm. Since our categories are binary we use the binary_crossentropy loss function. Otherwise we would have used categorical_crossentopy. The final argument is the criterion we’ll use to evaluate our model. In this case we use the accuracy.
classifier.compile(optimizer= 'adam',
loss = 'binary_crossentropy',
metrics = ['accuracy'])
Fitting ANN to the training set
X_train represents the independent variables we’re using to train our ANN, and y_train represents the column we’re predicting. Epochs represents the number of times we’re going to pass our full dataset through the ANN. Batch_size is the number of observations after which the weights will be updated.
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100, verbose=2, validation_data = [X_test, y_test])
Predicting using the training set
y_pred = classifier.predict(X_test)
y_pred[0:10]
y_pred = (y_pred > 0.5)
y_pred[0:10]
Checking the confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
cm