XAI example

카테고리 없음

XAI example

GrayHEAD 2023. 7. 25. 00:05

Grad-CAM

https://keras.io/examples/vision/grad_cam/

Keras documentation: Grad-CAM class activation visualization

Grad-CAM class activation visualization Author: fchollet Date created: 2020/04/26 Last modified: 2021/03/07 Description: How to obtain a class activation heatmap for an image classification model. View in Colab • GitHub source Adapted from Deep Learning

keras.io

====

LIME (Local Interpretable Model-agnostic Explanations): LIME은 모델의 개별 예측을 설명하기 위해 사용됩니다. LIME은 데이터의 로컬 영역에서 단순한 모델을 학습하여 복잡한 모델의 예측을 근사화하는 방식으로 작동합니다. 이 방식을 통해 개별 예측에 대한 해석 가능성을 제공합니다.

# 1. LIME

import lime

import lime.lime_tabular

from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

# Load data

data = load_breast_cancer()

X = data.data

y = data.target

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier

clf = RandomForestClassifier()

clf.fit(X_train, y_train)

# Create a lime explainer object

explainer = lime.lime_tabular.LimeTabularExplainer(X_train,

feature_names=data.feature_names,

class_names=data.target_names,

discretize_continuous=True)

# Explain a prediction (for the first instance in the test set)

exp = explainer.explain_instance(X_test[0], clf.predict_proba, num_features=5, top_labels=1)

exp.show_in_notebook(show_table=True, show_all=False)

======

SHAP (SHapley Additive exPlanations): SHAP는 게임 이론에서 영감을 받아 개발된 방법으로, 각 특성이 예측에 얼마나 영향을 미치는지 측정합니다. SHAP 값은 각 특성의 "공정한" 기여도를 나타내며, 이를 통해 모델의 전반적인 동작뿐 아니라 개별 예측을 이해하는 데 도움이 됩니다.

# shap

import shap

from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

# Load data

data = load_breast_cancer()

X = data.data

y = data.target

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier

clf = RandomForestClassifier()

clf.fit(X_train, y_train)

# Create a TreeExplainer object

explainer = shap.TreeExplainer(clf)

# Calculate Shap values

shap_values = explainer.shap_values(X_test)

# Plot the SHAP values for the first instance in the test set

shap.initjs()

shap.force_plot(explainer.expected_value[0], shap_values[0][0], X_test[0], feature_names=data.feature_names)

======

Partial Dependence Plots (PDP): PDP는 특정 특성과 예측 결과 사이의 관계를 시각화하는 데 사용됩니다. 다른 특성들을 고정시키고 관심 있는 특성을 변화시키며 예측 결과를 관찰함으로써, 해당 특성이 결과에 얼마나 영향을 미치는지 이해할 수 있습니다.

# PDP: scikit learn < 0.1

from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.inspection import plot_partial_dependence

import matplotlib.pyplot as plt

# Load data

data = load_breast_cancer()

X = data.data

y = data.target

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier

clf = RandomForestClassifier()

clf.fit(X_train, y_train)

# Plot PDP

features = [0, 1] # Select two features for the partial dependence plots

plot_partial_dependence(clf, X_train, features, feature_names=data.feature_names)

plt.show()

# scikit-learn > 1.0

from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.inspection import PartialDependenceDisplay

import matplotlib.pyplot as plt

# Load data

data = load_breast_cancer()

X = data.data

y = data.target

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier

clf = RandomForestClassifier(random_state=42)

clf.fit(X_train, y_train)

# Plot PDP

features = [0, 1] # Select two features for the partial dependence plots

display = PartialDependenceDisplay.from_estimator(clf, X_train, features, feature_names=data.feature_names)

plt.show()

======

Feature Importance: 피처 중요도는 랜덤 포레스트와 같은 모델에서 자주 사용되는 방법으로, 각 피처가 모델의 예측에 얼마나 기여하는지 측정합니다. 피처 중요도는 피처 선택에 유용하며, 모델이 어떤 피처에 가장 많이 의존하는지를 알려줍니다.

# Feature Importance

from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

import matplotlib.pyplot as plt

import numpy as np

# Load data

data = load_breast_cancer()

X = data.data

y = data.target

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier

clf = RandomForestClassifier()

clf.fit(X_train, y_train)

# Get feature importances

importances = clf.feature_importances_

# Plot feature importances

indices = np.argsort(importances)

plt.figure(figsize=(10, 10))

plt.title('Feature Importances')

plt.barh(range(len(indices)), importances[indices], color='b', align='center')

plt.yticks(range(len(indices)), [data.feature_names[i] for i in indices])

plt.xlabel('Relative Importance')

plt.show()