카테고리 없음

XAI example

GrayHEAD 2023. 7. 25. 00:05

Grad-CAM

 

https://keras.io/examples/vision/grad_cam/

 

Keras documentation: Grad-CAM class activation visualization

Grad-CAM class activation visualization Author: fchollet Date created: 2020/04/26 Last modified: 2021/03/07 Description: How to obtain a class activation heatmap for an image classification model. View in Colab • GitHub source Adapted from Deep Learning

keras.io

 

 

 

====

 

LIME (Local Interpretable Model-agnostic Explanations): LIME은 모델의 개별 예측을 설명하기 위해 사용됩니다. LIME은 데이터의 로컬 영역에서 단순한 모델을 학습하여 복잡한 모델의 예측을 근사화하는 방식으로 작동합니다. 이 방식을 통해 개별 예측에 대한 해석 가능성을 제공합니다.

# 1. LIME

import lime
import lime.lime_tabular
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load data
data = load_breast_cancer()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Create a lime explainer object
explainer = lime.lime_tabular.LimeTabularExplainer(X_train,
feature_names=data.feature_names,
class_names=data.target_names,
discretize_continuous=True)

# Explain a prediction (for the first instance in the test set)
exp = explainer.explain_instance(X_test[0], clf.predict_proba, num_features=5, top_labels=1)
exp.show_in_notebook(show_table=True, show_all=False)

======

SHAP (SHapley Additive exPlanations): SHAP는 게임 이론에서 영감을 받아 개발된 방법으로, 각 특성이 예측에 얼마나 영향을 미치는지 측정합니다. SHAP 값은 각 특성의 "공정한" 기여도를 나타내며, 이를 통해 모델의 전반적인 동작뿐 아니라 개별 예측을 이해하는 데 도움이 됩니다.

 

# shap

import shap
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load data
data = load_breast_cancer()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Create a TreeExplainer object
explainer = shap.TreeExplainer(clf)

# Calculate Shap values
shap_values = explainer.shap_values(X_test)

# Plot the SHAP values for the first instance in the test set
shap.initjs()
shap.force_plot(explainer.expected_value[0], shap_values[0][0], X_test[0], feature_names=data.feature_names)

======

Partial Dependence Plots (PDP): PDP는 특정 특성과 예측 결과 사이의 관계를 시각화하는 데 사용됩니다. 다른 특성들을 고정시키고 관심 있는 특성을 변화시키며 예측 결과를 관찰함으로써, 해당 특성이 결과에 얼마나 영향을 미치는지 이해할 수 있습니다.

# PDP: scikit learn < 0.1

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import plot_partial_dependence
import matplotlib.pyplot as plt

# Load data
data = load_breast_cancer()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Plot PDP
features = [0, 1] # Select two features for the partial dependence plots
plot_partial_dependence(clf, X_train, features, feature_names=data.feature_names)
plt.show()
# scikit-learn > 1.0
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import PartialDependenceDisplay
import matplotlib.pyplot as plt

# Load data
data = load_breast_cancer()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Plot PDP
features = [0, 1] # Select two features for the partial dependence plots
display = PartialDependenceDisplay.from_estimator(clf, X_train, features, feature_names=data.feature_names)
plt.show()

 

 

======

Feature Importance: 피처 중요도는 랜덤 포레스트와 같은 모델에서 자주 사용되는 방법으로, 각 피처가 모델의 예측에 얼마나 기여하는지 측정합니다. 피처 중요도는 피처 선택에 유용하며, 모델이 어떤 피처에 가장 많이 의존하는지를 알려줍니다.

 

# Feature Importance

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
import numpy as np

# Load data
data = load_breast_cancer()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train a classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Get feature importances
importances = clf.feature_importances_

# Plot feature importances
indices = np.argsort(importances)
plt.figure(figsize=(10, 10))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='b', align='center')
plt.yticks(range(len(indices)), [data.feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()