Evaluation#

To import the following functions:

from maddlib import evaluation

Compute MADD#

The evaluation.MADD() function which computes MADD depends on two other functions that are described below: evaluation.separate_pred_proba() and evaluation.normalized_density_vector().

evaluation.MADD(h, X_test=None, pred_proba=None, sf=None, pred_proba_sf0=None, pred_proba_sf1=None, min_nb_points=50)#

Compute MADD.

Parameters:
  • h (float or str) – Bandwidth parameter (either a float \(\in \left]0, 1\right[\) or 'auto' for an automatic computation of the optimal bandiwdth).

  • X_test (pandas.DataFrame or None) – Optional test set (without labels) on which to evaluate MADD. If X_test is given, preb_proba and sf are also expected to be given.

  • pred_proba (numpy.ndarray of shape (n, 1) or None) – Optional predicted probabilities (associated to the test set X_test) on which to evaluate MADD. If pred_proba is given, X_test and sf are also expected to be given.

  • sf (str or None) – Optional sensitive feature name (from the test set X_test) with which to evaluate MADD. If sf is given, X_test and pred_proba are also expected to be given.

  • pred_proba_sf0 (numpy.ndarray of shape (n, 1) or None) – Optional predicted probabilities of group 0 with which to evaluate MADD. If pred_proba_sf0 is given, pred_proba_sf1 is also expected to be given.

  • pred_proba_sf1 (numpy.ndarray of shape (n, 1) or None) – Optional predicted probabilities of group 1 with which to evaluate MADD. If pred_proba_sf1 is given, pred_proba_sf0 is also expected to be given.

  • min_nb_points (int or None) – Optional minimum number of points to consider in the bandwidth interval.

Returns:

MADD result

Return type:

float

evaluation.separate_pred_proba(X, pred_proba, sf)#

Return the separated predicted probabilities according the sensitive feature.

Parameters:
  • X (pandas.DataFrame) – The feature set.

  • pred_proba (numpy.ndarray of shape (n, 1)) – The predicted probabilities (of positive predictions).

  • sf (str) – Sensitive feature name included in the feature set X.

Returns:

The couple of predicted probabilities separated (pred_proba_sf0, pred_proba_sf1)

Return type:

couple of numpy.ndarray

evaluation.normalized_density_vector(pred_proba_sfi, e)#

Compute the density vector for a group (\(D_{G_0}\) or \(D_{G_1}\)).

Parameters:
  • pred_proba_sfi (numpy.ndarray of shape (n, 1)) – The predicted probabilities (of positive predictions) for one group.

  • e (float) – Bandwidth parameter.

Returns:

The density vector

Return type:

numpy.ndarray

Display MADD results#

To retrieve a list of random ingredients, you can use the lumache.get_random_ingredients() function:

evaluation.madd_plot(h, pred_proba_sf0, pred_proba_sf1, legend_groups, title, figsize=(12, 4))#

Return a plot of a visual approximation of the resulting MADD for graphical analysis.

Parameters:
  • h (float or str) – Bandwidth parameter (either a float \(\in \left]0, 1\right[\) or 'auto' for an automatic computation of the optimal bandiwdth).

  • pred_proba_sf0 (numpy.ndarray of shape (n, 1)) – The predicted probabilities of group 0 with which to evaluate MADD.

  • pred_proba_sf1 (numpy.ndarray of shape (n, 1)) – The predicted probabilities of group 1 with which to evaluate MADD.

  • legend_groups (str or 2-tuple) – The name of the sensitive feature or the names of the two groups in a 2-tuple.

  • title (str) – The title of the graph (it could be the name of the model that outputs the predicted probabilities).

Returns:

Plot

Return type:

matplotlib.figure.Figure