Pular para o conteúdo

Metrics Reference

Este conteúdo não está disponível em sua língua ainda.

Venturalitica ships with 35+ metrics organized into 7 categories. Each metric is registered in METRIC_REGISTRY and can be referenced by its key in OSCAL policy files.


CategoryMetricsDescription
Performance4Standard ML accuracy, precision, recall, F1
Fairness (Traditional)2Demographic parity, equal opportunity
Fairness (Alternative)2Equalized odds, predictive parity
Multiclass Fairness7Fairness metrics for multi-class classification
Data Quality4Disparate impact, class imbalance, completeness
Privacy4k-anonymity, l-diversity, t-closeness, data minimization
Causal Fairness4Counterfactual, path decomposition, awareness

Registry KeyFunctionDescription
accuracy_scorecalc_accuracyOverall classification accuracy
precision_scorecalc_precisionPositive predictive value
recall_scorecalc_recallSensitivity / true positive rate
f1_scorecalc_f1Harmonic mean of precision and recall

Usage in policy:

- control-id: model-accuracy
props:
- name: metric_key
value: accuracy_score
- name: threshold
value: "0.80"
- name: operator
value: ">="

These are the most commonly used fairness measures for binary classification.

Measures the difference in positive prediction rates between protected groups.

  • Formula: |P(Y=1|A=a) - P(Y=1|A=b)|
  • Ideal value: 0.0
  • Typical threshold: < 0.10
  • Requires: dimension (protected attribute column)

Measures the difference in true positive rates (TPR) between groups.

  • Formula: |TPR_a - TPR_b|
  • Ideal value: 0.0
  • Typical threshold: < 0.10
  • Requires: dimension, target, prediction

Ensures both TPR and FPR are equal across groups. Stricter than equal opportunity.

  • Formula: |TPR_a - TPR_b| + |FPR_a - FPR_b|
  • Ideal value: 0.0
  • Typical threshold: < 0.20
  • Requires: dimension, target, prediction
  • Reference: Hardt et al. 2016

Measures precision equality across groups. When a positive prediction is made, it should be equally reliable regardless of group membership.

  • Formula: |Precision_a - Precision_b|
  • Ideal value: 0.0
  • Typical threshold: < 0.10
  • Requires: dimension, target, prediction
  • Reference: Corbett-Davies et al. 2017

For multi-class classification tasks. See Multiclass Fairness for detailed usage.

Registry KeyDescription
multiclass_demographic_parityDemographic parity extended to multi-class
multiclass_equal_opportunityEqual opportunity per class
multiclass_confusion_metricsPer-group confusion matrix analysis
weighted_demographic_parity_multiclassClass-weighted demographic parity
macro_equal_opportunity_multiclassMacro-averaged equal opportunity
micro_equalized_odds_multiclassMicro-averaged equalized odds
predictive_parity_multiclassPredictive parity per class

Registry KeyDescriptionTypical Threshold
disparate_impactFour-Fifths Rule ratio between groups> 0.80
class_imbalanceMinority class proportion> 0.20
group_min_positive_rateMinimum positive rate across groups> 0.10
data_completenessProportion of non-null values> 0.95

Usage example (loan scenario):

- control-id: credit-data-bias
description: "Disparate impact ratio must satisfy the Four-Fifths Rule"
props:
- name: metric_key
value: disparate_impact
- name: "input:dimension"
value: gender
- name: operator
value: ">"
- name: threshold
value: "0.8"

GDPR-aligned privacy measures from venturalitica.assurance.privacy.

Minimum group size when quasi-identifiers are known. Prevents re-identification.

  • Formula: min(|group|) where groups are defined by quasi-identifiers
  • Ideal value: >= 5 (GDPR recommendation)
  • Requires: quasi-identifier columns
from venturalitica.metrics.privacy import calc_k_anonymity
k = calc_k_anonymity(df, quasi_identifiers=["age", "gender", "zipcode"])
assert k >= 5, "GDPR recommends k >= 5"

Minimum distinct values of a sensitive attribute per quasi-identifier group.

  • Formula: min(distinct values in sensitive_attribute per QI group)
  • Ideal value: >= 2

Maximum distribution difference between groups using Earth Mover Distance.

  • Formula: max(EMD between group distributions)
  • Ideal value: < 0.15
  • Reference: Li et al. 2007

GDPR Article 5 compliance — proportion of non-sensitive columns.

  • Formula: (total_columns - sensitive_columns) / total_columns
  • Ideal value: >= 0.70
from venturalitica.metrics.privacy import calc_data_minimization_score
score = calc_data_minimization_score(
df,
sensitive_columns=["age", "income", "health_status"]
)

Advanced metrics from venturalitica.assurance.causal.

Registry KeyDescription
path_decompositionDecomposes causal paths to identify direct vs indirect discrimination
counterfactual_fairnessTests whether changing a protected attribute would change the outcome
fairness_through_awarenessEnsures similar individuals receive similar predictions
causal_fairness_diagnosticComprehensive diagnostic combining multiple causal tests

These are convenience aliases that use calc_mean internally:

Registry KeyUse Case
bias_scoreGeneral bias scoring for LLM outputs
stereotype_preference_rateStereotype detection in generated text
category_bias_scorePer-category bias in benchmark evaluations

Specialized metrics for ESG report analysis:

Registry KeyDescription
classification_distributionDistribution of ESG classifications
report_coverageCoverage of reporting requirements
provenance_completenessCompleteness of data provenance chain
chunk_diversityDiversity of text chunks in RAG pipelines
subtitle_diversityDiversity of section headings in reports

Different metrics capture different fairness concepts, ordered by strictness:

Demographic Parity (least strict)
| Same approval rates across groups
v
Equal Opportunity (medium)
| Same TPR across groups
v
Equalized Odds (most strict)
| Same TPR AND FPR across groups
v
Predictive Parity (orthogonal)
Same precision across groups

In practice, a system can pass demographic parity while failing equalized odds. Choose metrics aligned with your risk context:

  • Lending / Hiring: Equalized odds + predictive parity
  • Healthcare: Equalized odds + privacy metrics (k-anonymity >= 5)
  • Comprehensive audit: All metrics together

Every metric can be referenced in an OSCAL policy via the metric_key property:

assessment-plan:
metadata:
title: "Custom Fairness Policy"
control-implementations:
- description: "Fairness Controls"
implemented-requirements:
- control-id: my-check
props:
- name: metric_key
value: equalized_odds_ratio # <-- Registry key
- name: threshold
value: "0.20"
- name: operator
value: "<"
- name: "input:dimension"
value: gender # <-- Protected attribute

See Policy Authoring Guide for the complete OSCAL format reference.


To register a new metric:

  1. Create the function in the appropriate module under venturalitica/assurance/:

    def calc_my_metric(df, **kwargs) -> float:
    # Validate inputs
    if "dimension" not in kwargs:
    raise ValueError("Missing 'dimension' parameter")
    # Calculate and return
    return value
  2. Register it in venturalitica/metrics/__init__.py:

    METRIC_REGISTRY["my_metric"] = calc_my_metric
  3. Use it in your OSCAL policy via metric_key: my_metric.