Pular para o conteúdo

Column Binding

Este conteúdo não está disponível em sua língua ainda.

Venturalitica uses a synonym-based column binding system to map abstract concepts (like “gender” or “age”) to actual DataFrame column names (like “Attribute9” or “Attribute13”). This decouples your OSCAL policies from specific dataset schemas.


When you call vl.enforce(), the SDK resolves column names in three steps:

  1. Explicit parameters: Check if target or prediction values exist as column names
  2. Synonym discovery: Look up the column name in COLUMN_SYNONYMS
  3. Lowercase fallback: Try a case-insensitive match
vl.enforce(
data=df,
target="class", # Explicit: looks for 'class' column
gender="Attribute9", # Attribute mapping: 'gender' -> 'Attribute9'
age="Attribute13", # Attribute mapping: 'age' -> 'Attribute13'
policy="data_policy.oscal.yaml"
)

In the OSCAL policy, controls reference abstract names like gender and age via input:dimension. The SDK resolves these to actual columns at runtime.


The built-in COLUMN_SYNONYMS dictionary maps semantic roles to known column name variants:

RoleKnown Synonyms
gendersex, gender, sexo, Attribute9
ageage, age_group, edad, Attribute13
racerace, ethnicity, raza
targettarget, class, label, y, true_label, ground_truth, approved, default, outcome
predictionprediction, pred, y_pred, predictions, score, proba, output
dimensionsex, gender, age, race, Attribute9, Attribute13

If you do not explicitly pass a column mapping, the SDK automatically discovers columns by scanning your DataFrame for known synonyms:

# These are equivalent for the UCI German Credit dataset:
vl.enforce(data=df, target="class", gender="Attribute9")
vl.enforce(data=df) # Auto-discovers 'class' as target, 'Attribute9' as gender

discover_column(requested, context_mapping, data, synonyms)

Section titled “discover_column(requested, context_mapping, data, synonyms)”

Discovers a single column using this priority order:

  1. Check context_mapping (explicit user-provided mapping)
  2. Check if requested is a direct column name
  3. Search COLUMN_SYNONYMS for a matching group
  4. Lowercase fallback
  5. Return "MISSING" if not found

Resolves one or more column names from a string or list:

  • Supports comma-separated strings: "target, gender"
  • Supports lists: ["target", "gender"]
  • Returns a list of resolved column names

Column binding bridges the gap between legal requirements (OSCAL policies) and technical reality (messy DataFrames):

OSCAL Policy Python Code DataFrame
+--------------+ enforce() +-----------+ binding +------------+
| dimension: | ------------> | gender= | ------------> | Attribute9 |
| gender | | "Attr..." | | (actual |
| threshold: | | | | column) |
| > 0.8 | | | | |
+--------------+ +-----------+ +------------+

This design means:

  • Compliance Officers write policies using abstract names (gender, age)
  • Engineers provide the mapping to technical column names once
  • The same policy works across different datasets with different schemas

The synonym dictionary includes Spanish translations:

  • gender = sexo
  • age = edad
  • race = raza

This allows datasets with Spanish column headers to be used without explicit mapping.


You can extend the synonym dictionary programmatically:

from venturalitica.binding import COLUMN_SYNONYMS
# Add a custom synonym for your dataset
COLUMN_SYNONYMS["income"] = ["income", "salary", "wage", "earnings", "ingreso"]

Or pass a custom synonym dictionary to the resolution functions:

from venturalitica.binding import resolve_col_names
custom_synonyms = {
"risk_score": ["risk", "risk_score", "risk_level", "riesgo"],
}
resolved = resolve_col_names("risk_score", df, synonyms=custom_synonyms)