Column Binding
Este conteúdo não está disponível em sua língua ainda.
Venturalitica uses a synonym-based column binding system to map abstract concepts (like “gender” or “age”) to actual DataFrame column names (like “Attribute9” or “Attribute13”). This decouples your OSCAL policies from specific dataset schemas.
How It Works
Section titled “How It Works”When you call vl.enforce(), the SDK resolves column names in three steps:
- Explicit parameters: Check if
targetorpredictionvalues exist as column names - Synonym discovery: Look up the column name in
COLUMN_SYNONYMS - Lowercase fallback: Try a case-insensitive match
Example
Section titled “Example”vl.enforce( data=df, target="class", # Explicit: looks for 'class' column gender="Attribute9", # Attribute mapping: 'gender' -> 'Attribute9' age="Attribute13", # Attribute mapping: 'age' -> 'Attribute13' policy="data_policy.oscal.yaml")In the OSCAL policy, controls reference abstract names like gender and age via input:dimension. The SDK resolves these to actual columns at runtime.
Synonym Dictionary
Section titled “Synonym Dictionary”The built-in COLUMN_SYNONYMS dictionary maps semantic roles to known column name variants:
| Role | Known Synonyms |
|---|---|
gender | sex, gender, sexo, Attribute9 |
age | age, age_group, edad, Attribute13 |
race | race, ethnicity, raza |
target | target, class, label, y, true_label, ground_truth, approved, default, outcome |
prediction | prediction, pred, y_pred, predictions, score, proba, output |
dimension | sex, gender, age, race, Attribute9, Attribute13 |
Auto-Discovery
Section titled “Auto-Discovery”If you do not explicitly pass a column mapping, the SDK automatically discovers columns by scanning your DataFrame for known synonyms:
# These are equivalent for the UCI German Credit dataset:vl.enforce(data=df, target="class", gender="Attribute9")vl.enforce(data=df) # Auto-discovers 'class' as target, 'Attribute9' as genderResolution Functions
Section titled “Resolution Functions”discover_column(requested, context_mapping, data, synonyms)
Section titled “discover_column(requested, context_mapping, data, synonyms)”Discovers a single column using this priority order:
- Check
context_mapping(explicit user-provided mapping) - Check if
requestedis a direct column name - Search
COLUMN_SYNONYMSfor a matching group - Lowercase fallback
- Return
"MISSING"if not found
resolve_col_names(val, data, synonyms)
Section titled “resolve_col_names(val, data, synonyms)”Resolves one or more column names from a string or list:
- Supports comma-separated strings:
"target, gender" - Supports lists:
["target", "gender"] - Returns a list of resolved column names
The Policy-Code Handshake
Section titled “The Policy-Code Handshake”Column binding bridges the gap between legal requirements (OSCAL policies) and technical reality (messy DataFrames):
OSCAL Policy Python Code DataFrame+--------------+ enforce() +-----------+ binding +------------+| dimension: | ------------> | gender= | ------------> | Attribute9 || gender | | "Attr..." | | (actual || threshold: | | | | column) || > 0.8 | | | | |+--------------+ +-----------+ +------------+This design means:
- Compliance Officers write policies using abstract names (
gender,age) - Engineers provide the mapping to technical column names once
- The same policy works across different datasets with different schemas
Multilingual Support
Section titled “Multilingual Support”The synonym dictionary includes Spanish translations:
gender=sexoage=edadrace=raza
This allows datasets with Spanish column headers to be used without explicit mapping.
Custom Synonyms
Section titled “Custom Synonyms”You can extend the synonym dictionary programmatically:
from venturalitica.binding import COLUMN_SYNONYMS
# Add a custom synonym for your datasetCOLUMN_SYNONYMS["income"] = ["income", "salary", "wage", "earnings", "ingreso"]Or pass a custom synonym dictionary to the resolution functions:
from venturalitica.binding import resolve_col_names
custom_synonyms = { "risk_score": ["risk", "risk_score", "risk_level", "riesgo"],}resolved = resolve_col_names("risk_score", df, synonyms=custom_synonyms)