Policies & Guardrails

Aberon enforces policies on AI agent behavior in real-time. Policies are configured in the dashboard — no agent code changes needed.

Policy Types

1. PII Detection

Automatically detect and mask personally identifiable information in traces.

How it works:

Powered by Microsoft Presidio NLP engine
Scans agent inputs/outputs before storage
Detected PII is replaced with type labels: [PERSON], [US_SSN], [IBAN_CODE]
Raw PII is NEVER stored

Supported entity types:

PERSON, PHONE_NUMBER, EMAIL_ADDRESS, CREDIT_CARD, US_SSN, IBAN_CODE, IP_ADDRESS, LOCATION, DATE_TIME, NRP, MEDICAL_LICENSE

Example:

Agent output (original):

"Employee John Smith (SSN: 123-45-6789) will receive
 payment to account DE89370400440532013000."

Stored in Aberon:

"Employee [PERSON] (SSN: [US_SSN]) will receive
 payment to account [IBAN_CODE]."

Audit log records: pii_detected: ["PERSON", "US_SSN", "IBAN_CODE"], fields_masked: 3

Dashboard: Policies → Create → PII Detection

Select entity types to detect
Apply to specific agent or all agents
Action: redact (mask and store) or block (reject the trace)

2. Tool Restriction

Block or require approval for specific tool calls.

Use case: Your support agent has access to search_kb, create_ticket, send_email, execute_sql. You want to block the last two.

Dashboard: Policies → Create → Tool Restriction

List blocked tools: send_email, execute_sql, delete_user
Action: block (instant) or require_approval (human decides)
Apply to specific agent or all agents

What happens when an agent calls a blocked tool:

14:23:01  search_kb("refund policy")       ✅ Allowed
14:23:03  create_ticket(customer=...)       ✅ Allowed
14:23:05  send_email(to="client@...")       ❌ BLOCKED

Dashboard shows a guardrail block notification with:

Which agent attempted it
Which tool was called
Which policy blocked it
Full trace link

Every block is recorded in the audit trail.

3. Cost Limit

Pause agent execution when cost exceeds a threshold.

Use case: Data analysis agent processes large datasets via GPT-4. Normally costs $0.50 per run. Sometimes enters a loop and costs $500.

Dashboard: Policies → Create → Cost Limit

Set max cost: $50.00 per run
Action: require_approval
Timeout: 600 seconds (10 minutes for human to decide)

What happens:

Step 1: Parse dataset      $2.30   ✅
Step 2: Summarize          $18.40  ✅
Step 3: Cross-reference    $31.20  ✅
Step 4: Generate report    ⏳ PAUSED — $51.90 exceeds $50 limit

Dashboard shows approval request:

Current cost: $51.90
Limit: $50.00
[Approve] [Reject] [8:42 remaining]

If approved: Step 4 completes. Total: $64.00. Audit: "Approved by analyst@company.com"

If rejected: Agent stops. No additional cost.

If timeout: Agent stops. Policy default applies.

4. Approval Workflows (Human-in-the-Loop)

Any policy with action "require_approval" creates an approval request.

Approval lifecycle:

Policy triggers → approval request created
Agent pauses (SDK polls for decision)
Human sees request in Dashboard → Approvals
Human approves or rejects with optional reason
Agent receives decision and continues or stops
Everything recorded in audit trail

Dashboard: Approvals page shows:

Pending approvals with countdown timer
Approved/rejected history
Who decided, when, why

SDK integration:

result = agent.check_guardrails(tool_name="send_email", trace_id=t.trace_id)

if result.requires_approval:
    pending = PendingApproval(client._transport, result.requires_approval)
    decision = pending.wait(timeout=120, poll_interval=3)
    # Returns when human approves, or raises ApprovalDeniedError / ApprovalExpiredError

Policy Scope

Policies can target:

All agents — global policy (target_agent_id = None)
Specific agent — only applies to one agent
Agent + children — applies to agent and all its sub-agents (apply_to_children = True)

Policy Priority

When multiple policies apply to the same action, they are evaluated in priority order (lower number = higher priority). First policy that blocks or requires approval wins.

Audit Trail for Policies

Every policy action is recorded:

guardrail.passed — check passed, agent proceeds
guardrail.blocked — action blocked by policy
guardrail.approval_requested — human approval needed
approval.approved — human approved with reason
approval.rejected — human rejected with reason
approval.expired — no decision within timeout

All entries are part of the SHA-256 hash chain — tamper-evident. Learn more about the immutable audit trail.

SDK & Integration — guardrail checks in code
Licensing — plans and limits
Troubleshooting — policy issues and fixes