Policies & Guardrails

Aberon enforces policies on AI agent behavior in real-time. Policies are configured in the dashboard — no agent code changes needed.

Policy Types

1. PII Detection

Automatically detect and mask personally identifiable information in traces.

How it works:

  • Powered by Microsoft Presidio NLP engine
  • Scans agent inputs/outputs before storage
  • Detected PII is replaced with type labels: [PERSON], [US_SSN], [IBAN_CODE]
  • Raw PII is NEVER stored

Supported entity types:

PERSON, PHONE_NUMBER, EMAIL_ADDRESS, CREDIT_CARD, US_SSN, IBAN_CODE, IP_ADDRESS, LOCATION, DATE_TIME, NRP, MEDICAL_LICENSE

Example:

Agent output (original):

"Employee John Smith (SSN: 123-45-6789) will receive
 payment to account DE89370400440532013000."

Stored in Aberon:

"Employee [PERSON] (SSN: [US_SSN]) will receive
 payment to account [IBAN_CODE]."

Audit log records: pii_detected: ["PERSON", "US_SSN", "IBAN_CODE"], fields_masked: 3

Dashboard: Policies → Create → PII Detection

  • Select entity types to detect
  • Apply to specific agent or all agents
  • Action: redact (mask and store) or block (reject the trace)

2. Tool Restriction

Block or require approval for specific tool calls.

Use case: Your support agent has access to search_kb, create_ticket, send_email, execute_sql. You want to block the last two.

Dashboard: Policies → Create → Tool Restriction

  • List blocked tools: send_email, execute_sql, delete_user
  • Action: block (instant) or require_approval (human decides)
  • Apply to specific agent or all agents

What happens when an agent calls a blocked tool:

14:23:01  search_kb("refund policy")       ✅ Allowed
14:23:03  create_ticket(customer=...)       ✅ Allowed
14:23:05  send_email(to="client@...")       ❌ BLOCKED

Dashboard shows a guardrail block notification with:

  • Which agent attempted it
  • Which tool was called
  • Which policy blocked it
  • Full trace link

Every block is recorded in the audit trail.

3. Cost Limit

Pause agent execution when cost exceeds a threshold.

Use case: Data analysis agent processes large datasets via GPT-4. Normally costs $0.50 per run. Sometimes enters a loop and costs $500.

Dashboard: Policies → Create → Cost Limit

  • Set max cost: $50.00 per run
  • Action: require_approval
  • Timeout: 600 seconds (10 minutes for human to decide)

What happens:

Step 1: Parse dataset      $2.30   ✅
Step 2: Summarize          $18.40  ✅
Step 3: Cross-reference    $31.20  ✅
Step 4: Generate report    ⏳ PAUSED — $51.90 exceeds $50 limit

Dashboard shows approval request:

  • Current cost: $51.90
  • Limit: $50.00
  • [Approve] [Reject] [8:42 remaining]

If approved: Step 4 completes. Total: $64.00. Audit: "Approved by analyst@company.com"

If rejected: Agent stops. No additional cost.

If timeout: Agent stops. Policy default applies.

4. Approval Workflows (Human-in-the-Loop)

Any policy with action "require_approval" creates an approval request.

Approval lifecycle:

  1. Policy triggers → approval request created
  2. Agent pauses (SDK polls for decision)
  3. Human sees request in Dashboard → Approvals
  4. Human approves or rejects with optional reason
  5. Agent receives decision and continues or stops
  6. Everything recorded in audit trail

Dashboard: Approvals page shows:

  • Pending approvals with countdown timer
  • Approved/rejected history
  • Who decided, when, why

SDK integration:

result = agent.check_guardrails(tool_name="send_email", trace_id=t.trace_id)

if result.requires_approval:
    pending = PendingApproval(client._transport, result.requires_approval)
    decision = pending.wait(timeout=120, poll_interval=3)
    # Returns when human approves, or raises ApprovalDeniedError / ApprovalExpiredError

Policy Scope

Policies can target:

  • All agents — global policy (target_agent_id = None)
  • Specific agent — only applies to one agent
  • Agent + children — applies to agent and all its sub-agents (apply_to_children = True)

Policy Priority

When multiple policies apply to the same action, they are evaluated in priority order (lower number = higher priority). First policy that blocks or requires approval wins.

Audit Trail for Policies

Every policy action is recorded:

  • guardrail.passed — check passed, agent proceeds
  • guardrail.blocked — action blocked by policy
  • guardrail.approval_requested — human approval needed
  • approval.approved — human approved with reason
  • approval.rejected — human rejected with reason
  • approval.expired — no decision within timeout

All entries are part of the SHA-256 hash chain — tamper-evident. Learn more about the immutable audit trail.