TestSavant SDK — Add Guardrails to Your AI App
Guardrails for modern AI teams

TestSavant SDK

TestSavant SDK gives you policy-as-code guardrails across every model, tool, and route. Stop prompt injection, toxic drift, and policy gaps with scans that run before and after each model call.

Low Latency Policy-as-code Audit-ready Easy integration
policy_guardrails.py
from testsavant.guard import InputGuard, OutputGuard

input_guard = InputGuard(api_key=API_KEY, project_id=PROJECT_ID)
output_guard = OutputGuard(api_key=API_KEY, project_id=PROJECT_ID)

prompt = "Summarize the latest release train for executives."
if input_guard.scan(prompt).is_valid:
    completion = llm.generate(prompt)
    verdict = output_guard.scan(prompt=prompt, output=completion)
    if verdict.is_valid:
        ship(completion)
    else:
        escalate(verdict.results)

Our Scanners

Input scanners

Anonymize

Anonymize(entity_types, tag="base", threshold=0.5, redact=False)

Detects and redacts PII before it hits your model.

BanCode

BanCode(tag="base", threshold=0.5)

Stops code snippets or scripts from being submitted upstream.

BanCompetitors

BanCompetitors(competitors, tag="base", threshold=0.5, redact=False)

Filters prompts that mention disallowed competitor names.

BanTopics

BanTopics(topics, tag="base", threshold=0.5, mode="blacklist")

Blocks any prompt requesting sensitive or restricted themes.

Code

Code(languages, tag="base", threshold=0.5, is_blocked=True)

Detects programming languages you want to quarantine before use.

Gibberish

Gibberish(tag="base", threshold=0.5)

Keeps garbage prompts from wasting cycles and tokens.

Language

Language(valid_languages, tag="base", threshold=0.5)

Ensures incoming prompts match your allowed language list.

NSFW

NSFW(tag="base", threshold=0.5)

Screens explicit or harmful requests before they enter workflows.

PromptInjection

PromptInjection(tag="base", threshold=0.5)

Neutralizes adversarial jailbreak attempts embedded in prompts.

Toxicity

Toxicity(tag="base", threshold=0.5)

Detects hateful, harassing, or unsafe user inputs.

Anonymize

Anonymize(entity_types, tag="base", threshold=0.5, redact=False)

Detects and redacts PII before it hits your model.

BanCode

BanCode(tag="base", threshold=0.5)

Stops code snippets or scripts from being submitted upstream.

BanCompetitors

BanCompetitors(competitors, tag="base", threshold=0.5, redact=False)

Filters prompts that mention disallowed competitor names.

BanTopics

BanTopics(topics, tag="base", threshold=0.5, mode="blacklist")

Blocks any prompt requesting sensitive or restricted themes.

Code

Code(languages, tag="base", threshold=0.5, is_blocked=True)

Detects programming languages you want to quarantine before use.

Gibberish

Gibberish(tag="base", threshold=0.5)

Keeps garbage prompts from wasting cycles and tokens.

Language

Language(valid_languages, tag="base", threshold=0.5)

Ensures incoming prompts match your allowed language list.

NSFW

NSFW(tag="base", threshold=0.5)

Screens explicit or harmful requests before they enter workflows.

PromptInjection

PromptInjection(tag="base", threshold=0.5)

Neutralizes adversarial jailbreak attempts embedded in prompts.

Toxicity

Toxicity(tag="base", threshold=0.5)

Detects hateful, harassing, or unsafe user inputs.

Output scanners

BanCode

BanCode(tag="base", threshold=0.5)

Strips executable code blocks before responses reach users.

BanCompetitors

BanCompetitors(competitors, tag="base", threshold=0.5, redact=False)

Removes mentions of competitor brands from generated output.

BanTopics

BanTopics(topics, tag="base", threshold=0.5, mode="blacklist")

Keeps restricted subject matter out of final responses.

Bias

Bias(tag="base", threshold=0.5)

Identifies biased or unfair statements before delivery.

Code

Code(languages, tag="base", threshold=0.5, is_blocked=True)

Flags unauthorized code languages within model responses.

FactualConsistency

FactualConsistency(tag="base", minimum_score=0.5)

Compares answers against sources to highlight hallucinations.

Gibberish

Gibberish(tag="base", threshold=0.5)

Prevents meaningless responses from reaching customers.

Language

Language(valid_languages, tag="base", threshold=0.5)

Enforces that output languages stay within approved options.

LanguageSame

LanguageSame(tag="base", threshold=0.5)

Verifies the reply matches the customer’s original language.

MaliciousURL

MaliciousURL(tag="base", threshold=0.5)

Scans for phishing or malicious links before they’re sent.

NoRefusal

NoRefusal(tag="base", threshold=0.5)

Catches unnecessary refusals so you can trigger fallbacks.

NSFW

NSFW(tag="base", threshold=0.5)

Blocks explicit or brand-unsafe completion content.

Toxicity

Toxicity(tag="base", threshold=0.5)

Removes hateful or abusive language before it leaves the agent.

BanCode

BanCode(tag="base", threshold=0.5)

Strips executable code blocks before responses reach users.

BanCompetitors

BanCompetitors(competitors, tag="base", threshold=0.5, redact=False)

Removes mentions of competitor brands from generated output.

BanTopics

BanTopics(topics, tag="base", threshold=0.5, mode="blacklist")

Keeps restricted subject matter out of final responses.

Bias

Bias(tag="base", threshold=0.5)

Identifies biased or unfair statements before delivery.

Code

Code(languages, tag="base", threshold=0.5, is_blocked=True)

Flags unauthorized code languages within model responses.

FactualConsistency

FactualConsistency(tag="base", minimum_score=0.5)

Compares answers against sources to highlight hallucinations.

Gibberish

Gibberish(tag="base", threshold=0.5)

Prevents meaningless responses from reaching customers.

Language

Language(valid_languages, tag="base", threshold=0.5)

Enforces that output languages stay within approved options.

LanguageSame

LanguageSame(tag="base", threshold=0.5)

Verifies the reply matches the customer’s original language.

MaliciousURL

MaliciousURL(tag="base", threshold=0.5)

Scans for phishing or malicious links before they’re sent.

NoRefusal

NoRefusal(tag="base", threshold=0.5)

Catches unnecessary refusals so you can trigger fallbacks.

NSFW

NSFW(tag="base", threshold=0.5)

Blocks explicit or brand-unsafe completion content.

Toxicity

Toxicity(tag="base", threshold=0.5)

Removes hateful or abusive language before it leaves the agent.

Why teams choose TestSavant

Guardrails that collaborate with your stack, not against it.

Every SDK surface is designed for fast approvals, low latency, and audit-ready evidence. Deploy guardrails once, enforce across every channel.

Unified Guardrails

Centralize policy rules across chat, agents, retrieval flows, and API tools. No more subtle drift between teams.

  • • Declarative policies with per-route overrides
  • • Works across OpenAI, Anthropic, Azure OpenAI, Vertex
  • • Integrates with eval or red-team pipelines

Defense-in-Depth

Detect prompt injection, refusals, brand safety violations, and more with layered scanners.

  • • Pre and post execution checkpoints
  • • Out-of-box scanners + custom scoring plug-ins
  • • Alerts via Slack, PagerDuty, or webhook

Evidence, Automated

Generate policy logs mapped to SOC2, ISO, and EU AI Act controls—straight from SDK usage.

  • • Structured JSON and CSV exports
  • • Redlines highlighted for compliance review
  • • Link into TestSavant Studio for audits

Enterprise-Ready

Role-based access, SSO, secrets management, and regional data residency baked in.

  • • Fine-grained API keys with rotation hooks
  • • Region pinning (US / EU / APAC)
  • • Single-tenant deployments on request

Observability Included

Stream guardrail outcomes to dashboards and incident tooling with zero extra code.

  • • Connectors for Datadog, Splunk, Snowflake
  • • Real-time dashboards with anomaly alerts
  • • Automatic retention policies

Test-Driven Guardrails

Replay production incidents, run eval suites, and auto-tune thresholds with Studio.

  • • Import red-team transcripts directly
  • • Version guardrails alongside feature flags
  • • Promote changes with confidence scores
Quick Start

Drop-in guardrails in under five minutes

Pick your runtime, add scanners, and start shipping safer prompts immediately.

Python · Input Scanners

from testsavant.guard import InputGuard
from testsavant.guard.input_scanners import PromptInjection, Toxicity

input_guard = InputGuard(api_key="YOUR_API_KEY", project_id="YOUR_PROJECT_ID")
input_guard.add_scanner(PromptInjection(tag="base", threshold=0.45))
input_guard.add_scanner(Toxicity(tag="brand", threshold=0.65))

prompt = "Write a short story about a friendly robot."
result = input_guard.scan(prompt)

if result.is_valid:
    print("Prompt is safe.")
else:
    print("Prompt failed guardrails:", result.results)
Integration walkthrough

Integrate the TestSavant SDK in four steps

Follow this path to wire guardrails into your stack and start enforcing policies the same afternoon.

Install the package

Add testsavant-sdk to your project via pip or npm and pull in the language helpers you need.

Configure your client

Drop in your API key and project ID, choose the policies to enforce, and set any environment-specific overrides.

Wrap your model calls

Use InputGuard and OutputGuard around prompts, completions, and tool responses to score every interaction.

Ship and monitor

Deploy with confidence, stream incidents to the console, and iterate on thresholds as your traffic patterns evolve.

Built for hybrid AI stacks

Route-aware guardrails without the glue code

Swap models, call tools, or fall back to retrieval without losing coverage. The SDK tracks risk posture end-to-end.

Input scanners

Shape inbound prompts before they reach your model, tool, or retrieval layer.

  • • Prompt injection & jailbreak blocks
  • • Personally identifiable information detection
  • • Domain restrictions & session fingerprints

Output scanners

Catch toxic, non-compliant, or low-quality completions before they reach an end user.

  • • Toxicity, hate, self-harm policy coverage
  • • Hallucination & factual consistency checks
  • • Sensitive code or credential leakage

Tooling & agents

Protect function calling, RAG, and multi-step agents with policy-aware guardrails.

  • • Tool permission filters
  • • Retrieval checks on embedded documents
  • • Session memory scrubbing & retention control

Fewer Incidents

Block prompt injection, policy evasions, and unsafe tool calls before they reach downstream systems.

  • • Catch risks before or after the model invocation
  • • Auto escalate to security or trust teams
  • • Reduce manual reviews by up to 60%

Cleaner Audits

Every decision is logged with the exact policy, signal, and remediation trail auditors expect.

  • • SOC2, ISO, HIPAA, EU AI Act friendly exports
  • • Evidence packs generated automatically
  • • Replay incidents alongside risk scores

Faster Delivery

Launch new prompts, tools, and agents with guardrails already validated through automated tests.

  • • Policy diffing and preview environments
  • • Git + CI integrations to block regressions
  • • Safe rollout toggles with instant rollback

FAQs

What is TestSavant SDK?

It’s a multi-language SDK that enforces guardrails around prompts, completions, function calls, and RAG pipelines. Policies are versioned with your codebase and approved once.

How do I install the SDK?

Install with pip install testsavant-sdk or npm install testsavant-sdk. Both ship with rich typings.

What risks can I scan for?

Prompt injection, toxicity, self-harm, PII, gibberish, hallucinations, policy breaches, credential leakage, source code exfiltration, and more. Add your own detectors via custom scanners.

Do I need an API key?

Yes. Create a project inside the TestSavant console to get API keys. Rotate them with the CLI, Terraform provider, or secret managers like Vault.

Will this handle production scale?

Absolutely. Teams rely on the SDK in high-volume support bots, search experiences, and agentic workflows with latency budgets under 15ms.

Put unified guardrails in front of every agent today

Safer inputs. Safer outputs. Consistent policy. Clear proof for every review.