BanCode
BanCode(tag="base", threshold=0.5)
Strips executable code blocks before responses reach users.
BanCompetitors
BanCompetitors(competitors, tag="base", threshold=0.5, redact=False)
Removes mentions of competitor brands from generated output.
BanTopics
BanTopics(topics, tag="base", threshold=0.5, mode="blacklist")
Keeps restricted subject matter out of final responses.
Bias
Bias(tag="base", threshold=0.5)
Identifies biased or unfair statements before delivery.
Code
Code(languages, tag="base", threshold=0.5, is_blocked=True)
Flags unauthorized code languages within model responses.
FactualConsistency
FactualConsistency(tag="base", minimum_score=0.5)
Compares answers against sources to highlight hallucinations.
Gibberish
Gibberish(tag="base", threshold=0.5)
Prevents meaningless responses from reaching customers.
Language
Language(valid_languages, tag="base", threshold=0.5)
Enforces that output languages stay within approved options.
LanguageSame
LanguageSame(tag="base", threshold=0.5)
Verifies the reply matches the customer’s original language.
MaliciousURL
MaliciousURL(tag="base", threshold=0.5)
Scans for phishing or malicious links before they’re sent.
NoRefusal
NoRefusal(tag="base", threshold=0.5)
Catches unnecessary refusals so you can trigger fallbacks.
NSFW
NSFW(tag="base", threshold=0.5)
Blocks explicit or brand-unsafe completion content.
Toxicity
Toxicity(tag="base", threshold=0.5)
Removes hateful or abusive language before it leaves the agent.
BanCode
BanCode(tag="base", threshold=0.5)
Strips executable code blocks before responses reach users.
BanCompetitors
BanCompetitors(competitors, tag="base", threshold=0.5, redact=False)
Removes mentions of competitor brands from generated output.
BanTopics
BanTopics(topics, tag="base", threshold=0.5, mode="blacklist")
Keeps restricted subject matter out of final responses.
Bias
Bias(tag="base", threshold=0.5)
Identifies biased or unfair statements before delivery.
Code
Code(languages, tag="base", threshold=0.5, is_blocked=True)
Flags unauthorized code languages within model responses.
FactualConsistency
FactualConsistency(tag="base", minimum_score=0.5)
Compares answers against sources to highlight hallucinations.
Gibberish
Gibberish(tag="base", threshold=0.5)
Prevents meaningless responses from reaching customers.
Language
Language(valid_languages, tag="base", threshold=0.5)
Enforces that output languages stay within approved options.
LanguageSame
LanguageSame(tag="base", threshold=0.5)
Verifies the reply matches the customer’s original language.
MaliciousURL
MaliciousURL(tag="base", threshold=0.5)
Scans for phishing or malicious links before they’re sent.
NoRefusal
NoRefusal(tag="base", threshold=0.5)
Catches unnecessary refusals so you can trigger fallbacks.
NSFW
NSFW(tag="base", threshold=0.5)
Blocks explicit or brand-unsafe completion content.
Toxicity
Toxicity(tag="base", threshold=0.5)
Removes hateful or abusive language before it leaves the agent.