Content Intelligence
Platform
A multi-tenant SaaS that monitors real-time industry news, detects high-impact events, and generates SEO-optimised content before competitors react. Built on Kubernetes with Claude AI at its core.
Why this platform exists
Most businesses know they should publish about industry news. They don't — because the workflow is broken:
News breaks at 9am
A reporter reads about it. Maybe.
Writer assigned at 11am
They research, draft, edit. Takes 3–4 hours.
Approval chain — next day
Manager reviews. Revisions. Legal check. More revisions.
Published 48+ hours later
Google has already indexed 50 competitor articles. The SEO window is closed.
The SEO advantage belongs to whoever publishes first with quality content. Every hour of delay is lost organic ranking opportunity. This platform compresses the window from 48+ hours to under 4 hours.
What the platform compresses
System Architecture
Four microservices communicate via Redis Streams — a durable, ordered message queue. No service calls another directly over HTTP. This means if one crashes, the others continue and no data is lost.
- Polls RSS feeds every 2 minutes
- Runs Gate 1 (zero-cost rules)
- Deduplicates by URL hash
- Quarantines broken sources
- Publishes to
news.filtered
- Consumes
news.filtered - Runs Gates 2, 3, 4
- Competitor gap analysis
- Evidence gathering (Serper)
- Publishes to
content.drafts
- Web dashboard for clients
- Email digest with approve/reject links
- Admin panel for operators
- JWT auth (cookie + Bearer)
- Publishes to
content.approved
- Consumes
content.approved - WordPress REST API (Markdown→HTML)
- Dev.to API (native Markdown)
- Retry with exponential backoff
- Records to
publicationstable
Redis Stream topology
Services never call each other's HTTP endpoints. Everything goes through Redis Streams. If intelligence-worker crashes, ingestion-worker keeps writing to the stream. When intelligence-worker restarts, it resumes exactly where it left off — no messages lost.
Why the 4-Gate Funnel Exists
Claude Sonnet costs ~$0.015 per content generation call. Without filtering, running every ingested article through Sonnet would cost ~$450/client/month. With the funnel: $2.55/client/month.
| Gate | Method | Volume | Cost/call | Monthly |
|---|---|---|---|---|
| Gate 1 — Rules | Pure Python | 30,000 articles | $0.00 | $0.00 |
| Gate 2 — Haiku | Claude Haiku (batched) | 3,000 articles | $0.0001 | $0.30 |
| Gate 3 — Signal | Postgres + pytrends | 600 articles | $0.00 | $0.00 |
| Gate 4 — Sonnet | Claude Sonnet | 150 articles | $0.015 | $2.25 |
| Total | ~$2.55 |
Each gate must be cheaper than the next. You never spend $0.015 on something that a $0.00 rule would have caught. The gates are deliberately ordered from cheapest to most expensive.
Gate 1 — Rules Engine
File: services/ingestion-worker/stages/gate1_rules.py
Zero-cost filter. Runs entirely in Python — no database queries, no API calls, no network. Eliminates ~90% of articles before any AI is touched. Six rules run in order; the first failure short-circuits (fast reject).
Drop articles with fewer than 50 chars of title+summary. These are usually empty RSS entries, tracking pixels, or fetcher errors with no useful content.
Each source has a trust_score (0.0–1.0) set by the admin. Sources below 0.4 are dropped. Prevents spam aggregators from polluting the pipeline.
Drop articles older than 48 hours (configurable). Old news generates low-value content and hurts SEO freshness signals. Compares published_at against a UTC cutoff.
If the article matches any of the client's excluded topics, drop it — even if a keyword also matches. Example: a mortgage broker has "security" as a keyword but "gaming" excluded. "Gaming security" gets dropped.
Breaking news bypasses the keyword check entirely. If the title contains "breaking", "emergency", etc., it passes Gate 1 regardless of keyword match. Real-time events shouldn't wait for keyword list tuning.
The article must contain at least one of the client's configured keywords. Case-insensitive substring search. The last and most expensive check — only reached if all previous rules passed.
The actual code
class Gate1Rules:
def __init__(self, min_content_length=50, source_trust_min=0.4,
max_age_hours=48, urgency_keywords=None):
# Pre-compute a lowercase set for O(1) membership checks in the hot path.
# A set lookup is O(1) vs O(n) for a list — matters when called 1000x/day.
self.urgency_keywords = set(k.lower() for k in (urgency_keywords or []))
def check(self, item, keywords, excluded):
text = f"{item.get('title', '')} {item.get('summary', '')}".strip()
# Rule 1: minimum content length
if len(text) < self.min_content_length:
return False, "too_short"
# Rule 2: source trust score — fail-open (default 1.0 if missing)
if item.get("trust_score", 1.0) < self.source_trust_min:
return False, "low_trust_source"
# Rule 3: recency check
pub = item.get("published_at")
if pub:
if pub.tzinfo is None: # feedparser returns naive datetimes
pub = pub.replace(tzinfo=timezone.utc)
cutoff = datetime.now(timezone.utc) - timedelta(hours=self.max_age_hours)
if pub < cutoff:
return False, "stale"
text_lower = text.lower() # compute once, use below
# Rule 4: hard exclusions (checked BEFORE keyword match)
for exc in (excluded or []):
if exc.lower() in text_lower:
return False, f"excluded:{exc}"
# Rule 5: urgency override — breaking news bypasses keyword check
if self.urgency_keywords and any(kw in text_lower for kw in self.urgency_keywords):
return True, "urgency_override"
# Rule 6: keyword match — the core relevance gate
if not any(kw.lower() in text_lower for kw in (keywords or [])):
return False, "no_keyword_match"
return True, "passed"
Length and trust checks are placed first because they require zero string operations on the article text. Exclusions run before keywords so that a forbidden topic can't slip through on a keyword match. Urgency override is placed after exclusions — even breaking news gets dropped if it matches an exclusion.
Gate 2 — Haiku Relevance Scoring
File: services/intelligence-worker/stages/gate2_relevance.py
Uses Claude Haiku (the cheapest Anthropic model) to score article relevance on a 0–100 scale. Three cost optimisations make this viable at scale: batching, prompt caching, and minimal input.
Three cost optimisations
8 articles per API call. Instead of 100 calls for 100 articles, you make 13 calls. The model scores all 8 articles in one response using index-based JSON.
The system prompt (client profile) is identical across all batches in a run. Anthropic caches it after the first call. ~80% cost saving on subsequent calls.
Only title + first 200 chars of summary are sent to the model. Not the full article — just enough context for relevance scoring.
Each client can use a different Gate 2 model (GPT-4o-mini, Gemini Flash, DeepSeek). The provider abstraction makes this transparent — same interface regardless of provider.
How batching works
BATCH_SIZE = 8 # 8 articles per API call (sweet spot — larger batches confuse indexing)
MIN_SCORE = 60 # articles below this score are dropped
def _score_batch(provider, articles, client_profile, prompt_template):
# Build the batch text — each article gets an index number.
# Haiku uses these indexes in its JSON response: {"scores": [{"index": 0, "score": 78, ...}]}
articles_text = "\n---\n".join([
f"[{i}] TITLE: {a['title']}\nSUMMARY: {(a.get('summary') or '')[:200]}"
for i, a in enumerate(articles)
])
# The system prompt contains the CLIENT'S PROFILE — same for all batches in a run.
# Caching this is the key cost saving.
system_prompt = prompt_template.format(
industry_type = client_profile.get("industry_type", "general"),
target_geo = ", ".join(client_profile.get("target_geo") or ["global"]),
keywords = ", ".join(client_profile.get("keywords") or []),
excluded_topics = ", ".join(client_profile.get("excluded_topics") or []),
)
# cache_system=True adds cache_control to the system prompt.
# After the first API call, Anthropic serves the system prompt from cache.
response = provider.complete(
system=system_prompt,
user=f'Score each article 0-100. Return JSON: {{"scores": [{{"index": 0, "score": N, "matched_keywords": []}}]}}\n\n{articles_text}',
max_tokens=256,
cache_system=True, # ← the magic flag
)
result = json.loads(response.text)
scores = {s["index"]: s for s in result.get("scores", [])}
# Filter: keep only articles above MIN_SCORE threshold
passed = []
for i, article in enumerate(articles):
score = scores.get(i, {}).get("score", 0)
if score >= MIN_SCORE:
article["relevance_score"] = score
article["matched_keywords"] = scores[i].get("matched_keywords", [])
passed.append(article)
return passed
def run_gate2(provider, articles, client_profile, prompt_template):
passed = []
# Iterate in steps of BATCH_SIZE: 0, 8, 16, 24, ...
for i in range(0, len(articles), BATCH_SIZE):
batch = articles[i:i + BATCH_SIZE]
passed.extend(_score_batch(provider, batch, client_profile, prompt_template))
return passed
Without batching: 100 calls × full prompt = high cost.
With batching: 13 calls × (system prompt cached after call 1) = ~80% saving on 12 of those 13 calls.
Result: Gate 2 costs ~$0.30/month per client — pennies.
Gate 3 — Signal Detection
File: services/intelligence-worker/stages/signal_detection.py
Zero-cost filter that combines two independent signals. Both are free to compute. An article needs a high combined score before we spend $0.015 on Sonnet generation.
Signal 1: Source Spread (our own data)
Counts how many distinct sources covered the same topic in the last 2 hours. Uses PostgreSQL full-text search on our own news_items table — zero external calls.
def _source_spread(conn, title: str, hours: int = 2) -> int:
# Extract 5 meaningful words from the title for full-text matching.
# Filter out short stop-words ("the", "and", "for") — they match everything.
words = [w.strip(".,!?\"'") for w in title.split() if len(w) > 3][:5]
# OR query: article matches if ANY keyword appears (broad catch)
tsquery = " | ".join(words)
with conn.cursor() as cur:
cur.execute("""
SELECT COUNT(DISTINCT source_id)
FROM news_items
WHERE to_tsvector('english', title) @@ to_tsquery('english', %s)
AND published_at > NOW() - INTERVAL '%s hours'
""", (tsquery, hours))
return cur.fetchone()[0] or 1
# Map raw count to a 0–100 score
def _spread_score(count: int) -> int:
if count >= 5: return 100 # 5+ sources = definitely breaking
if count >= 3: return 60 # trending across outlets
if count >= 2: return 40 # gaining traction
return 20 # isolated report
Signal 2: Google Trends SEO Opportunity
Queries Google Trends for search interest on the client's matched keywords, geo-filtered to their location. Redis-cached for 24 hours — same keyword pair = one API call.
Combined Score Formula
trend_score = (spread_score * 0.6) + (seo_opportunity * 0.4)
# 60/40 weighting: spread is more reliable (our own data)
# Trends complements with demand-side intent but can be rate-limited
# Urgency detection overrides the score threshold entirely:
# - "breaking" / "emergency" in title → urgency = "breaking" → Gate 3 bypassed
# - 3+ sources covering topic → urgency = "high" → Gate 3 bypassed
If Google Trends is unavailable, seo_opportunity defaults to 50 (neutral). A Trends outage never blocks content generation. This is called "fail-open" design — the default is to continue, not to stop.
Gate 4 — Sonnet Content Generation
File: services/intelligence-worker/stages/gate4_generation.py
The most expensive step and the core product. Claude Sonnet generates 6 content formats in one API call, selecting the best angle from 8 options based on the news type, competitor gaps, and client voice.
What one Gate 4 call produces
| Output | Description |
|---|---|
blog.title | SEO H1 — question format, keyword in first 8 words, year suffix |
blog.slug | URL slug derived from title |
blog.meta_description | 150–160 chars for Google SERPs |
blog.body_markdown | 1,200–3,500 word article with mandatory section structure |
blog.faq_schema | Exactly 5 Q&A pairs for Google FAQ rich results |
linkedin_post | Platform-optimised, shorter format |
twitter_thread | Array of tweets (ready for Twitter/X API) |
newsletter_snippet | 2–3 sentences for Facebook / email |
selected_angle | Which of the 8 angles Claude chose (stored for analytics) |
Geo-skip logic
Before generating, the model checks if the news is actually relevant to the client's geography. If not, it outputs a skip signal instead of a draft — saving $0.015 and preventing irrelevant content.
# If Claude decides the news is irrelevant to the client's geo:
{"selected_angle": "skip", "reason": "geo_not_impacted"}
# If it decides to generate:
{
"selected_angle": "local_impact",
"blog": {
"title": "Why should Australian mortgage brokers reconsider fixed rates in 2026?",
"slug": "australian-mortgage-brokers-fixed-rates-2026",
"meta_description": "The RBA's latest decision changes the fixed vs variable ...",
"body_markdown": "...(full article 1200-3500 words)...",
"keywords": ["mortgage broker", "fixed rate", "RBA 2026"],
"faq_schema": [{"question": "...", "answer": "..."}, ...]
},
"linkedin_post": "...",
"twitter_thread": ["tweet 1", "tweet 2", ...],
"newsletter_snippet": "..."
}
Mandatory article structure
Every generated article must contain these sections in order. Claude is explicitly instructed to follow this structure — it's part of the Gate 4 system prompt stored in values.yaml.
sections = [
"Quick Answer (40–60 words, no heading, standalone prose)",
"What You Will Learn (4–6 bullets)",
"What Is [Topic]? (80–120 words)",
"Why Does [Problem] Happen? (100–150 words, 4–6 bullets)",
"At-a-Glance Summary (Markdown table, 5–8 rows)",
"How to [Solve It] (200–300 words, numbered H3 steps)",
"What Happens If You Ignore This? (80–120 words, 3–5 bullets)",
"", # Pexels image placeholder
"Common Mistakes to Avoid (table: Mistake | Why | What to Do Instead)",
"Expert Tips (100–150 words, ≥2 tips with measurable checks)",
"", # second image
"Frequently Asked Questions (exactly 5 FAQs)",
"Key Takeaways (60–80 words, 4–5 bullets)",
"References (3–5 entries as [Title](URL))",
]
Ingestion Worker
File: services/ingestion-worker/main.py
Runs continuously as a Kubernetes Deployment (not a CronJob — it needs sub-minute responsiveness). Every 2 minutes it polls all active RSS sources for all clients and runs Gate 1.
The poll loop
async def poll_loop():
while True:
# Fetch all active, non-quarantined sources from Postgres
sources = get_active_sources(conn)
for source in sources:
articles = fetch_rss(source["feed_url"]) # parse RSS/Atom feed
for article in articles:
url_hash = sha256(normalize_url(article["url"])).hexdigest()
# Deduplication: skip if we've already processed this URL
if url_hash in seen_hashes:
continue
# Gate 1: zero-cost rules filter (per-client)
for client in source["clients"]:
passes, reason = gate1.check(article, client["keywords"], client["excluded"])
if passes:
# Publish to Redis Stream for intelligence-worker to consume
redis.xadd("news.filtered", {
"article_id": article_id,
"client_id": client["id"],
"reason": reason,
})
await asyncio.sleep(POLL_INTERVAL_SECONDS) # default: 120s
Source quarantine system
Every RSS source is tracked for consecutive failures. After 3 failures, it's quarantined with exponential backoff. The system automatically tries to find a replacement feed.
| Quarantine # | Duration | Recovery |
|---|---|---|
| 1st time | 6 hours | Auto-retry after expiry |
| 2nd time | 12 hours | Auto-retry after expiry |
| 3rd time | 24 hours | Auto-retry after expiry |
| 4th time | 48 hours | Auto-retry after expiry |
| 5th time | 96 hours | Auto-retry after expiry |
| 6th+ | 168 hours (7 days) | Manual restore from admin |
3-tier replacement feed discovery
When a source is quarantined, the system immediately searches for a replacement — no admin intervention required.
Scrapes the dead source's homepage for <link rel="alternate"> RSS tags. Also probes common paths: /feed, /rss, /rss.xml, /atom.xml.
Searches news.google.com/rss/search?q={source_name}+{industry}. Extracts publisher domains from results, probes top 8 for native RSS feeds. No API key needed.
Falls back to the default_sources DB table — curated industry sources not already assigned to this client. Always available, always a working feed.
URL deduplication
def normalize_url(url: str) -> str:
"""Strip UTM/tracking params so the same article isn't processed twice
if it appears with different tracking params in different RSS feeds."""
from urllib.parse import urlparse, urlencode, parse_qsl
parsed = urlparse(url)
# Keep only non-tracking query params (strip utm_*, fbclid, etc.)
clean_params = [(k, v) for k, v in parse_qsl(parsed.query)
if not k.startswith(("utm_", "fbclid", "gclid", "ref"))]
return parsed._replace(query=urlencode(clean_params)).geturl()
# URL hash is stored in news_items table — SHA256 of the normalized URL
url_hash = hashlib.sha256(normalize_url(article["url"]).encode()).hexdigest()
Intelligence Worker
File: services/intelligence-worker/main.py
The brain of the platform. Consumes the news.filtered Redis Stream and orchestrates the full Gates 2–4 pipeline for each article. Uses Redis consumer groups so no message is ever processed twice — even if the worker crashes and restarts mid-batch.
Pipeline orchestration
async def process_message(msg, client_id, article):
client = get_client_profile(client_id)
# 1. Gate 2 — Haiku relevance scoring (batched, prompt-cached)
relevant = run_gate2(provider, [article], client, PROMPT_RELEVANCE)
if not relevant:
ack(msg); return
# 2. Gate 3 — Signal detection (spread + Google Trends)
signal = detect_signal(conn, article["title"], client["target_geo"])
if signal.trend_score < GATE3_MIN_TREND_SCORE and signal.urgency == "normal":
ack(msg); return
# 3. Competitor analysis — what angles have competitors taken?
comp = analyze_competitors(conn, article, client)
# comp.avoid_angles = ["local_impact", "action_list"]
# comp.trend_score_boost = 15 (first-mover bonus)
# 4. Evidence pipeline
enrichment = quick_enrich(article, client["keywords"]) # Tier 1: Serper
evidence = gather_evidence(article, enrichment, llm_haiku) # Tier 2: deep pack
# 5. Topic clustering — find related published articles for internal links
cluster = get_cluster_links(conn, article, client_id)
# 6. Gate 4 — Sonnet generation
draft = run_gate4(
provider=llm_sonnet,
article=article,
client=client,
comp_analysis=comp,
evidence_pack=evidence,
cluster_links=cluster,
)
if draft.get("selected_angle") == "skip":
ack(msg); return # geo_not_impacted — skip silently
# 7. Save to Postgres, publish to content.drafts stream
save_draft(conn, draft, client_id)
redis.xadd("content.drafts", {"draft_id": draft["id"], "client_id": client_id})
ack(msg) # ← critical: only ack AFTER successful save
With consumer groups, Redis tracks which messages have been acknowledged (ACK'd). If the worker crashes between processing and ACK'ing, Redis re-delivers the message when the worker restarts. No message is ever permanently lost — the pipeline is crash-safe.
Draft limits per plan
Before Gate 4, the worker checks the client's daily and weekly draft limits (from PLAN_LIMITS_JSON in the ConfigMap). This prevents the pipeline from generating more content than the client can review.
Approval Service
File: services/approval-service/main.py
FastAPI web application that serves the client dashboard, admin panel, and email approval workflow. Clients never see raw AI output — everything goes through human approval first.
The approval workflow
Daily digest email
A CronJob triggers /send-digest each morning. For each client with pending drafts, an email is sent with approve/reject/edit links for each draft.
HMAC-signed links (no login required)
Each approve/reject/edit link contains an HMAC-SHA256 token. Clients can approve content from their email inbox without logging in. Links expire after 7 days.
Optional editing
The edit link opens a tabbed editor: blog post (with character counters), LinkedIn post, Twitter thread. Clients can tweak the AI output before publishing.
Publishes to Redis Stream
On approval, the service writes to content.approved. Publisher-worker picks this up and distributes to WordPress, Dev.to, etc.
HMAC token format
REVIEW_SECRET = os.environ["REVIEW_SECRET"] # from K8s Secret
def create_review_token(draft_id: str, action: str) -> str:
"""Create a signed URL token. Format: {token}:{action}:{expiry}"""
expiry = int(time.time()) + 7 * 24 * 3600 # 7 days from now
payload = f"{draft_id}:{action}:{expiry}"
sig = hmac.new(REVIEW_SECRET.encode(), payload.encode(), hashlib.sha256).hexdigest()
return f"{sig}:{action}:{expiry}"
def verify_review_token(token: str, draft_id: str) -> tuple[bool, str]:
"""Verify a token from an email link. Returns (valid, action)."""
try:
sig, action, expiry = token.split(":")
if int(expiry) < time.time():
return False, "" # expired
payload = f"{draft_id}:{action}:{expiry}"
expected = hmac.new(REVIEW_SECRET.encode(), payload.encode(), hashlib.sha256).hexdigest()
# Constant-time comparison — prevents timing attacks
if not hmac.compare_digest(sig, expected):
return False, "" # tampered
return True, action
except Exception:
return False, ""
Publisher Worker
File: services/publisher-worker/main.py
Consumes content.approved and distributes to all publishing platforms the client has configured. WordPress always publishes first — its URL becomes the canonical URL for all subsequent platforms.
Retry logic
MAX_RETRIES = 3
RETRY_DELAYS = [5, 15, 30] # seconds — exponential-ish backoff
for attempt in range(MAX_RETRIES):
try:
result = publisher.publish(draft, config)
# Record success in publications table
record_publication(conn, draft_id, platform, "published", result["url"])
break
except Exception as e:
if attempt == MAX_RETRIES - 1:
# All retries exhausted — send to dead-letter queue
redis.xadd("content.failed", {"draft_id": draft_id, "error": str(e)})
record_publication(conn, draft_id, platform, "failed", error=str(e))
else:
time.sleep(RETRY_DELAYS[attempt])
WordPress publisher — Markdown to HTML
def publish(self, draft: dict, config: dict) -> dict:
blog = draft["blog"]
body = blog["body_markdown"]
# Strip the FAQ section from body — it's added separately as structured HTML
# to prevent duplicates (once inline, once as schema markup at the bottom).
body_without_faq = strip_faq_section(body)
# Convert Markdown to HTML (using markdown library)
html_body = markdown.markdown(
body_without_faq,
extensions=["fenced_code", "tables", "nl2br"]
)
# Append FAQ as structured HTML (better for Google FAQ rich results)
if blog.get("faq_schema"):
html_body += build_faq_html(blog["faq_schema"])
# WordPress REST API call
response = requests.post(
f"{config['site_url']}/wp-json/wp/v2/posts",
auth=(config["username"], config["app_password"]), # Application Passwords
json={
"title": blog["title"],
"slug": blog["slug"],
"content": html_body,
"excerpt": blog.get("meta_description", ""),
"status": "publish",
"categories": resolve_categories(config),
}
)
return {"url": response.json()["link"]}
Redis Streams — The Message Bus
Redis Streams are the backbone of inter-service communication. They're more than a pub/sub queue — they're a durable, ordered, consumer-group-aware log of events.
Why Streams instead of HTTP?
| Property | Direct HTTP calls | Redis Streams |
|---|---|---|
| Crash safety | ❌ Request lost if receiver is down | ✅ Message waits until consumer is ready |
| At-least-once delivery | ❌ Manual retry logic needed | ✅ Built-in — unacked messages re-delivered |
| Decoupling | ❌ Sender must know receiver's address | ✅ Services only know the stream name |
| Backpressure | ❌ Fast sender overwhelms slow receiver | ✅ Slow consumer naturally applies backpressure |
| Audit trail | ❌ No built-in history | ✅ Stream is an ordered log (inspectable) |
Consumer groups explained
# Create consumer group (run once at startup)
redis.xgroup_create("news.filtered", "intelligence-workers", id="0", mkstream=True)
# Read NEW messages (> means "messages after my last position")
messages = redis.xreadgroup(
groupname="intelligence-workers",
consumername="worker-pod-1",
streams={"news.filtered": ">"},
count=10,
block=5000, # block for up to 5 seconds waiting for new messages
)
# Process each message...
for stream_name, msg_list in (messages or []):
for msg_id, fields in msg_list:
try:
process(fields)
# ACK only AFTER successful processing
# If this line is never reached (crash), Redis re-delivers the message
redis.xack("news.filtered", "intelligence-workers", msg_id)
except Exception as e:
# Don't ACK on failure — message will be re-delivered
log.error("Processing failed: %s", e)
When a message is delivered but not yet ACK'd, Redis holds it in the Pending Entries List. If the worker crashes, these messages stay in PEL and are re-delivered when the worker restarts. This is how the pipeline survives pod crashes with zero data loss.
Inspecting streams from kubectl
# How many messages are waiting in the pipeline?
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen news.filtered
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen content.drafts
# How many messages are in the dead-letter queue?
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen content.failed
# Inspect last 5 messages in a stream
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xrevrange news.filtered + - COUNT 5
Multi-Provider AI Architecture
File: services/intelligence-worker/providers/
Every AI call goes through a provider abstraction layer. The rest of the codebase calls provider.complete(system, user) — it never knows or cares which underlying model or API it's using.
The provider interface
from abc import ABC, abstractmethod
from dataclasses import dataclass
@dataclass
class LLMResponse:
text: str
input_tokens: int
output_tokens: int
cache_read_tokens: int = 0 # Anthropic prompt caching
cache_write_tokens: int = 0
class LLMProvider(ABC):
@property
@abstractmethod
def model_id(self) -> str: ...
@abstractmethod
def complete(
self,
system: str,
user: str,
max_tokens: int = 512,
cache_system: bool = False, # Anthropic-specific, ignored by others
) -> LLMResponse: ...
Provider factory — model name → provider instance
def get_provider(model: str) -> LLMProvider:
"""Return the correct provider instance based on model name prefix."""
api_key_map = {
"claude-": ("ANTHROPIC_API_KEY", AnthropicProvider),
"gpt-": ("OPENAI_API_KEY", OpenAIProvider),
"o1-": ("OPENAI_API_KEY", OpenAIProvider),
"o3-": ("OPENAI_API_KEY", OpenAIProvider),
"gemini-": ("GOOGLE_API_KEY", GoogleProvider),
"deepseek-": ("DEEPSEEK_API_KEY", DeepSeekProvider),
}
for prefix, (env_var, ProviderClass) in api_key_map.items():
if model.startswith(prefix):
api_key = os.environ.get(env_var)
if not api_key:
raise EnvironmentError(f"{env_var} not set for model {model!r}")
return ProviderClass(api_key=api_key, model=model)
raise ValueError(f"Unknown model: {model!r}")
Per-client model override
Each client row has gate2_model and gate4_model columns. If set, they override the platform defaults. If their configured API key isn't set, the platform silently falls back to Anthropic — preventing one misconfigured client from breaking the entire pipeline.
def get_client_provider(client, gate: str) -> LLMProvider:
"""Get the provider for a client, with fallback to platform default."""
model_field = f"gate{gate}_model"
client_model = client.get(model_field)
if client_model:
try:
return get_provider(client_model)
except EnvironmentError:
# API key not set — fall back silently, log warning
log.warning("Client %s: %s API key missing, falling back to Anthropic",
client["id"][:8], client_model)
# Platform default from ConfigMap
default = RELEVANCE_MODEL if gate == "2" else GENERATION_MODEL
return get_provider(default)
| Provider | Models | Env var required |
|---|---|---|
| Anthropic (default) | claude-haiku-4-5, claude-sonnet-4-6 | ANTHROPIC_API_KEY |
| OpenAI | gpt-4o, gpt-4o-mini, o1-*, o3-* | OPENAI_API_KEY |
gemini-2.0-flash, etc. | GOOGLE_API_KEY | |
| DeepSeek | deepseek-chat, deepseek-reasoner | DEEPSEEK_API_KEY |
Prompt Caching — 80% Cost Saving
Anthropic's prompt caching lets you mark a portion of the prompt as "cache this". On subsequent API calls with the same cached prefix, Anthropic serves it from cache at ~10% of the normal input token cost.
How the AnthropicProvider implements it
def complete(self, system: str, user: str, max_tokens: int = 512,
cache_system: bool = False) -> LLMResponse:
# Without caching: system is just a string
# With caching: wrap it in a block with cache_control
if cache_system:
system_block = [{
"type": "text",
"text": system,
"cache_control": {"type": "ephemeral"}, # ← this is the magic
}]
else:
system_block = system # plain string — no caching
response = self._client.messages.create(
model=self._model,
max_tokens=max_tokens,
system=system_block,
messages=[{"role": "user", "content": user}],
)
return LLMResponse(
text=response.content[0].text,
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
# These fields tell you how the caching is performing:
cache_read_tokens = getattr(response.usage, "cache_read_input_tokens", 0) or 0,
cache_write_tokens = getattr(response.usage, "cache_creation_input_tokens", 0) or 0,
)
Why it saves ~80% for Gate 2
# Batch 1 — system prompt is WRITTEN to cache (full price for system tokens)
# cache_write_tokens = 800 (the client profile system prompt)
# input_tokens = 800 + 1200 (system + 8 article titles)
# Batch 2–13 — system prompt is READ from cache (10% of normal price)
# cache_read_tokens = 800 (same system prompt, served from cache)
# input_tokens = 1200 (only the 8 article titles — system not billed at full rate)
# Net saving: 12 batches × 800 tokens × 90% discount = 8,640 tokens saved
# At $0.00025/1K input tokens (Haiku): saves ~$0.002/run/client
# Across 365 days: ~$0.73/year/client from caching alone
Only when the system prompt is identical across multiple calls in the same session. Gate 2 qualifies perfectly — same client profile repeated across 13 batches. Gate 4 doesn't cache its system prompt because it varies per article (different evidence pack, different competitor context).
Auth & Security
File: services/approval-service/auth.py
The auth system has zero external dependencies — no PyJWT, no authlib. Everything is implemented with Python's standard library. This keeps the container image lean and eliminates supply-chain risk from auth libraries.
JWT implementation from scratch
# JWT format: base64url(header) . base64url(payload) . base64url(signature)
def _b64(data: bytes) -> str:
# URL-safe base64 with "=" padding stripped (JWT spec requires unpadded)
return urlsafe_b64encode(data).rstrip(b"=").decode()
def create_jwt(client_id: str, email: str) -> str:
header = _b64(json.dumps({"alg": "HS256", "typ": "JWT"}).encode())
payload = _b64(json.dumps({
"client_id": client_id,
"email": email,
"exp": int(time.time()) + JWT_EXPIRY_MINS * 60,
"iat": int(time.time()),
}).encode())
signing_input = f"{header}.{payload}"
sig = _b64(hmac.new(
JWT_SECRET.encode(),
signing_input.encode(),
hashlib.sha256,
).digest())
return f"{signing_input}.{sig}"
def decode_jwt(token: str) -> Optional[dict]:
try:
header, payload, sig = token.split(".")
signing_input = f"{header}.{payload}"
expected = _b64(hmac.new(JWT_SECRET.encode(), signing_input.encode(), hashlib.sha256).digest())
# Constant-time comparison — prevents timing side-channel attacks
if not hmac.compare_digest(sig, expected):
return None # tampered token
data = json.loads(_unb64(payload))
if data.get("exp", 0) < time.time():
return None # expired
return data
except Exception:
return None # never raises — bad tokens always return None
Password hashing
def hash_password(password: str) -> str:
"""Hash a password for storage. Format: {hex_salt}:{hex_hash}"""
salt = secrets.token_hex(16) # 16 bytes of cryptographic randomness
h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt.encode(), 260_000)
return f"{salt}:{h.hex()}"
def _verify_password(password: str, stored_hash: str) -> bool:
salt, hex_hash = stored_hash.split(":", 1)
h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt.encode(), 260_000)
return hmac.compare_digest(h.hex(), hex_hash) # constant-time
# 260,000 iterations: ~100ms on a modern CPU.
# This means an attacker can only try ~10 passwords/second per core.
# bcrypt would also work — PBKDF2 is chosen because it's in Python stdlib (no dependency).
Timing attack prevention
_DUMMY_HASH = "dummy:000000000000000000000000000000000000000000000000000000000000000"
def _dummy_hash_check(password: str) -> None:
"""Run a full PBKDF2 computation even for non-existent users.
Without this: login for unknown@email.com returns in 1ms (DB miss).
login for real@email.com returns in 100ms (hash computed).
An attacker measures the difference to discover which emails are registered.
With this: both paths take ~100ms regardless. Side channel eliminated.
"""
hashlib.pbkdf2_hmac("sha256", password.encode(), b"dummy", 260_000)
# Result is discarded — we only run this for its timing effect
Cookie security
response.set_cookie(
key = "ci_session",
value = token,
httponly = True, # JS cannot read this cookie — protects against XSS token theft
secure = True, # browser only sends it over HTTPS — prevents network sniffing
samesite = "lax", # sent on top-level same-site navigations — CSRF protection
max_age = JWT_EXPIRY_MINS * 60,
)
Security summary
| Threat | Mitigation |
|---|---|
| XSS token theft | httponly=True on JWT cookie |
| Network sniffing | secure=True — HTTPS only |
| CSRF | samesite="lax" + form tokens |
| Timing attacks (login) | Dummy PBKDF2 hash for missing users |
| Timing attacks (comparison) | hmac.compare_digest everywhere |
| Password cracking | PBKDF2-SHA256, 260k iterations, per-user salt |
| Forged approval links | HMAC-SHA256 signed, 7-day expiry |
| Cross-tenant data leak | client_id from JWT only — never from request body |
Multi-tenancy — Tenant Isolation
Every table with client data has a client_id UUID column. The critical rule: client_id is sourced from the signed JWT only. It is never trusted from the request body.
The middleware pattern
# Every protected route extracts client_id from the JWT:
@router.get("/dashboard")
def dashboard(request: Request):
client = require_client(request) # decodes JWT, returns payload dict
client_id = client["client_id"] # ← from JWT signature, not request params
# All DB queries are scoped to this client_id
drafts = get_drafts(conn, client_id) # SELECT ... WHERE client_id = %s
return render("dashboard.html", drafts)
# What an attacker CANNOT do:
# GET /dashboard?client_id= → client_id from URL is IGNORED
# POST /approve with body {"client_id": "..."} → body client_id is IGNORED
# The client_id is read exclusively from the signed cookie/Bearer JWT.
Database schema (tenant isolation)
-- One row per tenant
CREATE TABLE clients (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
business_name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
industry_type TEXT,
target_geo JSONB,
keywords JSONB,
active BOOLEAN DEFAULT TRUE
);
-- All tenant-scoped tables have client_id FK
CREATE TABLE content_drafts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
title TEXT,
body_markdown TEXT,
status TEXT DEFAULT 'pending', -- pending / approved / rejected / published
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- news_items is SHARED across all clients (Gate 1 runs once per article)
-- client_relevance maps articles to clients (Gate 2 runs per-client)
CREATE TABLE client_relevance (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
news_item_id UUID NOT NULL REFERENCES news_items(id),
relevance_score INT,
processed BOOLEAN DEFAULT FALSE
);
One bug that trusts client_id from the request body instead of the JWT could leak every tenant's data to any other tenant. The enforcement is at the application layer — there's no database-level row security (yet). Every developer must follow the pattern: always get client_id from the decoded JWT.
Evidence Pipeline
Before Gate 4 runs, the platform searches for real supporting evidence so Claude can cite actual sources and verified figures — not hedged vague claims.
Tier 1 — Quick Serper enrichment
A short Serper API search on {article title} + {top 3 client keywords} returns 5 snippets. Injected into Gate 4 as lightweight context. Degrades to [] if SERPER_API_KEY is not set.
def quick_enrich(article: dict, keywords: list[str]) -> list[dict]:
"""Fetch 5 Serper snippets for evidence context. Degrades gracefully."""
api_key = os.environ.get("SERPER_API_KEY")
if not api_key:
return [] # feature disabled — Gate 4 runs without enrichment
query = f"{article['title']} {' '.join(keywords[:3])}"
try:
resp = requests.post(
"https://google.serper.dev/search",
headers={"X-API-KEY": api_key},
json={"q": query, "num": 5},
timeout=10,
)
results = resp.json().get("organic", [])
return [{"title": r["title"], "snippet": r["snippet"],
"url": r["link"], "source": r.get("displayLink")}
for r in results]
except Exception:
return [] # any error → degrade gracefully, never block Gate 4
Tier 2 — Deep Haiku evidence gathering
When enabled, Haiku runs 5–10 targeted searches, fetches and strips HTML from source pages, then classifies each source and extracts claims with confidence levels.
# Haiku produces a structured evidence pack:
evidence_pack = {
"verified_claims": [
{
"claim": "The RBA raised rates by 25bps to 4.35%",
"source": "RBA official statement",
"confidence": "high",
"safe_phrasing": "According to the RBA's official statement...",
}
],
"claims_to_avoid": [
{
"claim": "Rates will fall by end of 2024",
"reason": "Prediction without verifiable source"
}
],
"recommended_references": [
{"title": "RBA Rate Decision — May 2026", "url": "https://rba.gov.au/..."}
],
"source_classifications": [
{"url": "...", "type": "government_or_regulator", "allowed_to_use": True}
]
}
# This entire pack is injected into the Gate 4 system prompt.
# Gate 4 is instructed: "Only use statistics and dates from VERIFIED CLAIMS.
# Never use anything in CLAIMS TO AVOID."
Content Angles — The Core Differentiator
Competitors rewrite news. This platform generates opinionated, differentiated content. Claude selects the best angle for each article+client combination from 8 options — and avoids angles that competitors have already taken.
Competitor angle avoidance
# Competitor angles are inferred from title patterns — no AI cost
ANGLE_PATTERNS = {
"local_impact": [r"what .+ means for .+", r"how .+ affects .+"],
"action_list": [r"\d+ things? (to|you should)", r"what (to do|businesses should)"],
"contrarian": [r"why .+ (is|might be) wrong", r"the truth about"],
"faq_explainer": [r"everything you need to know", r"what is .+ and why"],
"expert_commentary": [r"why .+ matters", r"what .+ means for the industry"],
}
def infer_competitor_angle(title: str) -> Optional[str]:
title_lower = title.lower()
for angle, patterns in ANGLE_PATTERNS.items():
if any(re.search(p, title_lower) for p in patterns):
return angle
return None
# Result: competitor analysis returns avoid_angles = ["local_impact", "faq_explainer"]
# These are injected into the Gate 4 prompt:
# "DO NOT use local_impact — Competitor X already published that angle.
# DO NOT use faq_explainer — Competitor Y already published that angle.
# Choose a different angle that provides unique value."
H1 title rules (hardcoded in the prompt)
H1_RULES = """
H1 TITLE RULES (mandatory):
- Must be a question format: How / Why / What / When / Should / Can / Is
- Primary keyword must appear within the first 8 words
- Include "2026" for informational, how-to, FAQ, and local SEO articles
FORBIDDEN TITLE PATTERNS (never use these):
- "5 things", "5 steps", "10 ways" (numbered lists)
- "Understanding [topic]"
- "Everything you need to know about"
- "changes everything" / "ultimate guide"
- "What X needs to know" (where X = the reader's role)
"""
# Example of good titles:
# "Why should Australian mortgage brokers rethink fixed rates in 2026?"
# "How does the RBA cash rate affect first-home buyers in Brisbane in 2026?"
# "What do cybersecurity teams need to know about the new APRA ruling?"
"5 things" articles are commoditised — every content farm produces them. "Understanding X" signals generic educational content, not actionable expert advice. "Everything you need to know" is overused and Google's helpful content guidelines penalise these patterns. The question-format rule is based on SEO data showing that question-format titles consistently outperform declarative titles for featured snippets.