Reading This Codebase
From Zero
You don't need to know Python or web development to follow this guide — but the deep-dive sections below assume you do. This section bridges that gap using real code from this repo. Every example is copy-pasted from an actual file, with line-by-line annotations.
Python syntax, using this repo as the textbook
Python reads close to English. The whole codebase is built from a small number of repeating shapes: imports, functions, dictionaries, lists, and conditionals. Once you recognise these five shapes, you can read almost any file in this repo.
1. Imports — "borrow code other people wrote"
Every file starts by importing the tools it needs. services/ingestion-worker/sources/rss.py is a small, complete file — a good first read:
import logging # log parse failures and malformed feeds without crashing
import feedparser # universal RSS/Atom parser — handles most feed quirks
from dateutil import parser as dateparser, tz
import feedparser means "load the feedparser library, and refer to it as feedparser below." from dateutil import parser as dateparser, tz means "from the dateutil library, take just the parser and tz pieces — and call parser by the nickname dateparser instead, because parser is a generic name that would be confusing on its own."
2. Functions — "a named, reusable recipe"
A function is defined with def name(inputs) -> output_type:. Everything indented underneath belongs to the function.
def fetch(source: dict, max_articles: int = 20) -> list[dict]:
"""
Fetch articles from one RSS/Atom feed and return normalised dicts.
"""
try:
resp = _requests.get(source["url"], timeout=15)
feed = feedparser.parse(resp.content)
except Exception as e:
log.warning("RSS parse failed %s: %s", source["url"], e)
return []
def fetch(source: dict, max_articles: int = 20) -> list[dict]:
Defines a function named fetch. It takes one required input source (a dict — see below) and one optional input max_articles which defaults to 20 if the caller doesn't supply it. -> list[dict] says: "this function returns a list of dicts." These type hints don't change how Python runs — they're documentation that tools like editors and Claude can check.
try / except
"Try to do this; if it throws an error, do that instead." Here: try to fetch and parse the feed. If the network call fails for any reason (timeout, DNS failure, bad SSL cert — all caught by Exception), log a warning and return an empty list rather than crashing the whole worker.
return []
return sends a value back to whoever called this function. [] is an empty list. The caller (in main.py) gets back zero articles for this source and moves on to the next one — one broken RSS feed never takes down the whole ingestion run.
3. Dictionaries — "a labelled box of values"
A dict (dictionary) is Python's version of a JSON object: a set of key: value pairs, accessed by name with square brackets. This is the single most common data shape in the codebase — every database row, every API request body, and every config object is a dict.
articles.append({
"source_id": source["id"], # UUID — foreign key to sources table
"source_name": source["name"], # human-readable name — used in drop logs
"client_id": source["client_id"], # UUID — which client owns this source
"trust_score": source["trust_score"], # 0.0–1.0 — used by Gate 1 rule 2
"client_keywords": source["keywords"] or [], # TEXT[] from DB
"client_excluded": source["excluded_topics"] or [], # TEXT[] from DB
"url": url,
"title": title,
"summary": summary[:2000], # cap to prevent very long articles
"published_at": published_at, # datetime or None
})
source["id"] means "look up the value stored under the key "id" in the dict called source." The whole { ... } block is building a new dict — a normalised "article" record — by pulling values out of source (the RSS source config) and combining them with values parsed from the feed (url, title, etc). source["keywords"] or [] means "use source["keywords"] if it's not empty/None, otherwise use an empty list" — a one-line default. summary[:2000] means "take only the first 2000 characters of summary" — this is Python's slice syntax.
4. Lists and loops — "do this for every item"
articles = []
for entry in feed.entries[:max_articles]:
url = entry.get("link", "").strip()
title = entry.get("title", "").strip()
if not url or not title:
continue # skip this entry and move to the next
articles.append({ ... })
return articles
articles = []
Creates an empty list — an ordered, growable collection. Think of it as an empty shopping basket that articles.append(...) will add items to, one per loop iteration.
for entry in feed.entries[:max_articles]:
"For each item in this collection, run the indented block once with entry set to that item." feed.entries[:20] is a slice — "give me only the first 20 entries," so a feed with 1,000 backlogged articles doesn't flood the pipeline on first run.
entry.get("link", "").strip()
.get("link", "") is the safe version of entry["link"] — if the key "link" doesn't exist, return "" (empty string) instead of crashing. .strip() removes leading/trailing whitespace. Chaining methods like this — .get(...).strip() — is everywhere in this codebase.
if not url or not title: continue
not url is True when url is an empty string. continue means "stop processing this item, jump straight to the next loop iteration." So: any feed entry missing a link or title is silently skipped.
Almost every function in this codebase follows the same shape: receive a dict (or list of dicts) → loop / transform / filter → return a dict (or list of dicts). Once you see this, files like gate1_rules.py, gate2_relevance.py, and the admin routes in admin.py stop looking like a wall of text and start looking like the same five Lego bricks rearranged.
How FastAPI receives an HTTP request
"The frontend" in this platform is just HTML pages rendered by the server (no separate React/Vue app). When you click a button or submit a form in your browser, it sends an HTTP request over the network to approval-service, which is a Python process running FastAPI. FastAPI's job is: match the incoming request to the right Python function, run it, and turn whatever it returns into an HTTP response.
Anatomy of a route
@router.post("/admin/sources/{source_id}/toggle")
def admin_toggle_source(request: Request, source_id: str, client_id: str = Form(...)):
_require_admin(request)
...
return RedirectResponse(f"/admin/clients/{client_id}?flash=Source+updated", status_code=303)
@router.post("/admin/sources/{source_id}/toggle")
This line — a decorator — is FastAPI's routing table entry. It says: "when an HTTP POST request arrives at a URL matching /admin/sources/<anything>/toggle, call the function below it." The {source_id} part is a path parameter — a placeholder that captures whatever string is in that position of the URL.
def admin_toggle_source(request: Request, source_id: str, client_id: str = Form(...)):
FastAPI inspects this function's parameters and fills them in automatically: source_id: str comes from the {source_id} in the URL path; client_id: str = Form(...) means "read a field named client_id out of the submitted HTML form body" (the ... means it's required); request: Request gives access to cookies — used here to check the admin session.
return RedirectResponse(f"/admin/clients/{client_id}?flash=Source+updated", status_code=303)
Whatever a route function returns becomes the HTTP response. RedirectResponse(..., status_code=303) tells the browser "go fetch this other URL instead" — which is how a form submission ends up showing you the updated page. f"...{client_id}..." is an f-string: any {expression} inside is replaced with its value, so client_id = "f73c..." produces /admin/clients/f73c...?flash=Source+updated.
Where do GET and POST come from?
Every HTTP request has a method (GET, POST, etc.) and a path (the URL after the domain). Your browser sends:
Triggered by typing a URL, clicking a link, or the browser loading a page. No body — just "give me this page." Routes decorated @router.get(...) typically SELECT from Postgres and return a rendered HTML page (response_class=HTMLResponse).
Triggered by an HTML <form method="POST"> being submitted, or JavaScript's fetch(). Has a body containing the form fields. Routes decorated @router.post(...) typically INSERT/UPDATE Postgres, then redirect.
In many modern apps, "frontend" means a separate JavaScript app (React, Vue) that calls a JSON API. This platform doesn't do that for the admin UI. The "frontend" is HTML files in services/approval-service/templates/, rendered by the same Python process that talks to Postgres. The browser only ever talks to one thing: the FastAPI app, over plain HTTP, exchanging full HTML pages (and form submissions) — not JSON. The api-gateway service is the exception: it serves JSON to API clients (e.g. a future mobile app or third-party integration), authenticated with JWTs instead of cookies.
How a request finds its function
How the backend talks to Postgres
Every piece of data this platform knows about — clients, sources, news items, drafts — lives in one PostgreSQL database. Python talks to Postgres using a library called psycopg2, which lets you send raw SQL strings and get rows back as Python tuples.
Step 1 — open a connection
def get_db():
# Open a new Postgres connection using the DATABASE_URL from the K8s Secret.
# connect_timeout=10 prevents the handler from hanging indefinitely if the DB
# is temporarily unreachable — without this, psycopg2 blocks forever and
# Cloudflare returns a 524 to the browser.
# Caller is responsible for calling conn.close() — always use try/finally.
return psycopg2.connect(DATABASE_URL, connect_timeout=10)
DATABASE_URL is a single string like postgresql://user:password@host:5432/content_intelligence — it encodes the username, password, host, port, and database name. It's injected as an environment variable from a Kubernetes Secret (never hardcoded, never in Git — see the Configuration table in Multi-tenancy).
Step 2 — get a cursor, run SQL, read results
conn = get_db()
try:
with conn.cursor() as cur:
cur.execute("""
SELECT id, source_type, name, url, trust_score, active, ...
FROM sources WHERE client_id = %s ORDER BY source_type, name
""", (client_id,))
sources = cur.fetchall()
finally:
conn.close()
conn = get_db()
Opens a connection — think of it as opening a phone line to the database. conn is now a Python object you can send commands through.
with conn.cursor() as cur:
A cursor is the thing that actually sends SQL and receives results, scoped to this connection. with ... as cur: is a context manager — it guarantees the cursor is cleaned up automatically when the indented block ends, even if an error occurs.
cur.execute("SELECT ... WHERE client_id = %s", (client_id,))
Sends the SQL string to Postgres. The %s is a placeholder — psycopg2 safely substitutes the value from the tuple (client_id,) in its place. This is the only safe way to insert variables into SQL. Never build SQL with f-strings/string concatenation — that's how SQL injection vulnerabilities happen.
sources = cur.fetchall()
Retrieves every row the query matched, as a Python list of tuples — e.g. [(id1, "rss", "Skift", "https://...", 0.8, True, ...), (id2, ...), ...]. Each tuple's values are in the same order as the columns listed in SELECT.
try / finally: conn.close()
Database connections are a limited resource — a connection pool can be exhausted if connections are opened and never closed. finally guarantees conn.close() runs whether the try block succeeded or raised an error.
Writing data: INSERT / UPDATE + commit
with conn.cursor() as cur:
cur.execute(
"UPDATE sources SET active = NOT active WHERE id=%s RETURNING active",
(source_id,),
)
new_state = cur.fetchone()[0]
conn.commit()
By default, psycopg2 wraps every connection in a transaction. UPDATE/INSERT/DELETE statements only become permanent once you call conn.commit() — until then, they're invisible to every other connection (including psql in another terminal). RETURNING active is a Postgres feature that returns the new value of the row you just updated, in the same round-trip — avoiding a separate SELECT afterwards. cur.fetchone()[0] grabs the first column of the first (only) returned row.
The shape of a row, end to end
cur.execute("""
SELECT id, source_type, name, url, trust_score, active, last_fetched, ...
FROM sources WHERE client_id = %s ORDER BY source_type, name
""", (client_id,))
sources = cur.fetchall()
# sources = [
# ("a1b2...", "rss", "Skift", "https://skift.com/feed", 0.8, True, ...),
# ("c3d4...", "rss", "Hospitality Net", "https://...", 0.7, True, ...),
# ]
Notice the SQL column order — id, source_type, name, url, trust_score, active, ... — matches the order of values inside each tuple. This ordering becomes important in the next section, where Jinja2 unpacks these tuples by position.
Jinja2: turning Python data into HTML
Once a route function has its data (a list of tuples from Postgres), it needs to turn that into an HTML page the browser can display. Jinja2 is a templating engine: you write an .html file with normal HTML, plus special {{ ... }} and {% ... %} tags that Python fills in at render time.
Returning a template from a route
FastAPI routes with response_class=HTMLResponse call templates.TemplateResponse(...), passing a context dict — every key in that dict becomes a variable available inside the template:
return templates.TemplateResponse("admin_client_edit.html", {
"request": request,
"client": client,
"sources": sources, # the list of tuples from cur.fetchall()
"now": datetime.now(timezone.utc),
})
Looping over rows: {% for %}
The template receives sources — a list of tuples, one per source row. {% for %} unpacks each tuple's positional values into named variables, in the exact same order as the SQL SELECT:
{% for sid, stype, sname, surl, strust, sactive, slast, serr, serrcount,
serrat, ssugg, squarantined, squarcount, sauto in sources %}
{% if stype != 'competitor' %}
<tr>
<td>{{ sname }}</td>
<td>{{ surl }}</td>
<td>{{ strust }}</td>
<td>
{% if sactive %}<span class="badge green">On</span>
{% else %}<span class="badge red">Off</span>{% endif %}
</td>
</tr>
{% endif %}
{% endfor %}
{% for sid, stype, sname, ... in sources %}
This is identical in spirit to Python's for x in list: — but it destructures each tuple into 14 named variables in one go, exactly like Python's sid, stype, sname = some_tuple. sid is column 1 (id), stype is column 2 (source_type), and so on — matching the SQL SELECT id, source_type, name, ... from the previous section position-for-position. If the SQL column order and this list ever get out of sync, variables silently point at the wrong data — a common bug source.
{{ sname }}
Double curly braces output a value as text into the HTML. If sname is "Skift", the rendered HTML contains <td>Skift</td>. Jinja2 automatically escapes special characters (so a source named <script> renders as harmless text, not executable HTML) — this is the main defence against stored XSS.
{% if sactive %} ... {% else %} ... {% endif %}
A conditional, just like Python's if/else. sactive is the Python boolean True or False coming straight from the Postgres active column. Depending on its value, one of two badges is rendered.
Forms: how a click becomes a POST request
Every button that changes data is inside an HTML <form>. The browser doesn't need any JavaScript to send a POST — submitting a form is a built-in browser feature:
<form class="inline-form" method="POST" action="/admin/sources/{{ sid }}/toggle">
<input type="hidden" name="client_id" value="{{ client.id }}">
<button class="btn btn-sm">Toggle</button>
</form>
method="POST" action="/admin/sources/{{ sid }}/toggle"
{{ sid }} is filled in at render time with this row's actual source UUID — e.g. /admin/sources/a1b2c3.../toggle. When the button is clicked, the browser sends an HTTP POST to exactly that URL — which is the URL pattern matched by @router.post("/admin/sources/{source_id}/toggle") from the FastAPI section above. The {{ sid }} in the template and the {source_id} in the route decorator are how the rendered HTML "knows" which Python function will handle the click.
<input type="hidden" name="client_id" value="{{ client.id }}">
A hidden field — invisible to the user, but still submitted with the form. This is how client_id ends up available as client_id: str = Form(...) in the Python route, without the user seeing or typing it.
SQL column order (admin.py) → Jinja2 unpacking order (admin_client_edit.html's {% for %}) → rendered {{ sid }} values inside form action URLs → browser POST → FastAPI {source_id} path parameter → new SQL UPDATE. Five files, one thread. The next section walks this exact thread, click by click.
Two different ways code runs in this platform
So far, every example reacted to a browser click — code runs, sends a response, and stops, waiting for the next request. But three of the five services (ingestion-worker, intelligence-worker, publisher-worker) never receive HTTP requests at all. They run forever in a loop. Both patterns matter, and confusing them is a common source of "wait, who calls this function?" confusion when reading the code.
The process sits idle until an HTTP request arrives, runs one route function, returns a response, then goes back to idle. Like a shop assistant who only acts when a customer walks up to the counter. Driven entirely by FastAPI's router — there is no while True loop in this code.
The process runs an infinite while True: loop from the moment it starts. No browser is involved. Like a security guard doing rounds every few minutes, whether or not anything has happened. Driven by time.sleep() (polling on a timer) or redis.xreadgroup(..., block=5000) (waiting on a queue).
What a worker's main loop actually looks like
while True:
try:
# block=5000 → wait up to 5 seconds for new messages before looping.
# count=5 → process up to 5 messages per batch.
# ">" → only undelivered messages (not pending/unacked ones).
messages = r.xreadgroup(
CONSUMER_GROUP, CONSUMER_NAME,
{STREAM_IN: ">"},
count=5,
block=5000,
)
except redislib.ConnectionError as e:
log.error("Redis connection lost: %s — retrying in 5s", e)
time.sleep(5)
continue
for stream, msgs in messages:
for msg_id, fields in msgs:
process_message(fields) # runs Gates 2-4 for one news item
while True:
An infinite loop — runs forever until the process is killed (e.g. by Kubernetes during a deploy). This is the entire "main program" for a worker service — there's no router, no incoming connections to wait for.
r.xreadgroup(..., block=5000)
block=5000 means "wait up to 5000 milliseconds (5 seconds) for a new message to appear on the news.filtered Redis Stream — if one arrives sooner, return immediately; if none arrives, return empty after 5s and loop again." This is not a busy-loop burning CPU — the process is asleep, parked on this call, until Redis wakes it up or the timeout passes.
for stream, msgs in messages: for msg_id, fields in msgs:
A nested loop — Redis can return messages from multiple streams, and each stream can return multiple messages in one batch (count=5). The inner loop processes each message one at a time by calling process_message(fields) — this is where Gate 2 → Gate 3 → Gate 4 actually run for that news item.
When you're reading admin.py, every variable (client_id, request) exists only for the duration of one HTTP request — created fresh each time, discarded after the response is sent. When you're reading intelligence-worker/main.py, variables like conn (the DB connection) and last_gate5_rerun_poll persist across thousands of loop iterations for the lifetime of the pod — which is why you'll see explicit reconnect logic (conn = get_db() inside an except block) that a request/response handler never needs: a worker can't just "wait for the next request" to get a fresh connection, because there is no next request — there's only the next loop iteration, which it has to survive itself.
Browser → FastAPI route → psycopg2 → Postgres → Jinja2 → HTML (request/response), and while True: → Redis Stream → Gates 2-4 → Postgres (background worker) — these are the only two "shapes" of execution anywhere in this codebase. Every file you open is one function inside one of these two shapes. The walkthrough below ties the request/response shape together with a real click, end to end.
End to end: clicking "Toggle" on a source
This is the smallest complete action in the admin dashboard: on a client's edit page, every RSS source has a Toggle button that switches it on/off without a page reload-from-scratch. It touches all four pieces from the Fundamentals sections above — HTML form, FastAPI route, psycopg2/Postgres, Jinja2 re-render — in under 10 lines of real code each. Step through it below.
The URL in step 2 (/admin/sources/<id>/toggle) is constructed by Jinja2 in step 1 from {{ sid }}, and is the same string that FastAPI's @router.post("/admin/sources/{source_id}/toggle") pattern matches in step 3 — that's the entire "connection" between frontend and backend: a shared URL string. Similarly, the SQL column order in step 5 is what determines the variable order in the {% for %} loop back in step 7. Nothing here is magic — it's strings and lists, lining up by convention.
Content Intelligence
Platform
A multi-tenant SaaS that monitors real-time industry news, detects high-impact events, and generates SEO-optimised content before competitors react. Built on Kubernetes with Claude AI at its core.
Why this platform exists
Most businesses know they should publish about industry news. They don't — because the workflow is broken:
News breaks at 9am
A reporter reads about it. Maybe.
Writer assigned at 11am
They research, draft, edit. Takes 3–4 hours.
Approval chain — next day
Manager reviews. Revisions. Legal check. More revisions.
Published 48+ hours later
Google has already indexed 50 competitor articles. The SEO window is closed.
The SEO advantage belongs to whoever publishes first with quality content. Every hour of delay is lost organic ranking opportunity. This platform compresses the window from 48+ hours to under 4 hours.
What the platform compresses
System Architecture
Four microservices communicate via Redis Streams — a durable, ordered message queue. No service calls another directly over HTTP. This means if one crashes, the others continue and no data is lost.
- Polls RSS feeds every 2 minutes
- Runs Gate 1 (zero-cost rules)
- Deduplicates by URL hash
- Quarantines broken sources
- Publishes to
news.filtered
- Consumes
news.filtered - Runs Gates 2, 3, 4
- Competitor gap analysis
- Evidence gathering (Serper)
- Publishes to
content.drafts
- Web dashboard for clients
- Email digest with approve/reject links
- Admin panel for operators
- JWT auth (cookie + Bearer)
- Publishes to
content.approved
- Consumes
content.approved - WordPress REST API (Markdown→HTML)
- Dev.to API (native Markdown)
- Retry with exponential backoff
- Records to
publicationstable
Redis Stream topology
Services never call each other's HTTP endpoints. Everything goes through Redis Streams. If intelligence-worker crashes, ingestion-worker keeps writing to the stream. When intelligence-worker restarts, it resumes exactly where it left off — no messages lost.
Why the 4-Gate Funnel Exists
Claude Sonnet costs ~$0.015 per content generation call. Without filtering, running every ingested article through Sonnet would cost ~$450/client/month. With the funnel: ~$3.33/client/month (the original $2.55 funnel cost, plus Gate 3.5 ICP-fit scoring and Gate 5's automated review/auto-revision — see below).
| Gate | Method | Volume | Cost/call | Monthly |
|---|---|---|---|---|
| Gate 1 — Rules | Pure Python | 30,000 articles | $0.00 | $0.00 |
| Gate 2 — Haiku | Claude Haiku (batched) | 3,000 articles | $0.0001 | $0.30 |
| Gate 3 — Signal | Postgres + pytrends | 600 articles | $0.00 | $0.00 |
| Gate 3.5 — ICP-Fit | Claude Haiku | 150 articles | $0.0003 | $0.045 |
| Gate 4 — Sonnet | Claude Sonnet | 150 articles | $0.015 | $2.25 |
| Gate 5 — cheap_review (round 1) | Claude Haiku | 150 drafts | $0.0003 | $0.045 |
| Gate 5 — targeted_revision | Claude Sonnet | ~45 drafts (30% in 70-84 band) | $0.015 | $0.675 |
| Gate 5 — cheap_review (round 2) | Claude Haiku | ~45 drafts (only if revised) | $0.0003 | $0.0135 |
| Total | ~$3.33 |
Each gate must be cheaper than the next. You never spend $0.015 on something that a $0.00 rule would have caught. The gates are deliberately ordered from cheapest to most expensive. Gate 5's targeted_revision is the one exception — it's a Sonnet call, but it only runs for the ~30% of drafts that scored 70-84 with no critical issues, and it replaces what would otherwise be manual admin editing time.
Gate 1 — Rules Engine
File: services/ingestion-worker/stages/gate1_rules.py
Zero-cost filter. Runs entirely in Python — no database queries, no API calls, no network. Eliminates ~90% of articles before any AI is touched. Six rules run in order; the first failure short-circuits (fast reject).
Drop articles with fewer than 50 chars of title+summary. These are usually empty RSS entries, tracking pixels, or fetcher errors with no useful content.
Each source has a trust_score (0.0–1.0) set by the admin. Sources below 0.4 are dropped. Prevents spam aggregators from polluting the pipeline.
Drop articles older than 48 hours (configurable). Old news generates low-value content and hurts SEO freshness signals. Compares published_at against a UTC cutoff.
If the article matches any of the client's excluded topics, drop it — even if a keyword also matches. Example: a mortgage broker has "security" as a keyword but "gaming" excluded. "Gaming security" gets dropped.
Breaking news bypasses the keyword check entirely. If the title contains "breaking", "emergency", etc., it passes Gate 1 regardless of keyword match. Real-time events shouldn't wait for keyword list tuning.
The article must contain at least one of the client's configured keywords. Case-insensitive substring search. The last and most expensive check — only reached if all previous rules passed.
The actual code
class Gate1Rules:
def __init__(self, min_content_length=50, source_trust_min=0.4,
max_age_hours=48, urgency_keywords=None):
# Pre-compute a lowercase set for O(1) membership checks in the hot path.
# A set lookup is O(1) vs O(n) for a list — matters when called 1000x/day.
self.urgency_keywords = set(k.lower() for k in (urgency_keywords or []))
def check(self, item, keywords, excluded):
text = f"{item.get('title', '')} {item.get('summary', '')}".strip()
# Rule 1: minimum content length
if len(text) < self.min_content_length:
return False, "too_short"
# Rule 2: source trust score — fail-open (default 1.0 if missing)
if item.get("trust_score", 1.0) < self.source_trust_min:
return False, "low_trust_source"
# Rule 3: recency check
pub = item.get("published_at")
if pub:
if pub.tzinfo is None: # feedparser returns naive datetimes
pub = pub.replace(tzinfo=timezone.utc)
cutoff = datetime.now(timezone.utc) - timedelta(hours=self.max_age_hours)
if pub < cutoff:
return False, "stale"
text_lower = text.lower() # compute once, use below
# Rule 4: hard exclusions (checked BEFORE keyword match)
for exc in (excluded or []):
if exc.lower() in text_lower:
return False, f"excluded:{exc}"
# Rule 5: urgency override — breaking news bypasses keyword check
if self.urgency_keywords and any(kw in text_lower for kw in self.urgency_keywords):
return True, "urgency_override"
# Rule 6: keyword match — the core relevance gate
if not any(kw.lower() in text_lower for kw in (keywords or [])):
return False, "no_keyword_match"
return True, "passed"
Length and trust checks are placed first because they require zero string operations on the article text. Exclusions run before keywords so that a forbidden topic can't slip through on a keyword match. Urgency override is placed after exclusions — even breaking news gets dropped if it matches an exclusion.
Gate 2 — Haiku Relevance Scoring
File: services/intelligence-worker/stages/gate2_relevance.py
Uses Claude Haiku (the cheapest Anthropic model) to score article relevance on a 0–100 scale. Three cost optimisations make this viable at scale: batching, prompt caching, and minimal input.
Three cost optimisations
8 articles per API call. Instead of 100 calls for 100 articles, you make 13 calls. The model scores all 8 articles in one response using index-based JSON.
The system prompt (client profile) is identical across all batches in a run. Anthropic caches it after the first call. ~80% cost saving on subsequent calls.
Only title + first 200 chars of summary are sent to the model. Not the full article — just enough context for relevance scoring.
Each client can use a different Gate 2 model (GPT-4o-mini, Gemini Flash, DeepSeek). The provider abstraction makes this transparent — same interface regardless of provider.
How batching works
BATCH_SIZE = 8 # 8 articles per API call (sweet spot — larger batches confuse indexing)
MIN_SCORE = 60 # articles below this score are dropped
def _score_batch(provider, articles, client_profile, prompt_template):
# Build the batch text — each article gets an index number.
# Haiku uses these indexes in its JSON response: {"scores": [{"index": 0, "score": 78, ...}]}
articles_text = "\n---\n".join([
f"[{i}] TITLE: {a['title']}\nSUMMARY: {(a.get('summary') or '')[:200]}"
for i, a in enumerate(articles)
])
# The system prompt contains the CLIENT'S PROFILE — same for all batches in a run.
# Caching this is the key cost saving.
system_prompt = prompt_template.format(
industry_type = client_profile.get("industry_type", "general"),
target_geo = ", ".join(client_profile.get("target_geo") or ["global"]),
keywords = ", ".join(client_profile.get("keywords") or []),
excluded_topics = ", ".join(client_profile.get("excluded_topics") or []),
)
# cache_system=True adds cache_control to the system prompt.
# After the first API call, Anthropic serves the system prompt from cache.
response = provider.complete(
system=system_prompt,
user=f'Score each article 0-100. Return JSON: {{"scores": [{{"index": 0, "score": N, "matched_keywords": []}}]}}\n\n{articles_text}',
max_tokens=256,
cache_system=True, # ← the magic flag
)
result = json.loads(response.text)
scores = {s["index"]: s for s in result.get("scores", [])}
# Filter: keep only articles above MIN_SCORE threshold
passed = []
for i, article in enumerate(articles):
score = scores.get(i, {}).get("score", 0)
if score >= MIN_SCORE:
article["relevance_score"] = score
article["matched_keywords"] = scores[i].get("matched_keywords", [])
passed.append(article)
return passed
def run_gate2(provider, articles, client_profile, prompt_template):
passed = []
# Iterate in steps of BATCH_SIZE: 0, 8, 16, 24, ...
for i in range(0, len(articles), BATCH_SIZE):
batch = articles[i:i + BATCH_SIZE]
passed.extend(_score_batch(provider, batch, client_profile, prompt_template))
return passed
Without batching: 100 calls × full prompt = high cost.
With batching: 13 calls × (system prompt cached after call 1) = ~80% saving on 12 of those 13 calls.
Result: Gate 2 costs ~$0.30/month per client — pennies.
Gate 3 — Signal Detection
File: services/intelligence-worker/stages/signal_detection.py
Zero-cost filter that combines two independent signals. Both are free to compute. An article needs a high combined score before we spend $0.015 on Sonnet generation.
Signal 1: Source Spread (our own data)
Counts how many distinct sources covered the same topic in the last 2 hours. Uses PostgreSQL full-text search on our own news_items table — zero external calls.
def _source_spread(conn, title: str, hours: int = 2) -> int:
# Extract 5 meaningful words from the title for full-text matching.
# Filter out short stop-words ("the", "and", "for") — they match everything.
words = [w.strip(".,!?\"'") for w in title.split() if len(w) > 3][:5]
# OR query: article matches if ANY keyword appears (broad catch)
tsquery = " | ".join(words)
with conn.cursor() as cur:
cur.execute("""
SELECT COUNT(DISTINCT source_id)
FROM news_items
WHERE to_tsvector('english', title) @@ to_tsquery('english', %s)
AND published_at > NOW() - INTERVAL '%s hours'
""", (tsquery, hours))
return cur.fetchone()[0] or 1
# Map raw count to a 0–100 score
def _spread_score(count: int) -> int:
if count >= 5: return 100 # 5+ sources = definitely breaking
if count >= 3: return 60 # trending across outlets
if count >= 2: return 40 # gaining traction
return 20 # isolated report
Signal 2: Google Trends SEO Opportunity
Queries Google Trends for search interest on the client's matched keywords, geo-filtered to their location. Redis-cached for 24 hours — same keyword pair = one API call.
Combined Score Formula
trend_score = (spread_score * 0.6) + (seo_opportunity * 0.4)
# 60/40 weighting: spread is more reliable (our own data)
# Trends complements with demand-side intent but can be rate-limited
# Urgency detection overrides the score threshold entirely:
# - "breaking" / "emergency" in title → urgency = "breaking" → Gate 3 bypassed
# - 3+ sources covering topic → urgency = "high" → Gate 3 bypassed
If Google Trends is unavailable, seo_opportunity defaults to 50 (neutral). A Trends outage never blocks content generation. This is called "fail-open" design — the default is to continue, not to stop.
Gate 3.5 — ICP-Fit Scoring
File: services/intelligence-worker/stages/icp_fit.py
Gates 1–3 ask "is this story newsworthy?" Gate 3.5 asks a sharper question: is this story worth a full article for this client's specific audience? A story can pass every prior gate — high source spread, strong SEO opportunity, on-topic for the industry — and still be a poor fit for a client whose Ideal Client Profile (ICP) targets, say, first-time homebuyers rather than property investors. Gate 3.5 runs after evidence gathering and before Gate 4, using Claude Haiku against the client's ICP block (the same _build_icp_block() used by Gate 4 and Gate 5).
What it returns
| Field | Meaning |
|---|---|
audience_relevance | 0.0–1.0 — does this matter to the ICP's primary audience? |
business_relevance | 0.0–1.0 — does this connect to the client's business goal? |
actionability | 0.0–1.0 — can the reader actually do something with this? |
risk_level | low / medium / high — informational only, logged for review |
recommended_angle | ICP-informed angle suggestion, passed into Gate 4's icp_fit context |
claims_to_use | Evidence pack claims re-ranked/filtered for this audience — replaces use_these_claims before Gate 4 sees it |
should_generate | audience_relevance ≥ 0.6 AND actionability ≥ 0.5 — if false, this news_item/client pair is dropped before Gate 4 ever runs |
The drop decision
If should_generate is false, the pipeline marks the news item processed for this client and returns — the same outcome as a Gate 4 geo_not_impacted skip, but caught one step earlier and one model tier cheaper ($0.0003 Haiku vs $0.015 Sonnet). This is the same "cheaper gate catches what the more expensive gate would have caught" principle that justifies the whole funnel.
On any exception, Gate 3.5 returns should_generate=True, claims_to_use unchanged, recommended_angle="" — Gate 4 runs exactly as it would have without Gate 3.5. A Gate 3.5 failure never blocks generation; at worst, it costs an extra Sonnet call that Gate 3.5 would otherwise have saved.
Gate 3.5 reuses gate2_provider — whatever model (Haiku, GPT-4o-mini, DeepSeek) is configured for Gate 2 relevance scoring also runs ICP-fit scoring and Gate 5's cheap_review. One per-client setting, three cheap-tier stages.
Gate 4 — Sonnet Content Generation
File: services/intelligence-worker/stages/gate4_generation.py
The most expensive step and the core product. Claude Sonnet generates 6 content formats in one API call, selecting the best angle from 8 options based on the news type, competitor gaps, and client voice.
What one Gate 4 call produces
| Output | Description |
|---|---|
blog.title | SEO H1 — question format, keyword in first 8 words, year suffix |
blog.slug | URL slug derived from title |
blog.meta_description | 150–160 chars for Google SERPs |
blog.body_markdown | 1,200–3,500 word article with mandatory section structure |
blog.faq_schema | Exactly 5 Q&A pairs for Google FAQ rich results |
linkedin_post | Platform-optimised, shorter format |
selected_angle | Which of the 8 angles Claude chose (stored for analytics) |
Geo-skip logic
Before generating, the model checks if the news is actually relevant to the client's geography. If not, it outputs a skip signal instead of a draft — saving $0.015 and preventing irrelevant content.
# If Claude decides the news is irrelevant to the client's geo:
{"selected_angle": "skip", "reason": "geo_not_impacted"}
# If it decides to generate:
{
"selected_angle": "local_impact",
"blog": {
"title": "Why should Australian mortgage brokers reconsider fixed rates in 2026?",
"slug": "australian-mortgage-brokers-fixed-rates-2026",
"meta_description": "The RBA's latest decision changes the fixed vs variable ...",
"body_markdown": "...(full article 1200-3500 words)...",
"keywords": ["mortgage broker", "fixed rate", "RBA 2026"],
"faq_schema": [{"question": "...", "answer": "..."}, ...]
},
"linkedin_post": "..."
}
Mandatory article structure
Every generated article must contain these sections in order. Claude is explicitly instructed to follow this structure — it's part of the Gate 4 system prompt stored in values.yaml.
sections = [
"Quick Answer (40–60 words, no heading, standalone prose)",
"What You Will Learn (4–6 bullets)",
"What Is [Topic]? (80–120 words)",
"Why Does [Problem] Happen? (100–150 words, 4–6 bullets)",
"At-a-Glance Summary (Markdown table, 5–8 rows)",
"How to [Solve It] (200–300 words, numbered H3 steps)",
"What Happens If You Ignore This? (80–120 words, 3–5 bullets)",
"", # Pexels image placeholder
"Common Mistakes to Avoid (table: Mistake | Why | What to Do Instead)",
"Expert Tips (100–150 words, ≥2 tips with measurable checks)",
"", # second image
"Frequently Asked Questions (exactly 5 FAQs)",
"Key Takeaways (60–80 words, 4–5 bullets)",
"References (3–5 entries as [Title](URL))",
]
Gate 5 — Automated QA Review + Targeted Auto-Revision
Files: services/intelligence-worker/stages/rules_validator.py, services/intelligence-worker/stages/cheap_review.py, services/intelligence-worker/stages/targeted_revision.py
Every Gate 4 draft passes through an automated QA pass before a human ever sees it. Unlike Gates 1–3, Gate 5 doesn't filter volume — it classifies each of the 3–5 daily drafts so admins know which ones are safe to approve quickly and which need a closer look. As of PR3, Gate 5 can also fix a draft itself: a single targeted Sonnet revision pass for drafts that are "almost there" (score 70-84, no critical issues), followed by a second review to confirm the fix worked.
Three-stage pipeline
| Stage | Method | Cost | Runs when |
|---|---|---|---|
rules_validator |
Pure Python | $0.00 | Always — word count, heading structure, FAQ count, required sections present |
cheap_review (round 1) |
Claude Haiku | ~$0.0003 | Always — fabricated stats, eligibility-scoping errors, audience_fit vs. ICP, avoid_these_claims leakage, off-topic drift, brand safety |
targeted_revision |
Claude Sonnet | ~$0.01–0.02 | Only if round 1 scored 70-84 with no critical issues, auto-revision is enabled, and this draft hasn't been auto-revised yet |
cheap_review (round 2) |
Claude Haiku | ~$0.0003 | Only immediately after a successful targeted_revision — confirms the fix actually cleared the bar |
Classification thresholds (PR3)
# Any "critical" issue (factual_risk, fabrication, off_topic, brand_safety)
# → hard-blocks the draft until an admin overrides with a written reason
if has_critical:
return "blocked_factual_review", score
# Review itself errored — high-risk industries fail safe to admin review
if review_failed:
if industry_type in {"mortgage", "finance", "health", "cybersecurity"}:
return "needs_admin_review", None
return "review_failed", None
if not passed:
return "needs_admin_review", score
# Clean bill of health — safe to fast-track
if score is not None and score >= 85 and not has_major:
return "ready_for_approval", score
# NEW in PR3 — "almost there", worth one automated fix attempt
if score is not None and score >= 70:
return "needs_auto_revision", score
return "needs_admin_review", score
needs_auto_revision is never persisted_run_gate5 resolves it immediately, in the same pass:
- Auto-revision enabled + not yet attempted → run
targeted_revision, then re-runcheap_review(round 2) and re-classify:- round 2 =
ready_for_approval→ final statusauto_revised - round 2 =
blocked_factual_review→ staysblocked_factual_review(the fix introduced or exposed a critical issue) - anything else →
needs_admin_review(no second auto-revision attempt)
- round 2 =
- Auto-revision disabled, unconfigured, or
targeted_revisionitself failed →needs_admin_review
targeted_revision — minimal-edit Sonnet pass
Given the specific issues cheap_review flagged (severity, category, location, description, suggested fix), Sonnet rewrites only blog.title, blog.meta_description, blog.body_markdown, and blog.faq — every other field (slug, LinkedIn post, keywords, image prompt, selected angle) is preserved verbatim. The result replaces the draft in place via update_draft, and a second content_reviews row (review_round=2, review_type="auto_fix") records what changed and what it cost.
If targeted_revision throws for any reason — API error, malformed JSON, missing fields — it returns None and the original draft is left completely untouched. The caller then routes the draft to needs_admin_review as if auto-revision had never been attempted. This stage can never lose or corrupt a draft.
ICP-awareness across Gate 5 (PR3)
Both cheap_review and targeted_revision now receive the client's ICP block (_build_icp_block — the same primary/secondary audience, pain points, business goal, and preferred/avoided angles that inform Gate 4 and Gate 3.5). This adds a new audience_fit issue category to cheap_review: a draft can score well on facts and structure but still miss the mark if it's written for the wrong reader. When audience_fit issues are flagged, targeted_revision is explicitly instructed to re-pitch the affected sections at the ICP's actual audience. audience_fit is always major or minor — never critical, so it can trigger auto-revision but never a hard block on its own.
Admin-curated factual guardrails (PR3)
The new content_guardrails table lets an admin promote a recurring Gate 5 finding into a standing rule — e.g. "Do not describe the First Home Guarantee as available to all buyers; it is means-tested and place-restricted." Rules are scoped by nullable client_id/industry_type (NULL = applies regardless of that dimension) and are injected into every future Gate 4 prompt for the matching client/industry as {factual_guardrails_block} — closing the loop from "Gate 5 caught this once" to "Gate 4 never makes this mistake again."
Hard-block + admin override
A draft with review_status = blocked_factual_review cannot be approved normally — the Approve button is replaced with an "Approve anyway" flow that requires the admin to pick a rejection reason or write an override note before the draft can publish. This keeps factually risky drafts out of the approval queue by default while still leaving a human the final say.
Admin controls
| Control | Where | Effect |
|---|---|---|
GATE5_ENABLED |
Helm env var | Kills the entire Gate 5 pass. Restart required. |
gate5_auto_revision_enabled |
platform_settings table, toggle on /admin/quality-review |
Kills only the targeted_revision sub-step — rules_validator + cheap_review still run. No restart needed. |
auto_revision_count badge |
content_drafts column, shown on the draft edit page |
Tells the admin this draft was auto-fixed and re-reviewed — both review rounds are visible in the Gate 5 review history. |
If the Haiku review call fails for any reason, Gate 5 doesn't block the pipeline — the draft still reaches the approval queue. For high-risk verticals (mortgage, finance, health, cybersecurity) a failed review routes to needs_admin_review instead of being silently waved through. The same fail-open principle applies to targeted_revision: any failure leaves the original draft untouched.
Gate 5 (all three stages) also runs on drafts produced by cluster synthesis — when a follow-up story enriches an existing draft (update_draft) or generates a linked update article, that output goes through the exact same rules_validator → cheap_review → targeted_revision → cheap_review pipeline before reaching the approval queue.
Ingestion Worker
File: services/ingestion-worker/main.py
Runs continuously as a Kubernetes Deployment (not a CronJob — it needs sub-minute responsiveness). Every 2 minutes it polls all active RSS sources for all clients and runs Gate 1.
The poll loop
async def poll_loop():
while True:
# Fetch all active, non-quarantined sources from Postgres
sources = get_active_sources(conn)
for source in sources:
articles = fetch_rss(source["feed_url"]) # parse RSS/Atom feed
for article in articles:
url_hash = sha256(normalize_url(article["url"])).hexdigest()
# Deduplication: skip if we've already processed this URL
if url_hash in seen_hashes:
continue
# Gate 1: zero-cost rules filter (per-client)
for client in source["clients"]:
passes, reason = gate1.check(article, client["keywords"], client["excluded"])
if passes:
# Publish to Redis Stream for intelligence-worker to consume
redis.xadd("news.filtered", {
"article_id": article_id,
"client_id": client["id"],
"reason": reason,
})
await asyncio.sleep(POLL_INTERVAL_SECONDS) # default: 120s
Source quarantine system
Every RSS source is tracked for consecutive failures. After 3 failures, it's quarantined with exponential backoff. The system automatically tries to find a replacement feed.
| Quarantine # | Duration | Recovery |
|---|---|---|
| 1st time | 6 hours | Auto-retry after expiry |
| 2nd time | 12 hours | Auto-retry after expiry |
| 3rd time | 24 hours | Auto-retry after expiry |
| 4th time | 48 hours | Auto-retry after expiry |
| 5th time | 96 hours | Auto-retry after expiry |
| 6th+ | 168 hours (7 days) | Manual restore from admin |
3-tier replacement feed discovery
When a source is quarantined, the system immediately searches for a replacement — no admin intervention required.
Scrapes the dead source's homepage for <link rel="alternate"> RSS tags. Also probes common paths: /feed, /rss, /rss.xml, /atom.xml.
Searches news.google.com/rss/search?q={source_name}+{industry}. Extracts publisher domains from results, probes top 8 for native RSS feeds. No API key needed.
Falls back to the default_sources DB table — curated industry sources not already assigned to this client. Always available, always a working feed.
URL deduplication
def normalize_url(url: str) -> str:
"""Strip UTM/tracking params so the same article isn't processed twice
if it appears with different tracking params in different RSS feeds."""
from urllib.parse import urlparse, urlencode, parse_qsl
parsed = urlparse(url)
# Keep only non-tracking query params (strip utm_*, fbclid, etc.)
clean_params = [(k, v) for k, v in parse_qsl(parsed.query)
if not k.startswith(("utm_", "fbclid", "gclid", "ref"))]
return parsed._replace(query=urlencode(clean_params)).geturl()
# URL hash is stored in news_items table — SHA256 of the normalized URL
url_hash = hashlib.sha256(normalize_url(article["url"]).encode()).hexdigest()
Intelligence Worker
File: services/intelligence-worker/main.py
The brain of the platform. Consumes the news.filtered Redis Stream and orchestrates the full Gates 2–4 pipeline for each article. Uses Redis consumer groups so no message is ever processed twice — even if the worker crashes and restarts mid-batch.
Pipeline orchestration
async def process_message(msg, client_id, article):
client = get_client_profile(client_id)
# 1. Gate 2 — Haiku relevance scoring (batched, prompt-cached)
relevant = run_gate2(provider, [article], client, PROMPT_RELEVANCE)
if not relevant:
ack(msg); return
# 2. Gate 3 — Signal detection (spread + Google Trends)
signal = detect_signal(conn, article["title"], client["target_geo"])
if signal.trend_score < GATE3_MIN_TREND_SCORE and signal.urgency == "normal":
ack(msg); return
# 3. Competitor analysis — what angles have competitors taken?
comp = analyze_competitors(conn, article, client)
# comp.avoid_angles = ["local_impact", "action_list"]
# comp.trend_score_boost = 15 (first-mover bonus)
# 4. Evidence pipeline
enrichment = quick_enrich(article, client["keywords"]) # Tier 1: Serper
evidence = gather_evidence(article, enrichment, llm_haiku) # Tier 2: deep pack
# 5. Topic clustering — find related published articles for internal links
cluster = get_cluster_links(conn, article, client_id)
# 6. Gate 4 — Sonnet generation
draft = run_gate4(
provider=llm_sonnet,
article=article,
client=client,
comp_analysis=comp,
evidence_pack=evidence,
cluster_links=cluster,
)
if draft.get("selected_angle") == "skip":
ack(msg); return # geo_not_impacted — skip silently
# 7. Save to Postgres, publish to content.drafts stream
save_draft(conn, draft, client_id)
redis.xadd("content.drafts", {"draft_id": draft["id"], "client_id": client_id})
ack(msg) # ← critical: only ack AFTER successful save
With consumer groups, Redis tracks which messages have been acknowledged (ACK'd). If the worker crashes between processing and ACK'ing, Redis re-delivers the message when the worker restarts. No message is ever permanently lost — the pipeline is crash-safe.
Draft limits per plan
Before Gate 4, the worker checks the client's daily and weekly draft limits (from PLAN_LIMITS_JSON in the ConfigMap). This prevents the pipeline from generating more content than the client can review.
Approval Service
File: services/approval-service/main.py
FastAPI web application that serves the client dashboard, admin panel, and email approval workflow. Clients never see raw AI output — everything goes through human approval first.
The approval workflow
Daily digest email
A CronJob triggers /send-digest each morning. For each client with pending drafts, an email is sent with approve/reject/edit links for each draft.
HMAC-signed links (no login required)
Each approve/reject/edit link contains an HMAC-SHA256 token. Clients can approve content from their email inbox without logging in. Links expire after 7 days.
Optional editing
The edit link opens a tabbed editor: blog post (with character counters), LinkedIn post. Clients can tweak the AI output before publishing.
Publishes to Redis Stream
On approval, the service writes to content.approved. Publisher-worker picks this up and distributes to WordPress, Dev.to, etc.
HMAC token format
REVIEW_SECRET = os.environ["REVIEW_SECRET"] # from K8s Secret
def create_review_token(draft_id: str, action: str) -> str:
"""Create a signed URL token. Format: {token}:{action}:{expiry}"""
expiry = int(time.time()) + 7 * 24 * 3600 # 7 days from now
payload = f"{draft_id}:{action}:{expiry}"
sig = hmac.new(REVIEW_SECRET.encode(), payload.encode(), hashlib.sha256).hexdigest()
return f"{sig}:{action}:{expiry}"
def verify_review_token(token: str, draft_id: str) -> tuple[bool, str]:
"""Verify a token from an email link. Returns (valid, action)."""
try:
sig, action, expiry = token.split(":")
if int(expiry) < time.time():
return False, "" # expired
payload = f"{draft_id}:{action}:{expiry}"
expected = hmac.new(REVIEW_SECRET.encode(), payload.encode(), hashlib.sha256).hexdigest()
# Constant-time comparison — prevents timing attacks
if not hmac.compare_digest(sig, expected):
return False, "" # tampered
return True, action
except Exception:
return False, ""
Publisher Worker
File: services/publisher-worker/main.py
Consumes content.approved and distributes to all publishing platforms the client has configured. WordPress always publishes first — its URL becomes the canonical URL for all subsequent platforms.
Retry logic
MAX_RETRIES = 3
RETRY_DELAYS = [5, 15, 30] # seconds — exponential-ish backoff
for attempt in range(MAX_RETRIES):
try:
result = publisher.publish(draft, config)
# Record success in publications table
record_publication(conn, draft_id, platform, "published", result["url"])
break
except Exception as e:
if attempt == MAX_RETRIES - 1:
# All retries exhausted — send to dead-letter queue
redis.xadd("content.failed", {"draft_id": draft_id, "error": str(e)})
record_publication(conn, draft_id, platform, "failed", error=str(e))
else:
time.sleep(RETRY_DELAYS[attempt])
WordPress publisher — Markdown to HTML
def publish(self, draft: dict, config: dict) -> dict:
blog = draft["blog"]
body = blog["body_markdown"]
# Strip the FAQ section from body — it's added separately as structured HTML
# to prevent duplicates (once inline, once as schema markup at the bottom).
body_without_faq = strip_faq_section(body)
# Convert Markdown to HTML (using markdown library)
html_body = markdown.markdown(
body_without_faq,
extensions=["fenced_code", "tables", "nl2br"]
)
# Append FAQ as structured HTML (better for Google FAQ rich results)
if blog.get("faq_schema"):
html_body += build_faq_html(blog["faq_schema"])
# WordPress REST API call
response = requests.post(
f"{config['site_url']}/wp-json/wp/v2/posts",
auth=(config["username"], config["app_password"]), # Application Passwords
json={
"title": blog["title"],
"slug": blog["slug"],
"content": html_body,
"excerpt": blog.get("meta_description", ""),
"status": "publish",
"categories": resolve_categories(config),
}
)
return {"url": response.json()["link"]}
Redis Streams — The Message Bus
Redis Streams are the backbone of inter-service communication. They're more than a pub/sub queue — they're a durable, ordered, consumer-group-aware log of events.
Why Streams instead of HTTP?
| Property | Direct HTTP calls | Redis Streams |
|---|---|---|
| Crash safety | ❌ Request lost if receiver is down | ✅ Message waits until consumer is ready |
| At-least-once delivery | ❌ Manual retry logic needed | ✅ Built-in — unacked messages re-delivered |
| Decoupling | ❌ Sender must know receiver's address | ✅ Services only know the stream name |
| Backpressure | ❌ Fast sender overwhelms slow receiver | ✅ Slow consumer naturally applies backpressure |
| Audit trail | ❌ No built-in history | ✅ Stream is an ordered log (inspectable) |
Consumer groups explained
# Create consumer group (run once at startup)
redis.xgroup_create("news.filtered", "intelligence-workers", id="0", mkstream=True)
# Read NEW messages (> means "messages after my last position")
messages = redis.xreadgroup(
groupname="intelligence-workers",
consumername="worker-pod-1",
streams={"news.filtered": ">"},
count=10,
block=5000, # block for up to 5 seconds waiting for new messages
)
# Process each message...
for stream_name, msg_list in (messages or []):
for msg_id, fields in msg_list:
try:
process(fields)
# ACK only AFTER successful processing
# If this line is never reached (crash), Redis re-delivers the message
redis.xack("news.filtered", "intelligence-workers", msg_id)
except Exception as e:
# Don't ACK on failure — message will be re-delivered
log.error("Processing failed: %s", e)
When a message is delivered but not yet ACK'd, Redis holds it in the Pending Entries List. If the worker crashes, these messages stay in PEL and are re-delivered when the worker restarts. This is how the pipeline survives pod crashes with zero data loss.
Inspecting streams from kubectl
# How many messages are waiting in the pipeline?
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen news.filtered
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen content.drafts
# How many messages are in the dead-letter queue?
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen content.failed
# Inspect last 5 messages in a stream
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xrevrange news.filtered + - COUNT 5
Multi-Provider AI Architecture
File: services/intelligence-worker/providers/
Every AI call goes through a provider abstraction layer. The rest of the codebase calls provider.complete(system, user) — it never knows or cares which underlying model or API it's using.
The provider interface
from abc import ABC, abstractmethod
from dataclasses import dataclass
@dataclass
class LLMResponse:
text: str
input_tokens: int
output_tokens: int
cache_read_tokens: int = 0 # Anthropic prompt caching
cache_write_tokens: int = 0
class LLMProvider(ABC):
@property
@abstractmethod
def model_id(self) -> str: ...
@abstractmethod
def complete(
self,
system: str,
user: str,
max_tokens: int = 512,
cache_system: bool = False, # Anthropic-specific, ignored by others
) -> LLMResponse: ...
Provider factory — model name → provider instance
def get_provider(model: str) -> LLMProvider:
"""Return the correct provider instance based on model name prefix."""
api_key_map = {
"claude-": ("ANTHROPIC_API_KEY", AnthropicProvider),
"gpt-": ("OPENAI_API_KEY", OpenAIProvider),
"o1-": ("OPENAI_API_KEY", OpenAIProvider),
"o3-": ("OPENAI_API_KEY", OpenAIProvider),
"gemini-": ("GOOGLE_API_KEY", GoogleProvider),
"deepseek-": ("DEEPSEEK_API_KEY", DeepSeekProvider),
}
for prefix, (env_var, ProviderClass) in api_key_map.items():
if model.startswith(prefix):
api_key = os.environ.get(env_var)
if not api_key:
raise EnvironmentError(f"{env_var} not set for model {model!r}")
return ProviderClass(api_key=api_key, model=model)
raise ValueError(f"Unknown model: {model!r}")
Two layers of model selection — both editable with no restart
Every Gate that calls an LLM resolves its provider through the same two-layer fallback. Layer 1 is a per-client override (clients.gate2_model / clients.gate4_model, set from each client's "LLM models" tab). Layer 2 is a platform-wide default, stored in platform_settings (default_gate2_model / default_gate4_model) and editable from /admin/quality-review — this replaces what used to be a Helm-only, restart-required setting (RELEVANCE_MODEL / GENERATION_MODEL), which now only acts as the last-resort fallback if no platform_settings row exists yet.
def _resolve_default_model(conn, setting_key, env_default) -> str:
"""platform_settings override of the Helm env var — no restart needed."""
with conn.cursor() as cur:
cur.execute("SELECT value FROM platform_settings WHERE key = %s", (setting_key,))
row = cur.fetchone()
return row[0] if row else env_default
# Layer 2: admin-edited platform default, falling back to the Helm env var
gate2_default = _resolve_default_model(conn, "default_gate2_model", GATE2_MODEL)
# Layer 1: per-client override wins if set
try:
gate2_provider = get_provider(client.get("gate2_model") or gate2_default)
except EnvironmentError as exc:
# configured provider's API key missing — fall back to the Helm default
log.warning("Gate2 provider key missing (%s) — falling back to %s", exc, GATE2_MODEL)
gate2_provider = get_provider(GATE2_MODEL)
One setting, multiple gates
| Setting | Gates it controls |
|---|---|
gate2_model / default_gate2_modelcheap tier — Haiku by default |
Gate 2 relevance scoring, Gate 3.5 ICP-fit, Gate 5 cheap_review (both rounds) |
gate4_model / default_gate4_modelquality tier — Sonnet by default |
Gate 4 generation, Gate 5 targeted_revision |
| Provider | Models | Env var required |
|---|---|---|
| Anthropic (default) | claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7 | ANTHROPIC_API_KEY |
| OpenAI | gpt-4o, gpt-4o-mini, o1-*, o3-* | OPENAI_API_KEY |
gemini-2.5-pro, etc. | GOOGLE_API_KEY | |
| DeepSeek | deepseek-chat, deepseek-reasoner | DEEPSEEK_API_KEY |
/admin/quality-reviewThe "LLM model defaults" card lets an admin change the platform-wide Gate 2/3.5/5-review and Gate 4/5-revision models in two dropdowns — takes effect on the next pipeline run, no pod restart. The per-client "LLM models" tab (on each client's profile) overrides these per-client, e.g. routing one budget client through deepseek-chat while everyone else uses Anthropic.
Prompt Caching — 80% Cost Saving
Anthropic's prompt caching lets you mark a portion of the prompt as "cache this". On subsequent API calls with the same cached prefix, Anthropic serves it from cache at ~10% of the normal input token cost.
How the AnthropicProvider implements it
def complete(self, system: str, user: str, max_tokens: int = 512,
cache_system: bool = False) -> LLMResponse:
# Without caching: system is just a string
# With caching: wrap it in a block with cache_control
if cache_system:
system_block = [{
"type": "text",
"text": system,
"cache_control": {"type": "ephemeral"}, # ← this is the magic
}]
else:
system_block = system # plain string — no caching
response = self._client.messages.create(
model=self._model,
max_tokens=max_tokens,
system=system_block,
messages=[{"role": "user", "content": user}],
)
return LLMResponse(
text=response.content[0].text,
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
# These fields tell you how the caching is performing:
cache_read_tokens = getattr(response.usage, "cache_read_input_tokens", 0) or 0,
cache_write_tokens = getattr(response.usage, "cache_creation_input_tokens", 0) or 0,
)
Why it saves ~80% for Gate 2
# Batch 1 — system prompt is WRITTEN to cache (full price for system tokens)
# cache_write_tokens = 800 (the client profile system prompt)
# input_tokens = 800 + 1200 (system + 8 article titles)
# Batch 2–13 — system prompt is READ from cache (10% of normal price)
# cache_read_tokens = 800 (same system prompt, served from cache)
# input_tokens = 1200 (only the 8 article titles — system not billed at full rate)
# Net saving: 12 batches × 800 tokens × 90% discount = 8,640 tokens saved
# At $0.00025/1K input tokens (Haiku): saves ~$0.002/run/client
# Across 365 days: ~$0.73/year/client from caching alone
Only when the system prompt is identical across multiple calls in the same session. Gate 2 qualifies perfectly — same client profile repeated across 13 batches. Gate 4 doesn't cache its system prompt because it varies per article (different evidence pack, different competitor context).
Auth & Security
File: services/approval-service/auth.py
The auth system has zero external dependencies — no PyJWT, no authlib. Everything is implemented with Python's standard library. This keeps the container image lean and eliminates supply-chain risk from auth libraries.
JWT implementation from scratch
# JWT format: base64url(header) . base64url(payload) . base64url(signature)
def _b64(data: bytes) -> str:
# URL-safe base64 with "=" padding stripped (JWT spec requires unpadded)
return urlsafe_b64encode(data).rstrip(b"=").decode()
def create_jwt(client_id: str, email: str) -> str:
header = _b64(json.dumps({"alg": "HS256", "typ": "JWT"}).encode())
payload = _b64(json.dumps({
"client_id": client_id,
"email": email,
"exp": int(time.time()) + JWT_EXPIRY_MINS * 60,
"iat": int(time.time()),
}).encode())
signing_input = f"{header}.{payload}"
sig = _b64(hmac.new(
JWT_SECRET.encode(),
signing_input.encode(),
hashlib.sha256,
).digest())
return f"{signing_input}.{sig}"
def decode_jwt(token: str) -> Optional[dict]:
try:
header, payload, sig = token.split(".")
signing_input = f"{header}.{payload}"
expected = _b64(hmac.new(JWT_SECRET.encode(), signing_input.encode(), hashlib.sha256).digest())
# Constant-time comparison — prevents timing side-channel attacks
if not hmac.compare_digest(sig, expected):
return None # tampered token
data = json.loads(_unb64(payload))
if data.get("exp", 0) < time.time():
return None # expired
return data
except Exception:
return None # never raises — bad tokens always return None
Password hashing
def hash_password(password: str) -> str:
"""Hash a password for storage. Format: {hex_salt}:{hex_hash}"""
salt = secrets.token_hex(16) # 16 bytes of cryptographic randomness
h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt.encode(), 260_000)
return f"{salt}:{h.hex()}"
def _verify_password(password: str, stored_hash: str) -> bool:
salt, hex_hash = stored_hash.split(":", 1)
h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt.encode(), 260_000)
return hmac.compare_digest(h.hex(), hex_hash) # constant-time
# 260,000 iterations: ~100ms on a modern CPU.
# This means an attacker can only try ~10 passwords/second per core.
# bcrypt would also work — PBKDF2 is chosen because it's in Python stdlib (no dependency).
Timing attack prevention
_DUMMY_HASH = "dummy:000000000000000000000000000000000000000000000000000000000000000"
def _dummy_hash_check(password: str) -> None:
"""Run a full PBKDF2 computation even for non-existent users.
Without this: login for unknown@email.com returns in 1ms (DB miss).
login for real@email.com returns in 100ms (hash computed).
An attacker measures the difference to discover which emails are registered.
With this: both paths take ~100ms regardless. Side channel eliminated.
"""
hashlib.pbkdf2_hmac("sha256", password.encode(), b"dummy", 260_000)
# Result is discarded — we only run this for its timing effect
Cookie security
response.set_cookie(
key = "ci_session",
value = token,
httponly = True, # JS cannot read this cookie — protects against XSS token theft
secure = True, # browser only sends it over HTTPS — prevents network sniffing
samesite = "lax", # sent on top-level same-site navigations — CSRF protection
max_age = JWT_EXPIRY_MINS * 60,
)
Security summary
| Threat | Mitigation |
|---|---|
| XSS token theft | httponly=True on JWT cookie |
| Network sniffing | secure=True — HTTPS only |
| CSRF | samesite="lax" + form tokens |
| Timing attacks (login) | Dummy PBKDF2 hash for missing users |
| Timing attacks (comparison) | hmac.compare_digest everywhere |
| Password cracking | PBKDF2-SHA256, 260k iterations, per-user salt |
| Forged approval links | HMAC-SHA256 signed, 7-day expiry |
| Cross-tenant data leak | client_id from JWT only — never from request body |
Multi-tenancy — Tenant Isolation
Every table with client data has a client_id UUID column. The critical rule: client_id is sourced from the signed JWT only. It is never trusted from the request body.
The middleware pattern
# Every protected route extracts client_id from the JWT:
@router.get("/dashboard")
def dashboard(request: Request):
client = require_client(request) # decodes JWT, returns payload dict
client_id = client["client_id"] # ← from JWT signature, not request params
# All DB queries are scoped to this client_id
drafts = get_drafts(conn, client_id) # SELECT ... WHERE client_id = %s
return render("dashboard.html", drafts)
# What an attacker CANNOT do:
# GET /dashboard?client_id= → client_id from URL is IGNORED
# POST /approve with body {"client_id": "..."} → body client_id is IGNORED
# The client_id is read exclusively from the signed cookie/Bearer JWT.
Database schema (tenant isolation)
-- One row per tenant
CREATE TABLE clients (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
business_name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
industry_type TEXT,
target_geo JSONB,
keywords JSONB,
active BOOLEAN DEFAULT TRUE
);
-- All tenant-scoped tables have client_id FK
CREATE TABLE content_drafts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
title TEXT,
body_markdown TEXT,
status TEXT DEFAULT 'pending', -- pending / approved / rejected / published
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- news_items is SHARED across all clients (Gate 1 runs once per article)
-- client_relevance maps articles to clients (Gate 2 runs per-client)
CREATE TABLE client_relevance (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
news_item_id UUID NOT NULL REFERENCES news_items(id),
relevance_score INT,
processed BOOLEAN DEFAULT FALSE
);
One bug that trusts client_id from the request body instead of the JWT could leak every tenant's data to any other tenant. The enforcement is at the application layer — there's no database-level row security (yet). Every developer must follow the pattern: always get client_id from the decoded JWT.
Evidence Pipeline
Before Gate 4 runs, the platform searches for real supporting evidence so Claude can cite actual sources and verified figures — not hedged vague claims.
Tier 1 — Quick Serper enrichment
A short Serper API search on {article title} + {top 3 client keywords} returns 5 snippets. Injected into Gate 4 as lightweight context. Degrades to [] if SERPER_API_KEY is not set.
def quick_enrich(article: dict, keywords: list[str]) -> list[dict]:
"""Fetch 5 Serper snippets for evidence context. Degrades gracefully."""
api_key = os.environ.get("SERPER_API_KEY")
if not api_key:
return [] # feature disabled — Gate 4 runs without enrichment
query = f"{article['title']} {' '.join(keywords[:3])}"
try:
resp = requests.post(
"https://google.serper.dev/search",
headers={"X-API-KEY": api_key},
json={"q": query, "num": 5},
timeout=10,
)
results = resp.json().get("organic", [])
return [{"title": r["title"], "snippet": r["snippet"],
"url": r["link"], "source": r.get("displayLink")}
for r in results]
except Exception:
return [] # any error → degrade gracefully, never block Gate 4
Tier 2 — Deep Haiku evidence gathering
When enabled, Haiku runs 5–10 targeted searches, fetches and strips HTML from source pages, then classifies each source and extracts claims with confidence levels.
# Haiku produces a structured evidence pack:
evidence_pack = {
"verified_claims": [
{
"claim": "The RBA raised rates by 25bps to 4.35%",
"source": "RBA official statement",
"confidence": "high",
"safe_phrasing": "According to the RBA's official statement...",
}
],
"claims_to_avoid": [
{
"claim": "Rates will fall by end of 2024",
"reason": "Prediction without verifiable source"
}
],
"recommended_references": [
{"title": "RBA Rate Decision — May 2026", "url": "https://rba.gov.au/..."}
],
"source_classifications": [
{"url": "...", "type": "government_or_regulator", "allowed_to_use": True}
]
}
# This entire pack is injected into the Gate 4 system prompt.
# Gate 4 is instructed: "Only use statistics and dates from VERIFIED CLAIMS.
# Never use anything in CLAIMS TO AVOID."
Content Angles — The Core Differentiator
Competitors rewrite news. This platform generates opinionated, differentiated content. Claude selects the best angle for each article+client combination from 8 options — and avoids angles that competitors have already taken.
Competitor angle avoidance
# Competitor angles are inferred from title patterns — no AI cost
ANGLE_PATTERNS = {
"local_impact": [r"what .+ means for .+", r"how .+ affects .+"],
"action_list": [r"\d+ things? (to|you should)", r"what (to do|businesses should)"],
"contrarian": [r"why .+ (is|might be) wrong", r"the truth about"],
"faq_explainer": [r"everything you need to know", r"what is .+ and why"],
"expert_commentary": [r"why .+ matters", r"what .+ means for the industry"],
}
def infer_competitor_angle(title: str) -> Optional[str]:
title_lower = title.lower()
for angle, patterns in ANGLE_PATTERNS.items():
if any(re.search(p, title_lower) for p in patterns):
return angle
return None
# Result: competitor analysis returns avoid_angles = ["local_impact", "faq_explainer"]
# These are injected into the Gate 4 prompt:
# "DO NOT use local_impact — Competitor X already published that angle.
# DO NOT use faq_explainer — Competitor Y already published that angle.
# Choose a different angle that provides unique value."
H1 title rules (hardcoded in the prompt)
H1_RULES = """
H1 TITLE RULES (mandatory):
- Must be a question format: How / Why / What / When / Should / Can / Is
- Primary keyword must appear within the first 8 words
- Include "2026" for informational, how-to, FAQ, and local SEO articles
FORBIDDEN TITLE PATTERNS (never use these):
- "5 things", "5 steps", "10 ways" (numbered lists)
- "Understanding [topic]"
- "Everything you need to know about"
- "changes everything" / "ultimate guide"
- "What X needs to know" (where X = the reader's role)
"""
# Example of good titles:
# "Why should Australian mortgage brokers rethink fixed rates in 2026?"
# "How does the RBA cash rate affect first-home buyers in Brisbane in 2026?"
# "What do cybersecurity teams need to know about the new APRA ruling?"
"5 things" articles are commoditised — every content farm produces them. "Understanding X" signals generic educational content, not actionable expert advice. "Everything you need to know" is overused and Google's helpful content guidelines penalise these patterns. The question-format rule is based on SEO data showing that question-format titles consistently outperform declarative titles for featured snippets.