Fundamentals — Start Here

Reading This Codebase
From Zero

You don't need to know Python or web development to follow this guide — but the deep-dive sections below assume you do. This section bridges that gap using real code from this repo. Every example is copy-pasted from an actual file, with line-by-line annotations.

Python syntax, using this repo as the textbook

Python reads close to English. The whole codebase is built from a small number of repeating shapes: imports, functions, dictionaries, lists, and conditionals. Once you recognise these five shapes, you can read almost any file in this repo.

1. Imports — "borrow code other people wrote"

Every file starts by importing the tools it needs. services/ingestion-worker/sources/rss.py is a small, complete file — a good first read:

services/ingestion-worker/sources/rss.py
import logging         # log parse failures and malformed feeds without crashing

import feedparser      # universal RSS/Atom parser — handles most feed quirks
from dateutil import parser as dateparser, tz
How to read this

import feedparser means "load the feedparser library, and refer to it as feedparser below." from dateutil import parser as dateparser, tz means "from the dateutil library, take just the parser and tz pieces — and call parser by the nickname dateparser instead, because parser is a generic name that would be confusing on its own."

2. Functions — "a named, reusable recipe"

A function is defined with def name(inputs) -> output_type:. Everything indented underneath belongs to the function.

services/ingestion-worker/sources/rss.py
def fetch(source: dict, max_articles: int = 20) -> list[dict]:
    """
    Fetch articles from one RSS/Atom feed and return normalised dicts.
    """
    try:
        resp = _requests.get(source["url"], timeout=15)
        feed = feedparser.parse(resp.content)
    except Exception as e:
        log.warning("RSS parse failed %s: %s", source["url"], e)
        return []

def fetch(source: dict, max_articles: int = 20) -> list[dict]:

Defines a function named fetch. It takes one required input source (a dict — see below) and one optional input max_articles which defaults to 20 if the caller doesn't supply it. -> list[dict] says: "this function returns a list of dicts." These type hints don't change how Python runs — they're documentation that tools like editors and Claude can check.

try / except

"Try to do this; if it throws an error, do that instead." Here: try to fetch and parse the feed. If the network call fails for any reason (timeout, DNS failure, bad SSL cert — all caught by Exception), log a warning and return an empty list rather than crashing the whole worker.

return []

return sends a value back to whoever called this function. [] is an empty list. The caller (in main.py) gets back zero articles for this source and moves on to the next one — one broken RSS feed never takes down the whole ingestion run.

3. Dictionaries — "a labelled box of values"

A dict (dictionary) is Python's version of a JSON object: a set of key: value pairs, accessed by name with square brackets. This is the single most common data shape in the codebase — every database row, every API request body, and every config object is a dict.

services/ingestion-worker/sources/rss.py
articles.append({
    "source_id":   source["id"],            # UUID — foreign key to sources table
    "source_name": source["name"],          # human-readable name — used in drop logs
    "client_id":   source["client_id"],     # UUID — which client owns this source
    "trust_score": source["trust_score"],   # 0.0–1.0 — used by Gate 1 rule 2

    "client_keywords": source["keywords"] or [],          # TEXT[] from DB
    "client_excluded": source["excluded_topics"] or [],   # TEXT[] from DB

    "url":          url,
    "title":        title,
    "summary":      summary[:2000],   # cap to prevent very long articles
    "published_at": published_at,     # datetime or None
})
How to read this

source["id"] means "look up the value stored under the key "id" in the dict called source." The whole { ... } block is building a new dict — a normalised "article" record — by pulling values out of source (the RSS source config) and combining them with values parsed from the feed (url, title, etc). source["keywords"] or [] means "use source["keywords"] if it's not empty/None, otherwise use an empty list" — a one-line default. summary[:2000] means "take only the first 2000 characters of summary" — this is Python's slice syntax.

4. Lists and loops — "do this for every item"

services/ingestion-worker/sources/rss.py
articles = []

for entry in feed.entries[:max_articles]:
    url   = entry.get("link", "").strip()
    title = entry.get("title", "").strip()

    if not url or not title:
        continue  # skip this entry and move to the next

    articles.append({ ... })

return articles

articles = []

Creates an empty list — an ordered, growable collection. Think of it as an empty shopping basket that articles.append(...) will add items to, one per loop iteration.

for entry in feed.entries[:max_articles]:

"For each item in this collection, run the indented block once with entry set to that item." feed.entries[:20] is a slice — "give me only the first 20 entries," so a feed with 1,000 backlogged articles doesn't flood the pipeline on first run.

entry.get("link", "").strip()

.get("link", "") is the safe version of entry["link"] — if the key "link" doesn't exist, return "" (empty string) instead of crashing. .strip() removes leading/trailing whitespace. Chaining methods like this — .get(...).strip() — is everywhere in this codebase.

if not url or not title: continue

not url is True when url is an empty string. continue means "stop processing this item, jump straight to the next loop iteration." So: any feed entry missing a link or title is silently skipped.

The pattern that repeats everywhere

Almost every function in this codebase follows the same shape: receive a dict (or list of dicts) → loop / transform / filter → return a dict (or list of dicts). Once you see this, files like gate1_rules.py, gate2_relevance.py, and the admin routes in admin.py stop looking like a wall of text and start looking like the same five Lego bricks rearranged.

How FastAPI receives an HTTP request

"The frontend" in this platform is just HTML pages rendered by the server (no separate React/Vue app). When you click a button or submit a form in your browser, it sends an HTTP request over the network to approval-service, which is a Python process running FastAPI. FastAPI's job is: match the incoming request to the right Python function, run it, and turn whatever it returns into an HTTP response.

Anatomy of a route

services/approval-service/admin.py
@router.post("/admin/sources/{source_id}/toggle")
def admin_toggle_source(request: Request, source_id: str, client_id: str = Form(...)):
    _require_admin(request)
    ...
    return RedirectResponse(f"/admin/clients/{client_id}?flash=Source+updated", status_code=303)

@router.post("/admin/sources/{source_id}/toggle")

This line — a decorator — is FastAPI's routing table entry. It says: "when an HTTP POST request arrives at a URL matching /admin/sources/<anything>/toggle, call the function below it." The {source_id} part is a path parameter — a placeholder that captures whatever string is in that position of the URL.

def admin_toggle_source(request: Request, source_id: str, client_id: str = Form(...)):

FastAPI inspects this function's parameters and fills them in automatically: source_id: str comes from the {source_id} in the URL path; client_id: str = Form(...) means "read a field named client_id out of the submitted HTML form body" (the ... means it's required); request: Request gives access to cookies — used here to check the admin session.

return RedirectResponse(f"/admin/clients/{client_id}?flash=Source+updated", status_code=303)

Whatever a route function returns becomes the HTTP response. RedirectResponse(..., status_code=303) tells the browser "go fetch this other URL instead" — which is how a form submission ends up showing you the updated page. f"...{client_id}..." is an f-string: any {expression} inside is replaced with its value, so client_id = "f73c..." produces /admin/clients/f73c...?flash=Source+updated.

Where do GET and POST come from?

Every HTTP request has a method (GET, POST, etc.) and a path (the URL after the domain). Your browser sends:

GET Reading a page

Triggered by typing a URL, clicking a link, or the browser loading a page. No body — just "give me this page." Routes decorated @router.get(...) typically SELECT from Postgres and return a rendered HTML page (response_class=HTMLResponse).

POST Submitting a form / taking an action

Triggered by an HTML <form method="POST"> being submitted, or JavaScript's fetch(). Has a body containing the form fields. Routes decorated @router.post(...) typically INSERT/UPDATE Postgres, then redirect.

"Frontend talking to backend" — there is no separate frontend server

In many modern apps, "frontend" means a separate JavaScript app (React, Vue) that calls a JSON API. This platform doesn't do that for the admin UI. The "frontend" is HTML files in services/approval-service/templates/, rendered by the same Python process that talks to Postgres. The browser only ever talks to one thing: the FastAPI app, over plain HTTP, exchanging full HTML pages (and form submissions) — not JSON. The api-gateway service is the exception: it serves JSON to API clients (e.g. a future mobile app or third-party integration), authenticated with JWTs instead of cookies.

How a request finds its function

Browser clicks "Toggle" button
HTTP POST /admin/sources/<id>/toggle
FastAPI router matches decorator → function
admin_toggle_source() runs the Python code
Response redirect → browser re-renders

How the backend talks to Postgres

Every piece of data this platform knows about — clients, sources, news items, drafts — lives in one PostgreSQL database. Python talks to Postgres using a library called psycopg2, which lets you send raw SQL strings and get rows back as Python tuples.

Step 1 — open a connection

services/approval-service/admin.py
def get_db():
    # Open a new Postgres connection using the DATABASE_URL from the K8s Secret.
    # connect_timeout=10 prevents the handler from hanging indefinitely if the DB
    # is temporarily unreachable — without this, psycopg2 blocks forever and
    # Cloudflare returns a 524 to the browser.
    # Caller is responsible for calling conn.close() — always use try/finally.
    return psycopg2.connect(DATABASE_URL, connect_timeout=10)

DATABASE_URL is a single string like postgresql://user:password@host:5432/content_intelligence — it encodes the username, password, host, port, and database name. It's injected as an environment variable from a Kubernetes Secret (never hardcoded, never in Git — see the Configuration table in Multi-tenancy).

Step 2 — get a cursor, run SQL, read results

services/approval-service/admin.py
conn = get_db()
try:
    with conn.cursor() as cur:
        cur.execute("""
            SELECT id, source_type, name, url, trust_score, active, ...
            FROM sources WHERE client_id = %s ORDER BY source_type, name
        """, (client_id,))
        sources = cur.fetchall()
finally:
    conn.close()

conn = get_db()

Opens a connection — think of it as opening a phone line to the database. conn is now a Python object you can send commands through.

with conn.cursor() as cur:

A cursor is the thing that actually sends SQL and receives results, scoped to this connection. with ... as cur: is a context manager — it guarantees the cursor is cleaned up automatically when the indented block ends, even if an error occurs.

cur.execute("SELECT ... WHERE client_id = %s", (client_id,))

Sends the SQL string to Postgres. The %s is a placeholder — psycopg2 safely substitutes the value from the tuple (client_id,) in its place. This is the only safe way to insert variables into SQL. Never build SQL with f-strings/string concatenation — that's how SQL injection vulnerabilities happen.

sources = cur.fetchall()

Retrieves every row the query matched, as a Python list of tuples — e.g. [(id1, "rss", "Skift", "https://...", 0.8, True, ...), (id2, ...), ...]. Each tuple's values are in the same order as the columns listed in SELECT.

try / finally: conn.close()

Database connections are a limited resource — a connection pool can be exhausted if connections are opened and never closed. finally guarantees conn.close() runs whether the try block succeeded or raised an error.

Writing data: INSERT / UPDATE + commit

services/approval-service/admin.py
with conn.cursor() as cur:
    cur.execute(
        "UPDATE sources SET active = NOT active WHERE id=%s RETURNING active",
        (source_id,),
    )
    new_state = cur.fetchone()[0]
conn.commit()
Don't forget conn.commit()

By default, psycopg2 wraps every connection in a transaction. UPDATE/INSERT/DELETE statements only become permanent once you call conn.commit() — until then, they're invisible to every other connection (including psql in another terminal). RETURNING active is a Postgres feature that returns the new value of the row you just updated, in the same round-trip — avoiding a separate SELECT afterwards. cur.fetchone()[0] grabs the first column of the first (only) returned row.

The shape of a row, end to end

services/approval-service/admin.py — SELECT then unpack
cur.execute("""
    SELECT id, source_type, name, url, trust_score, active, last_fetched, ...
    FROM sources WHERE client_id = %s ORDER BY source_type, name
""", (client_id,))
sources = cur.fetchall()
# sources = [
#   ("a1b2...", "rss", "Skift", "https://skift.com/feed", 0.8, True, ...),
#   ("c3d4...", "rss", "Hospitality Net", "https://...", 0.7, True, ...),
# ]

Notice the SQL column order — id, source_type, name, url, trust_score, active, ... — matches the order of values inside each tuple. This ordering becomes important in the next section, where Jinja2 unpacks these tuples by position.

Jinja2: turning Python data into HTML

Once a route function has its data (a list of tuples from Postgres), it needs to turn that into an HTML page the browser can display. Jinja2 is a templating engine: you write an .html file with normal HTML, plus special {{ ... }} and {% ... %} tags that Python fills in at render time.

Returning a template from a route

FastAPI routes with response_class=HTMLResponse call templates.TemplateResponse(...), passing a context dict — every key in that dict becomes a variable available inside the template:

services/approval-service/admin.py (simplified)
return templates.TemplateResponse("admin_client_edit.html", {
    "request": request,
    "client": client,
    "sources": sources,   # the list of tuples from cur.fetchall()
    "now": datetime.now(timezone.utc),
})

Looping over rows: {% for %}

The template receives sources — a list of tuples, one per source row. {% for %} unpacks each tuple's positional values into named variables, in the exact same order as the SQL SELECT:

services/approval-service/templates/admin_client_edit.html
{% for sid, stype, sname, surl, strust, sactive, slast, serr, serrcount,
       serrat, ssugg, squarantined, squarcount, sauto in sources %}
{% if stype != 'competitor' %}
<tr>
  <td>{{ sname }}</td>
  <td>{{ surl }}</td>
  <td>{{ strust }}</td>
  <td>
    {% if sactive %}<span class="badge green">On</span>
    {% else %}<span class="badge red">Off</span>{% endif %}
  </td>
</tr>
{% endif %}
{% endfor %}

{% for sid, stype, sname, ... in sources %}

This is identical in spirit to Python's for x in list: — but it destructures each tuple into 14 named variables in one go, exactly like Python's sid, stype, sname = some_tuple. sid is column 1 (id), stype is column 2 (source_type), and so on — matching the SQL SELECT id, source_type, name, ... from the previous section position-for-position. If the SQL column order and this list ever get out of sync, variables silently point at the wrong data — a common bug source.

{{ sname }}

Double curly braces output a value as text into the HTML. If sname is "Skift", the rendered HTML contains <td>Skift</td>. Jinja2 automatically escapes special characters (so a source named <script> renders as harmless text, not executable HTML) — this is the main defence against stored XSS.

{% if sactive %} ... {% else %} ... {% endif %}

A conditional, just like Python's if/else. sactive is the Python boolean True or False coming straight from the Postgres active column. Depending on its value, one of two badges is rendered.

Forms: how a click becomes a POST request

Every button that changes data is inside an HTML <form>. The browser doesn't need any JavaScript to send a POST — submitting a form is a built-in browser feature:

services/approval-service/templates/admin_client_edit.html
<form class="inline-form" method="POST" action="/admin/sources/{{ sid }}/toggle">
  <input type="hidden" name="client_id" value="{{ client.id }}">
  <button class="btn btn-sm">Toggle</button>
</form>

method="POST" action="/admin/sources/{{ sid }}/toggle"

{{ sid }} is filled in at render time with this row's actual source UUID — e.g. /admin/sources/a1b2c3.../toggle. When the button is clicked, the browser sends an HTTP POST to exactly that URL — which is the URL pattern matched by @router.post("/admin/sources/{source_id}/toggle") from the FastAPI section above. The {{ sid }} in the template and the {source_id} in the route decorator are how the rendered HTML "knows" which Python function will handle the click.

<input type="hidden" name="client_id" value="{{ client.id }}">

A hidden field — invisible to the user, but still submitted with the form. This is how client_id ends up available as client_id: str = Form(...) in the Python route, without the user seeing or typing it.

The full loop, named

SQL column order (admin.py) → Jinja2 unpacking order (admin_client_edit.html's {% for %}) → rendered {{ sid }} values inside form action URLs → browser POST → FastAPI {source_id} path parameter → new SQL UPDATE. Five files, one thread. The next section walks this exact thread, click by click.

Two different ways code runs in this platform

So far, every example reacted to a browser click — code runs, sends a response, and stops, waiting for the next request. But three of the five services (ingestion-worker, intelligence-worker, publisher-worker) never receive HTTP requests at all. They run forever in a loop. Both patterns matter, and confusing them is a common source of "wait, who calls this function?" confusion when reading the code.

Request/Response approval-service, api-gateway

The process sits idle until an HTTP request arrives, runs one route function, returns a response, then goes back to idle. Like a shop assistant who only acts when a customer walks up to the counter. Driven entirely by FastAPI's router — there is no while True loop in this code.

Background Worker ingestion-, intelligence-, publisher-worker

The process runs an infinite while True: loop from the moment it starts. No browser is involved. Like a security guard doing rounds every few minutes, whether or not anything has happened. Driven by time.sleep() (polling on a timer) or redis.xreadgroup(..., block=5000) (waiting on a queue).

What a worker's main loop actually looks like

services/intelligence-worker/main.py
while True:
    try:
        # block=5000 → wait up to 5 seconds for new messages before looping.
        # count=5    → process up to 5 messages per batch.
        # ">"        → only undelivered messages (not pending/unacked ones).
        messages = r.xreadgroup(
            CONSUMER_GROUP, CONSUMER_NAME,
            {STREAM_IN: ">"},
            count=5,
            block=5000,
        )
    except redislib.ConnectionError as e:
        log.error("Redis connection lost: %s — retrying in 5s", e)
        time.sleep(5)
        continue

    for stream, msgs in messages:
        for msg_id, fields in msgs:
            process_message(fields)   # runs Gates 2-4 for one news item

while True:

An infinite loop — runs forever until the process is killed (e.g. by Kubernetes during a deploy). This is the entire "main program" for a worker service — there's no router, no incoming connections to wait for.

r.xreadgroup(..., block=5000)

block=5000 means "wait up to 5000 milliseconds (5 seconds) for a new message to appear on the news.filtered Redis Stream — if one arrives sooner, return immediately; if none arrives, return empty after 5s and loop again." This is not a busy-loop burning CPU — the process is asleep, parked on this call, until Redis wakes it up or the timeout passes.

for stream, msgs in messages: for msg_id, fields in msgs:

A nested loop — Redis can return messages from multiple streams, and each stream can return multiple messages in one batch (count=5). The inner loop processes each message one at a time by calling process_message(fields) — this is where Gate 2 → Gate 3 → Gate 4 actually run for that news item.

Why this split matters for tenant data

When you're reading admin.py, every variable (client_id, request) exists only for the duration of one HTTP request — created fresh each time, discarded after the response is sent. When you're reading intelligence-worker/main.py, variables like conn (the DB connection) and last_gate5_rerun_poll persist across thousands of loop iterations for the lifetime of the pod — which is why you'll see explicit reconnect logic (conn = get_db() inside an except block) that a request/response handler never needs: a worker can't just "wait for the next request" to get a fresh connection, because there is no next request — there's only the next loop iteration, which it has to survive itself.

You now have the full picture

Browser → FastAPI route → psycopg2 → Postgres → Jinja2 → HTML (request/response), and while True: → Redis Stream → Gates 2-4 → Postgres (background worker) — these are the only two "shapes" of execution anywhere in this codebase. Every file you open is one function inside one of these two shapes. The walkthrough below ties the request/response shape together with a real click, end to end.

End to end: clicking "Toggle" on a source

This is the smallest complete action in the admin dashboard: on a client's edit page, every RSS source has a Toggle button that switches it on/off without a page reload-from-scratch. It touches all four pieces from the Fundamentals sections above — HTML form, FastAPI route, psycopg2/Postgres, Jinja2 re-render — in under 10 lines of real code each. Step through it below.

🖥️
Browser
admin_client_edit.html
📡
HTTP POST
/admin/sources/<id>/toggle
🐍
FastAPI route
admin_toggle_source()
🔌
psycopg2
get_db() + cursor
🗄️
Postgres
sources table
↩️
Redirect
303 → back to browser
Step 1 of 7
What to notice as you click through

The URL in step 2 (/admin/sources/<id>/toggle) is constructed by Jinja2 in step 1 from {{ sid }}, and is the same string that FastAPI's @router.post("/admin/sources/{source_id}/toggle") pattern matches in step 3 — that's the entire "connection" between frontend and backend: a shared URL string. Similarly, the SQL column order in step 5 is what determines the variable order in the {% for %} loop back in step 7. Nothing here is magic — it's strings and lists, lining up by convention.

Platform Learning Guide

Content Intelligence
Platform

A multi-tenant SaaS that monitors real-time industry news, detects high-impact events, and generates SEO-optimised content before competitors react. Built on Kubernetes with Claude AI at its core.

4
Cost gates
~$3.33
AI cost / client / month
~4h
News → Published
3–5
Drafts / client / day

Why this platform exists

Most businesses know they should publish about industry news. They don't — because the workflow is broken:

1

News breaks at 9am

A reporter reads about it. Maybe.

2

Writer assigned at 11am

They research, draft, edit. Takes 3–4 hours.

3

Approval chain — next day

Manager reviews. Revisions. Legal check. More revisions.

4

Published 48+ hours later

Google has already indexed 50 competitor articles. The SEO window is closed.

The Insight

The SEO advantage belongs to whoever publishes first with quality content. Every hour of delay is lost organic ranking opportunity. This platform compresses the window from 48+ hours to under 4 hours.

What the platform compresses

News breaks RSS / Google News
Detected & scored < 5 minutes
Draft generated < 2 minutes
Human approves 1 click
Live on WordPress seconds

System Architecture

Four microservices communicate via Redis Streams — a durable, ordered message queue. No service calls another directly over HTTP. This means if one crashes, the others continue and no data is lost.

Python ingestion-worker
  • Polls RSS feeds every 2 minutes
  • Runs Gate 1 (zero-cost rules)
  • Deduplicates by URL hash
  • Quarantines broken sources
  • Publishes to news.filtered
Python intelligence-worker
  • Consumes news.filtered
  • Runs Gates 2, 3, 4
  • Competitor gap analysis
  • Evidence gathering (Serper)
  • Publishes to content.drafts
FastAPI approval-service
  • Web dashboard for clients
  • Email digest with approve/reject links
  • Admin panel for operators
  • JWT auth (cookie + Bearer)
  • Publishes to content.approved
Python publisher-worker
  • Consumes content.approved
  • WordPress REST API (Markdown→HTML)
  • Dev.to API (native Markdown)
  • Retry with exponential backoff
  • Records to publications table

Redis Stream topology

news.filtered ingestion-worker produces, intelligence-worker consumes
content.drafts intelligence-worker produces, approval-service notified
content.approved approval-service produces, publisher-worker consumes
news.failed dead letter — ingestion errors (replay with scripts/replay-failed.py)
content.failed dead letter — generation/publishing errors
Design Principle: No Direct HTTP Between Services

Services never call each other's HTTP endpoints. Everything goes through Redis Streams. If intelligence-worker crashes, ingestion-worker keeps writing to the stream. When intelligence-worker restarts, it resumes exactly where it left off — no messages lost.

Why the 4-Gate Funnel Exists

Claude Sonnet costs ~$0.015 per content generation call. Without filtering, running every ingested article through Sonnet would cost ~$450/client/month. With the funnel: ~$3.33/client/month (the original $2.55 funnel cost, plus Gate 3.5 ICP-fit scoring and Gate 5's automated review/auto-revision — see below).

1,000/day
Gate 1 — Rules Engine  drops ~90%
$0.00
~100/day
Gate 2 — Haiku  drops ~80%
$0.0001/call
~20/day
Gate 3 — Signal  drops ~75%
$0.00
~5/day
Gate 3.5 — ICP-Fit (Haiku)  audience/business re-rank
$0.0003/call
~5/day
Gate 4 — Sonnet
$0.015/call
~5/day
Gate 5 — Review (Haiku) + Auto-Revision (Sonnet, ~30% of drafts)
$0.0003 + ~$0.015
Gate Method Volume Cost/call Monthly
Gate 1 — Rules Pure Python 30,000 articles $0.00 $0.00
Gate 2 — Haiku Claude Haiku (batched) 3,000 articles $0.0001 $0.30
Gate 3 — Signal Postgres + pytrends 600 articles $0.00 $0.00
Gate 3.5 — ICP-Fit Claude Haiku 150 articles $0.0003 $0.045
Gate 4 — Sonnet Claude Sonnet 150 articles $0.015 $2.25
Gate 5 — cheap_review (round 1) Claude Haiku 150 drafts $0.0003 $0.045
Gate 5 — targeted_revision Claude Sonnet ~45 drafts (30% in 70-84 band) $0.015 $0.675
Gate 5 — cheap_review (round 2) Claude Haiku ~45 drafts (only if revised) $0.0003 $0.0135
Total ~$3.33
Key Design Rule

Each gate must be cheaper than the next. You never spend $0.015 on something that a $0.00 rule would have caught. The gates are deliberately ordered from cheapest to most expensive. Gate 5's targeted_revision is the one exception — it's a Sonnet call, but it only runs for the ~30% of drafts that scored 70-84 with no critical issues, and it replaces what would otherwise be manual admin editing time.

Gate 1 — Rules Engine

File: services/ingestion-worker/stages/gate1_rules.py

Zero-cost filter. Runs entirely in Python — no database queries, no API calls, no network. Eliminates ~90% of articles before any AI is touched. Six rules run in order; the first failure short-circuits (fast reject).

1
Minimum content length

Drop articles with fewer than 50 chars of title+summary. These are usually empty RSS entries, tracking pixels, or fetcher errors with no useful content.

2
Source trust score

Each source has a trust_score (0.0–1.0) set by the admin. Sources below 0.4 are dropped. Prevents spam aggregators from polluting the pipeline.

3
Recency check

Drop articles older than 48 hours (configurable). Old news generates low-value content and hurts SEO freshness signals. Compares published_at against a UTC cutoff.

4
Hard exclusions

If the article matches any of the client's excluded topics, drop it — even if a keyword also matches. Example: a mortgage broker has "security" as a keyword but "gaming" excluded. "Gaming security" gets dropped.

5
Urgency override

Breaking news bypasses the keyword check entirely. If the title contains "breaking", "emergency", etc., it passes Gate 1 regardless of keyword match. Real-time events shouldn't wait for keyword list tuning.

6
Keyword match

The article must contain at least one of the client's configured keywords. Case-insensitive substring search. The last and most expensive check — only reached if all previous rules passed.

The actual code

gate1_rules.py Python
class Gate1Rules:
    def __init__(self, min_content_length=50, source_trust_min=0.4,
                 max_age_hours=48, urgency_keywords=None):
        # Pre-compute a lowercase set for O(1) membership checks in the hot path.
        # A set lookup is O(1) vs O(n) for a list — matters when called 1000x/day.
        self.urgency_keywords = set(k.lower() for k in (urgency_keywords or []))

    def check(self, item, keywords, excluded):
        text = f"{item.get('title', '')} {item.get('summary', '')}".strip()

        # Rule 1: minimum content length
        if len(text) < self.min_content_length:
            return False, "too_short"

        # Rule 2: source trust score — fail-open (default 1.0 if missing)
        if item.get("trust_score", 1.0) < self.source_trust_min:
            return False, "low_trust_source"

        # Rule 3: recency check
        pub = item.get("published_at")
        if pub:
            if pub.tzinfo is None:           # feedparser returns naive datetimes
                pub = pub.replace(tzinfo=timezone.utc)
            cutoff = datetime.now(timezone.utc) - timedelta(hours=self.max_age_hours)
            if pub < cutoff:
                return False, "stale"

        text_lower = text.lower()            # compute once, use below

        # Rule 4: hard exclusions (checked BEFORE keyword match)
        for exc in (excluded or []):
            if exc.lower() in text_lower:
                return False, f"excluded:{exc}"

        # Rule 5: urgency override — breaking news bypasses keyword check
        if self.urgency_keywords and any(kw in text_lower for kw in self.urgency_keywords):
            return True, "urgency_override"

        # Rule 6: keyword match — the core relevance gate
        if not any(kw.lower() in text_lower for kw in (keywords or [])):
            return False, "no_keyword_match"

        return True, "passed"
Why rules run in this specific order

Length and trust checks are placed first because they require zero string operations on the article text. Exclusions run before keywords so that a forbidden topic can't slip through on a keyword match. Urgency override is placed after exclusions — even breaking news gets dropped if it matches an exclusion.

Gate 2 — Haiku Relevance Scoring

File: services/intelligence-worker/stages/gate2_relevance.py

Uses Claude Haiku (the cheapest Anthropic model) to score article relevance on a 0–100 scale. Three cost optimisations make this viable at scale: batching, prompt caching, and minimal input.

Three cost optimisations

1. Batching

8 articles per API call. Instead of 100 calls for 100 articles, you make 13 calls. The model scores all 8 articles in one response using index-based JSON.

2. Prompt Caching

The system prompt (client profile) is identical across all batches in a run. Anthropic caches it after the first call. ~80% cost saving on subsequent calls.

3. Minimal Input

Only title + first 200 chars of summary are sent to the model. Not the full article — just enough context for relevance scoring.

Per-client model override

Each client can use a different Gate 2 model (GPT-4o-mini, Gemini Flash, DeepSeek). The provider abstraction makes this transparent — same interface regardless of provider.

How batching works

gate2_relevance.py Python
BATCH_SIZE = 8    # 8 articles per API call (sweet spot — larger batches confuse indexing)
MIN_SCORE  = 60   # articles below this score are dropped

def _score_batch(provider, articles, client_profile, prompt_template):
    # Build the batch text — each article gets an index number.
    # Haiku uses these indexes in its JSON response: {"scores": [{"index": 0, "score": 78, ...}]}
    articles_text = "\n---\n".join([
        f"[{i}] TITLE: {a['title']}\nSUMMARY: {(a.get('summary') or '')[:200]}"
        for i, a in enumerate(articles)
    ])

    # The system prompt contains the CLIENT'S PROFILE — same for all batches in a run.
    # Caching this is the key cost saving.
    system_prompt = prompt_template.format(
        industry_type    = client_profile.get("industry_type", "general"),
        target_geo       = ", ".join(client_profile.get("target_geo") or ["global"]),
        keywords         = ", ".join(client_profile.get("keywords") or []),
        excluded_topics  = ", ".join(client_profile.get("excluded_topics") or []),
    )

    # cache_system=True adds cache_control to the system prompt.
    # After the first API call, Anthropic serves the system prompt from cache.
    response = provider.complete(
        system=system_prompt,
        user=f'Score each article 0-100. Return JSON: {{"scores": [{{"index": 0, "score": N, "matched_keywords": []}}]}}\n\n{articles_text}',
        max_tokens=256,
        cache_system=True,   # ← the magic flag
    )

    result = json.loads(response.text)
    scores = {s["index"]: s for s in result.get("scores", [])}

    # Filter: keep only articles above MIN_SCORE threshold
    passed = []
    for i, article in enumerate(articles):
        score = scores.get(i, {}).get("score", 0)
        if score >= MIN_SCORE:
            article["relevance_score"]  = score
            article["matched_keywords"] = scores[i].get("matched_keywords", [])
            passed.append(article)
    return passed

def run_gate2(provider, articles, client_profile, prompt_template):
    passed = []
    # Iterate in steps of BATCH_SIZE: 0, 8, 16, 24, ...
    for i in range(0, len(articles), BATCH_SIZE):
        batch = articles[i:i + BATCH_SIZE]
        passed.extend(_score_batch(provider, batch, client_profile, prompt_template))
    return passed
Token cost breakdown for 100 articles/day

Without batching: 100 calls × full prompt = high cost.
With batching: 13 calls × (system prompt cached after call 1) = ~80% saving on 12 of those 13 calls.
Result: Gate 2 costs ~$0.30/month per client — pennies.

Gate 3 — Signal Detection

File: services/intelligence-worker/stages/signal_detection.py

Zero-cost filter that combines two independent signals. Both are free to compute. An article needs a high combined score before we spend $0.015 on Sonnet generation.

Signal 1: Source Spread (our own data)

Counts how many distinct sources covered the same topic in the last 2 hours. Uses PostgreSQL full-text search on our own news_items table — zero external calls.

signal_detection.py Python
def _source_spread(conn, title: str, hours: int = 2) -> int:
    # Extract 5 meaningful words from the title for full-text matching.
    # Filter out short stop-words ("the", "and", "for") — they match everything.
    words = [w.strip(".,!?\"'") for w in title.split() if len(w) > 3][:5]

    # OR query: article matches if ANY keyword appears (broad catch)
    tsquery = " | ".join(words)

    with conn.cursor() as cur:
        cur.execute("""
            SELECT COUNT(DISTINCT source_id)
            FROM news_items
            WHERE to_tsvector('english', title) @@ to_tsquery('english', %s)
              AND published_at > NOW() - INTERVAL '%s hours'
        """, (tsquery, hours))
        return cur.fetchone()[0] or 1

# Map raw count to a 0–100 score
def _spread_score(count: int) -> int:
    if count >= 5:  return 100   # 5+ sources = definitely breaking
    if count >= 3:  return 60    # trending across outlets
    if count >= 2:  return 40    # gaining traction
    return 20                    # isolated report

Signal 2: Google Trends SEO Opportunity

Queries Google Trends for search interest on the client's matched keywords, geo-filtered to their location. Redis-cached for 24 hours — same keyword pair = one API call.

Combined Score Formula

signal_detection.py Python
trend_score = (spread_score * 0.6) + (seo_opportunity * 0.4)
# 60/40 weighting: spread is more reliable (our own data)
# Trends complements with demand-side intent but can be rate-limited

# Urgency detection overrides the score threshold entirely:
# - "breaking" / "emergency" in title → urgency = "breaking" → Gate 3 bypassed
# - 3+ sources covering topic       → urgency = "high"     → Gate 3 bypassed
Fail-Open Design

If Google Trends is unavailable, seo_opportunity defaults to 50 (neutral). A Trends outage never blocks content generation. This is called "fail-open" design — the default is to continue, not to stop.

Gate 3.5 — ICP-Fit Scoring

File: services/intelligence-worker/stages/icp_fit.py

Gates 1–3 ask "is this story newsworthy?" Gate 3.5 asks a sharper question: is this story worth a full article for this client's specific audience? A story can pass every prior gate — high source spread, strong SEO opportunity, on-topic for the industry — and still be a poor fit for a client whose Ideal Client Profile (ICP) targets, say, first-time homebuyers rather than property investors. Gate 3.5 runs after evidence gathering and before Gate 4, using Claude Haiku against the client's ICP block (the same _build_icp_block() used by Gate 4 and Gate 5).

What it returns

FieldMeaning
audience_relevance0.0–1.0 — does this matter to the ICP's primary audience?
business_relevance0.0–1.0 — does this connect to the client's business goal?
actionability0.0–1.0 — can the reader actually do something with this?
risk_levellow / medium / high — informational only, logged for review
recommended_angleICP-informed angle suggestion, passed into Gate 4's icp_fit context
claims_to_useEvidence pack claims re-ranked/filtered for this audience — replaces use_these_claims before Gate 4 sees it
should_generateaudience_relevance ≥ 0.6 AND actionability ≥ 0.5 — if false, this news_item/client pair is dropped before Gate 4 ever runs

The drop decision

If should_generate is false, the pipeline marks the news item processed for this client and returns — the same outcome as a Gate 4 geo_not_impacted skip, but caught one step earlier and one model tier cheaper ($0.0003 Haiku vs $0.015 Sonnet). This is the same "cheaper gate catches what the more expensive gate would have caught" principle that justifies the whole funnel.

Fail-Open Design

On any exception, Gate 3.5 returns should_generate=True, claims_to_use unchanged, recommended_angle="" — Gate 4 runs exactly as it would have without Gate 3.5. A Gate 3.5 failure never blocks generation; at worst, it costs an extra Sonnet call that Gate 3.5 would otherwise have saved.

Same model knob as Gate 2

Gate 3.5 reuses gate2_provider — whatever model (Haiku, GPT-4o-mini, DeepSeek) is configured for Gate 2 relevance scoring also runs ICP-fit scoring and Gate 5's cheap_review. One per-client setting, three cheap-tier stages.

Gate 4 — Sonnet Content Generation

File: services/intelligence-worker/stages/gate4_generation.py

The most expensive step and the core product. Claude Sonnet generates 6 content formats in one API call, selecting the best angle from 8 options based on the news type, competitor gaps, and client voice.

What one Gate 4 call produces

OutputDescription
blog.titleSEO H1 — question format, keyword in first 8 words, year suffix
blog.slugURL slug derived from title
blog.meta_description150–160 chars for Google SERPs
blog.body_markdown1,200–3,500 word article with mandatory section structure
blog.faq_schemaExactly 5 Q&A pairs for Google FAQ rich results
linkedin_postPlatform-optimised, shorter format
selected_angleWhich of the 8 angles Claude chose (stored for analytics)

Geo-skip logic

Before generating, the model checks if the news is actually relevant to the client's geography. If not, it outputs a skip signal instead of a draft — saving $0.015 and preventing irrelevant content.

gate4_generation.py (output schema) JSON
# If Claude decides the news is irrelevant to the client's geo:
{"selected_angle": "skip", "reason": "geo_not_impacted"}

# If it decides to generate:
{
  "selected_angle": "local_impact",
  "blog": {
    "title":            "Why should Australian mortgage brokers reconsider fixed rates in 2026?",
    "slug":             "australian-mortgage-brokers-fixed-rates-2026",
    "meta_description": "The RBA's latest decision changes the fixed vs variable ...",
    "body_markdown":    "...(full article 1200-3500 words)...",
    "keywords":         ["mortgage broker", "fixed rate", "RBA 2026"],
    "faq_schema":       [{"question": "...", "answer": "..."}, ...]
  },
  "linkedin_post":      "..."
}

Mandatory article structure

Every generated article must contain these sections in order. Claude is explicitly instructed to follow this structure — it's part of the Gate 4 system prompt stored in values.yaml.

Mandatory section order
sections = [
    "Quick Answer (40–60 words, no heading, standalone prose)",
    "What You Will Learn (4–6 bullets)",
    "What Is [Topic]? (80–120 words)",
    "Why Does [Problem] Happen? (100–150 words, 4–6 bullets)",
    "At-a-Glance Summary (Markdown table, 5–8 rows)",
    "How to [Solve It] (200–300 words, numbered H3 steps)",
    "What Happens If You Ignore This? (80–120 words, 3–5 bullets)",
    "",          # Pexels image placeholder
    "Common Mistakes to Avoid (table: Mistake | Why | What to Do Instead)",
    "Expert Tips (100–150 words, ≥2 tips with measurable checks)",
    "",          # second image
    "Frequently Asked Questions (exactly 5 FAQs)",
    "Key Takeaways (60–80 words, 4–5 bullets)",
    "References (3–5 entries as [Title](URL))",
]

Gate 5 — Automated QA Review + Targeted Auto-Revision

Files: services/intelligence-worker/stages/rules_validator.py, services/intelligence-worker/stages/cheap_review.py, services/intelligence-worker/stages/targeted_revision.py

Every Gate 4 draft passes through an automated QA pass before a human ever sees it. Unlike Gates 1–3, Gate 5 doesn't filter volume — it classifies each of the 3–5 daily drafts so admins know which ones are safe to approve quickly and which need a closer look. As of PR3, Gate 5 can also fix a draft itself: a single targeted Sonnet revision pass for drafts that are "almost there" (score 70-84, no critical issues), followed by a second review to confirm the fix worked.

Three-stage pipeline

StageMethodCostRuns when
rules_validator Pure Python $0.00 Always — word count, heading structure, FAQ count, required sections present
cheap_review (round 1) Claude Haiku ~$0.0003 Always — fabricated stats, eligibility-scoping errors, audience_fit vs. ICP, avoid_these_claims leakage, off-topic drift, brand safety
targeted_revision Claude Sonnet ~$0.01–0.02 Only if round 1 scored 70-84 with no critical issues, auto-revision is enabled, and this draft hasn't been auto-revised yet
cheap_review (round 2) Claude Haiku ~$0.0003 Only immediately after a successful targeted_revision — confirms the fix actually cleared the bar

Classification thresholds (PR3)

intelligence-worker/main.py — _classify() Python
# Any "critical" issue (factual_risk, fabrication, off_topic, brand_safety)
# → hard-blocks the draft until an admin overrides with a written reason
if has_critical:
    return "blocked_factual_review", score

# Review itself errored — high-risk industries fail safe to admin review
if review_failed:
    if industry_type in {"mortgage", "finance", "health", "cybersecurity"}:
        return "needs_admin_review", None
    return "review_failed", None

if not passed:
    return "needs_admin_review", score

# Clean bill of health — safe to fast-track
if score is not None and score >= 85 and not has_major:
    return "ready_for_approval", score

# NEW in PR3 — "almost there", worth one automated fix attempt
if score is not None and score >= 70:
    return "needs_auto_revision", score

return "needs_admin_review", score
needs_auto_revision is never persisted

_run_gate5 resolves it immediately, in the same pass:

  • Auto-revision enabled + not yet attempted → run targeted_revision, then re-run cheap_review (round 2) and re-classify:
    • round 2 = ready_for_approval → final status auto_revised
    • round 2 = blocked_factual_review → stays blocked_factual_review (the fix introduced or exposed a critical issue)
    • anything else → needs_admin_review (no second auto-revision attempt)
  • Auto-revision disabled, unconfigured, or targeted_revision itself failed → needs_admin_review

targeted_revision — minimal-edit Sonnet pass

Given the specific issues cheap_review flagged (severity, category, location, description, suggested fix), Sonnet rewrites only blog.title, blog.meta_description, blog.body_markdown, and blog.faq — every other field (slug, LinkedIn post, keywords, image prompt, selected angle) is preserved verbatim. The result replaces the draft in place via update_draft, and a second content_reviews row (review_round=2, review_type="auto_fix") records what changed and what it cost.

Fail-closed-to-original

If targeted_revision throws for any reason — API error, malformed JSON, missing fields — it returns None and the original draft is left completely untouched. The caller then routes the draft to needs_admin_review as if auto-revision had never been attempted. This stage can never lose or corrupt a draft.

ICP-awareness across Gate 5 (PR3)

Both cheap_review and targeted_revision now receive the client's ICP block (_build_icp_block — the same primary/secondary audience, pain points, business goal, and preferred/avoided angles that inform Gate 4 and Gate 3.5). This adds a new audience_fit issue category to cheap_review: a draft can score well on facts and structure but still miss the mark if it's written for the wrong reader. When audience_fit issues are flagged, targeted_revision is explicitly instructed to re-pitch the affected sections at the ICP's actual audience. audience_fit is always major or minor — never critical, so it can trigger auto-revision but never a hard block on its own.

Admin-curated factual guardrails (PR3)

The new content_guardrails table lets an admin promote a recurring Gate 5 finding into a standing rule — e.g. "Do not describe the First Home Guarantee as available to all buyers; it is means-tested and place-restricted." Rules are scoped by nullable client_id/industry_type (NULL = applies regardless of that dimension) and are injected into every future Gate 4 prompt for the matching client/industry as {factual_guardrails_block} — closing the loop from "Gate 5 caught this once" to "Gate 4 never makes this mistake again."

Hard-block + admin override

A draft with review_status = blocked_factual_review cannot be approved normally — the Approve button is replaced with an "Approve anyway" flow that requires the admin to pick a rejection reason or write an override note before the draft can publish. This keeps factually risky drafts out of the approval queue by default while still leaving a human the final say.

Admin controls

ControlWhereEffect
GATE5_ENABLED Helm env var Kills the entire Gate 5 pass. Restart required.
gate5_auto_revision_enabled platform_settings table, toggle on /admin/quality-review Kills only the targeted_revision sub-step — rules_validator + cheap_review still run. No restart needed.
auto_revision_count badge content_drafts column, shown on the draft edit page Tells the admin this draft was auto-fixed and re-reviewed — both review rounds are visible in the Gate 5 review history.
Fail-Open Design

If the Haiku review call fails for any reason, Gate 5 doesn't block the pipeline — the draft still reaches the approval queue. For high-risk verticals (mortgage, finance, health, cybersecurity) a failed review routes to needs_admin_review instead of being silently waved through. The same fail-open principle applies to targeted_revision: any failure leaves the original draft untouched.

Same pipeline for cluster synthesis

Gate 5 (all three stages) also runs on drafts produced by cluster synthesis — when a follow-up story enriches an existing draft (update_draft) or generates a linked update article, that output goes through the exact same rules_validator → cheap_review → targeted_revision → cheap_review pipeline before reaching the approval queue.

Ingestion Worker

File: services/ingestion-worker/main.py

Runs continuously as a Kubernetes Deployment (not a CronJob — it needs sub-minute responsiveness). Every 2 minutes it polls all active RSS sources for all clients and runs Gate 1.

The poll loop

ingestion-worker/main.py (simplified) Python
async def poll_loop():
    while True:
        # Fetch all active, non-quarantined sources from Postgres
        sources = get_active_sources(conn)

        for source in sources:
            articles = fetch_rss(source["feed_url"])   # parse RSS/Atom feed

            for article in articles:
                url_hash = sha256(normalize_url(article["url"])).hexdigest()

                # Deduplication: skip if we've already processed this URL
                if url_hash in seen_hashes:
                    continue

                # Gate 1: zero-cost rules filter (per-client)
                for client in source["clients"]:
                    passes, reason = gate1.check(article, client["keywords"], client["excluded"])
                    if passes:
                        # Publish to Redis Stream for intelligence-worker to consume
                        redis.xadd("news.filtered", {
                            "article_id": article_id,
                            "client_id":  client["id"],
                            "reason":     reason,
                        })

        await asyncio.sleep(POLL_INTERVAL_SECONDS)   # default: 120s

Source quarantine system

Every RSS source is tracked for consecutive failures. After 3 failures, it's quarantined with exponential backoff. The system automatically tries to find a replacement feed.

Quarantine #DurationRecovery
1st time6 hoursAuto-retry after expiry
2nd time12 hoursAuto-retry after expiry
3rd time24 hoursAuto-retry after expiry
4th time48 hoursAuto-retry after expiry
5th time96 hoursAuto-retry after expiry
6th+168 hours (7 days)Manual restore from admin

3-tier replacement feed discovery

When a source is quarantined, the system immediately searches for a replacement — no admin intervention required.

T1
Same-site alternate URLs

Scrapes the dead source's homepage for <link rel="alternate"> RSS tags. Also probes common paths: /feed, /rss, /rss.xml, /atom.xml.

T2
Google News RSS search

Searches news.google.com/rss/search?q={source_name}+{industry}. Extracts publisher domains from results, probes top 8 for native RSS feeds. No API key needed.

T3
Platform default sources

Falls back to the default_sources DB table — curated industry sources not already assigned to this client. Always available, always a working feed.

URL deduplication

ingestion-worker/main.py Python
def normalize_url(url: str) -> str:
    """Strip UTM/tracking params so the same article isn't processed twice
    if it appears with different tracking params in different RSS feeds."""
    from urllib.parse import urlparse, urlencode, parse_qsl
    parsed = urlparse(url)
    # Keep only non-tracking query params (strip utm_*, fbclid, etc.)
    clean_params = [(k, v) for k, v in parse_qsl(parsed.query)
                    if not k.startswith(("utm_", "fbclid", "gclid", "ref"))]
    return parsed._replace(query=urlencode(clean_params)).geturl()

# URL hash is stored in news_items table — SHA256 of the normalized URL
url_hash = hashlib.sha256(normalize_url(article["url"]).encode()).hexdigest()

Intelligence Worker

File: services/intelligence-worker/main.py

The brain of the platform. Consumes the news.filtered Redis Stream and orchestrates the full Gates 2–4 pipeline for each article. Uses Redis consumer groups so no message is ever processed twice — even if the worker crashes and restarts mid-batch.

Pipeline orchestration

intelligence-worker/main.py (simplified flow) Python
async def process_message(msg, client_id, article):
    client = get_client_profile(client_id)

    # 1. Gate 2 — Haiku relevance scoring (batched, prompt-cached)
    relevant = run_gate2(provider, [article], client, PROMPT_RELEVANCE)
    if not relevant:
        ack(msg); return

    # 2. Gate 3 — Signal detection (spread + Google Trends)
    signal = detect_signal(conn, article["title"], client["target_geo"])
    if signal.trend_score < GATE3_MIN_TREND_SCORE and signal.urgency == "normal":
        ack(msg); return

    # 3. Competitor analysis — what angles have competitors taken?
    comp = analyze_competitors(conn, article, client)
    # comp.avoid_angles = ["local_impact", "action_list"]
    # comp.trend_score_boost = 15 (first-mover bonus)

    # 4. Evidence pipeline
    enrichment = quick_enrich(article, client["keywords"])          # Tier 1: Serper
    evidence   = gather_evidence(article, enrichment, llm_haiku)    # Tier 2: deep pack

    # 5. Topic clustering — find related published articles for internal links
    cluster = get_cluster_links(conn, article, client_id)

    # 6. Gate 4 — Sonnet generation
    draft = run_gate4(
        provider=llm_sonnet,
        article=article,
        client=client,
        comp_analysis=comp,
        evidence_pack=evidence,
        cluster_links=cluster,
    )

    if draft.get("selected_angle") == "skip":
        ack(msg); return    # geo_not_impacted — skip silently

    # 7. Save to Postgres, publish to content.drafts stream
    save_draft(conn, draft, client_id)
    redis.xadd("content.drafts", {"draft_id": draft["id"], "client_id": client_id})
    ack(msg)  # ← critical: only ack AFTER successful save
Why consumer groups matter

With consumer groups, Redis tracks which messages have been acknowledged (ACK'd). If the worker crashes between processing and ACK'ing, Redis re-delivers the message when the worker restarts. No message is ever permanently lost — the pipeline is crash-safe.

Draft limits per plan

Before Gate 4, the worker checks the client's daily and weekly draft limits (from PLAN_LIMITS_JSON in the ConfigMap). This prevents the pipeline from generating more content than the client can review.

Approval Service

File: services/approval-service/main.py

FastAPI web application that serves the client dashboard, admin panel, and email approval workflow. Clients never see raw AI output — everything goes through human approval first.

The approval workflow

1

Daily digest email

A CronJob triggers /send-digest each morning. For each client with pending drafts, an email is sent with approve/reject/edit links for each draft.

2

HMAC-signed links (no login required)

Each approve/reject/edit link contains an HMAC-SHA256 token. Clients can approve content from their email inbox without logging in. Links expire after 7 days.

3

Optional editing

The edit link opens a tabbed editor: blog post (with character counters), LinkedIn post. Clients can tweak the AI output before publishing.

4

Publishes to Redis Stream

On approval, the service writes to content.approved. Publisher-worker picks this up and distributes to WordPress, Dev.to, etc.

HMAC token format

main.py — HMAC approval links Python
REVIEW_SECRET = os.environ["REVIEW_SECRET"]  # from K8s Secret

def create_review_token(draft_id: str, action: str) -> str:
    """Create a signed URL token. Format: {token}:{action}:{expiry}"""
    expiry = int(time.time()) + 7 * 24 * 3600    # 7 days from now
    payload = f"{draft_id}:{action}:{expiry}"
    sig = hmac.new(REVIEW_SECRET.encode(), payload.encode(), hashlib.sha256).hexdigest()
    return f"{sig}:{action}:{expiry}"

def verify_review_token(token: str, draft_id: str) -> tuple[bool, str]:
    """Verify a token from an email link. Returns (valid, action)."""
    try:
        sig, action, expiry = token.split(":")
        if int(expiry) < time.time():
            return False, ""     # expired

        payload = f"{draft_id}:{action}:{expiry}"
        expected = hmac.new(REVIEW_SECRET.encode(), payload.encode(), hashlib.sha256).hexdigest()

        # Constant-time comparison — prevents timing attacks
        if not hmac.compare_digest(sig, expected):
            return False, ""    # tampered

        return True, action
    except Exception:
        return False, ""

Publisher Worker

File: services/publisher-worker/main.py

Consumes content.approved and distributes to all publishing platforms the client has configured. WordPress always publishes first — its URL becomes the canonical URL for all subsequent platforms.

Retry logic

publisher-worker/main.py Python
MAX_RETRIES  = 3
RETRY_DELAYS = [5, 15, 30]    # seconds — exponential-ish backoff

for attempt in range(MAX_RETRIES):
    try:
        result = publisher.publish(draft, config)
        # Record success in publications table
        record_publication(conn, draft_id, platform, "published", result["url"])
        break

    except Exception as e:
        if attempt == MAX_RETRIES - 1:
            # All retries exhausted — send to dead-letter queue
            redis.xadd("content.failed", {"draft_id": draft_id, "error": str(e)})
            record_publication(conn, draft_id, platform, "failed", error=str(e))
        else:
            time.sleep(RETRY_DELAYS[attempt])

WordPress publisher — Markdown to HTML

publishers/wordpress.py Python
def publish(self, draft: dict, config: dict) -> dict:
    blog = draft["blog"]
    body = blog["body_markdown"]

    # Strip the FAQ section from body — it's added separately as structured HTML
    # to prevent duplicates (once inline, once as schema markup at the bottom).
    body_without_faq = strip_faq_section(body)

    # Convert Markdown to HTML (using markdown library)
    html_body = markdown.markdown(
        body_without_faq,
        extensions=["fenced_code", "tables", "nl2br"]
    )

    # Append FAQ as structured HTML (better for Google FAQ rich results)
    if blog.get("faq_schema"):
        html_body += build_faq_html(blog["faq_schema"])

    # WordPress REST API call
    response = requests.post(
        f"{config['site_url']}/wp-json/wp/v2/posts",
        auth=(config["username"], config["app_password"]),  # Application Passwords
        json={
            "title":   blog["title"],
            "slug":    blog["slug"],
            "content": html_body,
            "excerpt": blog.get("meta_description", ""),
            "status":  "publish",
            "categories": resolve_categories(config),
        }
    )
    return {"url": response.json()["link"]}

Redis Streams — The Message Bus

Redis Streams are the backbone of inter-service communication. They're more than a pub/sub queue — they're a durable, ordered, consumer-group-aware log of events.

Why Streams instead of HTTP?

PropertyDirect HTTP callsRedis Streams
Crash safety❌ Request lost if receiver is down✅ Message waits until consumer is ready
At-least-once delivery❌ Manual retry logic needed✅ Built-in — unacked messages re-delivered
Decoupling❌ Sender must know receiver's address✅ Services only know the stream name
Backpressure❌ Fast sender overwhelms slow receiver✅ Slow consumer naturally applies backpressure
Audit trail❌ No built-in history✅ Stream is an ordered log (inspectable)

Consumer groups explained

How consumer groups work Python / Redis CLI
# Create consumer group (run once at startup)
redis.xgroup_create("news.filtered", "intelligence-workers", id="0", mkstream=True)

# Read NEW messages (> means "messages after my last position")
messages = redis.xreadgroup(
    groupname="intelligence-workers",
    consumername="worker-pod-1",
    streams={"news.filtered": ">"},
    count=10,
    block=5000,   # block for up to 5 seconds waiting for new messages
)

# Process each message...
for stream_name, msg_list in (messages or []):
    for msg_id, fields in msg_list:
        try:
            process(fields)
            # ACK only AFTER successful processing
            # If this line is never reached (crash), Redis re-delivers the message
            redis.xack("news.filtered", "intelligence-workers", msg_id)
        except Exception as e:
            # Don't ACK on failure — message will be re-delivered
            log.error("Processing failed: %s", e)
Pending Entries List (PEL)

When a message is delivered but not yet ACK'd, Redis holds it in the Pending Entries List. If the worker crashes, these messages stay in PEL and are re-delivered when the worker restarts. This is how the pipeline survives pod crashes with zero data loss.

Inspecting streams from kubectl

Bash
# How many messages are waiting in the pipeline?
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen news.filtered
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen content.drafts

# How many messages are in the dead-letter queue?
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen content.failed

# Inspect last 5 messages in a stream
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xrevrange news.filtered + - COUNT 5

Multi-Provider AI Architecture

File: services/intelligence-worker/providers/

Every AI call goes through a provider abstraction layer. The rest of the codebase calls provider.complete(system, user) — it never knows or cares which underlying model or API it's using.

The provider interface

providers/base.py Python
from abc import ABC, abstractmethod
from dataclasses import dataclass

@dataclass
class LLMResponse:
    text:               str
    input_tokens:       int
    output_tokens:      int
    cache_read_tokens:  int = 0   # Anthropic prompt caching
    cache_write_tokens: int = 0

class LLMProvider(ABC):
    @property
    @abstractmethod
    def model_id(self) -> str: ...

    @abstractmethod
    def complete(
        self,
        system:       str,
        user:         str,
        max_tokens:   int  = 512,
        cache_system: bool = False,   # Anthropic-specific, ignored by others
    ) -> LLMResponse: ...

Provider factory — model name → provider instance

providers/factory.py Python
def get_provider(model: str) -> LLMProvider:
    """Return the correct provider instance based on model name prefix."""
    api_key_map = {
        "claude-":     ("ANTHROPIC_API_KEY", AnthropicProvider),
        "gpt-":        ("OPENAI_API_KEY",    OpenAIProvider),
        "o1-":         ("OPENAI_API_KEY",    OpenAIProvider),
        "o3-":         ("OPENAI_API_KEY",    OpenAIProvider),
        "gemini-":     ("GOOGLE_API_KEY",    GoogleProvider),
        "deepseek-":   ("DEEPSEEK_API_KEY",  DeepSeekProvider),
    }

    for prefix, (env_var, ProviderClass) in api_key_map.items():
        if model.startswith(prefix):
            api_key = os.environ.get(env_var)
            if not api_key:
                raise EnvironmentError(f"{env_var} not set for model {model!r}")
            return ProviderClass(api_key=api_key, model=model)

    raise ValueError(f"Unknown model: {model!r}")

Two layers of model selection — both editable with no restart

Every Gate that calls an LLM resolves its provider through the same two-layer fallback. Layer 1 is a per-client override (clients.gate2_model / clients.gate4_model, set from each client's "LLM models" tab). Layer 2 is a platform-wide default, stored in platform_settings (default_gate2_model / default_gate4_model) and editable from /admin/quality-review — this replaces what used to be a Helm-only, restart-required setting (RELEVANCE_MODEL / GENERATION_MODEL), which now only acts as the last-resort fallback if no platform_settings row exists yet.

intelligence-worker/main.py Python
def _resolve_default_model(conn, setting_key, env_default) -> str:
    """platform_settings override of the Helm env var — no restart needed."""
    with conn.cursor() as cur:
        cur.execute("SELECT value FROM platform_settings WHERE key = %s", (setting_key,))
        row = cur.fetchone()
    return row[0] if row else env_default

# Layer 2: admin-edited platform default, falling back to the Helm env var
gate2_default = _resolve_default_model(conn, "default_gate2_model", GATE2_MODEL)

# Layer 1: per-client override wins if set
try:
    gate2_provider = get_provider(client.get("gate2_model") or gate2_default)
except EnvironmentError as exc:
    # configured provider's API key missing — fall back to the Helm default
    log.warning("Gate2 provider key missing (%s) — falling back to %s", exc, GATE2_MODEL)
    gate2_provider = get_provider(GATE2_MODEL)

One setting, multiple gates

SettingGates it controls
gate2_model / default_gate2_model
cheap tier — Haiku by default
Gate 2 relevance scoring, Gate 3.5 ICP-fit, Gate 5 cheap_review (both rounds)
gate4_model / default_gate4_model
quality tier — Sonnet by default
Gate 4 generation, Gate 5 targeted_revision
ProviderModelsEnv var required
Anthropic (default)claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7ANTHROPIC_API_KEY
OpenAIgpt-4o, gpt-4o-mini, o1-*, o3-*OPENAI_API_KEY
Googlegemini-2.5-pro, etc.GOOGLE_API_KEY
DeepSeekdeepseek-chat, deepseek-reasonerDEEPSEEK_API_KEY
Admin UI: /admin/quality-review

The "LLM model defaults" card lets an admin change the platform-wide Gate 2/3.5/5-review and Gate 4/5-revision models in two dropdowns — takes effect on the next pipeline run, no pod restart. The per-client "LLM models" tab (on each client's profile) overrides these per-client, e.g. routing one budget client through deepseek-chat while everyone else uses Anthropic.

Prompt Caching — 80% Cost Saving

Anthropic's prompt caching lets you mark a portion of the prompt as "cache this". On subsequent API calls with the same cached prefix, Anthropic serves it from cache at ~10% of the normal input token cost.

How the AnthropicProvider implements it

providers/anthropic_provider.py Python
def complete(self, system: str, user: str, max_tokens: int = 512,
             cache_system: bool = False) -> LLMResponse:

    # Without caching: system is just a string
    # With caching: wrap it in a block with cache_control
    if cache_system:
        system_block = [{
            "type": "text",
            "text": system,
            "cache_control": {"type": "ephemeral"},  # ← this is the magic
        }]
    else:
        system_block = system   # plain string — no caching

    response = self._client.messages.create(
        model=self._model,
        max_tokens=max_tokens,
        system=system_block,
        messages=[{"role": "user", "content": user}],
    )

    return LLMResponse(
        text=response.content[0].text,
        input_tokens=response.usage.input_tokens,
        output_tokens=response.usage.output_tokens,
        # These fields tell you how the caching is performing:
        cache_read_tokens  = getattr(response.usage, "cache_read_input_tokens", 0) or 0,
        cache_write_tokens = getattr(response.usage, "cache_creation_input_tokens", 0) or 0,
    )

Why it saves ~80% for Gate 2

Token flow for 100 articles, 13 batches
# Batch 1 — system prompt is WRITTEN to cache (full price for system tokens)
# cache_write_tokens = 800 (the client profile system prompt)
# input_tokens       = 800 + 1200 (system + 8 article titles)

# Batch 2–13 — system prompt is READ from cache (10% of normal price)
# cache_read_tokens  = 800 (same system prompt, served from cache)
# input_tokens       = 1200 (only the 8 article titles — system not billed at full rate)

# Net saving: 12 batches × 800 tokens × 90% discount = 8,640 tokens saved
# At $0.00025/1K input tokens (Haiku): saves ~$0.002/run/client
# Across 365 days: ~$0.73/year/client from caching alone
When to use cache_system=True

Only when the system prompt is identical across multiple calls in the same session. Gate 2 qualifies perfectly — same client profile repeated across 13 batches. Gate 4 doesn't cache its system prompt because it varies per article (different evidence pack, different competitor context).

Auth & Security

File: services/approval-service/auth.py

The auth system has zero external dependencies — no PyJWT, no authlib. Everything is implemented with Python's standard library. This keeps the container image lean and eliminates supply-chain risk from auth libraries.

JWT implementation from scratch

auth.py — HS256 JWT (no PyJWT) Python
# JWT format: base64url(header) . base64url(payload) . base64url(signature)

def _b64(data: bytes) -> str:
    # URL-safe base64 with "=" padding stripped (JWT spec requires unpadded)
    return urlsafe_b64encode(data).rstrip(b"=").decode()

def create_jwt(client_id: str, email: str) -> str:
    header  = _b64(json.dumps({"alg": "HS256", "typ": "JWT"}).encode())
    payload = _b64(json.dumps({
        "client_id": client_id,
        "email":     email,
        "exp":       int(time.time()) + JWT_EXPIRY_MINS * 60,
        "iat":       int(time.time()),
    }).encode())

    signing_input = f"{header}.{payload}"
    sig = _b64(hmac.new(
        JWT_SECRET.encode(),
        signing_input.encode(),
        hashlib.sha256,
    ).digest())

    return f"{signing_input}.{sig}"

def decode_jwt(token: str) -> Optional[dict]:
    try:
        header, payload, sig = token.split(".")
        signing_input = f"{header}.{payload}"
        expected = _b64(hmac.new(JWT_SECRET.encode(), signing_input.encode(), hashlib.sha256).digest())

        # Constant-time comparison — prevents timing side-channel attacks
        if not hmac.compare_digest(sig, expected):
            return None   # tampered token

        data = json.loads(_unb64(payload))
        if data.get("exp", 0) < time.time():
            return None   # expired

        return data
    except Exception:
        return None       # never raises — bad tokens always return None

Password hashing

auth.py — PBKDF2 password hashing Python
def hash_password(password: str) -> str:
    """Hash a password for storage. Format: {hex_salt}:{hex_hash}"""
    salt = secrets.token_hex(16)   # 16 bytes of cryptographic randomness
    h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt.encode(), 260_000)
    return f"{salt}:{h.hex()}"

def _verify_password(password: str, stored_hash: str) -> bool:
    salt, hex_hash = stored_hash.split(":", 1)
    h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt.encode(), 260_000)
    return hmac.compare_digest(h.hex(), hex_hash)  # constant-time

# 260,000 iterations: ~100ms on a modern CPU.
# This means an attacker can only try ~10 passwords/second per core.
# bcrypt would also work — PBKDF2 is chosen because it's in Python stdlib (no dependency).

Timing attack prevention

auth.py — dummy hash for non-existent users Python
_DUMMY_HASH = "dummy:000000000000000000000000000000000000000000000000000000000000000"

def _dummy_hash_check(password: str) -> None:
    """Run a full PBKDF2 computation even for non-existent users.

    Without this: login for unknown@email.com returns in 1ms (DB miss).
                  login for real@email.com returns in 100ms (hash computed).
    An attacker measures the difference to discover which emails are registered.

    With this: both paths take ~100ms regardless. Side channel eliminated.
    """
    hashlib.pbkdf2_hmac("sha256", password.encode(), b"dummy", 260_000)
    # Result is discarded — we only run this for its timing effect

Cookie security

auth.py — secure cookie configuration Python
response.set_cookie(
    key      = "ci_session",
    value    = token,
    httponly = True,       # JS cannot read this cookie — protects against XSS token theft
    secure   = True,       # browser only sends it over HTTPS — prevents network sniffing
    samesite = "lax",      # sent on top-level same-site navigations — CSRF protection
    max_age  = JWT_EXPIRY_MINS * 60,
)

Security summary

ThreatMitigation
XSS token thefthttponly=True on JWT cookie
Network sniffingsecure=True — HTTPS only
CSRFsamesite="lax" + form tokens
Timing attacks (login)Dummy PBKDF2 hash for missing users
Timing attacks (comparison)hmac.compare_digest everywhere
Password crackingPBKDF2-SHA256, 260k iterations, per-user salt
Forged approval linksHMAC-SHA256 signed, 7-day expiry
Cross-tenant data leakclient_id from JWT only — never from request body

Multi-tenancy — Tenant Isolation

Every table with client data has a client_id UUID column. The critical rule: client_id is sourced from the signed JWT only. It is never trusted from the request body.

The middleware pattern

How client_id is enforced in every route Python
# Every protected route extracts client_id from the JWT:
@router.get("/dashboard")
def dashboard(request: Request):
    client = require_client(request)         # decodes JWT, returns payload dict
    client_id = client["client_id"]          # ← from JWT signature, not request params

    # All DB queries are scoped to this client_id
    drafts = get_drafts(conn, client_id)     # SELECT ... WHERE client_id = %s
    return render("dashboard.html", drafts)

# What an attacker CANNOT do:
# GET /dashboard?client_id=  → client_id from URL is IGNORED
# POST /approve with body {"client_id": "..."}  → body client_id is IGNORED
# The client_id is read exclusively from the signed cookie/Bearer JWT.

Database schema (tenant isolation)

migrations/001_initial.sql (simplified) SQL
-- One row per tenant
CREATE TABLE clients (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    business_name TEXT NOT NULL,
    email         TEXT UNIQUE NOT NULL,
    industry_type TEXT,
    target_geo    JSONB,
    keywords      JSONB,
    active        BOOLEAN DEFAULT TRUE
);

-- All tenant-scoped tables have client_id FK
CREATE TABLE content_drafts (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id     UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
    title         TEXT,
    body_markdown TEXT,
    status        TEXT DEFAULT 'pending',    -- pending / approved / rejected / published
    created_at    TIMESTAMPTZ DEFAULT NOW()
);

-- news_items is SHARED across all clients (Gate 1 runs once per article)
-- client_relevance maps articles to clients (Gate 2 runs per-client)
CREATE TABLE client_relevance (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id       UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
    news_item_id    UUID NOT NULL REFERENCES news_items(id),
    relevance_score INT,
    processed       BOOLEAN DEFAULT FALSE
);
The Critical Isolation Invariant

One bug that trusts client_id from the request body instead of the JWT could leak every tenant's data to any other tenant. The enforcement is at the application layer — there's no database-level row security (yet). Every developer must follow the pattern: always get client_id from the decoded JWT.

Evidence Pipeline

Before Gate 4 runs, the platform searches for real supporting evidence so Claude can cite actual sources and verified figures — not hedged vague claims.

Tier 1 — Quick Serper enrichment

A short Serper API search on {article title} + {top 3 client keywords} returns 5 snippets. Injected into Gate 4 as lightweight context. Degrades to [] if SERPER_API_KEY is not set.

stages/enrichment.py Python
def quick_enrich(article: dict, keywords: list[str]) -> list[dict]:
    """Fetch 5 Serper snippets for evidence context. Degrades gracefully."""
    api_key = os.environ.get("SERPER_API_KEY")
    if not api_key:
        return []    # feature disabled — Gate 4 runs without enrichment

    query = f"{article['title']} {' '.join(keywords[:3])}"
    try:
        resp = requests.post(
            "https://google.serper.dev/search",
            headers={"X-API-KEY": api_key},
            json={"q": query, "num": 5},
            timeout=10,
        )
        results = resp.json().get("organic", [])
        return [{"title": r["title"], "snippet": r["snippet"],
                 "url": r["link"], "source": r.get("displayLink")}
                for r in results]
    except Exception:
        return []    # any error → degrade gracefully, never block Gate 4

Tier 2 — Deep Haiku evidence gathering

When enabled, Haiku runs 5–10 targeted searches, fetches and strips HTML from source pages, then classifies each source and extracts claims with confidence levels.

stages/evidence_gathering.py — what it produces Python
# Haiku produces a structured evidence pack:
evidence_pack = {
    "verified_claims": [
        {
            "claim":      "The RBA raised rates by 25bps to 4.35%",
            "source":     "RBA official statement",
            "confidence": "high",
            "safe_phrasing": "According to the RBA's official statement...",
        }
    ],
    "claims_to_avoid": [
        {
            "claim":  "Rates will fall by end of 2024",
            "reason": "Prediction without verifiable source"
        }
    ],
    "recommended_references": [
        {"title": "RBA Rate Decision — May 2026", "url": "https://rba.gov.au/..."}
    ],
    "source_classifications": [
        {"url": "...", "type": "government_or_regulator", "allowed_to_use": True}
    ]
}

# This entire pack is injected into the Gate 4 system prompt.
# Gate 4 is instructed: "Only use statistics and dates from VERIFIED CLAIMS.
#                        Never use anything in CLAIMS TO AVOID."

Content Angles — The Core Differentiator

Competitors rewrite news. This platform generates opinionated, differentiated content. Claude selects the best angle for each article+client combination from 8 options — and avoids angles that competitors have already taken.

local_impact
News that directly affects the client's geography. NOT for global events.
action_list
News requiring immediate business response from the reader.
contrarian
The consensus headline misses a deeper or opposite implication.
faq_explainer
Complex topic the client's audience needs to understand clearly.
educational
Complex concept for a non-expert audience — explain from first principles.
expert_commentary
High-profile event — positions the client as an industry authority.
emotional_hook
News with direct personal or financial impact on the reader.
opinionated
Significant event — a bold take that builds client authority and recall.

Competitor angle avoidance

stages/competitor_analysis.py Python
# Competitor angles are inferred from title patterns — no AI cost
ANGLE_PATTERNS = {
    "local_impact":       [r"what .+ means for .+", r"how .+ affects .+"],
    "action_list":        [r"\d+ things? (to|you should)", r"what (to do|businesses should)"],
    "contrarian":         [r"why .+ (is|might be) wrong", r"the truth about"],
    "faq_explainer":      [r"everything you need to know", r"what is .+ and why"],
    "expert_commentary":  [r"why .+ matters", r"what .+ means for the industry"],
}

def infer_competitor_angle(title: str) -> Optional[str]:
    title_lower = title.lower()
    for angle, patterns in ANGLE_PATTERNS.items():
        if any(re.search(p, title_lower) for p in patterns):
            return angle
    return None

# Result: competitor analysis returns avoid_angles = ["local_impact", "faq_explainer"]
# These are injected into the Gate 4 prompt:
# "DO NOT use local_impact — Competitor X already published that angle.
#  DO NOT use faq_explainer — Competitor Y already published that angle.
#  Choose a different angle that provides unique value."

H1 title rules (hardcoded in the prompt)

Gate 4 system prompt — title constraints
H1_RULES = """
H1 TITLE RULES (mandatory):
- Must be a question format: How / Why / What / When / Should / Can / Is
- Primary keyword must appear within the first 8 words
- Include "2026" for informational, how-to, FAQ, and local SEO articles

FORBIDDEN TITLE PATTERNS (never use these):
- "5 things", "5 steps", "10 ways" (numbered lists)
- "Understanding [topic]"
- "Everything you need to know about"
- "changes everything" / "ultimate guide"
- "What X needs to know" (where X = the reader's role)
"""

# Example of good titles:
# "Why should Australian mortgage brokers rethink fixed rates in 2026?"
# "How does the RBA cash rate affect first-home buyers in Brisbane in 2026?"
# "What do cybersecurity teams need to know about the new APRA ruling?"
Why these specific forbidden patterns?

"5 things" articles are commoditised — every content farm produces them. "Understanding X" signals generic educational content, not actionable expert advice. "Everything you need to know" is overused and Google's helpful content guidelines penalise these patterns. The question-format rule is based on SEO data showing that question-format titles consistently outperform declarative titles for featured snippets.