Platform Learning Guide

Content Intelligence
Platform

A multi-tenant SaaS that monitors real-time industry news, detects high-impact events, and generates SEO-optimised content before competitors react. Built on Kubernetes with Claude AI at its core.

4
Cost gates
$2.55
AI cost / client / month
~4h
News → Published
3–5
Drafts / client / day

Why this platform exists

Most businesses know they should publish about industry news. They don't — because the workflow is broken:

1

News breaks at 9am

A reporter reads about it. Maybe.

2

Writer assigned at 11am

They research, draft, edit. Takes 3–4 hours.

3

Approval chain — next day

Manager reviews. Revisions. Legal check. More revisions.

4

Published 48+ hours later

Google has already indexed 50 competitor articles. The SEO window is closed.

The Insight

The SEO advantage belongs to whoever publishes first with quality content. Every hour of delay is lost organic ranking opportunity. This platform compresses the window from 48+ hours to under 4 hours.

What the platform compresses

News breaks RSS / Google News
Detected & scored < 5 minutes
Draft generated < 2 minutes
Human approves 1 click
Live on WordPress seconds

System Architecture

Four microservices communicate via Redis Streams — a durable, ordered message queue. No service calls another directly over HTTP. This means if one crashes, the others continue and no data is lost.

Python ingestion-worker
  • Polls RSS feeds every 2 minutes
  • Runs Gate 1 (zero-cost rules)
  • Deduplicates by URL hash
  • Quarantines broken sources
  • Publishes to news.filtered
Python intelligence-worker
  • Consumes news.filtered
  • Runs Gates 2, 3, 4
  • Competitor gap analysis
  • Evidence gathering (Serper)
  • Publishes to content.drafts
FastAPI approval-service
  • Web dashboard for clients
  • Email digest with approve/reject links
  • Admin panel for operators
  • JWT auth (cookie + Bearer)
  • Publishes to content.approved
Python publisher-worker
  • Consumes content.approved
  • WordPress REST API (Markdown→HTML)
  • Dev.to API (native Markdown)
  • Retry with exponential backoff
  • Records to publications table

Redis Stream topology

news.filtered ingestion-worker produces, intelligence-worker consumes
content.drafts intelligence-worker produces, approval-service notified
content.approved approval-service produces, publisher-worker consumes
news.failed dead letter — ingestion errors (replay with scripts/replay-failed.py)
content.failed dead letter — generation/publishing errors
Design Principle: No Direct HTTP Between Services

Services never call each other's HTTP endpoints. Everything goes through Redis Streams. If intelligence-worker crashes, ingestion-worker keeps writing to the stream. When intelligence-worker restarts, it resumes exactly where it left off — no messages lost.

Why the 4-Gate Funnel Exists

Claude Sonnet costs ~$0.015 per content generation call. Without filtering, running every ingested article through Sonnet would cost ~$450/client/month. With the funnel: $2.55/client/month.

1,000/day
Gate 1 — Rules Engine  drops ~90%
$0.00
~100/day
Gate 2 — Haiku  drops ~80%
$0.0001/call
~20/day
Gate 3 — Signal  drops ~75%
$0.00
~5/day
Gate 4 — Sonnet
$0.015/call
Gate Method Volume Cost/call Monthly
Gate 1 — Rules Pure Python 30,000 articles $0.00 $0.00
Gate 2 — Haiku Claude Haiku (batched) 3,000 articles $0.0001 $0.30
Gate 3 — Signal Postgres + pytrends 600 articles $0.00 $0.00
Gate 4 — Sonnet Claude Sonnet 150 articles $0.015 $2.25
Total ~$2.55
Key Design Rule

Each gate must be cheaper than the next. You never spend $0.015 on something that a $0.00 rule would have caught. The gates are deliberately ordered from cheapest to most expensive.

Gate 1 — Rules Engine

File: services/ingestion-worker/stages/gate1_rules.py

Zero-cost filter. Runs entirely in Python — no database queries, no API calls, no network. Eliminates ~90% of articles before any AI is touched. Six rules run in order; the first failure short-circuits (fast reject).

1
Minimum content length

Drop articles with fewer than 50 chars of title+summary. These are usually empty RSS entries, tracking pixels, or fetcher errors with no useful content.

2
Source trust score

Each source has a trust_score (0.0–1.0) set by the admin. Sources below 0.4 are dropped. Prevents spam aggregators from polluting the pipeline.

3
Recency check

Drop articles older than 48 hours (configurable). Old news generates low-value content and hurts SEO freshness signals. Compares published_at against a UTC cutoff.

4
Hard exclusions

If the article matches any of the client's excluded topics, drop it — even if a keyword also matches. Example: a mortgage broker has "security" as a keyword but "gaming" excluded. "Gaming security" gets dropped.

5
Urgency override

Breaking news bypasses the keyword check entirely. If the title contains "breaking", "emergency", etc., it passes Gate 1 regardless of keyword match. Real-time events shouldn't wait for keyword list tuning.

6
Keyword match

The article must contain at least one of the client's configured keywords. Case-insensitive substring search. The last and most expensive check — only reached if all previous rules passed.

The actual code

gate1_rules.py Python
class Gate1Rules:
    def __init__(self, min_content_length=50, source_trust_min=0.4,
                 max_age_hours=48, urgency_keywords=None):
        # Pre-compute a lowercase set for O(1) membership checks in the hot path.
        # A set lookup is O(1) vs O(n) for a list — matters when called 1000x/day.
        self.urgency_keywords = set(k.lower() for k in (urgency_keywords or []))

    def check(self, item, keywords, excluded):
        text = f"{item.get('title', '')} {item.get('summary', '')}".strip()

        # Rule 1: minimum content length
        if len(text) < self.min_content_length:
            return False, "too_short"

        # Rule 2: source trust score — fail-open (default 1.0 if missing)
        if item.get("trust_score", 1.0) < self.source_trust_min:
            return False, "low_trust_source"

        # Rule 3: recency check
        pub = item.get("published_at")
        if pub:
            if pub.tzinfo is None:           # feedparser returns naive datetimes
                pub = pub.replace(tzinfo=timezone.utc)
            cutoff = datetime.now(timezone.utc) - timedelta(hours=self.max_age_hours)
            if pub < cutoff:
                return False, "stale"

        text_lower = text.lower()            # compute once, use below

        # Rule 4: hard exclusions (checked BEFORE keyword match)
        for exc in (excluded or []):
            if exc.lower() in text_lower:
                return False, f"excluded:{exc}"

        # Rule 5: urgency override — breaking news bypasses keyword check
        if self.urgency_keywords and any(kw in text_lower for kw in self.urgency_keywords):
            return True, "urgency_override"

        # Rule 6: keyword match — the core relevance gate
        if not any(kw.lower() in text_lower for kw in (keywords or [])):
            return False, "no_keyword_match"

        return True, "passed"
Why rules run in this specific order

Length and trust checks are placed first because they require zero string operations on the article text. Exclusions run before keywords so that a forbidden topic can't slip through on a keyword match. Urgency override is placed after exclusions — even breaking news gets dropped if it matches an exclusion.

Gate 2 — Haiku Relevance Scoring

File: services/intelligence-worker/stages/gate2_relevance.py

Uses Claude Haiku (the cheapest Anthropic model) to score article relevance on a 0–100 scale. Three cost optimisations make this viable at scale: batching, prompt caching, and minimal input.

Three cost optimisations

1. Batching

8 articles per API call. Instead of 100 calls for 100 articles, you make 13 calls. The model scores all 8 articles in one response using index-based JSON.

2. Prompt Caching

The system prompt (client profile) is identical across all batches in a run. Anthropic caches it after the first call. ~80% cost saving on subsequent calls.

3. Minimal Input

Only title + first 200 chars of summary are sent to the model. Not the full article — just enough context for relevance scoring.

Per-client model override

Each client can use a different Gate 2 model (GPT-4o-mini, Gemini Flash, DeepSeek). The provider abstraction makes this transparent — same interface regardless of provider.

How batching works

gate2_relevance.py Python
BATCH_SIZE = 8    # 8 articles per API call (sweet spot — larger batches confuse indexing)
MIN_SCORE  = 60   # articles below this score are dropped

def _score_batch(provider, articles, client_profile, prompt_template):
    # Build the batch text — each article gets an index number.
    # Haiku uses these indexes in its JSON response: {"scores": [{"index": 0, "score": 78, ...}]}
    articles_text = "\n---\n".join([
        f"[{i}] TITLE: {a['title']}\nSUMMARY: {(a.get('summary') or '')[:200]}"
        for i, a in enumerate(articles)
    ])

    # The system prompt contains the CLIENT'S PROFILE — same for all batches in a run.
    # Caching this is the key cost saving.
    system_prompt = prompt_template.format(
        industry_type    = client_profile.get("industry_type", "general"),
        target_geo       = ", ".join(client_profile.get("target_geo") or ["global"]),
        keywords         = ", ".join(client_profile.get("keywords") or []),
        excluded_topics  = ", ".join(client_profile.get("excluded_topics") or []),
    )

    # cache_system=True adds cache_control to the system prompt.
    # After the first API call, Anthropic serves the system prompt from cache.
    response = provider.complete(
        system=system_prompt,
        user=f'Score each article 0-100. Return JSON: {{"scores": [{{"index": 0, "score": N, "matched_keywords": []}}]}}\n\n{articles_text}',
        max_tokens=256,
        cache_system=True,   # ← the magic flag
    )

    result = json.loads(response.text)
    scores = {s["index"]: s for s in result.get("scores", [])}

    # Filter: keep only articles above MIN_SCORE threshold
    passed = []
    for i, article in enumerate(articles):
        score = scores.get(i, {}).get("score", 0)
        if score >= MIN_SCORE:
            article["relevance_score"]  = score
            article["matched_keywords"] = scores[i].get("matched_keywords", [])
            passed.append(article)
    return passed

def run_gate2(provider, articles, client_profile, prompt_template):
    passed = []
    # Iterate in steps of BATCH_SIZE: 0, 8, 16, 24, ...
    for i in range(0, len(articles), BATCH_SIZE):
        batch = articles[i:i + BATCH_SIZE]
        passed.extend(_score_batch(provider, batch, client_profile, prompt_template))
    return passed
Token cost breakdown for 100 articles/day

Without batching: 100 calls × full prompt = high cost.
With batching: 13 calls × (system prompt cached after call 1) = ~80% saving on 12 of those 13 calls.
Result: Gate 2 costs ~$0.30/month per client — pennies.

Gate 3 — Signal Detection

File: services/intelligence-worker/stages/signal_detection.py

Zero-cost filter that combines two independent signals. Both are free to compute. An article needs a high combined score before we spend $0.015 on Sonnet generation.

Signal 1: Source Spread (our own data)

Counts how many distinct sources covered the same topic in the last 2 hours. Uses PostgreSQL full-text search on our own news_items table — zero external calls.

signal_detection.py Python
def _source_spread(conn, title: str, hours: int = 2) -> int:
    # Extract 5 meaningful words from the title for full-text matching.
    # Filter out short stop-words ("the", "and", "for") — they match everything.
    words = [w.strip(".,!?\"'") for w in title.split() if len(w) > 3][:5]

    # OR query: article matches if ANY keyword appears (broad catch)
    tsquery = " | ".join(words)

    with conn.cursor() as cur:
        cur.execute("""
            SELECT COUNT(DISTINCT source_id)
            FROM news_items
            WHERE to_tsvector('english', title) @@ to_tsquery('english', %s)
              AND published_at > NOW() - INTERVAL '%s hours'
        """, (tsquery, hours))
        return cur.fetchone()[0] or 1

# Map raw count to a 0–100 score
def _spread_score(count: int) -> int:
    if count >= 5:  return 100   # 5+ sources = definitely breaking
    if count >= 3:  return 60    # trending across outlets
    if count >= 2:  return 40    # gaining traction
    return 20                    # isolated report

Signal 2: Google Trends SEO Opportunity

Queries Google Trends for search interest on the client's matched keywords, geo-filtered to their location. Redis-cached for 24 hours — same keyword pair = one API call.

Combined Score Formula

signal_detection.py Python
trend_score = (spread_score * 0.6) + (seo_opportunity * 0.4)
# 60/40 weighting: spread is more reliable (our own data)
# Trends complements with demand-side intent but can be rate-limited

# Urgency detection overrides the score threshold entirely:
# - "breaking" / "emergency" in title → urgency = "breaking" → Gate 3 bypassed
# - 3+ sources covering topic       → urgency = "high"     → Gate 3 bypassed
Fail-Open Design

If Google Trends is unavailable, seo_opportunity defaults to 50 (neutral). A Trends outage never blocks content generation. This is called "fail-open" design — the default is to continue, not to stop.

Gate 4 — Sonnet Content Generation

File: services/intelligence-worker/stages/gate4_generation.py

The most expensive step and the core product. Claude Sonnet generates 6 content formats in one API call, selecting the best angle from 8 options based on the news type, competitor gaps, and client voice.

What one Gate 4 call produces

OutputDescription
blog.titleSEO H1 — question format, keyword in first 8 words, year suffix
blog.slugURL slug derived from title
blog.meta_description150–160 chars for Google SERPs
blog.body_markdown1,200–3,500 word article with mandatory section structure
blog.faq_schemaExactly 5 Q&A pairs for Google FAQ rich results
linkedin_postPlatform-optimised, shorter format
twitter_threadArray of tweets (ready for Twitter/X API)
newsletter_snippet2–3 sentences for Facebook / email
selected_angleWhich of the 8 angles Claude chose (stored for analytics)

Geo-skip logic

Before generating, the model checks if the news is actually relevant to the client's geography. If not, it outputs a skip signal instead of a draft — saving $0.015 and preventing irrelevant content.

gate4_generation.py (output schema) JSON
# If Claude decides the news is irrelevant to the client's geo:
{"selected_angle": "skip", "reason": "geo_not_impacted"}

# If it decides to generate:
{
  "selected_angle": "local_impact",
  "blog": {
    "title":            "Why should Australian mortgage brokers reconsider fixed rates in 2026?",
    "slug":             "australian-mortgage-brokers-fixed-rates-2026",
    "meta_description": "The RBA's latest decision changes the fixed vs variable ...",
    "body_markdown":    "...(full article 1200-3500 words)...",
    "keywords":         ["mortgage broker", "fixed rate", "RBA 2026"],
    "faq_schema":       [{"question": "...", "answer": "..."}, ...]
  },
  "linkedin_post":      "...",
  "twitter_thread":     ["tweet 1", "tweet 2", ...],
  "newsletter_snippet": "..."
}

Mandatory article structure

Every generated article must contain these sections in order. Claude is explicitly instructed to follow this structure — it's part of the Gate 4 system prompt stored in values.yaml.

Mandatory section order
sections = [
    "Quick Answer (40–60 words, no heading, standalone prose)",
    "What You Will Learn (4–6 bullets)",
    "What Is [Topic]? (80–120 words)",
    "Why Does [Problem] Happen? (100–150 words, 4–6 bullets)",
    "At-a-Glance Summary (Markdown table, 5–8 rows)",
    "How to [Solve It] (200–300 words, numbered H3 steps)",
    "What Happens If You Ignore This? (80–120 words, 3–5 bullets)",
    "",          # Pexels image placeholder
    "Common Mistakes to Avoid (table: Mistake | Why | What to Do Instead)",
    "Expert Tips (100–150 words, ≥2 tips with measurable checks)",
    "",          # second image
    "Frequently Asked Questions (exactly 5 FAQs)",
    "Key Takeaways (60–80 words, 4–5 bullets)",
    "References (3–5 entries as [Title](URL))",
]

Ingestion Worker

File: services/ingestion-worker/main.py

Runs continuously as a Kubernetes Deployment (not a CronJob — it needs sub-minute responsiveness). Every 2 minutes it polls all active RSS sources for all clients and runs Gate 1.

The poll loop

ingestion-worker/main.py (simplified) Python
async def poll_loop():
    while True:
        # Fetch all active, non-quarantined sources from Postgres
        sources = get_active_sources(conn)

        for source in sources:
            articles = fetch_rss(source["feed_url"])   # parse RSS/Atom feed

            for article in articles:
                url_hash = sha256(normalize_url(article["url"])).hexdigest()

                # Deduplication: skip if we've already processed this URL
                if url_hash in seen_hashes:
                    continue

                # Gate 1: zero-cost rules filter (per-client)
                for client in source["clients"]:
                    passes, reason = gate1.check(article, client["keywords"], client["excluded"])
                    if passes:
                        # Publish to Redis Stream for intelligence-worker to consume
                        redis.xadd("news.filtered", {
                            "article_id": article_id,
                            "client_id":  client["id"],
                            "reason":     reason,
                        })

        await asyncio.sleep(POLL_INTERVAL_SECONDS)   # default: 120s

Source quarantine system

Every RSS source is tracked for consecutive failures. After 3 failures, it's quarantined with exponential backoff. The system automatically tries to find a replacement feed.

Quarantine #DurationRecovery
1st time6 hoursAuto-retry after expiry
2nd time12 hoursAuto-retry after expiry
3rd time24 hoursAuto-retry after expiry
4th time48 hoursAuto-retry after expiry
5th time96 hoursAuto-retry after expiry
6th+168 hours (7 days)Manual restore from admin

3-tier replacement feed discovery

When a source is quarantined, the system immediately searches for a replacement — no admin intervention required.

T1
Same-site alternate URLs

Scrapes the dead source's homepage for <link rel="alternate"> RSS tags. Also probes common paths: /feed, /rss, /rss.xml, /atom.xml.

T2
Google News RSS search

Searches news.google.com/rss/search?q={source_name}+{industry}. Extracts publisher domains from results, probes top 8 for native RSS feeds. No API key needed.

T3
Platform default sources

Falls back to the default_sources DB table — curated industry sources not already assigned to this client. Always available, always a working feed.

URL deduplication

ingestion-worker/main.py Python
def normalize_url(url: str) -> str:
    """Strip UTM/tracking params so the same article isn't processed twice
    if it appears with different tracking params in different RSS feeds."""
    from urllib.parse import urlparse, urlencode, parse_qsl
    parsed = urlparse(url)
    # Keep only non-tracking query params (strip utm_*, fbclid, etc.)
    clean_params = [(k, v) for k, v in parse_qsl(parsed.query)
                    if not k.startswith(("utm_", "fbclid", "gclid", "ref"))]
    return parsed._replace(query=urlencode(clean_params)).geturl()

# URL hash is stored in news_items table — SHA256 of the normalized URL
url_hash = hashlib.sha256(normalize_url(article["url"]).encode()).hexdigest()

Intelligence Worker

File: services/intelligence-worker/main.py

The brain of the platform. Consumes the news.filtered Redis Stream and orchestrates the full Gates 2–4 pipeline for each article. Uses Redis consumer groups so no message is ever processed twice — even if the worker crashes and restarts mid-batch.

Pipeline orchestration

intelligence-worker/main.py (simplified flow) Python
async def process_message(msg, client_id, article):
    client = get_client_profile(client_id)

    # 1. Gate 2 — Haiku relevance scoring (batched, prompt-cached)
    relevant = run_gate2(provider, [article], client, PROMPT_RELEVANCE)
    if not relevant:
        ack(msg); return

    # 2. Gate 3 — Signal detection (spread + Google Trends)
    signal = detect_signal(conn, article["title"], client["target_geo"])
    if signal.trend_score < GATE3_MIN_TREND_SCORE and signal.urgency == "normal":
        ack(msg); return

    # 3. Competitor analysis — what angles have competitors taken?
    comp = analyze_competitors(conn, article, client)
    # comp.avoid_angles = ["local_impact", "action_list"]
    # comp.trend_score_boost = 15 (first-mover bonus)

    # 4. Evidence pipeline
    enrichment = quick_enrich(article, client["keywords"])          # Tier 1: Serper
    evidence   = gather_evidence(article, enrichment, llm_haiku)    # Tier 2: deep pack

    # 5. Topic clustering — find related published articles for internal links
    cluster = get_cluster_links(conn, article, client_id)

    # 6. Gate 4 — Sonnet generation
    draft = run_gate4(
        provider=llm_sonnet,
        article=article,
        client=client,
        comp_analysis=comp,
        evidence_pack=evidence,
        cluster_links=cluster,
    )

    if draft.get("selected_angle") == "skip":
        ack(msg); return    # geo_not_impacted — skip silently

    # 7. Save to Postgres, publish to content.drafts stream
    save_draft(conn, draft, client_id)
    redis.xadd("content.drafts", {"draft_id": draft["id"], "client_id": client_id})
    ack(msg)  # ← critical: only ack AFTER successful save
Why consumer groups matter

With consumer groups, Redis tracks which messages have been acknowledged (ACK'd). If the worker crashes between processing and ACK'ing, Redis re-delivers the message when the worker restarts. No message is ever permanently lost — the pipeline is crash-safe.

Draft limits per plan

Before Gate 4, the worker checks the client's daily and weekly draft limits (from PLAN_LIMITS_JSON in the ConfigMap). This prevents the pipeline from generating more content than the client can review.

Approval Service

File: services/approval-service/main.py

FastAPI web application that serves the client dashboard, admin panel, and email approval workflow. Clients never see raw AI output — everything goes through human approval first.

The approval workflow

1

Daily digest email

A CronJob triggers /send-digest each morning. For each client with pending drafts, an email is sent with approve/reject/edit links for each draft.

2

HMAC-signed links (no login required)

Each approve/reject/edit link contains an HMAC-SHA256 token. Clients can approve content from their email inbox without logging in. Links expire after 7 days.

3

Optional editing

The edit link opens a tabbed editor: blog post (with character counters), LinkedIn post, Twitter thread. Clients can tweak the AI output before publishing.

4

Publishes to Redis Stream

On approval, the service writes to content.approved. Publisher-worker picks this up and distributes to WordPress, Dev.to, etc.

HMAC token format

main.py — HMAC approval links Python
REVIEW_SECRET = os.environ["REVIEW_SECRET"]  # from K8s Secret

def create_review_token(draft_id: str, action: str) -> str:
    """Create a signed URL token. Format: {token}:{action}:{expiry}"""
    expiry = int(time.time()) + 7 * 24 * 3600    # 7 days from now
    payload = f"{draft_id}:{action}:{expiry}"
    sig = hmac.new(REVIEW_SECRET.encode(), payload.encode(), hashlib.sha256).hexdigest()
    return f"{sig}:{action}:{expiry}"

def verify_review_token(token: str, draft_id: str) -> tuple[bool, str]:
    """Verify a token from an email link. Returns (valid, action)."""
    try:
        sig, action, expiry = token.split(":")
        if int(expiry) < time.time():
            return False, ""     # expired

        payload = f"{draft_id}:{action}:{expiry}"
        expected = hmac.new(REVIEW_SECRET.encode(), payload.encode(), hashlib.sha256).hexdigest()

        # Constant-time comparison — prevents timing attacks
        if not hmac.compare_digest(sig, expected):
            return False, ""    # tampered

        return True, action
    except Exception:
        return False, ""

Publisher Worker

File: services/publisher-worker/main.py

Consumes content.approved and distributes to all publishing platforms the client has configured. WordPress always publishes first — its URL becomes the canonical URL for all subsequent platforms.

Retry logic

publisher-worker/main.py Python
MAX_RETRIES  = 3
RETRY_DELAYS = [5, 15, 30]    # seconds — exponential-ish backoff

for attempt in range(MAX_RETRIES):
    try:
        result = publisher.publish(draft, config)
        # Record success in publications table
        record_publication(conn, draft_id, platform, "published", result["url"])
        break

    except Exception as e:
        if attempt == MAX_RETRIES - 1:
            # All retries exhausted — send to dead-letter queue
            redis.xadd("content.failed", {"draft_id": draft_id, "error": str(e)})
            record_publication(conn, draft_id, platform, "failed", error=str(e))
        else:
            time.sleep(RETRY_DELAYS[attempt])

WordPress publisher — Markdown to HTML

publishers/wordpress.py Python
def publish(self, draft: dict, config: dict) -> dict:
    blog = draft["blog"]
    body = blog["body_markdown"]

    # Strip the FAQ section from body — it's added separately as structured HTML
    # to prevent duplicates (once inline, once as schema markup at the bottom).
    body_without_faq = strip_faq_section(body)

    # Convert Markdown to HTML (using markdown library)
    html_body = markdown.markdown(
        body_without_faq,
        extensions=["fenced_code", "tables", "nl2br"]
    )

    # Append FAQ as structured HTML (better for Google FAQ rich results)
    if blog.get("faq_schema"):
        html_body += build_faq_html(blog["faq_schema"])

    # WordPress REST API call
    response = requests.post(
        f"{config['site_url']}/wp-json/wp/v2/posts",
        auth=(config["username"], config["app_password"]),  # Application Passwords
        json={
            "title":   blog["title"],
            "slug":    blog["slug"],
            "content": html_body,
            "excerpt": blog.get("meta_description", ""),
            "status":  "publish",
            "categories": resolve_categories(config),
        }
    )
    return {"url": response.json()["link"]}

Redis Streams — The Message Bus

Redis Streams are the backbone of inter-service communication. They're more than a pub/sub queue — they're a durable, ordered, consumer-group-aware log of events.

Why Streams instead of HTTP?

PropertyDirect HTTP callsRedis Streams
Crash safety❌ Request lost if receiver is down✅ Message waits until consumer is ready
At-least-once delivery❌ Manual retry logic needed✅ Built-in — unacked messages re-delivered
Decoupling❌ Sender must know receiver's address✅ Services only know the stream name
Backpressure❌ Fast sender overwhelms slow receiver✅ Slow consumer naturally applies backpressure
Audit trail❌ No built-in history✅ Stream is an ordered log (inspectable)

Consumer groups explained

How consumer groups work Python / Redis CLI
# Create consumer group (run once at startup)
redis.xgroup_create("news.filtered", "intelligence-workers", id="0", mkstream=True)

# Read NEW messages (> means "messages after my last position")
messages = redis.xreadgroup(
    groupname="intelligence-workers",
    consumername="worker-pod-1",
    streams={"news.filtered": ">"},
    count=10,
    block=5000,   # block for up to 5 seconds waiting for new messages
)

# Process each message...
for stream_name, msg_list in (messages or []):
    for msg_id, fields in msg_list:
        try:
            process(fields)
            # ACK only AFTER successful processing
            # If this line is never reached (crash), Redis re-delivers the message
            redis.xack("news.filtered", "intelligence-workers", msg_id)
        except Exception as e:
            # Don't ACK on failure — message will be re-delivered
            log.error("Processing failed: %s", e)
Pending Entries List (PEL)

When a message is delivered but not yet ACK'd, Redis holds it in the Pending Entries List. If the worker crashes, these messages stay in PEL and are re-delivered when the worker restarts. This is how the pipeline survives pod crashes with zero data loss.

Inspecting streams from kubectl

Bash
# How many messages are waiting in the pipeline?
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen news.filtered
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen content.drafts

# How many messages are in the dead-letter queue?
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xlen content.failed

# Inspect last 5 messages in a stream
kubectl exec -n content-intelligence deploy/ci-redis -- redis-cli xrevrange news.filtered + - COUNT 5

Multi-Provider AI Architecture

File: services/intelligence-worker/providers/

Every AI call goes through a provider abstraction layer. The rest of the codebase calls provider.complete(system, user) — it never knows or cares which underlying model or API it's using.

The provider interface

providers/base.py Python
from abc import ABC, abstractmethod
from dataclasses import dataclass

@dataclass
class LLMResponse:
    text:               str
    input_tokens:       int
    output_tokens:      int
    cache_read_tokens:  int = 0   # Anthropic prompt caching
    cache_write_tokens: int = 0

class LLMProvider(ABC):
    @property
    @abstractmethod
    def model_id(self) -> str: ...

    @abstractmethod
    def complete(
        self,
        system:       str,
        user:         str,
        max_tokens:   int  = 512,
        cache_system: bool = False,   # Anthropic-specific, ignored by others
    ) -> LLMResponse: ...

Provider factory — model name → provider instance

providers/factory.py Python
def get_provider(model: str) -> LLMProvider:
    """Return the correct provider instance based on model name prefix."""
    api_key_map = {
        "claude-":     ("ANTHROPIC_API_KEY", AnthropicProvider),
        "gpt-":        ("OPENAI_API_KEY",    OpenAIProvider),
        "o1-":         ("OPENAI_API_KEY",    OpenAIProvider),
        "o3-":         ("OPENAI_API_KEY",    OpenAIProvider),
        "gemini-":     ("GOOGLE_API_KEY",    GoogleProvider),
        "deepseek-":   ("DEEPSEEK_API_KEY",  DeepSeekProvider),
    }

    for prefix, (env_var, ProviderClass) in api_key_map.items():
        if model.startswith(prefix):
            api_key = os.environ.get(env_var)
            if not api_key:
                raise EnvironmentError(f"{env_var} not set for model {model!r}")
            return ProviderClass(api_key=api_key, model=model)

    raise ValueError(f"Unknown model: {model!r}")

Per-client model override

Each client row has gate2_model and gate4_model columns. If set, they override the platform defaults. If their configured API key isn't set, the platform silently falls back to Anthropic — preventing one misconfigured client from breaking the entire pipeline.

intelligence-worker/main.py Python
def get_client_provider(client, gate: str) -> LLMProvider:
    """Get the provider for a client, with fallback to platform default."""
    model_field = f"gate{gate}_model"
    client_model = client.get(model_field)

    if client_model:
        try:
            return get_provider(client_model)
        except EnvironmentError:
            # API key not set — fall back silently, log warning
            log.warning("Client %s: %s API key missing, falling back to Anthropic",
                        client["id"][:8], client_model)

    # Platform default from ConfigMap
    default = RELEVANCE_MODEL if gate == "2" else GENERATION_MODEL
    return get_provider(default)
ProviderModelsEnv var required
Anthropic (default)claude-haiku-4-5, claude-sonnet-4-6ANTHROPIC_API_KEY
OpenAIgpt-4o, gpt-4o-mini, o1-*, o3-*OPENAI_API_KEY
Googlegemini-2.0-flash, etc.GOOGLE_API_KEY
DeepSeekdeepseek-chat, deepseek-reasonerDEEPSEEK_API_KEY

Prompt Caching — 80% Cost Saving

Anthropic's prompt caching lets you mark a portion of the prompt as "cache this". On subsequent API calls with the same cached prefix, Anthropic serves it from cache at ~10% of the normal input token cost.

How the AnthropicProvider implements it

providers/anthropic_provider.py Python
def complete(self, system: str, user: str, max_tokens: int = 512,
             cache_system: bool = False) -> LLMResponse:

    # Without caching: system is just a string
    # With caching: wrap it in a block with cache_control
    if cache_system:
        system_block = [{
            "type": "text",
            "text": system,
            "cache_control": {"type": "ephemeral"},  # ← this is the magic
        }]
    else:
        system_block = system   # plain string — no caching

    response = self._client.messages.create(
        model=self._model,
        max_tokens=max_tokens,
        system=system_block,
        messages=[{"role": "user", "content": user}],
    )

    return LLMResponse(
        text=response.content[0].text,
        input_tokens=response.usage.input_tokens,
        output_tokens=response.usage.output_tokens,
        # These fields tell you how the caching is performing:
        cache_read_tokens  = getattr(response.usage, "cache_read_input_tokens", 0) or 0,
        cache_write_tokens = getattr(response.usage, "cache_creation_input_tokens", 0) or 0,
    )

Why it saves ~80% for Gate 2

Token flow for 100 articles, 13 batches
# Batch 1 — system prompt is WRITTEN to cache (full price for system tokens)
# cache_write_tokens = 800 (the client profile system prompt)
# input_tokens       = 800 + 1200 (system + 8 article titles)

# Batch 2–13 — system prompt is READ from cache (10% of normal price)
# cache_read_tokens  = 800 (same system prompt, served from cache)
# input_tokens       = 1200 (only the 8 article titles — system not billed at full rate)

# Net saving: 12 batches × 800 tokens × 90% discount = 8,640 tokens saved
# At $0.00025/1K input tokens (Haiku): saves ~$0.002/run/client
# Across 365 days: ~$0.73/year/client from caching alone
When to use cache_system=True

Only when the system prompt is identical across multiple calls in the same session. Gate 2 qualifies perfectly — same client profile repeated across 13 batches. Gate 4 doesn't cache its system prompt because it varies per article (different evidence pack, different competitor context).

Auth & Security

File: services/approval-service/auth.py

The auth system has zero external dependencies — no PyJWT, no authlib. Everything is implemented with Python's standard library. This keeps the container image lean and eliminates supply-chain risk from auth libraries.

JWT implementation from scratch

auth.py — HS256 JWT (no PyJWT) Python
# JWT format: base64url(header) . base64url(payload) . base64url(signature)

def _b64(data: bytes) -> str:
    # URL-safe base64 with "=" padding stripped (JWT spec requires unpadded)
    return urlsafe_b64encode(data).rstrip(b"=").decode()

def create_jwt(client_id: str, email: str) -> str:
    header  = _b64(json.dumps({"alg": "HS256", "typ": "JWT"}).encode())
    payload = _b64(json.dumps({
        "client_id": client_id,
        "email":     email,
        "exp":       int(time.time()) + JWT_EXPIRY_MINS * 60,
        "iat":       int(time.time()),
    }).encode())

    signing_input = f"{header}.{payload}"
    sig = _b64(hmac.new(
        JWT_SECRET.encode(),
        signing_input.encode(),
        hashlib.sha256,
    ).digest())

    return f"{signing_input}.{sig}"

def decode_jwt(token: str) -> Optional[dict]:
    try:
        header, payload, sig = token.split(".")
        signing_input = f"{header}.{payload}"
        expected = _b64(hmac.new(JWT_SECRET.encode(), signing_input.encode(), hashlib.sha256).digest())

        # Constant-time comparison — prevents timing side-channel attacks
        if not hmac.compare_digest(sig, expected):
            return None   # tampered token

        data = json.loads(_unb64(payload))
        if data.get("exp", 0) < time.time():
            return None   # expired

        return data
    except Exception:
        return None       # never raises — bad tokens always return None

Password hashing

auth.py — PBKDF2 password hashing Python
def hash_password(password: str) -> str:
    """Hash a password for storage. Format: {hex_salt}:{hex_hash}"""
    salt = secrets.token_hex(16)   # 16 bytes of cryptographic randomness
    h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt.encode(), 260_000)
    return f"{salt}:{h.hex()}"

def _verify_password(password: str, stored_hash: str) -> bool:
    salt, hex_hash = stored_hash.split(":", 1)
    h = hashlib.pbkdf2_hmac("sha256", password.encode(), salt.encode(), 260_000)
    return hmac.compare_digest(h.hex(), hex_hash)  # constant-time

# 260,000 iterations: ~100ms on a modern CPU.
# This means an attacker can only try ~10 passwords/second per core.
# bcrypt would also work — PBKDF2 is chosen because it's in Python stdlib (no dependency).

Timing attack prevention

auth.py — dummy hash for non-existent users Python
_DUMMY_HASH = "dummy:000000000000000000000000000000000000000000000000000000000000000"

def _dummy_hash_check(password: str) -> None:
    """Run a full PBKDF2 computation even for non-existent users.

    Without this: login for unknown@email.com returns in 1ms (DB miss).
                  login for real@email.com returns in 100ms (hash computed).
    An attacker measures the difference to discover which emails are registered.

    With this: both paths take ~100ms regardless. Side channel eliminated.
    """
    hashlib.pbkdf2_hmac("sha256", password.encode(), b"dummy", 260_000)
    # Result is discarded — we only run this for its timing effect

Cookie security

auth.py — secure cookie configuration Python
response.set_cookie(
    key      = "ci_session",
    value    = token,
    httponly = True,       # JS cannot read this cookie — protects against XSS token theft
    secure   = True,       # browser only sends it over HTTPS — prevents network sniffing
    samesite = "lax",      # sent on top-level same-site navigations — CSRF protection
    max_age  = JWT_EXPIRY_MINS * 60,
)

Security summary

ThreatMitigation
XSS token thefthttponly=True on JWT cookie
Network sniffingsecure=True — HTTPS only
CSRFsamesite="lax" + form tokens
Timing attacks (login)Dummy PBKDF2 hash for missing users
Timing attacks (comparison)hmac.compare_digest everywhere
Password crackingPBKDF2-SHA256, 260k iterations, per-user salt
Forged approval linksHMAC-SHA256 signed, 7-day expiry
Cross-tenant data leakclient_id from JWT only — never from request body

Multi-tenancy — Tenant Isolation

Every table with client data has a client_id UUID column. The critical rule: client_id is sourced from the signed JWT only. It is never trusted from the request body.

The middleware pattern

How client_id is enforced in every route Python
# Every protected route extracts client_id from the JWT:
@router.get("/dashboard")
def dashboard(request: Request):
    client = require_client(request)         # decodes JWT, returns payload dict
    client_id = client["client_id"]          # ← from JWT signature, not request params

    # All DB queries are scoped to this client_id
    drafts = get_drafts(conn, client_id)     # SELECT ... WHERE client_id = %s
    return render("dashboard.html", drafts)

# What an attacker CANNOT do:
# GET /dashboard?client_id=  → client_id from URL is IGNORED
# POST /approve with body {"client_id": "..."}  → body client_id is IGNORED
# The client_id is read exclusively from the signed cookie/Bearer JWT.

Database schema (tenant isolation)

migrations/001_initial.sql (simplified) SQL
-- One row per tenant
CREATE TABLE clients (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    business_name TEXT NOT NULL,
    email         TEXT UNIQUE NOT NULL,
    industry_type TEXT,
    target_geo    JSONB,
    keywords      JSONB,
    active        BOOLEAN DEFAULT TRUE
);

-- All tenant-scoped tables have client_id FK
CREATE TABLE content_drafts (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id     UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
    title         TEXT,
    body_markdown TEXT,
    status        TEXT DEFAULT 'pending',    -- pending / approved / rejected / published
    created_at    TIMESTAMPTZ DEFAULT NOW()
);

-- news_items is SHARED across all clients (Gate 1 runs once per article)
-- client_relevance maps articles to clients (Gate 2 runs per-client)
CREATE TABLE client_relevance (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id       UUID NOT NULL REFERENCES clients(id) ON DELETE CASCADE,
    news_item_id    UUID NOT NULL REFERENCES news_items(id),
    relevance_score INT,
    processed       BOOLEAN DEFAULT FALSE
);
The Critical Isolation Invariant

One bug that trusts client_id from the request body instead of the JWT could leak every tenant's data to any other tenant. The enforcement is at the application layer — there's no database-level row security (yet). Every developer must follow the pattern: always get client_id from the decoded JWT.

Evidence Pipeline

Before Gate 4 runs, the platform searches for real supporting evidence so Claude can cite actual sources and verified figures — not hedged vague claims.

Tier 1 — Quick Serper enrichment

A short Serper API search on {article title} + {top 3 client keywords} returns 5 snippets. Injected into Gate 4 as lightweight context. Degrades to [] if SERPER_API_KEY is not set.

stages/enrichment.py Python
def quick_enrich(article: dict, keywords: list[str]) -> list[dict]:
    """Fetch 5 Serper snippets for evidence context. Degrades gracefully."""
    api_key = os.environ.get("SERPER_API_KEY")
    if not api_key:
        return []    # feature disabled — Gate 4 runs without enrichment

    query = f"{article['title']} {' '.join(keywords[:3])}"
    try:
        resp = requests.post(
            "https://google.serper.dev/search",
            headers={"X-API-KEY": api_key},
            json={"q": query, "num": 5},
            timeout=10,
        )
        results = resp.json().get("organic", [])
        return [{"title": r["title"], "snippet": r["snippet"],
                 "url": r["link"], "source": r.get("displayLink")}
                for r in results]
    except Exception:
        return []    # any error → degrade gracefully, never block Gate 4

Tier 2 — Deep Haiku evidence gathering

When enabled, Haiku runs 5–10 targeted searches, fetches and strips HTML from source pages, then classifies each source and extracts claims with confidence levels.

stages/evidence_gathering.py — what it produces Python
# Haiku produces a structured evidence pack:
evidence_pack = {
    "verified_claims": [
        {
            "claim":      "The RBA raised rates by 25bps to 4.35%",
            "source":     "RBA official statement",
            "confidence": "high",
            "safe_phrasing": "According to the RBA's official statement...",
        }
    ],
    "claims_to_avoid": [
        {
            "claim":  "Rates will fall by end of 2024",
            "reason": "Prediction without verifiable source"
        }
    ],
    "recommended_references": [
        {"title": "RBA Rate Decision — May 2026", "url": "https://rba.gov.au/..."}
    ],
    "source_classifications": [
        {"url": "...", "type": "government_or_regulator", "allowed_to_use": True}
    ]
}

# This entire pack is injected into the Gate 4 system prompt.
# Gate 4 is instructed: "Only use statistics and dates from VERIFIED CLAIMS.
#                        Never use anything in CLAIMS TO AVOID."

Content Angles — The Core Differentiator

Competitors rewrite news. This platform generates opinionated, differentiated content. Claude selects the best angle for each article+client combination from 8 options — and avoids angles that competitors have already taken.

local_impact
News that directly affects the client's geography. NOT for global events.
action_list
News requiring immediate business response from the reader.
contrarian
The consensus headline misses a deeper or opposite implication.
faq_explainer
Complex topic the client's audience needs to understand clearly.
educational
Complex concept for a non-expert audience — explain from first principles.
expert_commentary
High-profile event — positions the client as an industry authority.
emotional_hook
News with direct personal or financial impact on the reader.
opinionated
Significant event — a bold take that builds client authority and recall.

Competitor angle avoidance

stages/competitor_analysis.py Python
# Competitor angles are inferred from title patterns — no AI cost
ANGLE_PATTERNS = {
    "local_impact":       [r"what .+ means for .+", r"how .+ affects .+"],
    "action_list":        [r"\d+ things? (to|you should)", r"what (to do|businesses should)"],
    "contrarian":         [r"why .+ (is|might be) wrong", r"the truth about"],
    "faq_explainer":      [r"everything you need to know", r"what is .+ and why"],
    "expert_commentary":  [r"why .+ matters", r"what .+ means for the industry"],
}

def infer_competitor_angle(title: str) -> Optional[str]:
    title_lower = title.lower()
    for angle, patterns in ANGLE_PATTERNS.items():
        if any(re.search(p, title_lower) for p in patterns):
            return angle
    return None

# Result: competitor analysis returns avoid_angles = ["local_impact", "faq_explainer"]
# These are injected into the Gate 4 prompt:
# "DO NOT use local_impact — Competitor X already published that angle.
#  DO NOT use faq_explainer — Competitor Y already published that angle.
#  Choose a different angle that provides unique value."

H1 title rules (hardcoded in the prompt)

Gate 4 system prompt — title constraints
H1_RULES = """
H1 TITLE RULES (mandatory):
- Must be a question format: How / Why / What / When / Should / Can / Is
- Primary keyword must appear within the first 8 words
- Include "2026" for informational, how-to, FAQ, and local SEO articles

FORBIDDEN TITLE PATTERNS (never use these):
- "5 things", "5 steps", "10 ways" (numbered lists)
- "Understanding [topic]"
- "Everything you need to know about"
- "changes everything" / "ultimate guide"
- "What X needs to know" (where X = the reader's role)
"""

# Example of good titles:
# "Why should Australian mortgage brokers rethink fixed rates in 2026?"
# "How does the RBA cash rate affect first-home buyers in Brisbane in 2026?"
# "What do cybersecurity teams need to know about the new APRA ruling?"
Why these specific forbidden patterns?

"5 things" articles are commoditised — every content farm produces them. "Understanding X" signals generic educational content, not actionable expert advice. "Everything you need to know" is overused and Google's helpful content guidelines penalise these patterns. The question-format rule is based on SEO data showing that question-format titles consistently outperform declarative titles for featured snippets.