Skip to main content

ADR-016: Adapter Pipeline (Workers + Queues + Go Outline Parser)

  • Status: Accepted
  • Date: 2026-02-04

Context

We ingest course catalogs, prerequisites, and outlines from multiple institutions. Some sources are HTML; others include PDFs and unstructured documents.

Constraints:

  • Cloudflare Workers request handlers have tight CPU limits.
  • Cloudflare Queue consumers can perform heavier processing than HTTP handlers.
  • We want a hybrid approach: Workers for orchestration + ingestion; Go service for complex outline parsing.

Decision

Adapter architecture

  • Adapter Workers (TypeScript): Handle orchestration, HTTP fetching, lightweight parsing, queue production
  • Go Outline Parser Service: Handles heavy PDF/unstructured document parsing, called via Cloudflare Queues

Folder structure

tutornexus/
├── apps/
│ ├── web/ (React frontend)
│ └── api/ (Workers backend)
├── services/
│ ├── mcp/ (Go MCP server - subtree)
│ └── adapters/ (TypeScript adapter Workers)
├── tools/
│ └── cli/ (Rust CLI - subtree)
├── docs/
└── storage/ (R2 for raw artifacts)

Worker-based ingestion

  • Each institution adapter runs as a separate Cloudflare Worker module.
  • Scheduled runs are triggered via Cron.
  • Adapters enqueue work items to Cloudflare Queues.
  • Queue consumers fetch source documents and parse/normalize results.

Outline parsing (Go service)

Queue consumers invoke a dedicated Go service for:

  • PDFs
  • Unstructured outlines
  • Expensive parsing tasks

Deployment:

  • Render free tier for dev/staging
  • Fly.io for production

Storage

  • R2: Store large raw artifacts (HTML, PDFs, images)
  • Adapter-specific D1 (tn-adapter-*):
    • Raw fetch metadata (URLs, ETags, timestamps)
    • Parse outputs and adapter-local indices
    • Cached normalized data
  • Main D1 (tn-courses): Normalized course catalog

Reliability and idempotency

  • Queue jobs are idempotent and keyed by a stable content hash.
  • Retries must not create duplicate normalized records.
  • Failed jobs:
    • Retry N times with exponential backoff
    • Then move to dead-letter queue (DLQ) for manual review

Initial adapters (Phase 1)

AdapterInstitutionNotes
tn-adapter-sfuSimon Fraser UniversityPriority
tn-adapter-langaraLangara CollegePriority
tn-adapter-ubcUniversity of BCPriority
tn-adapter-douglasDouglas CollegePriority
tn-adapter-truThompson Rivers UniversityPriority
tn-adapter-bctransferBC Transfer GuideTransfer data

Additional adapters may be added in future phases.

Canonical normalized schema

Unified schema for course data with optional fields:

interface NormalizedCourse {
courseRef: string; // "institution:code" (e.g., "sfu:CMPT 120")
institution: string; // "sfu", "langara", etc.
code: string; // "CMPT 120"
title: string;
description: string;
credits: number; // Original credits
equivalentCredits?: number; // Credits when transferred
prerequisites: string[]; // Course references
corequisites: string[];
antirequisites: string[];
equivalentTo?: string[]; // Equivalent courses
conditions?: string; // "Grade of B or better"
validFrom?: string; // Term code or date
validUntil?: string;
updatedAt: string; // RFC3339
sourceUrls: string[]; // For evidence
}

interface NormalizedOutline {
courseRef: string;
learningOutcomes: string[];
gradingScheme: {
component: string;
weight: number;
}[];
topics: string[];
requiredTexts?: string[];
recommendedTexts?: string[];
policies?: string;
}

Consequences

  • More moving parts (Workers + Queues + Go service).
  • Better reliability and performance for heavy parsing.
  • Clear separation between ingestion (adapter DBs) and product catalog (tn-courses).
  • Raw artifacts in R2 reduce D1 storage costs.

Alternatives considered

  • All-Go scrapers: higher ops cost, less Workers-native.
  • All-Workers parsing: limited by runtime for heavy PDF/unstructured parsing.

Implementation notes

  • Adapter schema may evolve; optional fields allow backward compatibility.
  • Dead-letter queue requires monitoring and manual intervention process.
  • BC Transfer Guide adapter requires attribution in UI.