Skip to main content

ADR-018: Quotas and Entitlements

  • Status: Accepted
  • Date: 2026-02-04

Context

We need quotas to control cost and abuse across text and voice.

Decisions already made:

  • Anonymous: 10 prompts/day per IP
  • Free signed-in: 100 prompts/day per user
  • Paid: 500 prompts/day per user
  • Voice is not available to anonymous users
  • Subscriptions are managed via Polar ADR-010

Decision

Quota tiers

TierPrompts/dayVoiceAccess
Anonymous10NoIP-hashed
Free100Yes (counts as prompts)Authenticated user
Paid500Yes (counts as prompts)Authenticated + Polar subscription

What is counted

  • Prompt: Any user-originated request that triggers an LLM call.
  • Voice request: Counts as 1 prompt (voice seconds tracked separately for analytics).
  • Streaming response: Still counts as 1 prompt total, not per-token.
  • Partial/retry requests: Free up to 3 attempts per user action. After 3 retries, additional attempts count as new prompts.

Reset boundary

  • UTC midnight: Quotas reset at 00:00 UTC each day.
  • Rationale: Simpler implementation, consistent behavior across timezones.

BYOK policy

  • BYOK usage is unlimited (user pays for their own API usage).
  • Quota limits apply to Tutor Nexus platform calls (RAG, MCP tools, etc.), not to user's own LLM API calls.
  • Rationale: BYOK users already pay for API costs; limiting their usage adds friction without reducing platform costs.

Voice entitlements

  • Voice requests count against the same prompt quota.
  • No separate voice minute allocation.
  • Rationale: Simpler quota management.

Enforcement

  • Authoritative quota usage stored in D1 (tn-sessions).
  • Preflight checks: DO performs quota verification before accepting requests.
  • Rate limiting: DO enforces per-connection rate limits.
  • Concurrency safety: Usage increments use idempotent keys (prevents double-counting).

Identity keys

TierKey typeExample
AnonymousIP hash (privacy-preserving)sha256(ip + salt)
FreeUser ID (ULID)01HXYZ...
PaidUser ID + subscription ID01HXYZ... + sub_ABC...

Quota tracking schema

interface QuotaUsage {
id: ULID;
userId?: ULID; // null for anonymous
ipHash?: string; // null for authenticated
tier: "anonymous" | "free" | "paid";
date: string; // YYYY-MM-DD (UTC)
promptsUsed: number;
lastUpdated: string; // RFC3339
}

interface QuotaLimit {
tier: string;
promptsPerDay: number;
voiceEnabled: boolean;
}

API response for quota check

interface QuotaStatus {
tier: string;
promptsRemaining: number;
promptsTotal: number;
resetsAt: string; // RFC3339 UTC
voiceEnabled: boolean;
}

Example responses

Within quota:

{
"tier": "free",
"promptsRemaining": 85,
"promptsTotal": 100,
"resetsAt": "2026-02-05T00:00:00Z",
"voiceEnabled": true
}

Over quota:

{
"code": "RATE_LIMITED",
"message": "Daily prompt limit exceeded",
"details": {
"reason": "User has used 100/100 prompts today",
"retryAfter": 43200,
"limit": {
"current": 100,
"max": 100,
"window": "day"
}
},
"requestId": "01JXYZ123ABC"
}

Consequences

  • Clear, predictable quota behavior for users.
  • Simpler implementation (UTC midnight reset).
  • BYOK users have frictionless experience.
  • Voice integrated naturally into existing quota system.

Alternatives considered

  • Token-based quotas: more accurate but harder to explain and implement early.
  • Local timezone reset: better UX but more complex implementation.
  • BYOK quotas: adds friction without platform benefit.

Implementation notes

  • IP hash should use per-deployment salt for privacy.
  • Quota checks must be fast (cache frequently accessed data in DO).
  • Consider graduated rate limiting (warn at 80%, block at 100%).