WAGERBABE DOCS
PRDProduct Requirements Document

WagerBabe Scaling Initiative - Product Requirements Document Author: babe

Date: January 13, 2025 (Created) | Updated: January 13, 2025 Version: 1.1 (Updated with Phase 1 Weeks 1-2 completion status) --- ## Executive Summary WagerBabe is a modern sports betting platform built to scale from 145 to 50,000 concurrent users over 1-2 years. The platform has secured committed agents ready to migrate their user bases from legacy competitors (primarily Buckeye), but technical infrastructure capacity is the current bottleneck. This PRD defines requirements for a 4-phase enterprise-grade infrastructure scaling initiative that enables controlled user onboarding while maintaining mobile-first performance, agent hub functionality, and FAANG-level operational excellence. With funding and marketing secured, the focus is pure execution: build infrastructure capacity that gates user growth, not the reverse. ### What Makes This Special Mobile-First, Built by Agents for Agents: Unlike legacy competitors stuck with desktop-first UIs from 2010, WagerBabe is built from the ground up for mobile (68% of traffic) by people who actually understand agent workflows. The result is a modern, fast, intuitive platform that delights both end users and the agents managing their books - creating a sustainable competitive moat through superior user experience. --- ## Current Status (January 13, 2025) Project Completion: 80% Complete Current Capacity: 10,000-15,000 Concurrent Users Performance Achievement: 20x improvement from baseline ### Phase 1 Progress: Weeks 1-2 COMPLETE ** Week 1 COMPLETE: Database & Connection Pooling**

  • Database connection pooling (asyncpg: 10-50 connections)
  • PgBouncer architecture validated
  • 200x capacity increase (50 -> 10,000-15,000 users) ** Week 2 COMPLETE: Redis Distributed Caching**
  • HOT/WARM/COLD tiered caching strategies implemented
  • 95%+ cache hit ratio achieved
  • API response time: 2-5ms (cached), 50-100ms (uncached)
  • Sub-10ms JWT authentication via Redis session cache
  • 98% reduction in external API calls (Odds API cost optimization) ### Current Focus: Weeks 3-4 (In Progress) ** Week 3 URGENT: Database Crisis Resolution**
  • CRITICAL: Archive historical odds (database at 97.4% capacity -> target <70%)
  • ⏳ Create materialized views for sidebar optimization (<100ms target)
  • ⏳ Query optimization (eliminate N+1 queries, add missing indexes) ** Week 4 PLANNED: Validation & Performance Tuning**
  • Load testing at 10k-15k concurrent users
  • Virtual scrolling enhancements for sports sidebar
  • Mobile performance validation (Lighthouse Mobile Score >90)
  • Final Phase 1 sign-off before agent onboarding begins ### What's Complete (80%) ** Core Platform Features:**
  • Authentication & user management (JWT + Better Auth + Redis session caching)
  • Agent Hub (customer management, cashier, Tuesday settlements, reporting)
  • Betting features (odds display, betting slip, parlay calculation, real-time updates)
  • Sports catalog & navigation (virtual scrolling sidebar, filtering, prioritization)
  • WebSocket real-time odds updates (30s latency for live games) ** Infrastructure & Performance:**
  • 10,000-15,000 concurrent user capacity (20x baseline improvement)
  • Redis distributed caching (95%+ hit ratio, HOT/WARM/COLD strategies)
  • Database connection pooling (asyncpg with 10-50 connections)
  • API performance: <200ms p95 latency
  • Zero-downtime deployments (Railway infrastructure) ### What's Remaining (20%) ** Critical Blockers:**
  • Database archival (97.4% capacity - URGENT Week 3 priority)
  • Sidebar query optimization (materialized views for <100ms loads) ⏳ Performance Validation:
  • Load testing to validate 10k-15k capacity under sustained load
  • Mobile performance metrics validation (Lighthouse CI)
  • Virtual scrolling performance tuning ** Future Phases (Not Blocking Launch):**
  • Phase 2: WebSocket horizontal scaling (25k users)
  • Phase 3: Read replicas, CDN (50k users)
  • Phase 4: Multi-region, microservices (100k+ users) For detailed architecture and scaling decisions, see: docs/architecture.md --- ## Project Classification Technical Type: Full-Stack Web Application (Next.js 15 + FastAPI) Domain: Sports Betting Platform Complexity: Enterprise-grade scaling with high reliability requirements Architecture:
  • Client: Next.js 15 with React 19, mobile-first responsive design, PWA capabilities
  • Server: FastAPI (Python 3.13) with JWT authentication
  • Database: PostgreSQL via Supabase (managed service)
  • Caching: Redis 5.0+ for tiered caching and WebSocket pub/sub
  • Deployment: Railway (application) + Supabase (database) + Redis Cloud Business Model:
  • $4-6 PPH (per head) revenue from agent fees
  • 2 master agents committed with sub-agent networks
  • Target: 10k users in 2-3 months, 50k users in 1-2 years
  • Infrastructure cost: <1% of revenue at scale (99.5% gross margin) --- ## Success Criteria ### Phase 1 Target (2-3 Months): 1,000-10,000 Users What Success Looks Like: Success means agents confidently migrate their users because the platform is rock-solid reliable. Zero tolerance for critical bugs during onboarding. Each 1,000-user batch proves the system can handle more before the next wave arrives. Technical Success Indicators:
  • API Performance: 95% of requests complete in <200ms
  • Cache Efficiency: >90% cache hit rate (minimal API quota usage)
  • Database Health: All queries <50ms at p95
  • System Stability: 99.9% uptime (≤43 minutes downtime/month)
  • Zero Critical Incidents: No data loss, no prolonged outages during onboarding Business Success Indicators:
  • Controlled Onboarding: Successfully onboard 10,000 users in 1,000-user batches
  • Validation Gates: 3-7 day monitoring windows between batches show stable performance
  • Agent Satisfaction: No major complaints about platform stability or performance
  • User Retention: >85% week-over-week active users
  • Revenue Realization: $40k-60k/month ($4-6 PPH × 10k users) The Win: Agents trust the platform enough to keep migrating users, and users prefer WagerBabe over Buckeye because it's faster and easier to use. ### Long-Term Target (1-2 Years): 50,000 Users Technical Success Indicators:
  • Concurrent Capacity: Support 50,000 concurrent users
  • WebSocket Scale: 50k+ simultaneous connections
  • API Performance: <200ms response time under full load
  • Enterprise SLA: 99.9% uptime with <15 minute recovery time (RTO) Business Success Indicators:
  • Market Position: Recognized competitive alternative to Buckeye
  • Revenue: $200k-300k/month ($4-6 PPH × 50k users)
  • Infrastructure Efficiency: <1% of revenue spent on infrastructure ($1,364/mo = 0.5%)
  • Agent Network: Multiple established agent partnerships beyond initial 2 master agents The Win: WagerBabe is the modern platform agents recommend to each other, and infrastructure scales seamlessly as demand grows. ### Business Metrics Growth Tracking:
  • Weekly Active Users (WAU)
  • User onboarding rate (users/week)
  • Agent acquisition rate (new agents/month) Technical Health:
  • System uptime percentage
  • API response time (p50, p95, p99)
  • Error rate (5xx responses)
  • Cache hit rate
  • Database query performance Business Health:
  • Monthly Recurring Revenue (MRR)
  • Revenue per user (actual PPH)
  • Agent satisfaction scores (qualitative)
  • User retention rate (week-over-week)
  • Infrastructure cost as % of revenue --- ## Product Scope ### MVP - Phase 1 Foundation (Weeks 1-2) Core Objective: Scale infrastructure to reliably support 1,000 concurrent users while maintaining all existing features and performance. What Must Work: Existing Features (Must Maintain):
  1. Mobile-First Betting Experience - The competitive advantage vs Buckeye - Fast load times (<3s initial, <1s navigation) - Smooth scrolling sidebar with virtual scrolling for 100+ leagues - Instant betting slip interactions - Real-time odds updates 2. Sports Sidebar & Navigation - Filter to only bettable sports - Prioritize American sports (NFL, NBA, MLB, NHL) - Game counts per league - Fast loading (<100ms cached, <300ms fresh) 3. Real-Time Odds Display - Live odds updates via WebSocket (30s latency for live games) - Tiered caching (live 30s, upcoming 5min, scheduled 30min) - Multiple sportsbook odds comparison 4. Betting Slip Functionality - Add/remove bets smoothly - Calculate parlays correctly - Submit bets reliably (zero data loss) 5. Agent Hub - Critical for agent retention - Customer management (CRUD operations) - Cashier interface (balance management) - Tuesday settlement cycles - Basic reporting 6. Authentication & User Management - JWT-based secure authentication - Agent vs user role separation - Session management New Infrastructure (Phase 1 Deliverables): Week 1: Database & Sidebar Optimization
  • Enhanced sidebar service (filtering, prioritization)
  • Database materialized view for sidebar aggregation
  • Sidebar API enhancements
  • PgBouncer setup (10k connections -> 100 actual DB connections)
  • Virtual scrolling in sidebar component
  • Sidebar TanStack Query hook optimization Week 2: Tiered Caching & API Efficiency
  • Game status classifier (live, upcoming, scheduled)
  • Tiered Redis caching (30s to 30min based on game status)
  • Request batching & deduplication
  • Database query optimization
  • Dynamic TanStack Query configuration
  • Archive old odds_history data (database at 97.4% - URGENT) Phase 1 Success Criteria:
  • Sidebar loads <100ms (cached)
  • API usage <3,000 req/min (50% of free tier limit)
  • Cache hit rate >90%
  • Database queries <50ms (p95)
  • Support 500 concurrent DB connections via PgBouncer
  • All existing features work perfectly at 1,000 users
  • Database storage freed up (archive historical odds) Onboarding Process:
  • Agents create user accounts one-by-one via Agent Hub
  • Simple form: username, password, contact info
  • No bulk import needed yet (deferred to Phase 2+)
  • Validation: Test with 10-100 users -> monitor 3-7 days -> onboard first 1,000 batch -> monitor 7 days ### Growth Features - Phases 2-3 (Months 1-2) Phase 2 (Weeks 3-4): Real-Time & Background Jobs -> 10,000 users
  • WebSocket horizontal scaling with Redis pub/sub
  • Celery background workers for async job processing
  • Priority queues (high: live odds, medium: upcoming, low: scheduled)
  • Distributed tracing (Datadog APM or Jaeger)
  • On-call rotation with PagerDuty
  • Canary deployments for major changes
  • Feature flags for risky features
  • Hourly database snapshots
  • Automated load testing in CI/CD Phase 3 (Month 2): Enterprise-Grade Operations -> 25,000 users
  • CDN for static assets (Cloudflare Pro)
  • Advanced monitoring/APM (Datadog or similar)
  • Read replicas (route reads to replicas, writes to primary)
  • SOC 2 Type II preparation
  • Penetration testing (external firm)
  • Automated rollback on error rate spike
  • Blue/green deployment infrastructure
  • Chaos engineering experiments
  • Multi-region planning ### Vision Features - Phase 4 (Months 3-4) Advanced Scaling -> 50,000 users
  • Event streaming (Kafka/RabbitMQ) for real-time data pipelines
  • CQRS pattern (Command Query Responsibility Segregation)
  • Microservices architecture (if needed - evaluate carefully)
  • Multi-region active-passive deployment
  • Advanced chaos engineering (GameDay exercises)
  • Data sharding strategy (shard by agent_id)
  • Predictive capacity planning models
  • Predictive pre-caching (cache popular games before users request) Explicitly Out of Scope (All Phases):
  • Compliance/regulatory features (age verification, state restrictions) - owner handles separately
  • Payment processing integration (separate initiative)
  • Native mobile apps (PWA sufficient for now)
  • Social features (sharing bets, leaderboards)
  • Live streaming integration
  • In-play betting automation --- ## Full-Stack Web Application Architecture ### API Specification Server Architecture:
  • Framework: FastAPI (Python 3.13)
  • Base URL: http://localhost:8000 (dev), production URL TBD
  • API Documentation: Auto-generated at /docs (Swagger UI)
  • Versioning: /api/v1/ prefix for all endpoints Endpoint Structure: ``` /api/v1/ ├── /auth/ # Authentication endpoints │ ├── POST /login │ ├── POST /signup │ ├── POST /logout │ └── GET /me │ ├── /agent/ # Agent-only endpoints (require agent role) │ ├── /dashboard/ # Agent metrics and overview │ ├── /clients/ # Client management (CRUD) │ ├── /reports/ # Performance reports │ ├── /hub/ # Agent Hub features │ └── /cashier/ # Balance management, settlements │ ├── /user/ # User-only endpoints │ ├── /profile/ # User profile management │ └── /history/ # Bet history │ ├── /dashboard/ # Dashboard data (role-based) ├── /betting/ # Betting operations │ ├── GET /odds # Fetch odds │ ├── POST /bets # Place bets │ └── GET /slip # Betting slip state │ └── /shared/ # Shared resources ├── /sports/ # Sports catalog └── /leagues/ # League information
- **Content-Type:** `application/json`
- **Authentication:** JWT token in `Authorization: Bearer <token>` header
- **Error Format:** Consistent error responses with `error`, `message`, `details`
- **Rate Limiting:** Per-endpoint rate limits (documented in API spec) **Key Performance Requirements:**
- API response time: <200ms (p95)
- Concurrent connections: Support via connection pooling
- Caching: Redis-backed caching for frequently accessed data ### Authentication & Authorization **Authentication Model:**
- **Type:** JWT (JSON Web Tokens) with Supabase Auth integration
- **Token Lifetime:** Access tokens expire after 1 hour, refresh tokens after 7 days
- **Storage:** Tokens stored in httpOnly cookies (client-side)
- **Session Management:** Auto-refresh mechanism for seamless UX **Authorization Roles:**

User Roles: ├── User (default) # End users placing bets ├── Agent # Bookmakers managing customer books └── Admin (future) # Platform administrators

- Role-based access control (RBAC)
- Agent endpoints require `role: agent` in JWT payload
- User endpoints require authenticated user
- Shared endpoints accessible by any authenticated user **Security Requirements:**
- Passwords hashed with bcrypt (min 8 characters)
- JWT signed with secret key (rotated every 90 days)
- Rate limiting on auth endpoints (5 attempts/min for login)
- HTTPS enforced in production
- CORS configured for client domain only ### Platform Requirements **Client Platform:**
- **Framework:** Next.js 15 with React 19
- **Target Browsers:** Chrome, Safari, Firefox, Edge (last 2 versions)
- **Mobile Support:** iOS 14+, Android 10+
- **PWA:** Progressive Web App capabilities for app-like experience **Mobile-First Design:**
- **Primary Platform:** Mobile (68% of traffic)
- **Breakpoints:** - Small mobile: 320px - Large mobile: 375px - Tablet: 768px - Desktop: 1024px+ **Performance Targets (Mobile):**
- First Contentful Paint (FCP): <1.5s on 3G
- Largest Contentful Paint (LCP): <2.5s on 3G
- Time to Interactive (TTI): <3.5s on 3G
- Cumulative Layout Shift (CLS): <0.1
- Lighthouse Mobile Score: >90 **Device Features Used:**
- Touch interactions (tap, swipe, long-press)
- Responsive viewport
- Local storage for caching
- Service worker for offline capability (Phase 2+)
- Push notifications (Phase 3+) **Browser Storage:**
- LocalStorage: User preferences, cached odds
- SessionStorage: Temporary betting slip state
- IndexedDB: Offline data cache (Phase 2+) --- ## User Experience Principles **Design Philosophy:** Modern, fast, mobile-first. The UI should feel like a 2025 app, not a 2010 desktop site. Every interaction should be instant and intuitive - users should never wait or wonder what to do next. **Visual Personality:**
- **Modern & Clean:** Minimalist design with plenty of whitespace, no clutter
- **Professional:** Trust-building through polished, consistent design
- **Fast:** Visual feedback happens instantly (<50ms touch response)
- **Accessible:** High contrast, readable fonts, touch-friendly targets (min 44px) **Core UX Principles:** 1. **Mobile-First Everything** - Design for thumb-reach zones (bottom navigation, key actions within thumb range) - Touch targets minimum 44px × 44px - Swipe gestures for common actions - Optimized for one-handed use 2. **Speed is a Feature** - Instant visual feedback on all interactions - Optimistic UI updates (update UI immediately, sync server later) - Skeleton screens while loading (no spinners) - Perceived performance matters as much as actual performance 3. **Progressive Disclosure** - Show only what's needed at each step - Advanced features hidden until needed - Clear navigation hierarchy (don't bury features) 4. **Error Prevention Over Error Handling** - Disable invalid actions before users try them - Inline validation as users type - Confirmation for destructive actions - Clear, actionable error messages when things go wrong ### Key Interactions **Betting Flow (Primary User Journey):**
1. **Browse Odds:** Sidebar navigation -> select sport -> select league -> view games
2. **Add to Slip:** Tap odds -> bet added to slip with visual confirmation
3. **Review Slip:** Bottom sheet slides up showing all bets, calculated payout
4. **Place Bet:** Single tap to submit -> optimistic UI update -> server confirmation **Agent Dashboard (Secondary User Journey):**
1. **View Overview:** Quick metrics on landing (revenue, active users, pending settlements)
2. **Manage Clients:** Search/filter -> tap client -> view details -> perform actions
3. **Process Settlements:** Tuesday cycle reminder -> review amounts -> process payments
4. **Generate Reports:** Select date range -> choose metrics -> export/view **Navigation Patterns:**
- **Mobile:** Bottom navigation bar (Home, Odds, Slip, Profile)
- **Desktop:** Left sidebar with collapsible sections
- **Search:** Prominent search for finding specific games/leagues
- **Filters:** Slide-out panel for advanced filtering **Visual Feedback:**
- **Tap:** Subtle highlight + haptic feedback (mobile)
- **Success:** Green checkmark + brief message
- **Error:** Red indicator + clear explanation
- **Loading:** Skeleton screens (not spinners)
- **Real-time Updates:** Smooth animations for odds changes **Accessibility:**
- Semantic HTML for screen readers
- Keyboard navigation support
- ARIA labels where needed
- High contrast mode support
- Font scaling support --- ## Functional Requirements _This section defines WHAT capabilities the system must have. Each requirement is implementation-agnostic and testable. These requirements drive all downstream work: UX design, architecture, epic breakdown, and implementation._ ### User Account & Authentication **FR1:** Users can create accounts with email and password
**FR2:** Users can log in securely and maintain sessions across devices
**FR3:** Users can log out and invalidate their session
**FR4:** Users can reset passwords via email verification
**FR5:** System distinguishes between User and Agent roles with appropriate permissions
**FR6:** Sessions auto-refresh transparently to maintain seamless user experience ### Sports Catalog & Navigation **FR7:** Users can browse available sports with active betting markets
**FR8:** System filters to show only sports with bettable games
**FR9:** Users can view leagues within each sport
**FR10:** System prioritizes American sports (NFL, NBA, MLB, NHL) in navigation
**FR11:** Users can see game counts per league for context
**FR12:** System supports virtual scrolling for smooth browsing of 100+ leagues ### Odds Display & Real-Time Updates **FR13:** Users can view current odds for upcoming and live games
**FR14:** System displays odds from multiple sportsbooks for comparison
**FR15:** Users receive real-time odds updates via WebSocket connection
**FR16:** System updates live game odds with 30-second latency
**FR17:** System updates upcoming game odds with 5-minute latency
**FR18:** System updates scheduled game odds with 30-minute latency
**FR19:** Users can see when odds were last updated ### Betting Slip & Wager Placement **FR20:** Users can add selected odds to their betting slip
**FR21:** Users can remove bets from their betting slip
**FR22:** Users can view all bets in their slip with calculated potential payout
**FR23:** System calculates parlay payouts correctly for multiple bets
**FR24:** Users can submit bets with zero data loss guarantee
**FR25:** System provides instant visual confirmation when bets are placed
**FR26:** Users can view their bet history ### Agent Customer Management **FR27:** Agents can create new user accounts for their customers
**FR28:** Agents can view a list of all their customers
**FR29:** Agents can search and filter their customer list
**FR30:** Agents can view detailed information for individual customers
**FR31:** Agents can update customer information
**FR32:** Agents can deactivate customer accounts
**FR33:** Agents can reactivate previously deactivated customers ### Agent Cashier & Financial Operations **FR34:** Agents can view customer account balances
**FR35:** Agents can credit customer accounts
**FR36:** Agents can debit customer accounts
**FR37:** System logs all financial transactions with timestamp and agent ID
**FR38:** Agents can view transaction history for each customer
**FR39:** Agents can view aggregated financial summaries ### Agent Settlement Processing **FR40:** System supports Tuesday settlement cycles
**FR41:** Agents can view pending settlements for the week
**FR42:** Agents can process settlements with payment method tracking
**FR43:** System tracks settlement compliance and completion
**FR44:** Agents receive reminders for upcoming Tuesday settlements ### Agent Reporting & Analytics **FR45:** Agents can view dashboard metrics (revenue, active users, pending settlements)
**FR46:** Agents can generate performance reports for specified date ranges
**FR47:** Agents can view customer betting activity summaries
**FR48:** Agents can export reports in common formats ### User Profile Management **FR49:** Users can view their profile information
**FR50:** Users can update their profile information
**FR51:** Users can manage their account preferences ### Infrastructure & Performance (User-Facing Capabilities) **FR52:** System loads initial page in <3 seconds on mobile
**FR53:** System navigates between pages in <1 second
**FR54:** System caches frequently accessed data for fast retrieval
**FR55:** System works reliably on mobile devices (iOS 14+, Android 10+)
**FR56:** System works reliably on desktop browsers (Chrome, Safari, Firefox, Edge)
**FR57:** System provides offline capability for viewing cached odds (Phase 2+) ### System Administration & Monitoring (Internal Capabilities) **FR58:** System maintains structured logs with trace IDs for debugging
**FR59:** System tracks key performance metrics in real-time dashboards
**FR60:** System sends alerts when performance degrades or errors spike
**FR61:** System supports zero-downtime deployments
**FR62:** System supports rollback to previous version within 5 minutes ### Data Management **FR63:** System archives historical odds data older than 7 days
**FR64:** System automatically deletes archived odds from primary database
**FR65:** System maintains bet data indefinitely for settlement purposes
**FR66:** System backs up database data with point-in-time recovery capability ### Scaling & Load Management (Phase 1) **FR67:** System supports 1,000 concurrent users with stable performance
**FR68:** System pools database connections to handle 10,000 client connections via 100 actual DB connections
**FR69:** System caches sidebar data with materialized views for <100ms load times
**FR70:** System classifies games by status (live, upcoming, scheduled) for tiered caching
**FR71:** System batches and deduplicates API requests to external odds providers ### Advanced Scaling (Phase 2-4) **FR72:** System supports horizontal scaling of WebSocket connections via Redis pub/sub (Phase 2)
**FR73:** System processes background jobs asynchronously with priority queues (Phase 2)
**FR74:** System distributes read queries to read replicas to reduce primary database load (Phase 3)
**FR75:** System serves static assets via CDN for reduced server load (Phase 3)
**FR76:** System supports multi-region deployment for geographic redundancy (Phase 4)
**FR77:** System shards database by agent_id for horizontal scaling beyond 50k users (Phase 4) ### Operational Excellence (Phase 2-4) **FR78:** System provides distributed tracing to track requests across services (Phase 2)
**FR79:** System supports feature flags for gradual rollout of new features (Phase 2)
**FR80:** System supports canary deployments for risk mitigation (Phase 2)
**FR81:** System automatically rolls back deployments when error rate spikes (Phase 3)
**FR82:** System supports blue/green deployments for zero-downtime releases (Phase 3) --- ## Non-Functional Requirements ### Performance **Mobile Performance (Primary Platform):**
- **NFR-P1:** First Contentful Paint (FCP) <1.5s on 3G networks
- **NFR-P2:** Largest Contentful Paint (LCP) <2.5s on 3G networks
- **NFR-P3:** Time to Interactive (TTI) <3.5s on 3G networks
- **NFR-P4:** Cumulative Layout Shift (CLS) <0.1
- **NFR-P5:** Lighthouse Mobile Score >90 **API Performance:**
- **NFR-P6:** 95% of API requests complete in <200ms
- **NFR-P7:** 99% of API requests complete in <500ms
- **NFR-P8:** Database queries execute in <50ms at p95
- **NFR-P9:** Cache hit rate maintained above 90% **UI Responsiveness:**
- **NFR-P10:** Touch interactions respond in <50ms
- **NFR-P11:** Page navigation completes in <1 second
- **NFR-P12:** Betting slip updates render in <100ms **Concurrent Capacity:**
- **NFR-P13:** Phase 1 supports 1,000 concurrent users
- **NFR-P14:** Phase 2 supports 10,000 concurrent users
- **NFR-P15:** Phase 3 supports 25,000 concurrent users
- **NFR-P16:** Phase 4 supports 50,000 concurrent users ### Security **Authentication & Authorization:**
- **NFR-S1:** All passwords hashed with bcrypt (cost factor ≥12)
- **NFR-S2:** JWT tokens signed with RS256 algorithm
- **NFR-S3:** Access tokens expire after 1 hour maximum
- **NFR-S4:** Refresh tokens expire after 7 days maximum
- **NFR-S5:** Failed login attempts rate-limited to 5 per minute per IP **Data Protection:**
- **NFR-S6:** All production traffic uses HTTPS (TLS 1.2+)
- **NFR-S7:** Sensitive data (passwords, tokens) never logged
- **NFR-S8:** Financial transactions logged with tamper-evident audit trail
- **NFR-S9:** Database connections encrypted in transit **API Security:**
- **NFR-S10:** API endpoints protected by rate limiting (100 req/min per user)
- **NFR-S11:** Agent endpoints validate JWT and agent role on every request
- **NFR-S12:** CORS configured to allow requests only from approved domains **Dependency Security:**
- **NFR-S13:** Automated dependency scanning runs weekly (Snyk/Dependabot)
- **NFR-S14:** Critical vulnerabilities patched within 7 days
- **NFR-S15:** High vulnerabilities patched within 30 days ### Scalability **Horizontal Scaling:**
- **NFR-SC1:** Application servers scale horizontally (add instances as needed)
- **NFR-SC2:** WebSocket servers scale horizontally via Redis pub/sub (Phase 2)
- **NFR-SC3:** Background workers scale horizontally (add Celery workers) (Phase 2) **Database Scaling:**
- **NFR-SC4:** Database connection pooling via PgBouncer (10k client -> 100 DB connections)
- **NFR-SC5:** Read queries distributed to read replicas (Phase 3)
- **NFR-SC6:** Write queries to primary database only **Caching Strategy:**
- **NFR-SC7:** Tiered caching based on data freshness requirements (30s to 30min TTL)
- **NFR-SC8:** Redis cluster for high-availability caching (Phase 3)
- **NFR-SC9:** CDN caching for static assets (Phase 3) **Resource Efficiency:**
- **NFR-SC10:** Infrastructure costs remain <1% of revenue at all scales
- **NFR-SC11:** API quota usage stays within provider limits via aggressive caching ### Reliability & Availability **Uptime Targets:**
- **NFR-R1:** Phase 1: 99.9% uptime per month (≤43 minutes downtime)
- **NFR-R2:** Phase 2+: 99.95% uptime per month (≤21 minutes downtime)
- **NFR-R3:** Zero data loss on bet submissions (synchronous writes with transaction logs) **Recovery Objectives:**
- **NFR-R4:** Recovery Time Objective (RTO): <2 hours for Phase 1, <30 minutes for Phase 2+
- **NFR-R5:** Recovery Point Objective (RPO): Zero data loss for bets, <5 minutes for user data
- **NFR-R6:** Database backups: Daily (Phase 1), Hourly (Phase 2+) **Error Handling:**
- **NFR-R7:** API error rate <2% of total requests
- **NFR-R8:** Failed requests return meaningful error messages with actionable guidance
- **NFR-R9:** System degrades gracefully under load (slow responses better than crashes) **Monitoring & Alerting:**
- **NFR-R10:** Critical alerts (service down, error rate >5%) page on-call within 1 minute
- **NFR-R11:** High alerts (error rate >2%, latency >1s) notify via Slack/email within 5 minutes
- **NFR-R12:** All alerts include runbook link with remediation steps ### Deployment & Release **Deployment Safety:**
- **NFR-D1:** Zero-downtime deployments (users never see maintenance pages)
- **NFR-D2:** Rollback to previous version completes in <5 minutes
- **NFR-D3:** Database migrations run while system is live (no downtime required) **Release Process:**
- **NFR-D4:** Phase 1: Manual deployments with validation checklist
- **NFR-D5:** Phase 2: Canary releases (5% -> 25% -> 50% -> 100% over 60 minutes)
- **NFR-D6:** Phase 3: Automated rollback when error rate exceeds threshold **Feature Flags:**
- **NFR-D7:** Risky features deployed behind feature flags (Phase 2)
- **NFR-D8:** Features can be disabled instantly without deployment ### Data Management **Retention & Archival:**
- **NFR-DM1:** Bet data retained indefinitely (regulatory and settlement requirements)
- **NFR-DM2:** Odds data retained 7 days in hot storage, archived to cold storage thereafter
- **NFR-DM3:** User activity logs retained 30 days in hot storage, 90 days archived
- **NFR-DM4:** Financial audit logs retained 1 year minimum **Backup & Recovery:**
- **NFR-DM5:** Database backups tested monthly via restore drill
- **NFR-DM6:** Backup retention: 30 days hot, 90 days archived
- **NFR-DM7:** Point-in-time recovery capability for last 7 days (Phase 2+) ### Operational Excellence **Observability:**
- **NFR-O1:** Structured logging with trace IDs for request correlation
- **NFR-O2:** Real-time dashboards update every 1 minute
- **NFR-O3:** Distributed tracing tracks requests across all services (Phase 2) **Load Testing:**
- **NFR-O4:** Automated load tests run weekly at 1.5x current peak load
- **NFR-O5:** Pre-deployment load tests at 2x target capacity for each phase
- **NFR-O6:** Load test scenarios: normal, peak (2x), spike (5x), sustained (7 days) **Disaster Recovery:**
- **NFR-O7:** Quarterly disaster recovery drills (database restore, full service recovery)
- **NFR-O8:** Runbooks documented and tested for all common failure scenarios
- **NFR-O9:** On-call engineer available 24/7 with <15 minute response time (Phase 2) **Chaos Engineering (Phase 3+):**
- **NFR-O10:** Quarterly GameDay exercises (intentional production failures)
- **NFR-O11:** Chaos experiments validate resilience (kill random instance, delay DB queries, etc.) --- ## Implementation Planning This PRD contains **82 Functional Requirements** and **80 Non-Functional Requirements** that must be decomposed into implementable epics and stories. **Requirements must be broken down because:**
- Claude Code has a 200k token context limit
- Epic breakdown creates bite-sized, testable stories
- Stories enable parallel development and progress tracking
- Clear acceptance criteria emerge from requirement decomposition **Next Step:** Run the epic breakdown workflow to transform these requirements into actionable development tasks organized by phase. --- ## References **Supporting Documentation:**
- **Product Brief:** [product-brief-wagerbabe-2025-01-13.md](product-brief-wagerbabe-2025-01-13.md)
- **Architecture Document:** [architecture.md](architecture.md) - Complete tech stack, scaling decisions, implementation patterns NEW
- **PRD Validation Report:** [PRD-VALIDATION-REPORT.md](PRD-VALIDATION-REPORT.md) - Current status vs requirements NEW
- **Scaling Roadmap:** [docs/scaling/SCALING_ROADMAP.md](scaling/SCALING_ROADMAP.md)
- **Current State Analysis:** [docs/scaling/CURRENT_STATE.md](scaling/CURRENT_STATE.md)
- **Cost Analysis:** [docs/scaling/COST_ANALYSIS.md](scaling/COST_ANALYSIS.md)
- **Architecture Decisions:** [docs/scaling/ARCHITECTURE_DECISIONS.md](scaling/ARCHITECTURE_DECISIONS.md) **Key Findings from Analysis:**
- Infrastructure costs scale sub-linearly (per-user cost drops 73% from 100 to 50k users)
- **Current capacity:** 10,000-15,000 concurrent users (20x improvement achieved), database at 97.4% full - **Original baseline:** ~50 concurrent users (pre-Phase 1 Weeks 1-2)
- Target capacity: 50,000 concurrent users
- ROI: 146-220x at full scale
- Managed services (Railway + Supabase) preferred for developer productivity --- ## Next Steps **Immediate Actions:** 1. **Epic & Story Breakdown** (REQUIRED) - Run: `workflow create-epics-and-stories` - Transform 162 requirements into implementable stories - Organize by 4-phase roadmap (Phase 1-4) - Create acceptance criteria for each story 2. **Architecture Document** COMPLETE - See: [architecture.md](architecture.md) - Documents current tech stack and scaling decisions - Defines implementation patterns for AI agent consistency - References Phase 1-4 scaling roadmap 3. **UX Design** COMPLETE - See: [ux-design-specification.md](ux-design-specification.md) - Mobile-first user flows documented - Interaction patterns defined (betting slip, sidebar, agent hub) - Visual mockups: [ux-color-themes.html](ux-color-themes.html), [ux-design-directions.html](ux-design-directions.html) 4. **Implementation** (After Epic Breakdown) - Follow 4-phase roadmap - Validate each phase with 1,000-user batch testing - Monitor SLOs and error budgets - Maintain 99.9% uptime during rollout --- _This PRD captures the essence of **WagerBabe Scaling Initiative** - transforming technical capacity from a bottleneck into a competitive advantage by enabling agents to migrate users from legacy platforms (Buckeye) to a modern, mobile-first betting experience with enterprise-grade operational excellence._ _Created through collaborative discovery between babe and AI Product Manager._ **Document Status:** Requirements Complete | 80% Implemented
**Total Requirements:** 162 (82 Functional + 80 Non-Functional)
**Current Phase:** Phase 1 Week 3-4 (Database archival + optimization)
**Next Milestone:** Phase 1 sign-off before agent onboarding ---