# Corpus API Specification

Base URL: `/api/corpus`

## Overview

Read-only API for navigating the podcast corpus hierarchy. Designed for AI agents and applications to explore podcast content efficiently.

**Rate Limit:** 200 requests/minute per IP

---

## Endpoints

### Stats & Documentation

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/stats` | Corpus-wide statistics |
| GET | `/spec` | This API specification (markdown) |

### Feeds (Podcasts)

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/feeds` | List all feeds |
| GET | `/feeds/:feedId` | Get single feed |
| GET | `/feeds/:feedId/stats` | Get feed statistics (depth check) |
| GET | `/feeds/:feedId/episodes` | List episodes for feed |

### Episodes

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/episodes/:guid` | Get single episode |
| GET | `/episodes/:guid/chapters` | List chapters for episode |

### Topics

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/topics` | Aggregated topics from chapter keywords |

### People (Creators & Guests)

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/people` | List/search people (creators + guests) |
| POST | `/people/episodes` | Get episodes featuring a person |

---

## Common Query Parameters

| Param | Type | Default | Max | Description |
|-------|------|---------|-----|-------------|
| `limit` | number | 50 | 200 | Results per page |
| `page` | number | 1 | - | Page number |

---

## Endpoint Details

### GET /stats

Returns corpus-wide statistics.

**Response:**
```json
{
  "feeds": { "total": 10 },
  "episodes": { "total": 5000 },
  "chapters": { "total": 50000 },
  "paragraphs": { "total": 500000 },
  "people": { "creators": 10, "guests": 500, "total": 510 },
  "topics": { "total": 2000 }
}
```

---

### GET /feeds/:feedId/stats

Get statistics for a specific feed. Useful for agents to assess feed depth before searching.

**Response:**
```json
{
  "feedId": "1015378",
  "title": "What Bitcoin Did",
  "episodeCount": 824,
  "chapterCount": 3200,
  "paragraphCount": 45000,
  "dateRange": { "earliest": "2018-11-01", "latest": "2026-02-06" },
  "generatedAt": "2026-02-09T..."
}
```

---

### GET /feeds

List all podcast feeds.

**Query params:**
- `limit` (default: 50)
- `page` (default: 1)

**Response:**
```json
{
  "data": [
    {
      "feedId": "123",
      "title": "Lex Fridman Podcast",
      "author": "Lex Fridman",
      "description": "...",
      "episodeCount": 481,
      "imageUrl": "https://..."
    }
  ],
  "pagination": { "page": 1, "totalPages": 1, "totalCount": 10, "limit": 50, "hasMore": false }
}
```

---

### GET /feeds/:feedId/episodes

List episodes for a specific feed.

**Query params:**
- `limit` (default: 50)
- `page` (default: 1)
- `sort`: `newest` (default) or `oldest`
- `minDate`: ISO date string (optional)
- `maxDate`: ISO date string (optional)

---

### GET /topics

Aggregated topics from chapter keywords.

**Query params:**
- `feedId`: Filter by feed (optional)
- `limit` (default: 50)
- `page` (default: 1)

**Response:**
```json
{
  "data": [
    {
      "keyword": "artificial intelligence",
      "count": 150,
      "feeds": [{ "feedId": "123", "title": "Lex Fridman Podcast" }],
      "sampleEpisodes": [{ "guid": "...", "title": "..." }]
    }
  ],
  "pagination": { ... }
}
```

---

### GET /people

List/search people (creators and guests).

**Query params:**
- `guestsOnly`: boolean (default: false) - Exclude hosts/creators
- `search`: string - Partial name match (case-insensitive)
- `feedId`: string - Filter to specific podcast
- `limit` (default: 50)
- `page` (default: 1)

**Response:**
```json
{
  "data": [
    {
      "name": "Elon Musk",
      "role": "guest",
      "appearances": 3,
      "feeds": [{ "feedId": "123", "title": "Lex Fridman Podcast" }],
      "recentEpisodes": [{ "guid": "...", "title": "...", "publishedDate": "..." }]
    }
  ],
  "pagination": { ... }
}
```

---

### POST /people/episodes

Get episodes featuring a specific person.

**Request body:**
```json
{
  "name": "Elon Musk",
  "guestsOnly": true,
  "feedId": "123",
  "limit": 50,
  "page": 1
}
```

**Response:**
```json
{
  "data": [
    {
      "guid": "abc123",
      "title": "#252 – Elon Musk: SpaceX, Mars...",
      "feedId": "123",
      "feedTitle": "Lex Fridman Podcast",
      "publishedDate": "2024-03-15",
      "role": "guest",
      "imageUrl": "https://..."
    }
  ],
  "pagination": { ... },
  "query": { "name": "Elon Musk", "guestsOnly": true }
}
```

---

## Error Responses

All errors follow this format:
```json
{
  "error": "Error type",
  "message": "Human-readable message",
  "details": "Technical details (when available)"
}
```

Common HTTP status codes:
- `400` - Bad request (invalid parameters)
- `404` - Resource not found
- `429` - Rate limit exceeded
- `500` - Server error
