# Folderbot Architecture

## Overview

Folderbot is a Telegram bot that gives users an LLM-powered assistant with access to their personal folder. The architecture follows a layered design:

```mermaid
flowchart LR
    TG[Telegram] <-->|messages| TB[TelegramBot]
    TB <-->|chat loop| LLC[LLMClient]
    LLC <-->|tool dispatch| FT[FolderTools]
    LLC -.-|structured extraction| I[instructor]
```

## Request Lifecycle

The full lifecycle of a user message, from Telegram to response:

```mermaid
sequenceDiagram
    participant U as User
    participant TG as Telegram
    participant TH as TelegramHandler
    participant SN as StatusNotifier
    participant SM as SessionManager
    participant LC as LLMClient
    participant I as instructor
    participant FT as FolderTools

    U->>TG: Send message
    TG->>TH: handle_message()

    Note over TH: Accumulate pending messages<br/>Cancel in-flight task if any

    TH->>TH: _start_processing()
    TH->>SN: start() → typing indicator

    TH->>SM: get_history(user_id)
    SM-->>TH: conversation history

    TH->>LC: chat(message, context, history)

    loop Agent Loop (max 10 iterations)
        LC->>I: create_with_completion()<br/>response_model=AgentResponse
        I-->>LC: AgentResponse

        alt Has answer, no tool calls
            LC->>LC: Hallucination guard check
            LC-->>TH: (answer, tools_used, topic, usage)
        else Has tool calls
            loop For each tool call
                LC->>SN: update(tool_name)
                SN->>TG: Edit status message

                alt ask_user tool
                    LC->>TH: on_ask_user callback
                    TH->>TG: Show interactive UI
                    U->>TG: Tap button / send text
                    TG->>TH: Resolve Future
                    TH-->>LC: User answer
                else Regular tool
                    LC->>FT: execute_async(name, args)
                    FT-->>LC: ToolResult
                end
            end
            Note over LC: Append results to<br/>gathered_context, loop
        end
    end

    TH->>SM: save_message (user + assistant)
    TH->>SM: record_token_usage
    TH->>SN: stop() → delete status
    TH->>TG: Reply with response
    TG->>U: Display answer
```

### Message Accumulation and Cancellation

When a user sends multiple messages quickly, Folderbot accumulates them instead of processing each one independently:

```mermaid
flowchart TD
    M1[Message 1 arrives] --> P1[Add to pending_messages]
    P1 --> T1[Start processing task]

    M2[Message 2 arrives<br/>while processing] --> C[Cancel current task]
    C --> R[Restore in-progress<br/>messages to pending]
    M2 --> P2[Add to pending_messages]
    R --> P2
    P2 --> T2[Start new task with<br/>all accumulated messages]

    T2 --> J[Messages joined with newline<br/>sent as single LLM request]

    style C fill:#c44,stroke:#333,color:#fff
    style J fill:#4a9,stroke:#333,color:#fff
```

## Core Components

### Telegram Handler (`telegram_handler.py`)

The `TelegramBot` class manages:
- **Message handling**: Accumulates user messages, cancels in-flight requests on new input
- **Command handlers**: `/start`, `/clear`, `/new`, `/status`, `/files`, `/tasks`
- **Document uploads**: Stores files and makes them available as tools
- **Photo handling**: Downloads photos, saves to uploads, encodes as base64, and passes to LLM as multimodal image blocks for vision analysis
- **ask_user UI**: Renders interactive Telegram widgets (inline keyboards, location pickers) when the LLM needs user input
- **Scheduler integration**: Sends messages from background tasks
- **File watcher**: Notifies users of file changes

### LLM Client (`llm_client.py`)

The `LLMClient` is **backend-agnostic** using the `instructor` package:
- Supports any LLM provider via `instructor.from_provider("provider/model")`
- Uses **structured extraction** (`AgentResponse` model) instead of native tool_use
- Tools are described in the system prompt text (instructor occupies the tools parameter)
- **Multimodal support**: Photo messages are encoded as base64 image blocks in the user message. Image format is provider-aware (`_format_image_block`): Anthropic uses native `image` blocks with `base64` source, OpenAI uses `image_url` with data URIs. The provider is detected from the model string prefix (e.g. `anthropic/...`).

### Agent Loop

The core loop in `LLMClient.chat()`:

```mermaid
flowchart TD
    A[Build messages:<br/>history + user message<br/>+ gathered tool results] --> B[Call LLM via instructor.create<br/>response_model=AgentResponse]
    B --> C{Answer provided?}
    C -->|yes| D[Return answer]
    C -->|no, tool calls| E{ask_user?}
    E -->|yes| F[Pause loop,<br/>wait for user via callback]
    F --> G[Append result to<br/>gathered_context]
    E -->|no| H[Dispatch to FolderTools]
    H --> G
    G --> A

    style D fill:#4a9,stroke:#333,color:#fff
```

### Structured Response Models

```python
class ToolCallRequest(BaseModel, frozen=True):
    name: str          # Tool name
    arguments: dict    # Tool arguments

class AgentResponse(BaseModel, frozen=True):
    tool_calls: list[ToolCallRequest]  # Tools to execute
    answer: str | None                  # Final answer (when done)
    topic: str                          # Conversation topic label

class AskUserRequest(BaseModel, frozen=True):
    question: str      # Question to display
    options: list[str]  # Button labels
    input_type: str    # choice | confirm | text | location
```

## Tool System

### Registration

Tools are registered via the `@folder_bot.tool()` decorator with typed Pydantic request/response models:

```python
@folder_bot.tool(name="read_file", request_type=ReadFileRequest, response_type=ReadFileResponse)
async def read_file(request, context):
    ...
```

### Tool Categories

| Category | Tools |
|----------|-------|
| File operations | `list_files`, `read_file`, `read_files`, `search_files`, `write_file` |
| Web | `web_search` (Google Custom Search API), `web_fetch`, `get_weather` |
| Scheduler | `schedule_task`, `list_tasks`, `cancel_task`, `get_task_results` |
| Uploads | `list_uploads`, `delete_upload`, `send_upload` |
| Visualization | `plot_chart` (matplotlib, sends PNG to user) |
| Calendar | `calendar_add`, `calendar_list`, `calendar_upcoming`, `calendar_update`, `calendar_delete` |
| Todo | `todo_add`, `todo_list`, `todo_update`, `todo_remove` |
| Topics | `list_topics`, `get_full_history`, `reorganize_topics` |
| Stats | `get_token_usage`, `token_stats`, `read_activity_log` |
| Notifications | `enable_file_notifications`, `disable_file_notifications`, `get_file_notification_status` |
| Utilities | `send_message`, `get_time`, `compare_numbers`, `shuffle_list`, `sort_list`, `random_choice`, `random_number` |
| Interactive | `ask_user` (handled in agent loop, not FolderTools) |

### Tool Configuration

Tools can have their own configuration via `[tools.<name>]` sections in `config.toml`:

```toml
[tools.web_search]
google_api_key = "..."
google_cx = "..."
```

Tools access their config via `get_tool_config(context, "tool_name")` which returns the tool's config dict. Custom tools receive the full `tools_config` dict in their constructor.

### Services Pattern

Tools receive dependencies through `BotContext.services`:
- `FolderServices`: Root path, config, path validation, `get_tool_config()`
- `SchedulerServices`: Task creation and management
- `UploadServices`: File upload storage and retrieval

```mermaid
classDiagram
    class BotContext {
        +services
        +user_id
    }
    class FolderServices {
        +root_path
        +config
        +validate_path()
    }
    class SchedulerServices {
        +create_task()
        +cancel_task()
        +list_tasks()
    }
    class UploadServices {
        +uploads_dir: Path
        +send_document(chat_id, path, filename)
        +chat_id: int
        +session_manager
    }

    BotContext --> FolderServices
    BotContext --> SchedulerServices
    BotContext --> UploadServices
```

## ask_user: Interactive User Input

The `ask_user` tool enables the LLM to pause its agent loop and request interactive input from the user via native Telegram UI.

### Flow

```mermaid
sequenceDiagram
    participant LLM as LLMClient
    participant CB as on_ask_user callback
    participant TB as TelegramBot
    participant TG as Telegram
    participant U as User

    LLM->>LLM: AgentResponse with<br/>tool_call name="ask_user"
    LLM->>CB: on_ask_user(AskUserRequest)
    CB->>TB: _handle_ask_user()
    TB->>TB: Create asyncio.Future
    TB->>TG: Send UI (keyboard / text prompt)
    TG->>U: Display interactive widget

    U->>TG: Tap button / send text / share location
    TG->>TB: CallbackQuery / Message
    TB->>TB: Resolve Future with answer

    TB-->>CB: Return answer string
    CB-->>LLM: Answer added to gathered_context
    LLM->>LLM: Continue agent loop
```

### Input Types

| Type | Telegram UI | Resolution |
|------|------------|------------|
| `choice` | Inline keyboard (one button per option) | CallbackQueryHandler |
| `confirm` | Inline keyboard (Yes/No row) | CallbackQueryHandler |
| `text` | Plain text question | Next text message intercepted |
| `location` | Reply keyboard with location button | Location MessageHandler |

### Key Design Decisions

- **`ask_user` is NOT a registered FolderBot tool** — it's handled specially in the agent loop because it requires async user interaction
- **`asyncio.Future`** for pause/resume — the agent loop awaits a Future that Telegram handlers resolve
- **Index-based callback data** (`ask:user_id:index`) avoids Telegram's 64-byte limit
- **120-second timeout** prevents the agent loop from hanging indefinitely
- **Backend-agnostic** — the LLM client knows nothing about Telegram; the callback is injected by the handler

## Session Management

- **SQLite-backed** via `SessionManager`
- Stores conversation history per user (role, content, timestamp, topic)
- Tracks version notifications, file notification preferences, uploads
- Records token usage per LLM call (input/output tokens, model, topic)

```mermaid
erDiagram
    USER ||--o{ CONVERSATION_HISTORY : has
    USER ||--o{ UPLOAD : stores
    USER {
        int user_id
        bool file_notifications
        string last_version_notified
    }
    CONVERSATION_HISTORY {
        int user_id
        string role
        string content
        datetime timestamp
        string topic
    }
    UPLOAD {
        int user_id
        string filename
        blob data
    }
```

## Topic-Based Conversation Management

Each message is tagged with a **topic** label (e.g. "weather", "recipes", "project planning") assigned by the LLM via the `AgentResponse.topic` field. Topics enable multi-threaded conversations:

- **Topic-aware history**: `build_topic_history()` always includes the last 4 messages for immediate context, then backfills the remaining character budget with same-topic messages from older history
- **`list_topics` tool**: Lets the user ask "what conversations am I having?" — returns topic names, message counts, and last activity
- **Backward compatible**: Old messages without a topic field default to `"general"`

```mermaid
flowchart LR
    H[Full History] --> R[Last 4 messages<br/>recency window]
    H --> B[Older same-topic<br/>messages backfill]
    R --> M[Merged history<br/>sent to LLM]
    B --> M
```

## Voice Transcription

Voice messages and audio files are transcribed **locally** at the Telegram handler layer using **faster-whisper** (CTranslate2). The LLM receives plain text — it doesn't need to know the input was audio.

- No API key required — runs entirely on-device via CTranslate2 (up to 4x faster than openai-whisper)
- Pre-built wheels with GPU support (CUDA) — `pip install` just works, no build flags needed
- Model configurable via `whisper_model` config key (default: `"base"`)
- Handles both `filters.VOICE` (voice messages) and `filters.AUDIO` (audio files)
- Transcription runs in a thread (`asyncio.to_thread`) to avoid blocking the event loop
- Models are auto-downloaded from Hugging Face Hub and cached after first load

```mermaid
sequenceDiagram
    participant U as User
    participant TG as Telegram
    participant TB as TelegramBot
    participant W as faster-whisper (local)
    participant LP as Message Pipeline

    U->>TG: Send voice message / audio file
    TG->>TB: handle_voice()
    TB->>TG: Download audio bytes
    TB->>W: transcribe_audio(bytes, model_name)
    Note over W: Writes to temp file,<br/>runs model.transcribe(),<br/>joins segments
    W-->>TB: TranscriptionResult(text)
    TB->>LP: Add text to pending_messages
    TB->>LP: _start_processing()
    LP->>LP: Normal LLM chat flow
```

## Self-Update Mechanism

The bot can automatically check PyPI for newer versions and upgrade itself:

- **`folderbot update`** CLI command: checks PyPI JSON API, runs `pip install --upgrade`, restarts the systemd service if running
- **Systemd timer**: `folderbot-update.timer` runs `folderbot update` every 5 minutes
- Installed/managed alongside the main service via `folderbot service install/enable/start`

```mermaid
flowchart LR
    T[systemd timer<br/>every 5min] --> U[folderbot update]
    U --> P{PyPI newer?}
    P -->|no| D[Done]
    P -->|yes| I[pip install --upgrade]
    I --> R[systemctl restart folderbot]
```

## Todo Management

Markdown-backed task tracking via `TodoStore`. Todos are stored in a human-readable `.md` file (default: `.folderbot/todos.md`), editable with any text editor. Atomic writes via `os.replace` prevent corruption.

```python
@dataclass(frozen=True)
class TodoItem:
    id: int
    user_id: int
    title: str
    description: str
    status: str          # todo | in_progress | done
    effort: str          # tiny | small | medium | large | epic
    tags: list[str]
    created_at: str
    updated_at: str
    completed_at: str | None
```

### Markdown Format

Uses GFM checkboxes with todo.txt conventions for tags (`+tag`) and `key:value` metadata:

```markdown
# Todos

- [ ] Buy groceries +shopping +errands
  effort: small
  Milk and eggs.
  <!-- id:1 user:42 created:2026-02-19T10:00:00 updated:2026-02-19T10:00:00 -->

- [x] Write report +work
  effort: large
  Quarterly report.
  <!-- id:2 user:42 created:2026-02-18T08:00:00 updated:2026-02-19T12:00:00 completed:2026-02-19T12:00:00 -->
```

- `- [S] Title +tags` — status chars: ` ` = todo, `~` = in_progress, `x` = done
- Tags use `+tagname` convention (todo.txt style), inline on the task line
- `effort: level` indented below (omitted when "medium", the default)
- Description as indented free text after effort line
- System metadata (id, user, timestamps) in an HTML comment — hidden in rendered views
- IDs computed from `max(ids) + 1` (no separate tracker needed)
- Items ordered by `created_at`

### Filtering

The `todo_list` tool supports filtering by status, max effort level, tag, and text search. Completed tasks are hidden by default. The effort filter enables queries like "what can I do in 30 minutes?" (`max_effort="small"` returns `tiny` + `small` tasks).

## Calendar

SQLite-backed event storage via `CalendarStore`. Supports adding, listing, updating, and deleting events. The `calendar_upcoming` tool returns events within a configurable time window, useful for "what's coming up this week?" queries.

## Token Usage Tracking

Every LLM call records input and output token counts in a `token_usage` SQLite table. The `get_token_usage` tool lets users query their consumption by period (today, week, month).

- `LLMClient.chat()` uses `create_with_completion()` to get raw completion metadata
- Token counts are accumulated across all agent loop iterations
- `TelegramBot._process_message()` calls `session_manager.record_token_usage()` after each chat
- Records are scoped per user, model, and topic

## Multi-Bot Service Support

The CLI supports running multiple bot instances as separate systemd services:

- `folderbot service install --bot notes` creates `folderbot-notes.service` with `ExecStart=folderbot run --bot notes`
- All service commands (`enable`, `start`, `stop`, etc.) accept `--bot NAME`
- The update timer remains shared across all bot instances
- Config uses the existing `bots` TOML section for per-bot overrides

## Hallucination Guard

A heuristic check (`_claims_tool_use()`) detects when the LLM claims to have performed an action (e.g., "I've updated your file") without actually calling any tools. When triggered, a system warning is injected and the LLM retries.

```mermaid
flowchart TD
    A[LLM returns answer] --> B{_claims_tool_use?}
    B -->|no| C[Accept answer]
    B -->|yes| D{Any tools actually called?}
    D -->|yes| C
    D -->|no| E[Inject warning into context]
    E --> F[Retry LLM call]

    style C fill:#4a9,stroke:#333,color:#fff
    style E fill:#c44,stroke:#333,color:#fff
```

## Roadmap

### Sandboxed Python Execution (`run_python` tool)

Allow the LLM to write and execute arbitrary Python code in a Docker container for tasks that require computation (e.g. generating a Brownian motion path, numerical simulations, data transformations).

**Design:**
- New `run_python` tool that accepts Python code and optional pip requirements
- Executes in a Docker container with strict isolation: no network, no volume mounts, read-only root filesystem, memory/CPU limits
- A designated output directory inside the container is mapped to a temp dir on host
- After execution: captures stdout/stderr, sends any generated files (images, CSVs) back to the user via Telegram
- Timeout to prevent runaway processes

```mermaid
sequenceDiagram
    participant LLM as LLMClient
    participant FT as FolderTools
    participant D as Docker Container
    participant TG as Telegram

    LLM->>FT: run_python(code, requirements)
    FT->>D: Create container<br/>(no network, mem limit)
    Note over D: pip install requirements<br/>Execute code<br/>Write files to /output
    D-->>FT: stdout + /output files
    FT->>TG: send_document(files)
    FT-->>LLM: ToolResult(stdout)
```

### LLM-Powered Todo Extraction with Section-Level Cache

Extract todos from **any** markdown file in the folder tree using LLM-based parsing, not just the structured `todos.md`. A SQLite cache layer avoids redundant LLM calls via per-section content hashing.

**Architecture:**

```
Markdown files (source of truth)
    │
    ├── split by headings (deterministic, cheap)
    │
    ├── per-section hash → compare with cache
    │       │
    │       ├── hash match → use cached extraction (free)
    │       │
    │       └── hash mismatch → diff old vs new content
    │               │
    │               ├── trivial change (status flip) → update programmatically
    │               │
    │               └── ambiguous change → targeted LLM call with old+new content
    │
    └── SQLite cache table:
            (file_path, section_index, content_hash, raw_content, extracted_json)
```

**Key design decisions:**
- **Section-level granularity**: hash and cache each heading-delimited section independently, so editing one todo in a 200-item file only re-processes that section
- **Store raw content**: enables diffing old vs new to detect the nature of changes (status transitions, title edits, etc.) without LLM
- **LLM as fallback**: simple/structured changes handled programmatically; LLM only called for ambiguous edits in unstructured files
- **File discovery**: scan folder tree using existing ReadRules include/exclude patterns
- **Write routing**: bot-created todos go to central `todos.md`; LLM identifies the right project file based on context
- **Rich extraction schema**: title, description, status, effort, deadline, priority, progress, time estimates, dependencies

### Lightweight Voice Transcription

Replace `faster-whisper` (CTranslate2 + PyTorch, ~2GB) with `pywhispercpp` (whisper.cpp/GGML, ~4MB). Same transcription quality, drastically smaller install. Native Apple Silicon support via CoreML.

### Homebrew Formula

Provide a Homebrew tap for macOS users: `brew install folderbot`. Would handle Python/venv setup and launchd service configuration (macOS equivalent of the current systemd integration).