# Folderbot Architecture ## Overview Folderbot is a Telegram bot that gives users an LLM-powered assistant with access to their personal folder. The architecture follows a layered design: ```mermaid flowchart LR TG[Telegram] <-->|messages| TB[TelegramBot] TB <-->|chat loop| LLC[LLMClient] LLC <-->|tool dispatch| FT[FolderTools] LLC -.-|structured extraction| I[instructor] ``` ## Request Lifecycle The full lifecycle of a user message, from Telegram to response: ```mermaid sequenceDiagram participant U as User participant TG as Telegram participant TH as TelegramHandler participant SN as StatusNotifier participant SM as SessionManager participant LC as LLMClient participant I as instructor participant FT as FolderTools U->>TG: Send message TG->>TH: handle_message() Note over TH: Accumulate pending messages
Cancel in-flight task if any TH->>TH: _start_processing() TH->>SN: start() → typing indicator TH->>SM: get_history(user_id) SM-->>TH: conversation history TH->>LC: chat(message, context, history) loop Agent Loop (max 10 iterations) LC->>I: create_with_completion()
response_model=AgentResponse I-->>LC: AgentResponse alt Has answer, no tool calls LC->>LC: Hallucination guard check LC-->>TH: (answer, tools_used, topic, usage) else Has tool calls loop For each tool call LC->>SN: update(tool_name) SN->>TG: Edit status message alt ask_user tool LC->>TH: on_ask_user callback TH->>TG: Show interactive UI U->>TG: Tap button / send text TG->>TH: Resolve Future TH-->>LC: User answer else Regular tool LC->>FT: execute_async(name, args) FT-->>LC: ToolResult end end Note over LC: Append results to
gathered_context, loop end end TH->>SM: save_message (user + assistant) TH->>SM: record_token_usage TH->>SN: stop() → delete status TH->>TG: Reply with response TG->>U: Display answer ``` ### Message Accumulation and Cancellation When a user sends multiple messages quickly, Folderbot accumulates them instead of processing each one independently: ```mermaid flowchart TD M1[Message 1 arrives] --> P1[Add to pending_messages] P1 --> T1[Start processing task] M2[Message 2 arrives
while processing] --> C[Cancel current task] C --> R[Restore in-progress
messages to pending] M2 --> P2[Add to pending_messages] R --> P2 P2 --> T2[Start new task with
all accumulated messages] T2 --> J[Messages joined with newline
sent as single LLM request] style C fill:#c44,stroke:#333,color:#fff style J fill:#4a9,stroke:#333,color:#fff ``` ## Core Components ### Telegram Handler (`telegram_handler.py`) The `TelegramBot` class manages: - **Message handling**: Accumulates user messages, cancels in-flight requests on new input - **Command handlers**: `/start`, `/clear`, `/new`, `/status`, `/files`, `/tasks` - **Document uploads**: Stores files and makes them available as tools - **Photo handling**: Downloads photos, saves to uploads, encodes as base64, and passes to LLM as multimodal image blocks for vision analysis - **ask_user UI**: Renders interactive Telegram widgets (inline keyboards, location pickers) when the LLM needs user input - **Scheduler integration**: Sends messages from background tasks - **File watcher**: Notifies users of file changes ### LLM Client (`llm_client.py`) The `LLMClient` is **backend-agnostic** using the `instructor` package: - Supports any LLM provider via `instructor.from_provider("provider/model")` - Uses **structured extraction** (`AgentResponse` model) instead of native tool_use - Tools are described in the system prompt text (instructor occupies the tools parameter) - **Multimodal support**: Photo messages are encoded as base64 image blocks in the user message. Image format is provider-aware (`_format_image_block`): Anthropic uses native `image` blocks with `base64` source, OpenAI uses `image_url` with data URIs. The provider is detected from the model string prefix (e.g. `anthropic/...`). ### Agent Loop The core loop in `LLMClient.chat()`: ```mermaid flowchart TD A[Build messages:
history + user message
+ gathered tool results] --> B[Call LLM via instructor.create
response_model=AgentResponse] B --> C{Answer provided?} C -->|yes| D[Return answer] C -->|no, tool calls| E{ask_user?} E -->|yes| F[Pause loop,
wait for user via callback] F --> G[Append result to
gathered_context] E -->|no| H[Dispatch to FolderTools] H --> G G --> A style D fill:#4a9,stroke:#333,color:#fff ``` ### Structured Response Models ```python class ToolCallRequest(BaseModel, frozen=True): name: str # Tool name arguments: dict # Tool arguments class AgentResponse(BaseModel, frozen=True): tool_calls: list[ToolCallRequest] # Tools to execute answer: str | None # Final answer (when done) topic: str # Conversation topic label class AskUserRequest(BaseModel, frozen=True): question: str # Question to display options: list[str] # Button labels input_type: str # choice | confirm | text | location ``` ## Tool System ### Registration Tools are registered via the `@folder_bot.tool()` decorator with typed Pydantic request/response models: ```python @folder_bot.tool(name="read_file", request_type=ReadFileRequest, response_type=ReadFileResponse) async def read_file(request, context): ... ``` ### Tool Categories | Category | Tools | |----------|-------| | File operations | `list_files`, `read_file`, `read_files`, `search_files`, `write_file` | | Web | `web_search` (Google Custom Search API), `web_fetch`, `get_weather` | | Scheduler | `schedule_task`, `list_tasks`, `cancel_task`, `get_task_results` | | Uploads | `list_uploads`, `delete_upload`, `send_upload` | | Visualization | `plot_chart` (matplotlib, sends PNG to user) | | Calendar | `calendar_add`, `calendar_list`, `calendar_upcoming`, `calendar_update`, `calendar_delete` | | Todo | `todo_add`, `todo_list`, `todo_update`, `todo_remove` | | Topics | `list_topics`, `get_full_history`, `reorganize_topics` | | Stats | `get_token_usage`, `token_stats`, `read_activity_log` | | Notifications | `enable_file_notifications`, `disable_file_notifications`, `get_file_notification_status` | | Utilities | `send_message`, `get_time`, `compare_numbers`, `shuffle_list`, `sort_list`, `random_choice`, `random_number` | | Interactive | `ask_user` (handled in agent loop, not FolderTools) | ### Tool Configuration Tools can have their own configuration via `[tools.]` sections in `config.toml`: ```toml [tools.web_search] google_api_key = "..." google_cx = "..." ``` Tools access their config via `get_tool_config(context, "tool_name")` which returns the tool's config dict. Custom tools receive the full `tools_config` dict in their constructor. ### Services Pattern Tools receive dependencies through `BotContext.services`: - `FolderServices`: Root path, config, path validation, `get_tool_config()` - `SchedulerServices`: Task creation and management - `UploadServices`: File upload storage and retrieval ```mermaid classDiagram class BotContext { +services +user_id } class FolderServices { +root_path +config +validate_path() } class SchedulerServices { +create_task() +cancel_task() +list_tasks() } class UploadServices { +uploads_dir: Path +send_document(chat_id, path, filename) +chat_id: int +session_manager } BotContext --> FolderServices BotContext --> SchedulerServices BotContext --> UploadServices ``` ## ask_user: Interactive User Input The `ask_user` tool enables the LLM to pause its agent loop and request interactive input from the user via native Telegram UI. ### Flow ```mermaid sequenceDiagram participant LLM as LLMClient participant CB as on_ask_user callback participant TB as TelegramBot participant TG as Telegram participant U as User LLM->>LLM: AgentResponse with
tool_call name="ask_user" LLM->>CB: on_ask_user(AskUserRequest) CB->>TB: _handle_ask_user() TB->>TB: Create asyncio.Future TB->>TG: Send UI (keyboard / text prompt) TG->>U: Display interactive widget U->>TG: Tap button / send text / share location TG->>TB: CallbackQuery / Message TB->>TB: Resolve Future with answer TB-->>CB: Return answer string CB-->>LLM: Answer added to gathered_context LLM->>LLM: Continue agent loop ``` ### Input Types | Type | Telegram UI | Resolution | |------|------------|------------| | `choice` | Inline keyboard (one button per option) | CallbackQueryHandler | | `confirm` | Inline keyboard (Yes/No row) | CallbackQueryHandler | | `text` | Plain text question | Next text message intercepted | | `location` | Reply keyboard with location button | Location MessageHandler | ### Key Design Decisions - **`ask_user` is NOT a registered FolderBot tool** — it's handled specially in the agent loop because it requires async user interaction - **`asyncio.Future`** for pause/resume — the agent loop awaits a Future that Telegram handlers resolve - **Index-based callback data** (`ask:user_id:index`) avoids Telegram's 64-byte limit - **120-second timeout** prevents the agent loop from hanging indefinitely - **Backend-agnostic** — the LLM client knows nothing about Telegram; the callback is injected by the handler ## Session Management - **SQLite-backed** via `SessionManager` - Stores conversation history per user (role, content, timestamp, topic) - Tracks version notifications, file notification preferences, uploads - Records token usage per LLM call (input/output tokens, model, topic) ```mermaid erDiagram USER ||--o{ CONVERSATION_HISTORY : has USER ||--o{ UPLOAD : stores USER { int user_id bool file_notifications string last_version_notified } CONVERSATION_HISTORY { int user_id string role string content datetime timestamp string topic } UPLOAD { int user_id string filename blob data } ``` ## Topic-Based Conversation Management Each message is tagged with a **topic** label (e.g. "weather", "recipes", "project planning") assigned by the LLM via the `AgentResponse.topic` field. Topics enable multi-threaded conversations: - **Topic-aware history**: `build_topic_history()` always includes the last 4 messages for immediate context, then backfills the remaining character budget with same-topic messages from older history - **`list_topics` tool**: Lets the user ask "what conversations am I having?" — returns topic names, message counts, and last activity - **Backward compatible**: Old messages without a topic field default to `"general"` ```mermaid flowchart LR H[Full History] --> R[Last 4 messages
recency window] H --> B[Older same-topic
messages backfill] R --> M[Merged history
sent to LLM] B --> M ``` ## Voice Transcription Voice messages and audio files are transcribed **locally** at the Telegram handler layer using **faster-whisper** (CTranslate2). The LLM receives plain text — it doesn't need to know the input was audio. - No API key required — runs entirely on-device via CTranslate2 (up to 4x faster than openai-whisper) - Pre-built wheels with GPU support (CUDA) — `pip install` just works, no build flags needed - Model configurable via `whisper_model` config key (default: `"base"`) - Handles both `filters.VOICE` (voice messages) and `filters.AUDIO` (audio files) - Transcription runs in a thread (`asyncio.to_thread`) to avoid blocking the event loop - Models are auto-downloaded from Hugging Face Hub and cached after first load ```mermaid sequenceDiagram participant U as User participant TG as Telegram participant TB as TelegramBot participant W as faster-whisper (local) participant LP as Message Pipeline U->>TG: Send voice message / audio file TG->>TB: handle_voice() TB->>TG: Download audio bytes TB->>W: transcribe_audio(bytes, model_name) Note over W: Writes to temp file,
runs model.transcribe(),
joins segments W-->>TB: TranscriptionResult(text) TB->>LP: Add text to pending_messages TB->>LP: _start_processing() LP->>LP: Normal LLM chat flow ``` ## Self-Update Mechanism The bot can automatically check PyPI for newer versions and upgrade itself: - **`folderbot update`** CLI command: checks PyPI JSON API, runs `pip install --upgrade`, restarts the systemd service if running - **Systemd timer**: `folderbot-update.timer` runs `folderbot update` every 5 minutes - Installed/managed alongside the main service via `folderbot service install/enable/start` ```mermaid flowchart LR T[systemd timer
every 5min] --> U[folderbot update] U --> P{PyPI newer?} P -->|no| D[Done] P -->|yes| I[pip install --upgrade] I --> R[systemctl restart folderbot] ``` ## Todo Management Markdown-backed task tracking via `TodoStore`. Todos are stored in a human-readable `.md` file (default: `.folderbot/todos.md`), editable with any text editor. Atomic writes via `os.replace` prevent corruption. ```python @dataclass(frozen=True) class TodoItem: id: int user_id: int title: str description: str status: str # todo | in_progress | done effort: str # tiny | small | medium | large | epic tags: list[str] created_at: str updated_at: str completed_at: str | None ``` ### Markdown Format Uses GFM checkboxes with todo.txt conventions for tags (`+tag`) and `key:value` metadata: ```markdown # Todos - [ ] Buy groceries +shopping +errands effort: small Milk and eggs. - [x] Write report +work effort: large Quarterly report. ``` - `- [S] Title +tags` — status chars: ` ` = todo, `~` = in_progress, `x` = done - Tags use `+tagname` convention (todo.txt style), inline on the task line - `effort: level` indented below (omitted when "medium", the default) - Description as indented free text after effort line - System metadata (id, user, timestamps) in an HTML comment — hidden in rendered views - IDs computed from `max(ids) + 1` (no separate tracker needed) - Items ordered by `created_at` ### Filtering The `todo_list` tool supports filtering by status, max effort level, tag, and text search. Completed tasks are hidden by default. The effort filter enables queries like "what can I do in 30 minutes?" (`max_effort="small"` returns `tiny` + `small` tasks). ## Calendar SQLite-backed event storage via `CalendarStore`. Supports adding, listing, updating, and deleting events. The `calendar_upcoming` tool returns events within a configurable time window, useful for "what's coming up this week?" queries. ## Token Usage Tracking Every LLM call records input and output token counts in a `token_usage` SQLite table. The `get_token_usage` tool lets users query their consumption by period (today, week, month). - `LLMClient.chat()` uses `create_with_completion()` to get raw completion metadata - Token counts are accumulated across all agent loop iterations - `TelegramBot._process_message()` calls `session_manager.record_token_usage()` after each chat - Records are scoped per user, model, and topic ## Multi-Bot Service Support The CLI supports running multiple bot instances as separate systemd services: - `folderbot service install --bot notes` creates `folderbot-notes.service` with `ExecStart=folderbot run --bot notes` - All service commands (`enable`, `start`, `stop`, etc.) accept `--bot NAME` - The update timer remains shared across all bot instances - Config uses the existing `bots` TOML section for per-bot overrides ## Hallucination Guard A heuristic check (`_claims_tool_use()`) detects when the LLM claims to have performed an action (e.g., "I've updated your file") without actually calling any tools. When triggered, a system warning is injected and the LLM retries. ```mermaid flowchart TD A[LLM returns answer] --> B{_claims_tool_use?} B -->|no| C[Accept answer] B -->|yes| D{Any tools actually called?} D -->|yes| C D -->|no| E[Inject warning into context] E --> F[Retry LLM call] style C fill:#4a9,stroke:#333,color:#fff style E fill:#c44,stroke:#333,color:#fff ``` ## Roadmap ### Sandboxed Python Execution (`run_python` tool) Allow the LLM to write and execute arbitrary Python code in a Docker container for tasks that require computation (e.g. generating a Brownian motion path, numerical simulations, data transformations). **Design:** - New `run_python` tool that accepts Python code and optional pip requirements - Executes in a Docker container with strict isolation: no network, no volume mounts, read-only root filesystem, memory/CPU limits - A designated output directory inside the container is mapped to a temp dir on host - After execution: captures stdout/stderr, sends any generated files (images, CSVs) back to the user via Telegram - Timeout to prevent runaway processes ```mermaid sequenceDiagram participant LLM as LLMClient participant FT as FolderTools participant D as Docker Container participant TG as Telegram LLM->>FT: run_python(code, requirements) FT->>D: Create container
(no network, mem limit) Note over D: pip install requirements
Execute code
Write files to /output D-->>FT: stdout + /output files FT->>TG: send_document(files) FT-->>LLM: ToolResult(stdout) ``` ### LLM-Powered Todo Extraction with Section-Level Cache Extract todos from **any** markdown file in the folder tree using LLM-based parsing, not just the structured `todos.md`. A SQLite cache layer avoids redundant LLM calls via per-section content hashing. **Architecture:** ``` Markdown files (source of truth) │ ├── split by headings (deterministic, cheap) │ ├── per-section hash → compare with cache │ │ │ ├── hash match → use cached extraction (free) │ │ │ └── hash mismatch → diff old vs new content │ │ │ ├── trivial change (status flip) → update programmatically │ │ │ └── ambiguous change → targeted LLM call with old+new content │ └── SQLite cache table: (file_path, section_index, content_hash, raw_content, extracted_json) ``` **Key design decisions:** - **Section-level granularity**: hash and cache each heading-delimited section independently, so editing one todo in a 200-item file only re-processes that section - **Store raw content**: enables diffing old vs new to detect the nature of changes (status transitions, title edits, etc.) without LLM - **LLM as fallback**: simple/structured changes handled programmatically; LLM only called for ambiguous edits in unstructured files - **File discovery**: scan folder tree using existing ReadRules include/exclude patterns - **Write routing**: bot-created todos go to central `todos.md`; LLM identifies the right project file based on context - **Rich extraction schema**: title, description, status, effort, deadline, priority, progress, time estimates, dependencies ### Lightweight Voice Transcription Replace `faster-whisper` (CTranslate2 + PyTorch, ~2GB) with `pywhispercpp` (whisper.cpp/GGML, ~4MB). Same transcription quality, drastically smaller install. Native Apple Silicon support via CoreML. ### Homebrew Formula Provide a Homebrew tap for macOS users: `brew install folderbot`. Would handle Python/venv setup and launchd service configuration (macOS equivalent of the current systemd integration).