# Folderbot Architecture
## Overview
Folderbot is a Telegram bot that gives users an LLM-powered assistant with access to their personal folder. The architecture follows a layered design:
```mermaid
flowchart LR
TG[Telegram] <-->|messages| TB[TelegramBot]
TB <-->|chat loop| LLC[LLMClient]
LLC <-->|tool dispatch| FT[FolderTools]
LLC -.-|structured extraction| I[instructor]
```
## Request Lifecycle
The full lifecycle of a user message, from Telegram to response:
```mermaid
sequenceDiagram
participant U as User
participant TG as Telegram
participant TH as TelegramHandler
participant SN as StatusNotifier
participant SM as SessionManager
participant LC as LLMClient
participant I as instructor
participant FT as FolderTools
U->>TG: Send message
TG->>TH: handle_message()
Note over TH: Accumulate pending messages
Cancel in-flight task if any
TH->>TH: _start_processing()
TH->>SN: start() → typing indicator
TH->>SM: get_history(user_id)
SM-->>TH: conversation history
TH->>LC: chat(message, context, history)
loop Agent Loop (max 10 iterations)
LC->>I: create_with_completion()
response_model=AgentResponse
I-->>LC: AgentResponse
alt Has answer, no tool calls
LC->>LC: Hallucination guard check
LC-->>TH: (answer, tools_used, topic, usage)
else Has tool calls
loop For each tool call
LC->>SN: update(tool_name)
SN->>TG: Edit status message
alt ask_user tool
LC->>TH: on_ask_user callback
TH->>TG: Show interactive UI
U->>TG: Tap button / send text
TG->>TH: Resolve Future
TH-->>LC: User answer
else Regular tool
LC->>FT: execute_async(name, args)
FT-->>LC: ToolResult
end
end
Note over LC: Append results to
gathered_context, loop
end
end
TH->>SM: save_message (user + assistant)
TH->>SM: record_token_usage
TH->>SN: stop() → delete status
TH->>TG: Reply with response
TG->>U: Display answer
```
### Message Accumulation and Cancellation
When a user sends multiple messages quickly, Folderbot accumulates them instead of processing each one independently:
```mermaid
flowchart TD
M1[Message 1 arrives] --> P1[Add to pending_messages]
P1 --> T1[Start processing task]
M2[Message 2 arrives
while processing] --> C[Cancel current task]
C --> R[Restore in-progress
messages to pending]
M2 --> P2[Add to pending_messages]
R --> P2
P2 --> T2[Start new task with
all accumulated messages]
T2 --> J[Messages joined with newline
sent as single LLM request]
style C fill:#c44,stroke:#333,color:#fff
style J fill:#4a9,stroke:#333,color:#fff
```
## Core Components
### Telegram Handler (`telegram_handler.py`)
The `TelegramBot` class manages:
- **Message handling**: Accumulates user messages, cancels in-flight requests on new input
- **Command handlers**: `/start`, `/clear`, `/new`, `/status`, `/files`, `/tasks`
- **Document uploads**: Stores files and makes them available as tools
- **Photo handling**: Downloads photos, saves to uploads, encodes as base64, and passes to LLM as multimodal image blocks for vision analysis
- **ask_user UI**: Renders interactive Telegram widgets (inline keyboards, location pickers) when the LLM needs user input
- **Scheduler integration**: Sends messages from background tasks
- **File watcher**: Notifies users of file changes
### LLM Client (`llm_client.py`)
The `LLMClient` is **backend-agnostic** using the `instructor` package:
- Supports any LLM provider via `instructor.from_provider("provider/model")`
- Uses **structured extraction** (`AgentResponse` model) instead of native tool_use
- Tools are described in the system prompt text (instructor occupies the tools parameter)
- **Multimodal support**: Photo messages are encoded as base64 image blocks in the user message. Image format is provider-aware (`_format_image_block`): Anthropic uses native `image` blocks with `base64` source, OpenAI uses `image_url` with data URIs. The provider is detected from the model string prefix (e.g. `anthropic/...`).
### Agent Loop
The core loop in `LLMClient.chat()`:
```mermaid
flowchart TD
A[Build messages:
history + user message
+ gathered tool results] --> B[Call LLM via instructor.create
response_model=AgentResponse]
B --> C{Answer provided?}
C -->|yes| D[Return answer]
C -->|no, tool calls| E{ask_user?}
E -->|yes| F[Pause loop,
wait for user via callback]
F --> G[Append result to
gathered_context]
E -->|no| H[Dispatch to FolderTools]
H --> G
G --> A
style D fill:#4a9,stroke:#333,color:#fff
```
### Structured Response Models
```python
class ToolCallRequest(BaseModel, frozen=True):
name: str # Tool name
arguments: dict # Tool arguments
class AgentResponse(BaseModel, frozen=True):
tool_calls: list[ToolCallRequest] # Tools to execute
answer: str | None # Final answer (when done)
topic: str # Conversation topic label
class AskUserRequest(BaseModel, frozen=True):
question: str # Question to display
options: list[str] # Button labels
input_type: str # choice | confirm | text | location
```
## Tool System
### Registration
Tools are registered via the `@folder_bot.tool()` decorator with typed Pydantic request/response models:
```python
@folder_bot.tool(name="read_file", request_type=ReadFileRequest, response_type=ReadFileResponse)
async def read_file(request, context):
...
```
### Tool Categories
| Category | Tools |
|----------|-------|
| File operations | `list_files`, `read_file`, `read_files`, `search_files`, `write_file` |
| Web | `web_search` (Google Custom Search API), `web_fetch`, `get_weather` |
| Scheduler | `schedule_task`, `list_tasks`, `cancel_task`, `get_task_results` |
| Uploads | `list_uploads`, `delete_upload`, `send_upload` |
| Visualization | `plot_chart` (matplotlib, sends PNG to user) |
| Calendar | `calendar_add`, `calendar_list`, `calendar_upcoming`, `calendar_update`, `calendar_delete` |
| Todo | `todo_add`, `todo_list`, `todo_update`, `todo_remove` |
| Topics | `list_topics`, `get_full_history`, `reorganize_topics` |
| Stats | `get_token_usage`, `token_stats`, `read_activity_log` |
| Notifications | `enable_file_notifications`, `disable_file_notifications`, `get_file_notification_status` |
| Utilities | `send_message`, `get_time`, `compare_numbers`, `shuffle_list`, `sort_list`, `random_choice`, `random_number` |
| Interactive | `ask_user` (handled in agent loop, not FolderTools) |
### Tool Configuration
Tools can have their own configuration via `[tools.]` sections in `config.toml`:
```toml
[tools.web_search]
google_api_key = "..."
google_cx = "..."
```
Tools access their config via `get_tool_config(context, "tool_name")` which returns the tool's config dict. Custom tools receive the full `tools_config` dict in their constructor.
### Services Pattern
Tools receive dependencies through `BotContext.services`:
- `FolderServices`: Root path, config, path validation, `get_tool_config()`
- `SchedulerServices`: Task creation and management
- `UploadServices`: File upload storage and retrieval
```mermaid
classDiagram
class BotContext {
+services
+user_id
}
class FolderServices {
+root_path
+config
+validate_path()
}
class SchedulerServices {
+create_task()
+cancel_task()
+list_tasks()
}
class UploadServices {
+uploads_dir: Path
+send_document(chat_id, path, filename)
+chat_id: int
+session_manager
}
BotContext --> FolderServices
BotContext --> SchedulerServices
BotContext --> UploadServices
```
## ask_user: Interactive User Input
The `ask_user` tool enables the LLM to pause its agent loop and request interactive input from the user via native Telegram UI.
### Flow
```mermaid
sequenceDiagram
participant LLM as LLMClient
participant CB as on_ask_user callback
participant TB as TelegramBot
participant TG as Telegram
participant U as User
LLM->>LLM: AgentResponse with
tool_call name="ask_user"
LLM->>CB: on_ask_user(AskUserRequest)
CB->>TB: _handle_ask_user()
TB->>TB: Create asyncio.Future
TB->>TG: Send UI (keyboard / text prompt)
TG->>U: Display interactive widget
U->>TG: Tap button / send text / share location
TG->>TB: CallbackQuery / Message
TB->>TB: Resolve Future with answer
TB-->>CB: Return answer string
CB-->>LLM: Answer added to gathered_context
LLM->>LLM: Continue agent loop
```
### Input Types
| Type | Telegram UI | Resolution |
|------|------------|------------|
| `choice` | Inline keyboard (one button per option) | CallbackQueryHandler |
| `confirm` | Inline keyboard (Yes/No row) | CallbackQueryHandler |
| `text` | Plain text question | Next text message intercepted |
| `location` | Reply keyboard with location button | Location MessageHandler |
### Key Design Decisions
- **`ask_user` is NOT a registered FolderBot tool** — it's handled specially in the agent loop because it requires async user interaction
- **`asyncio.Future`** for pause/resume — the agent loop awaits a Future that Telegram handlers resolve
- **Index-based callback data** (`ask:user_id:index`) avoids Telegram's 64-byte limit
- **120-second timeout** prevents the agent loop from hanging indefinitely
- **Backend-agnostic** — the LLM client knows nothing about Telegram; the callback is injected by the handler
## Session Management
- **SQLite-backed** via `SessionManager`
- Stores conversation history per user (role, content, timestamp, topic)
- Tracks version notifications, file notification preferences, uploads
- Records token usage per LLM call (input/output tokens, model, topic)
```mermaid
erDiagram
USER ||--o{ CONVERSATION_HISTORY : has
USER ||--o{ UPLOAD : stores
USER {
int user_id
bool file_notifications
string last_version_notified
}
CONVERSATION_HISTORY {
int user_id
string role
string content
datetime timestamp
string topic
}
UPLOAD {
int user_id
string filename
blob data
}
```
## Topic-Based Conversation Management
Each message is tagged with a **topic** label (e.g. "weather", "recipes", "project planning") assigned by the LLM via the `AgentResponse.topic` field. Topics enable multi-threaded conversations:
- **Topic-aware history**: `build_topic_history()` always includes the last 4 messages for immediate context, then backfills the remaining character budget with same-topic messages from older history
- **`list_topics` tool**: Lets the user ask "what conversations am I having?" — returns topic names, message counts, and last activity
- **Backward compatible**: Old messages without a topic field default to `"general"`
```mermaid
flowchart LR
H[Full History] --> R[Last 4 messages
recency window]
H --> B[Older same-topic
messages backfill]
R --> M[Merged history
sent to LLM]
B --> M
```
## Voice Transcription
Voice messages and audio files are transcribed **locally** at the Telegram handler layer using **faster-whisper** (CTranslate2). The LLM receives plain text — it doesn't need to know the input was audio.
- No API key required — runs entirely on-device via CTranslate2 (up to 4x faster than openai-whisper)
- Pre-built wheels with GPU support (CUDA) — `pip install` just works, no build flags needed
- Model configurable via `whisper_model` config key (default: `"base"`)
- Handles both `filters.VOICE` (voice messages) and `filters.AUDIO` (audio files)
- Transcription runs in a thread (`asyncio.to_thread`) to avoid blocking the event loop
- Models are auto-downloaded from Hugging Face Hub and cached after first load
```mermaid
sequenceDiagram
participant U as User
participant TG as Telegram
participant TB as TelegramBot
participant W as faster-whisper (local)
participant LP as Message Pipeline
U->>TG: Send voice message / audio file
TG->>TB: handle_voice()
TB->>TG: Download audio bytes
TB->>W: transcribe_audio(bytes, model_name)
Note over W: Writes to temp file,
runs model.transcribe(),
joins segments
W-->>TB: TranscriptionResult(text)
TB->>LP: Add text to pending_messages
TB->>LP: _start_processing()
LP->>LP: Normal LLM chat flow
```
## Self-Update Mechanism
The bot can automatically check PyPI for newer versions and upgrade itself:
- **`folderbot update`** CLI command: checks PyPI JSON API, runs `pip install --upgrade`, restarts the systemd service if running
- **Systemd timer**: `folderbot-update.timer` runs `folderbot update` every 5 minutes
- Installed/managed alongside the main service via `folderbot service install/enable/start`
```mermaid
flowchart LR
T[systemd timer
every 5min] --> U[folderbot update]
U --> P{PyPI newer?}
P -->|no| D[Done]
P -->|yes| I[pip install --upgrade]
I --> R[systemctl restart folderbot]
```
## Todo Management
Markdown-backed task tracking via `TodoStore`. Todos are stored in a human-readable `.md` file (default: `.folderbot/todos.md`), editable with any text editor. Atomic writes via `os.replace` prevent corruption.
```python
@dataclass(frozen=True)
class TodoItem:
id: int
user_id: int
title: str
description: str
status: str # todo | in_progress | done
effort: str # tiny | small | medium | large | epic
tags: list[str]
created_at: str
updated_at: str
completed_at: str | None
```
### Markdown Format
Uses GFM checkboxes with todo.txt conventions for tags (`+tag`) and `key:value` metadata:
```markdown
# Todos
- [ ] Buy groceries +shopping +errands
effort: small
Milk and eggs.
- [x] Write report +work
effort: large
Quarterly report.
```
- `- [S] Title +tags` — status chars: ` ` = todo, `~` = in_progress, `x` = done
- Tags use `+tagname` convention (todo.txt style), inline on the task line
- `effort: level` indented below (omitted when "medium", the default)
- Description as indented free text after effort line
- System metadata (id, user, timestamps) in an HTML comment — hidden in rendered views
- IDs computed from `max(ids) + 1` (no separate tracker needed)
- Items ordered by `created_at`
### Filtering
The `todo_list` tool supports filtering by status, max effort level, tag, and text search. Completed tasks are hidden by default. The effort filter enables queries like "what can I do in 30 minutes?" (`max_effort="small"` returns `tiny` + `small` tasks).
## Calendar
SQLite-backed event storage via `CalendarStore`. Supports adding, listing, updating, and deleting events. The `calendar_upcoming` tool returns events within a configurable time window, useful for "what's coming up this week?" queries.
## Token Usage Tracking
Every LLM call records input and output token counts in a `token_usage` SQLite table. The `get_token_usage` tool lets users query their consumption by period (today, week, month).
- `LLMClient.chat()` uses `create_with_completion()` to get raw completion metadata
- Token counts are accumulated across all agent loop iterations
- `TelegramBot._process_message()` calls `session_manager.record_token_usage()` after each chat
- Records are scoped per user, model, and topic
## Multi-Bot Service Support
The CLI supports running multiple bot instances as separate systemd services:
- `folderbot service install --bot notes` creates `folderbot-notes.service` with `ExecStart=folderbot run --bot notes`
- All service commands (`enable`, `start`, `stop`, etc.) accept `--bot NAME`
- The update timer remains shared across all bot instances
- Config uses the existing `bots` TOML section for per-bot overrides
## Hallucination Guard
A heuristic check (`_claims_tool_use()`) detects when the LLM claims to have performed an action (e.g., "I've updated your file") without actually calling any tools. When triggered, a system warning is injected and the LLM retries.
```mermaid
flowchart TD
A[LLM returns answer] --> B{_claims_tool_use?}
B -->|no| C[Accept answer]
B -->|yes| D{Any tools actually called?}
D -->|yes| C
D -->|no| E[Inject warning into context]
E --> F[Retry LLM call]
style C fill:#4a9,stroke:#333,color:#fff
style E fill:#c44,stroke:#333,color:#fff
```
## Roadmap
### Sandboxed Python Execution (`run_python` tool)
Allow the LLM to write and execute arbitrary Python code in a Docker container for tasks that require computation (e.g. generating a Brownian motion path, numerical simulations, data transformations).
**Design:**
- New `run_python` tool that accepts Python code and optional pip requirements
- Executes in a Docker container with strict isolation: no network, no volume mounts, read-only root filesystem, memory/CPU limits
- A designated output directory inside the container is mapped to a temp dir on host
- After execution: captures stdout/stderr, sends any generated files (images, CSVs) back to the user via Telegram
- Timeout to prevent runaway processes
```mermaid
sequenceDiagram
participant LLM as LLMClient
participant FT as FolderTools
participant D as Docker Container
participant TG as Telegram
LLM->>FT: run_python(code, requirements)
FT->>D: Create container
(no network, mem limit)
Note over D: pip install requirements
Execute code
Write files to /output
D-->>FT: stdout + /output files
FT->>TG: send_document(files)
FT-->>LLM: ToolResult(stdout)
```
### LLM-Powered Todo Extraction with Section-Level Cache
Extract todos from **any** markdown file in the folder tree using LLM-based parsing, not just the structured `todos.md`. A SQLite cache layer avoids redundant LLM calls via per-section content hashing.
**Architecture:**
```
Markdown files (source of truth)
│
├── split by headings (deterministic, cheap)
│
├── per-section hash → compare with cache
│ │
│ ├── hash match → use cached extraction (free)
│ │
│ └── hash mismatch → diff old vs new content
│ │
│ ├── trivial change (status flip) → update programmatically
│ │
│ └── ambiguous change → targeted LLM call with old+new content
│
└── SQLite cache table:
(file_path, section_index, content_hash, raw_content, extracted_json)
```
**Key design decisions:**
- **Section-level granularity**: hash and cache each heading-delimited section independently, so editing one todo in a 200-item file only re-processes that section
- **Store raw content**: enables diffing old vs new to detect the nature of changes (status transitions, title edits, etc.) without LLM
- **LLM as fallback**: simple/structured changes handled programmatically; LLM only called for ambiguous edits in unstructured files
- **File discovery**: scan folder tree using existing ReadRules include/exclude patterns
- **Write routing**: bot-created todos go to central `todos.md`; LLM identifies the right project file based on context
- **Rich extraction schema**: title, description, status, effort, deadline, priority, progress, time estimates, dependencies
### Lightweight Voice Transcription
Replace `faster-whisper` (CTranslate2 + PyTorch, ~2GB) with `pywhispercpp` (whisper.cpp/GGML, ~4MB). Same transcription quality, drastically smaller install. Native Apple Silicon support via CoreML.
### Homebrew Formula
Provide a Homebrew tap for macOS users: `brew install folderbot`. Would handle Python/venv setup and launchd service configuration (macOS equivalent of the current systemd integration).