Folderbot Architecture

Overview

Folderbot is a Telegram bot that gives users an LLM-powered assistant with access to their personal folder. The architecture follows a layered design:

        flowchart LR
    TG[Telegram] <-->|messages| TB[TelegramBot]
    TB <-->|chat loop| LLC[LLMClient]
    LLC <-->|tool dispatch| FT[FolderTools]
    LLC -.-|structured extraction| I[instructor]

Request Lifecycle

The full lifecycle of a user message, from Telegram to response:

        sequenceDiagram
    participant U as User
    participant TG as Telegram
    participant TH as TelegramHandler
    participant SN as StatusNotifier
    participant SM as SessionManager
    participant LC as LLMClient
    participant I as instructor
    participant FT as FolderTools

    U->>TG: Send message
    TG->>TH: handle_message()

    Note over TH: Accumulate pending messages<br/>Cancel in-flight task if any

    TH->>TH: _start_processing()
    TH->>SN: start() → typing indicator

    TH->>SM: get_history(user_id)
    SM-->>TH: conversation history

    TH->>LC: chat(message, context, history)

    loop Agent Loop (max 10 iterations)
        LC->>I: create_with_completion()<br/>response_model=AgentResponse
        I-->>LC: AgentResponse

        alt Has answer, no tool calls
            LC->>LC: Hallucination guard check
            LC-->>TH: (answer, tools_used, topic, usage)
        else Has tool calls
            loop For each tool call
                LC->>SN: update(tool_name)
                SN->>TG: Edit status message

                alt ask_user tool
                    LC->>TH: on_ask_user callback
                    TH->>TG: Show interactive UI
                    U->>TG: Tap button / send text
                    TG->>TH: Resolve Future
                    TH-->>LC: User answer
                else Regular tool
                    LC->>FT: execute_async(name, args)
                    FT-->>LC: ToolResult
                end
            end
            Note over LC: Append results to<br/>gathered_context, loop
        end
    end

    TH->>SM: save_message (user + assistant)
    TH->>SM: record_token_usage
    TH->>SN: stop() → delete status
    TH->>TG: Reply with response
    TG->>U: Display answer

Message Accumulation and Cancellation

When a user sends multiple messages quickly, Folderbot accumulates them instead of processing each one independently:

        flowchart TD
    M1[Message 1 arrives] --> P1[Add to pending_messages]
    P1 --> T1[Start processing task]

    M2[Message 2 arrives<br/>while processing] --> C[Cancel current task]
    C --> R[Restore in-progress<br/>messages to pending]
    M2 --> P2[Add to pending_messages]
    R --> P2
    P2 --> T2[Start new task with<br/>all accumulated messages]

    T2 --> J[Messages joined with newline<br/>sent as single LLM request]

    style C fill:#c44,stroke:#333,color:#fff
    style J fill:#4a9,stroke:#333,color:#fff

Core Components

Telegram Handler (`telegram_handler.py`)

The TelegramBot class manages:

Message handling: Accumulates user messages, cancels in-flight requests on new input
Command handlers: /start, /clear, /new, /status, /files, /tasks
Document uploads: Stores files and makes them available as tools
Photo handling: Downloads photos, saves to uploads, encodes as base64, and passes to LLM as multimodal image blocks for vision analysis
ask_user UI: Renders interactive Telegram widgets (inline keyboards, location pickers) when the LLM needs user input
Scheduler integration: Sends messages from background tasks
File watcher: Notifies users of file changes

LLM Client (`llm_client.py`)

The LLMClient is backend-agnostic using the instructor package:

Supports any LLM provider via instructor.from_provider("provider/model")
Uses structured extraction (AgentResponse model) instead of native tool_use
Tools are described in the system prompt text (instructor occupies the tools parameter)
Multimodal support: Photo messages are encoded as base64 image blocks in the user message. Image format is provider-aware (_format_image_block): Anthropic uses native image blocks with base64 source, OpenAI uses image_url with data URIs. The provider is detected from the model string prefix (e.g. anthropic/...).

Agent Loop

The core loop in LLMClient.chat():

        flowchart TD
    A[Build messages:<br/>history + user message<br/>+ gathered tool results] --> B[Call LLM via instructor.create<br/>response_model=AgentResponse]
    B --> C{Answer provided?}
    C -->|yes| D[Return answer]
    C -->|no, tool calls| E{ask_user?}
    E -->|yes| F[Pause loop,<br/>wait for user via callback]
    F --> G[Append result to<br/>gathered_context]
    E -->|no| H[Dispatch to FolderTools]
    H --> G
    G --> A

    style D fill:#4a9,stroke:#333,color:#fff

Structured Response Models

class ToolCallRequest(BaseModel, frozen=True):
    name: str          # Tool name
    arguments: dict    # Tool arguments

class AgentResponse(BaseModel, frozen=True):
    tool_calls: list[ToolCallRequest]  # Tools to execute
    answer: str | None                  # Final answer (when done)
    topic: str                          # Conversation topic label

class AskUserRequest(BaseModel, frozen=True):
    question: str      # Question to display
    options: list[str]  # Button labels
    input_type: str    # choice | confirm | text | location

Tool System

Registration

Tools are registered via the @folder_bot.tool() decorator with typed Pydantic request/response models:

@folder_bot.tool(name="read_file", request_type=ReadFileRequest, response_type=ReadFileResponse)
async def read_file(request, context):
    ...

Tool Categories

Category	Tools
File operations	`list_files`, `read_file`, `read_files`, `search_files`, `write_file`
Web	`web_search` (Google Custom Search API), `web_fetch`, `get_weather`
Scheduler	`schedule_task`, `list_tasks`, `cancel_task`, `get_task_results`
Uploads	`list_uploads`, `delete_upload`, `send_upload`
Visualization	`plot_chart` (matplotlib, sends PNG to user)
Calendar	`calendar_add`, `calendar_list`, `calendar_upcoming`, `calendar_update`, `calendar_delete`
Todo	`todo_add`, `todo_list`, `todo_update`, `todo_remove`
Topics	`list_topics`, `get_full_history`, `reorganize_topics`
Stats	`get_token_usage`, `token_stats`, `read_activity_log`
Notifications	`enable_file_notifications`, `disable_file_notifications`, `get_file_notification_status`
Utilities	`send_message`, `get_time`, `compare_numbers`, `shuffle_list`, `sort_list`, `random_choice`, `random_number`
Interactive	`ask_user` (handled in agent loop, not FolderTools)

Tool Configuration

Tools can have their own configuration via [tools.<name>] sections in config.toml:

[tools.web_search]
google_api_key = "..."
google_cx = "..."

Tools access their config via get_tool_config(context, "tool_name") which returns the tool’s config dict. Custom tools receive the full tools_config dict in their constructor.

Services Pattern

Tools receive dependencies through BotContext.services:

FolderServices: Root path, config, path validation, get_tool_config()
SchedulerServices: Task creation and management
UploadServices: File upload storage and retrieval

        classDiagram
    class BotContext {
        +services
        +user_id
    }
    class FolderServices {
        +root_path
        +config
        +validate_path()
    }
    class SchedulerServices {
        +create_task()
        +cancel_task()
        +list_tasks()
    }
    class UploadServices {
        +uploads_dir: Path
        +send_document(chat_id, path, filename)
        +chat_id: int
        +session_manager
    }

    BotContext --> FolderServices
    BotContext --> SchedulerServices
    BotContext --> UploadServices

ask_user: Interactive User Input

The ask_user tool enables the LLM to pause its agent loop and request interactive input from the user via native Telegram UI.

Flow

        sequenceDiagram
    participant LLM as LLMClient
    participant CB as on_ask_user callback
    participant TB as TelegramBot
    participant TG as Telegram
    participant U as User

    LLM->>LLM: AgentResponse with<br/>tool_call name="ask_user"
    LLM->>CB: on_ask_user(AskUserRequest)
    CB->>TB: _handle_ask_user()
    TB->>TB: Create asyncio.Future
    TB->>TG: Send UI (keyboard / text prompt)
    TG->>U: Display interactive widget

    U->>TG: Tap button / send text / share location
    TG->>TB: CallbackQuery / Message
    TB->>TB: Resolve Future with answer

    TB-->>CB: Return answer string
    CB-->>LLM: Answer added to gathered_context
    LLM->>LLM: Continue agent loop

Input Types

Type	Telegram UI	Resolution
`choice`	Inline keyboard (one button per option)	CallbackQueryHandler
`confirm`	Inline keyboard (Yes/No row)	CallbackQueryHandler
`text`	Plain text question	Next text message intercepted
`location`	Reply keyboard with location button	Location MessageHandler

Key Design Decisions

ask_user is NOT a registered FolderBot tool — it’s handled specially in the agent loop because it requires async user interaction
asyncio.Future for pause/resume — the agent loop awaits a Future that Telegram handlers resolve
Index-based callback data (ask:user_id:index) avoids Telegram’s 64-byte limit
120-second timeout prevents the agent loop from hanging indefinitely
Backend-agnostic — the LLM client knows nothing about Telegram; the callback is injected by the handler

Session Management

SQLite-backed via SessionManager
Stores conversation history per user (role, content, timestamp, topic)
Tracks version notifications, file notification preferences, uploads
Records token usage per LLM call (input/output tokens, model, topic)

        erDiagram
    USER ||--o{ CONVERSATION_HISTORY : has
    USER ||--o{ UPLOAD : stores
    USER {
        int user_id
        bool file_notifications
        string last_version_notified
    }
    CONVERSATION_HISTORY {
        int user_id
        string role
        string content
        datetime timestamp
        string topic
    }
    UPLOAD {
        int user_id
        string filename
        blob data
    }

Topic-Based Conversation Management

Each message is tagged with a topic label (e.g. “weather”, “recipes”, “project planning”) assigned by the LLM via the AgentResponse.topic field. Topics enable multi-threaded conversations:

Topic-aware history: build_topic_history() always includes the last 4 messages for immediate context, then backfills the remaining character budget with same-topic messages from older history
list_topics tool: Lets the user ask “what conversations am I having?” — returns topic names, message counts, and last activity
Backward compatible: Old messages without a topic field default to "general"

        flowchart LR
    H[Full History] --> R[Last 4 messages<br/>recency window]
    H --> B[Older same-topic<br/>messages backfill]
    R --> M[Merged history<br/>sent to LLM]
    B --> M

Voice Transcription

Voice messages and audio files are transcribed locally at the Telegram handler layer using faster-whisper (CTranslate2). The LLM receives plain text — it doesn’t need to know the input was audio.

No API key required — runs entirely on-device via CTranslate2 (up to 4x faster than openai-whisper)
Pre-built wheels with GPU support (CUDA) — pip install just works, no build flags needed
Model configurable via whisper_model config key (default: "base")
Handles both filters.VOICE (voice messages) and filters.AUDIO (audio files)
Transcription runs in a thread (asyncio.to_thread) to avoid blocking the event loop
Models are auto-downloaded from Hugging Face Hub and cached after first load

        sequenceDiagram
    participant U as User
    participant TG as Telegram
    participant TB as TelegramBot
    participant W as faster-whisper (local)
    participant LP as Message Pipeline

    U->>TG: Send voice message / audio file
    TG->>TB: handle_voice()
    TB->>TG: Download audio bytes
    TB->>W: transcribe_audio(bytes, model_name)
    Note over W: Writes to temp file,<br/>runs model.transcribe(),<br/>joins segments
    W-->>TB: TranscriptionResult(text)
    TB->>LP: Add text to pending_messages
    TB->>LP: _start_processing()
    LP->>LP: Normal LLM chat flow

Self-Update Mechanism

The bot can automatically check PyPI for newer versions and upgrade itself:

folderbot update CLI command: checks PyPI JSON API, runs pip install --upgrade, restarts the systemd service if running
Systemd timer: folderbot-update.timer runs folderbot update every 5 minutes
Installed/managed alongside the main service via folderbot service install/enable/start

        flowchart LR
    T[systemd timer<br/>every 5min] --> U[folderbot update]
    U --> P{PyPI newer?}
    P -->|no| D[Done]
    P -->|yes| I[pip install --upgrade]
    I --> R[systemctl restart folderbot]

Todo Management

Markdown-backed task tracking via TodoStore. Todos are stored in a human-readable .md file (default: .folderbot/todos.md), editable with any text editor. Atomic writes via os.replace prevent corruption.

@dataclass(frozen=True)
class TodoItem:
    id: int
    user_id: int
    title: str
    description: str
    status: str          # todo | in_progress | done
    effort: str          # tiny | small | medium | large | epic
    tags: list[str]
    created_at: str
    updated_at: str
    completed_at: str | None

Markdown Format

Uses GFM checkboxes with todo.txt conventions for tags (+tag) and key:value metadata:

# Todos

- [ ] Buy groceries +shopping +errands
  effort: small
  Milk and eggs.
  <!-- id:1 user:42 created:2026-02-19T10:00:00 updated:2026-02-19T10:00:00 -->

- [x] Write report +work
  effort: large
  Quarterly report.
  <!-- id:2 user:42 created:2026-02-18T08:00:00 updated:2026-02-19T12:00:00 completed:2026-02-19T12:00:00 -->

- [S] Title +tags — status chars: = todo, ~ = in_progress, x = done
Tags use +tagname convention (todo.txt style), inline on the task line
effort: level indented below (omitted when “medium”, the default)
Description as indented free text after effort line
System metadata (id, user, timestamps) in an HTML comment — hidden in rendered views
IDs computed from max(ids) + 1 (no separate tracker needed)
Items ordered by created_at

Filtering

The todo_list tool supports filtering by status, max effort level, tag, and text search. Completed tasks are hidden by default. The effort filter enables queries like “what can I do in 30 minutes?” (max_effort="small" returns tiny + small tasks).

Calendar

SQLite-backed event storage via CalendarStore. Supports adding, listing, updating, and deleting events. The calendar_upcoming tool returns events within a configurable time window, useful for “what’s coming up this week?” queries.

Token Usage Tracking

Every LLM call records input and output token counts in a token_usage SQLite table. The get_token_usage tool lets users query their consumption by period (today, week, month).

LLMClient.chat() uses create_with_completion() to get raw completion metadata
Token counts are accumulated across all agent loop iterations
TelegramBot._process_message() calls session_manager.record_token_usage() after each chat
Records are scoped per user, model, and topic

Multi-Bot Service Support

The CLI supports running multiple bot instances as separate systemd services:

folderbot service install --bot notes creates folderbot-notes.service with ExecStart=folderbot run --bot notes
All service commands (enable, start, stop, etc.) accept --bot NAME
The update timer remains shared across all bot instances
Config uses the existing bots TOML section for per-bot overrides

Hallucination Guard

A heuristic check (_claims_tool_use()) detects when the LLM claims to have performed an action (e.g., “I’ve updated your file”) without actually calling any tools. When triggered, a system warning is injected and the LLM retries.

        flowchart TD
    A[LLM returns answer] --> B{_claims_tool_use?}
    B -->|no| C[Accept answer]
    B -->|yes| D{Any tools actually called?}
    D -->|yes| C
    D -->|no| E[Inject warning into context]
    E --> F[Retry LLM call]

    style C fill:#4a9,stroke:#333,color:#fff
    style E fill:#c44,stroke:#333,color:#fff

Roadmap

Sandboxed Python Execution (`run_python` tool)

Allow the LLM to write and execute arbitrary Python code in a Docker container for tasks that require computation (e.g. generating a Brownian motion path, numerical simulations, data transformations).

Design:

New run_python tool that accepts Python code and optional pip requirements
Executes in a Docker container with strict isolation: no network, no volume mounts, read-only root filesystem, memory/CPU limits
A designated output directory inside the container is mapped to a temp dir on host
After execution: captures stdout/stderr, sends any generated files (images, CSVs) back to the user via Telegram
Timeout to prevent runaway processes

        sequenceDiagram
    participant LLM as LLMClient
    participant FT as FolderTools
    participant D as Docker Container
    participant TG as Telegram

    LLM->>FT: run_python(code, requirements)
    FT->>D: Create container<br/>(no network, mem limit)
    Note over D: pip install requirements<br/>Execute code<br/>Write files to /output
    D-->>FT: stdout + /output files
    FT->>TG: send_document(files)
    FT-->>LLM: ToolResult(stdout)

LLM-Powered Todo Extraction with Section-Level Cache

Extract todos from any markdown file in the folder tree using LLM-based parsing, not just the structured todos.md. A SQLite cache layer avoids redundant LLM calls via per-section content hashing.

Architecture:

Markdown files (source of truth)
    │
    ├── split by headings (deterministic, cheap)
    │
    ├── per-section hash → compare with cache
    │       │
    │       ├── hash match → use cached extraction (free)
    │       │
    │       └── hash mismatch → diff old vs new content
    │               │
    │               ├── trivial change (status flip) → update programmatically
    │               │
    │               └── ambiguous change → targeted LLM call with old+new content
    │
    └── SQLite cache table:
            (file_path, section_index, content_hash, raw_content, extracted_json)

Key design decisions:

Section-level granularity: hash and cache each heading-delimited section independently, so editing one todo in a 200-item file only re-processes that section
Store raw content: enables diffing old vs new to detect the nature of changes (status transitions, title edits, etc.) without LLM
LLM as fallback: simple/structured changes handled programmatically; LLM only called for ambiguous edits in unstructured files
File discovery: scan folder tree using existing ReadRules include/exclude patterns
Write routing: bot-created todos go to central todos.md; LLM identifies the right project file based on context
Rich extraction schema: title, description, status, effort, deadline, priority, progress, time estimates, dependencies

Lightweight Voice Transcription

Replace faster-whisper (CTranslate2 + PyTorch, ~2GB) with pywhispercpp (whisper.cpp/GGML, ~4MB). Same transcription quality, drastically smaller install. Native Apple Silicon support via CoreML.

Homebrew Formula

Provide a Homebrew tap for macOS users: brew install folderbot. Would handle Python/venv setup and launchd service configuration (macOS equivalent of the current systemd integration).