Folderbot Architecture

Overview

Folderbot is a Telegram bot that gives users an LLM-powered assistant with access to their personal folder. The architecture follows a layered design:

        flowchart LR
    TG[Telegram] <-->|messages| TB[TelegramBot]
    TB <-->|chat loop| LLC[LLMClient]
    LLC <-->|tool dispatch| FT[FolderTools]
    LLC -.-|structured extraction| I[instructor]
    

Request Lifecycle

The full lifecycle of a user message, from Telegram to response:

        sequenceDiagram
    participant U as User
    participant TG as Telegram
    participant TH as TelegramHandler
    participant SN as StatusNotifier
    participant SM as SessionManager
    participant LC as LLMClient
    participant I as instructor
    participant FT as FolderTools

    U->>TG: Send message
    TG->>TH: handle_message()

    Note over TH: Accumulate pending messages<br/>Cancel in-flight task if any

    TH->>TH: _start_processing()
    TH->>SN: start() → typing indicator

    TH->>SM: get_history(user_id)
    SM-->>TH: conversation history

    TH->>LC: chat(message, context, history)

    loop Agent Loop (max 10 iterations)
        LC->>I: create_with_completion()<br/>response_model=AgentResponse
        I-->>LC: AgentResponse

        alt Has answer, no tool calls
            LC->>LC: Hallucination guard check
            LC-->>TH: (answer, tools_used, topic, usage)
        else Has tool calls
            loop For each tool call
                LC->>SN: update(tool_name)
                SN->>TG: Edit status message

                alt ask_user tool
                    LC->>TH: on_ask_user callback
                    TH->>TG: Show interactive UI
                    U->>TG: Tap button / send text
                    TG->>TH: Resolve Future
                    TH-->>LC: User answer
                else Regular tool
                    LC->>FT: execute_async(name, args)
                    FT-->>LC: ToolResult
                end
            end
            Note over LC: Append results to<br/>gathered_context, loop
        end
    end

    TH->>SM: save_message (user + assistant)
    TH->>SM: record_token_usage
    TH->>SN: stop() → delete status
    TH->>TG: Reply with response
    TG->>U: Display answer
    

Message Accumulation and Cancellation

When a user sends multiple messages quickly, Folderbot accumulates them instead of processing each one independently:

        flowchart TD
    M1[Message 1 arrives] --> P1[Add to pending_messages]
    P1 --> T1[Start processing task]

    M2[Message 2 arrives<br/>while processing] --> C[Cancel current task]
    C --> R[Restore in-progress<br/>messages to pending]
    M2 --> P2[Add to pending_messages]
    R --> P2
    P2 --> T2[Start new task with<br/>all accumulated messages]

    T2 --> J[Messages joined with newline<br/>sent as single LLM request]

    style C fill:#c44,stroke:#333,color:#fff
    style J fill:#4a9,stroke:#333,color:#fff
    

Core Components

Telegram Handler (telegram_handler.py)

The TelegramBot class manages:

  • Message handling: Accumulates user messages, cancels in-flight requests on new input

  • Command handlers: /start, /clear, /new, /status, /files, /tasks

  • Document uploads: Stores files and makes them available as tools

  • Photo handling: Downloads photos, saves to uploads, encodes as base64, and passes to LLM as multimodal image blocks for vision analysis

  • ask_user UI: Renders interactive Telegram widgets (inline keyboards, location pickers) when the LLM needs user input

  • Scheduler integration: Sends messages from background tasks

  • File watcher: Notifies users of file changes

LLM Client (llm_client.py)

The LLMClient is backend-agnostic using the instructor package:

  • Supports any LLM provider via instructor.from_provider("provider/model")

  • Uses structured extraction (AgentResponse model) instead of native tool_use

  • Tools are described in the system prompt text (instructor occupies the tools parameter)

  • Multimodal support: Photo messages are encoded as base64 image blocks in the user message. Image format is provider-aware (_format_image_block): Anthropic uses native image blocks with base64 source, OpenAI uses image_url with data URIs. The provider is detected from the model string prefix (e.g. anthropic/...).

Agent Loop

The core loop in LLMClient.chat():

        flowchart TD
    A[Build messages:<br/>history + user message<br/>+ gathered tool results] --> B[Call LLM via instructor.create<br/>response_model=AgentResponse]
    B --> C{Answer provided?}
    C -->|yes| D[Return answer]
    C -->|no, tool calls| E{ask_user?}
    E -->|yes| F[Pause loop,<br/>wait for user via callback]
    F --> G[Append result to<br/>gathered_context]
    E -->|no| H[Dispatch to FolderTools]
    H --> G
    G --> A

    style D fill:#4a9,stroke:#333,color:#fff
    

Structured Response Models

class ToolCallRequest(BaseModel, frozen=True):
    name: str          # Tool name
    arguments: dict    # Tool arguments

class AgentResponse(BaseModel, frozen=True):
    tool_calls: list[ToolCallRequest]  # Tools to execute
    answer: str | None                  # Final answer (when done)
    topic: str                          # Conversation topic label

class AskUserRequest(BaseModel, frozen=True):
    question: str      # Question to display
    options: list[str]  # Button labels
    input_type: str    # choice | confirm | text | location

Tool System

Registration

Tools are registered via the @folder_bot.tool() decorator with typed Pydantic request/response models:

@folder_bot.tool(name="read_file", request_type=ReadFileRequest, response_type=ReadFileResponse)
async def read_file(request, context):
    ...

Tool Categories

Category

Tools

File operations

list_files, read_file, read_files, search_files, write_file

Web

web_search (Google Custom Search API), web_fetch, get_weather

Scheduler

schedule_task, list_tasks, cancel_task, get_task_results

Uploads

list_uploads, delete_upload, send_upload

Visualization

plot_chart (matplotlib, sends PNG to user)

Calendar

calendar_add, calendar_list, calendar_upcoming, calendar_update, calendar_delete

Todo

todo_add, todo_list, todo_update, todo_remove

Topics

list_topics, get_full_history, reorganize_topics

Stats

get_token_usage, token_stats, read_activity_log

Notifications

enable_file_notifications, disable_file_notifications, get_file_notification_status

Utilities

send_message, get_time, compare_numbers, shuffle_list, sort_list, random_choice, random_number

Interactive

ask_user (handled in agent loop, not FolderTools)

Tool Configuration

Tools can have their own configuration via [tools.<name>] sections in config.toml:

[tools.web_search]
google_api_key = "..."
google_cx = "..."

Tools access their config via get_tool_config(context, "tool_name") which returns the tool’s config dict. Custom tools receive the full tools_config dict in their constructor.

Services Pattern

Tools receive dependencies through BotContext.services:

  • FolderServices: Root path, config, path validation, get_tool_config()

  • SchedulerServices: Task creation and management

  • UploadServices: File upload storage and retrieval

        classDiagram
    class BotContext {
        +services
        +user_id
    }
    class FolderServices {
        +root_path
        +config
        +validate_path()
    }
    class SchedulerServices {
        +create_task()
        +cancel_task()
        +list_tasks()
    }
    class UploadServices {
        +uploads_dir: Path
        +send_document(chat_id, path, filename)
        +chat_id: int
        +session_manager
    }

    BotContext --> FolderServices
    BotContext --> SchedulerServices
    BotContext --> UploadServices
    

ask_user: Interactive User Input

The ask_user tool enables the LLM to pause its agent loop and request interactive input from the user via native Telegram UI.

Flow

        sequenceDiagram
    participant LLM as LLMClient
    participant CB as on_ask_user callback
    participant TB as TelegramBot
    participant TG as Telegram
    participant U as User

    LLM->>LLM: AgentResponse with<br/>tool_call name="ask_user"
    LLM->>CB: on_ask_user(AskUserRequest)
    CB->>TB: _handle_ask_user()
    TB->>TB: Create asyncio.Future
    TB->>TG: Send UI (keyboard / text prompt)
    TG->>U: Display interactive widget

    U->>TG: Tap button / send text / share location
    TG->>TB: CallbackQuery / Message
    TB->>TB: Resolve Future with answer

    TB-->>CB: Return answer string
    CB-->>LLM: Answer added to gathered_context
    LLM->>LLM: Continue agent loop
    

Input Types

Type

Telegram UI

Resolution

choice

Inline keyboard (one button per option)

CallbackQueryHandler

confirm

Inline keyboard (Yes/No row)

CallbackQueryHandler

text

Plain text question

Next text message intercepted

location

Reply keyboard with location button

Location MessageHandler

Key Design Decisions

  • ask_user is NOT a registered FolderBot tool — it’s handled specially in the agent loop because it requires async user interaction

  • asyncio.Future for pause/resume — the agent loop awaits a Future that Telegram handlers resolve

  • Index-based callback data (ask:user_id:index) avoids Telegram’s 64-byte limit

  • 120-second timeout prevents the agent loop from hanging indefinitely

  • Backend-agnostic — the LLM client knows nothing about Telegram; the callback is injected by the handler

Session Management

  • SQLite-backed via SessionManager

  • Stores conversation history per user (role, content, timestamp, topic)

  • Tracks version notifications, file notification preferences, uploads

  • Records token usage per LLM call (input/output tokens, model, topic)

        erDiagram
    USER ||--o{ CONVERSATION_HISTORY : has
    USER ||--o{ UPLOAD : stores
    USER {
        int user_id
        bool file_notifications
        string last_version_notified
    }
    CONVERSATION_HISTORY {
        int user_id
        string role
        string content
        datetime timestamp
        string topic
    }
    UPLOAD {
        int user_id
        string filename
        blob data
    }
    

Topic-Based Conversation Management

Each message is tagged with a topic label (e.g. “weather”, “recipes”, “project planning”) assigned by the LLM via the AgentResponse.topic field. Topics enable multi-threaded conversations:

  • Topic-aware history: build_topic_history() always includes the last 4 messages for immediate context, then backfills the remaining character budget with same-topic messages from older history

  • list_topics tool: Lets the user ask “what conversations am I having?” — returns topic names, message counts, and last activity

  • Backward compatible: Old messages without a topic field default to "general"

        flowchart LR
    H[Full History] --> R[Last 4 messages<br/>recency window]
    H --> B[Older same-topic<br/>messages backfill]
    R --> M[Merged history<br/>sent to LLM]
    B --> M
    

Voice Transcription

Voice messages and audio files are transcribed locally at the Telegram handler layer using faster-whisper (CTranslate2). The LLM receives plain text — it doesn’t need to know the input was audio.

  • No API key required — runs entirely on-device via CTranslate2 (up to 4x faster than openai-whisper)

  • Pre-built wheels with GPU support (CUDA) — pip install just works, no build flags needed

  • Model configurable via whisper_model config key (default: "base")

  • Handles both filters.VOICE (voice messages) and filters.AUDIO (audio files)

  • Transcription runs in a thread (asyncio.to_thread) to avoid blocking the event loop

  • Models are auto-downloaded from Hugging Face Hub and cached after first load

        sequenceDiagram
    participant U as User
    participant TG as Telegram
    participant TB as TelegramBot
    participant W as faster-whisper (local)
    participant LP as Message Pipeline

    U->>TG: Send voice message / audio file
    TG->>TB: handle_voice()
    TB->>TG: Download audio bytes
    TB->>W: transcribe_audio(bytes, model_name)
    Note over W: Writes to temp file,<br/>runs model.transcribe(),<br/>joins segments
    W-->>TB: TranscriptionResult(text)
    TB->>LP: Add text to pending_messages
    TB->>LP: _start_processing()
    LP->>LP: Normal LLM chat flow
    

Self-Update Mechanism

The bot can automatically check PyPI for newer versions and upgrade itself:

  • folderbot update CLI command: checks PyPI JSON API, runs pip install --upgrade, restarts the systemd service if running

  • Systemd timer: folderbot-update.timer runs folderbot update every 5 minutes

  • Installed/managed alongside the main service via folderbot service install/enable/start

        flowchart LR
    T[systemd timer<br/>every 5min] --> U[folderbot update]
    U --> P{PyPI newer?}
    P -->|no| D[Done]
    P -->|yes| I[pip install --upgrade]
    I --> R[systemctl restart folderbot]
    

Todo Management

Markdown-backed task tracking via TodoStore. Todos are stored in a human-readable .md file (default: .folderbot/todos.md), editable with any text editor. Atomic writes via os.replace prevent corruption.

@dataclass(frozen=True)
class TodoItem:
    id: int
    user_id: int
    title: str
    description: str
    status: str          # todo | in_progress | done
    effort: str          # tiny | small | medium | large | epic
    tags: list[str]
    created_at: str
    updated_at: str
    completed_at: str | None

Markdown Format

Uses GFM checkboxes with todo.txt conventions for tags (+tag) and key:value metadata:

# Todos

- [ ] Buy groceries +shopping +errands
  effort: small
  Milk and eggs.
  <!-- id:1 user:42 created:2026-02-19T10:00:00 updated:2026-02-19T10:00:00 -->

- [x] Write report +work
  effort: large
  Quarterly report.
  <!-- id:2 user:42 created:2026-02-18T08:00:00 updated:2026-02-19T12:00:00 completed:2026-02-19T12:00:00 -->
  • - [S] Title +tags — status chars: = todo, ~ = in_progress, x = done

  • Tags use +tagname convention (todo.txt style), inline on the task line

  • effort: level indented below (omitted when “medium”, the default)

  • Description as indented free text after effort line

  • System metadata (id, user, timestamps) in an HTML comment — hidden in rendered views

  • IDs computed from max(ids) + 1 (no separate tracker needed)

  • Items ordered by created_at

Filtering

The todo_list tool supports filtering by status, max effort level, tag, and text search. Completed tasks are hidden by default. The effort filter enables queries like “what can I do in 30 minutes?” (max_effort="small" returns tiny + small tasks).

Calendar

SQLite-backed event storage via CalendarStore. Supports adding, listing, updating, and deleting events. The calendar_upcoming tool returns events within a configurable time window, useful for “what’s coming up this week?” queries.

Token Usage Tracking

Every LLM call records input and output token counts in a token_usage SQLite table. The get_token_usage tool lets users query their consumption by period (today, week, month).

  • LLMClient.chat() uses create_with_completion() to get raw completion metadata

  • Token counts are accumulated across all agent loop iterations

  • TelegramBot._process_message() calls session_manager.record_token_usage() after each chat

  • Records are scoped per user, model, and topic

Multi-Bot Service Support

The CLI supports running multiple bot instances as separate systemd services:

  • folderbot service install --bot notes creates folderbot-notes.service with ExecStart=folderbot run --bot notes

  • All service commands (enable, start, stop, etc.) accept --bot NAME

  • The update timer remains shared across all bot instances

  • Config uses the existing bots TOML section for per-bot overrides

Hallucination Guard

A heuristic check (_claims_tool_use()) detects when the LLM claims to have performed an action (e.g., “I’ve updated your file”) without actually calling any tools. When triggered, a system warning is injected and the LLM retries.

        flowchart TD
    A[LLM returns answer] --> B{_claims_tool_use?}
    B -->|no| C[Accept answer]
    B -->|yes| D{Any tools actually called?}
    D -->|yes| C
    D -->|no| E[Inject warning into context]
    E --> F[Retry LLM call]

    style C fill:#4a9,stroke:#333,color:#fff
    style E fill:#c44,stroke:#333,color:#fff
    

Roadmap

Sandboxed Python Execution (run_python tool)

Allow the LLM to write and execute arbitrary Python code in a Docker container for tasks that require computation (e.g. generating a Brownian motion path, numerical simulations, data transformations).

Design:

  • New run_python tool that accepts Python code and optional pip requirements

  • Executes in a Docker container with strict isolation: no network, no volume mounts, read-only root filesystem, memory/CPU limits

  • A designated output directory inside the container is mapped to a temp dir on host

  • After execution: captures stdout/stderr, sends any generated files (images, CSVs) back to the user via Telegram

  • Timeout to prevent runaway processes

        sequenceDiagram
    participant LLM as LLMClient
    participant FT as FolderTools
    participant D as Docker Container
    participant TG as Telegram

    LLM->>FT: run_python(code, requirements)
    FT->>D: Create container<br/>(no network, mem limit)
    Note over D: pip install requirements<br/>Execute code<br/>Write files to /output
    D-->>FT: stdout + /output files
    FT->>TG: send_document(files)
    FT-->>LLM: ToolResult(stdout)
    

LLM-Powered Todo Extraction with Section-Level Cache

Extract todos from any markdown file in the folder tree using LLM-based parsing, not just the structured todos.md. A SQLite cache layer avoids redundant LLM calls via per-section content hashing.

Architecture:

Markdown files (source of truth)
    │
    ├── split by headings (deterministic, cheap)
    │
    ├── per-section hash → compare with cache
    │       │
    │       ├── hash match → use cached extraction (free)
    │       │
    │       └── hash mismatch → diff old vs new content
    │               │
    │               ├── trivial change (status flip) → update programmatically
    │               │
    │               └── ambiguous change → targeted LLM call with old+new content
    │
    └── SQLite cache table:
            (file_path, section_index, content_hash, raw_content, extracted_json)

Key design decisions:

  • Section-level granularity: hash and cache each heading-delimited section independently, so editing one todo in a 200-item file only re-processes that section

  • Store raw content: enables diffing old vs new to detect the nature of changes (status transitions, title edits, etc.) without LLM

  • LLM as fallback: simple/structured changes handled programmatically; LLM only called for ambiguous edits in unstructured files

  • File discovery: scan folder tree using existing ReadRules include/exclude patterns

  • Write routing: bot-created todos go to central todos.md; LLM identifies the right project file based on context

  • Rich extraction schema: title, description, status, effort, deadline, priority, progress, time estimates, dependencies

Lightweight Voice Transcription

Replace faster-whisper (CTranslate2 + PyTorch, ~2GB) with pywhispercpp (whisper.cpp/GGML, ~4MB). Same transcription quality, drastically smaller install. Native Apple Silicon support via CoreML.

Homebrew Formula

Provide a Homebrew tap for macOS users: brew install folderbot. Would handle Python/venv setup and launchd service configuration (macOS equivalent of the current systemd integration).