Folderbot Architecture
Overview
Folderbot is a Telegram bot that gives users an LLM-powered assistant with access to their personal folder. The architecture follows a layered design:
flowchart LR
TG[Telegram] <-->|messages| TB[TelegramBot]
TB <-->|chat loop| LLC[LLMClient]
LLC <-->|tool dispatch| FT[FolderTools]
LLC -.-|structured extraction| I[instructor]
Request Lifecycle
The full lifecycle of a user message, from Telegram to response:
sequenceDiagram
participant U as User
participant TG as Telegram
participant TH as TelegramHandler
participant SN as StatusNotifier
participant SM as SessionManager
participant LC as LLMClient
participant I as instructor
participant FT as FolderTools
U->>TG: Send message
TG->>TH: handle_message()
Note over TH: Accumulate pending messages<br/>Cancel in-flight task if any
TH->>TH: _start_processing()
TH->>SN: start() → typing indicator
TH->>SM: get_history(user_id)
SM-->>TH: conversation history
TH->>LC: chat(message, context, history)
loop Agent Loop (max 10 iterations)
LC->>I: create_with_completion()<br/>response_model=AgentResponse
I-->>LC: AgentResponse
alt Has answer, no tool calls
LC->>LC: Hallucination guard check
LC-->>TH: (answer, tools_used, topic, usage)
else Has tool calls
loop For each tool call
LC->>SN: update(tool_name)
SN->>TG: Edit status message
alt ask_user tool
LC->>TH: on_ask_user callback
TH->>TG: Show interactive UI
U->>TG: Tap button / send text
TG->>TH: Resolve Future
TH-->>LC: User answer
else Regular tool
LC->>FT: execute_async(name, args)
FT-->>LC: ToolResult
end
end
Note over LC: Append results to<br/>gathered_context, loop
end
end
TH->>SM: save_message (user + assistant)
TH->>SM: record_token_usage
TH->>SN: stop() → delete status
TH->>TG: Reply with response
TG->>U: Display answer
Message Accumulation and Cancellation
When a user sends multiple messages quickly, Folderbot accumulates them instead of processing each one independently:
flowchart TD
M1[Message 1 arrives] --> P1[Add to pending_messages]
P1 --> T1[Start processing task]
M2[Message 2 arrives<br/>while processing] --> C[Cancel current task]
C --> R[Restore in-progress<br/>messages to pending]
M2 --> P2[Add to pending_messages]
R --> P2
P2 --> T2[Start new task with<br/>all accumulated messages]
T2 --> J[Messages joined with newline<br/>sent as single LLM request]
style C fill:#c44,stroke:#333,color:#fff
style J fill:#4a9,stroke:#333,color:#fff
Core Components
Telegram Handler (telegram_handler.py)
The TelegramBot class manages:
Message handling: Accumulates user messages, cancels in-flight requests on new input
Command handlers:
/start,/clear,/new,/status,/files,/tasksDocument uploads: Stores files and makes them available as tools
Photo handling: Downloads photos, saves to uploads, encodes as base64, and passes to LLM as multimodal image blocks for vision analysis
ask_user UI: Renders interactive Telegram widgets (inline keyboards, location pickers) when the LLM needs user input
Scheduler integration: Sends messages from background tasks
File watcher: Notifies users of file changes
LLM Client (llm_client.py)
The LLMClient is backend-agnostic using the instructor package:
Supports any LLM provider via
instructor.from_provider("provider/model")Uses structured extraction (
AgentResponsemodel) instead of native tool_useTools are described in the system prompt text (instructor occupies the tools parameter)
Multimodal support: Photo messages are encoded as base64 image blocks in the user message. Image format is provider-aware (
_format_image_block): Anthropic uses nativeimageblocks withbase64source, OpenAI usesimage_urlwith data URIs. The provider is detected from the model string prefix (e.g.anthropic/...).
Agent Loop
The core loop in LLMClient.chat():
flowchart TD
A[Build messages:<br/>history + user message<br/>+ gathered tool results] --> B[Call LLM via instructor.create<br/>response_model=AgentResponse]
B --> C{Answer provided?}
C -->|yes| D[Return answer]
C -->|no, tool calls| E{ask_user?}
E -->|yes| F[Pause loop,<br/>wait for user via callback]
F --> G[Append result to<br/>gathered_context]
E -->|no| H[Dispatch to FolderTools]
H --> G
G --> A
style D fill:#4a9,stroke:#333,color:#fff
Structured Response Models
class ToolCallRequest(BaseModel, frozen=True):
name: str # Tool name
arguments: dict # Tool arguments
class AgentResponse(BaseModel, frozen=True):
tool_calls: list[ToolCallRequest] # Tools to execute
answer: str | None # Final answer (when done)
topic: str # Conversation topic label
class AskUserRequest(BaseModel, frozen=True):
question: str # Question to display
options: list[str] # Button labels
input_type: str # choice | confirm | text | location
Tool System
Registration
Tools are registered via the @folder_bot.tool() decorator with typed Pydantic request/response models:
@folder_bot.tool(name="read_file", request_type=ReadFileRequest, response_type=ReadFileResponse)
async def read_file(request, context):
...
Tool Categories
Category |
Tools |
|---|---|
File operations |
|
Web |
|
Scheduler |
|
Uploads |
|
Visualization |
|
Calendar |
|
Todo |
|
Topics |
|
Stats |
|
Notifications |
|
Utilities |
|
Interactive |
|
Tool Configuration
Tools can have their own configuration via [tools.<name>] sections in config.toml:
[tools.web_search]
google_api_key = "..."
google_cx = "..."
Tools access their config via get_tool_config(context, "tool_name") which returns the tool’s config dict. Custom tools receive the full tools_config dict in their constructor.
Services Pattern
Tools receive dependencies through BotContext.services:
FolderServices: Root path, config, path validation,get_tool_config()SchedulerServices: Task creation and managementUploadServices: File upload storage and retrieval
classDiagram
class BotContext {
+services
+user_id
}
class FolderServices {
+root_path
+config
+validate_path()
}
class SchedulerServices {
+create_task()
+cancel_task()
+list_tasks()
}
class UploadServices {
+uploads_dir: Path
+send_document(chat_id, path, filename)
+chat_id: int
+session_manager
}
BotContext --> FolderServices
BotContext --> SchedulerServices
BotContext --> UploadServices
ask_user: Interactive User Input
The ask_user tool enables the LLM to pause its agent loop and request interactive input from the user via native Telegram UI.
Flow
sequenceDiagram
participant LLM as LLMClient
participant CB as on_ask_user callback
participant TB as TelegramBot
participant TG as Telegram
participant U as User
LLM->>LLM: AgentResponse with<br/>tool_call name="ask_user"
LLM->>CB: on_ask_user(AskUserRequest)
CB->>TB: _handle_ask_user()
TB->>TB: Create asyncio.Future
TB->>TG: Send UI (keyboard / text prompt)
TG->>U: Display interactive widget
U->>TG: Tap button / send text / share location
TG->>TB: CallbackQuery / Message
TB->>TB: Resolve Future with answer
TB-->>CB: Return answer string
CB-->>LLM: Answer added to gathered_context
LLM->>LLM: Continue agent loop
Input Types
Type |
Telegram UI |
Resolution |
|---|---|---|
|
Inline keyboard (one button per option) |
CallbackQueryHandler |
|
Inline keyboard (Yes/No row) |
CallbackQueryHandler |
|
Plain text question |
Next text message intercepted |
|
Reply keyboard with location button |
Location MessageHandler |
Key Design Decisions
ask_useris NOT a registered FolderBot tool — it’s handled specially in the agent loop because it requires async user interactionasyncio.Futurefor pause/resume — the agent loop awaits a Future that Telegram handlers resolveIndex-based callback data (
ask:user_id:index) avoids Telegram’s 64-byte limit120-second timeout prevents the agent loop from hanging indefinitely
Backend-agnostic — the LLM client knows nothing about Telegram; the callback is injected by the handler
Session Management
SQLite-backed via
SessionManagerStores conversation history per user (role, content, timestamp, topic)
Tracks version notifications, file notification preferences, uploads
Records token usage per LLM call (input/output tokens, model, topic)
erDiagram
USER ||--o{ CONVERSATION_HISTORY : has
USER ||--o{ UPLOAD : stores
USER {
int user_id
bool file_notifications
string last_version_notified
}
CONVERSATION_HISTORY {
int user_id
string role
string content
datetime timestamp
string topic
}
UPLOAD {
int user_id
string filename
blob data
}
Topic-Based Conversation Management
Each message is tagged with a topic label (e.g. “weather”, “recipes”, “project planning”) assigned by the LLM via the AgentResponse.topic field. Topics enable multi-threaded conversations:
Topic-aware history:
build_topic_history()always includes the last 4 messages for immediate context, then backfills the remaining character budget with same-topic messages from older historylist_topicstool: Lets the user ask “what conversations am I having?” — returns topic names, message counts, and last activityBackward compatible: Old messages without a topic field default to
"general"
flowchart LR
H[Full History] --> R[Last 4 messages<br/>recency window]
H --> B[Older same-topic<br/>messages backfill]
R --> M[Merged history<br/>sent to LLM]
B --> M
Voice Transcription
Voice messages and audio files are transcribed locally at the Telegram handler layer using faster-whisper (CTranslate2). The LLM receives plain text — it doesn’t need to know the input was audio.
No API key required — runs entirely on-device via CTranslate2 (up to 4x faster than openai-whisper)
Pre-built wheels with GPU support (CUDA) —
pip installjust works, no build flags neededModel configurable via
whisper_modelconfig key (default:"base")Handles both
filters.VOICE(voice messages) andfilters.AUDIO(audio files)Transcription runs in a thread (
asyncio.to_thread) to avoid blocking the event loopModels are auto-downloaded from Hugging Face Hub and cached after first load
sequenceDiagram
participant U as User
participant TG as Telegram
participant TB as TelegramBot
participant W as faster-whisper (local)
participant LP as Message Pipeline
U->>TG: Send voice message / audio file
TG->>TB: handle_voice()
TB->>TG: Download audio bytes
TB->>W: transcribe_audio(bytes, model_name)
Note over W: Writes to temp file,<br/>runs model.transcribe(),<br/>joins segments
W-->>TB: TranscriptionResult(text)
TB->>LP: Add text to pending_messages
TB->>LP: _start_processing()
LP->>LP: Normal LLM chat flow
Self-Update Mechanism
The bot can automatically check PyPI for newer versions and upgrade itself:
folderbot updateCLI command: checks PyPI JSON API, runspip install --upgrade, restarts the systemd service if runningSystemd timer:
folderbot-update.timerrunsfolderbot updateevery 5 minutesInstalled/managed alongside the main service via
folderbot service install/enable/start
flowchart LR
T[systemd timer<br/>every 5min] --> U[folderbot update]
U --> P{PyPI newer?}
P -->|no| D[Done]
P -->|yes| I[pip install --upgrade]
I --> R[systemctl restart folderbot]
Todo Management
Markdown-backed task tracking via TodoStore. Todos are stored in a human-readable .md file (default: .folderbot/todos.md), editable with any text editor. Atomic writes via os.replace prevent corruption.
@dataclass(frozen=True)
class TodoItem:
id: int
user_id: int
title: str
description: str
status: str # todo | in_progress | done
effort: str # tiny | small | medium | large | epic
tags: list[str]
created_at: str
updated_at: str
completed_at: str | None
Markdown Format
Uses GFM checkboxes with todo.txt conventions for tags (+tag) and key:value metadata:
# Todos
- [ ] Buy groceries +shopping +errands
effort: small
Milk and eggs.
<!-- id:1 user:42 created:2026-02-19T10:00:00 updated:2026-02-19T10:00:00 -->
- [x] Write report +work
effort: large
Quarterly report.
<!-- id:2 user:42 created:2026-02-18T08:00:00 updated:2026-02-19T12:00:00 completed:2026-02-19T12:00:00 -->
- [S] Title +tags— status chars:= todo,~= in_progress,x= doneTags use
+tagnameconvention (todo.txt style), inline on the task lineeffort: levelindented below (omitted when “medium”, the default)Description as indented free text after effort line
System metadata (id, user, timestamps) in an HTML comment — hidden in rendered views
IDs computed from
max(ids) + 1(no separate tracker needed)Items ordered by
created_at
Filtering
The todo_list tool supports filtering by status, max effort level, tag, and text search. Completed tasks are hidden by default. The effort filter enables queries like “what can I do in 30 minutes?” (max_effort="small" returns tiny + small tasks).
Calendar
SQLite-backed event storage via CalendarStore. Supports adding, listing, updating, and deleting events. The calendar_upcoming tool returns events within a configurable time window, useful for “what’s coming up this week?” queries.
Token Usage Tracking
Every LLM call records input and output token counts in a token_usage SQLite table. The get_token_usage tool lets users query their consumption by period (today, week, month).
LLMClient.chat()usescreate_with_completion()to get raw completion metadataToken counts are accumulated across all agent loop iterations
TelegramBot._process_message()callssession_manager.record_token_usage()after each chatRecords are scoped per user, model, and topic
Multi-Bot Service Support
The CLI supports running multiple bot instances as separate systemd services:
folderbot service install --bot notescreatesfolderbot-notes.servicewithExecStart=folderbot run --bot notesAll service commands (
enable,start,stop, etc.) accept--bot NAMEThe update timer remains shared across all bot instances
Config uses the existing
botsTOML section for per-bot overrides
Hallucination Guard
A heuristic check (_claims_tool_use()) detects when the LLM claims to have performed an action (e.g., “I’ve updated your file”) without actually calling any tools. When triggered, a system warning is injected and the LLM retries.
flowchart TD
A[LLM returns answer] --> B{_claims_tool_use?}
B -->|no| C[Accept answer]
B -->|yes| D{Any tools actually called?}
D -->|yes| C
D -->|no| E[Inject warning into context]
E --> F[Retry LLM call]
style C fill:#4a9,stroke:#333,color:#fff
style E fill:#c44,stroke:#333,color:#fff
Roadmap
Sandboxed Python Execution (run_python tool)
Allow the LLM to write and execute arbitrary Python code in a Docker container for tasks that require computation (e.g. generating a Brownian motion path, numerical simulations, data transformations).
Design:
New
run_pythontool that accepts Python code and optional pip requirementsExecutes in a Docker container with strict isolation: no network, no volume mounts, read-only root filesystem, memory/CPU limits
A designated output directory inside the container is mapped to a temp dir on host
After execution: captures stdout/stderr, sends any generated files (images, CSVs) back to the user via Telegram
Timeout to prevent runaway processes
sequenceDiagram
participant LLM as LLMClient
participant FT as FolderTools
participant D as Docker Container
participant TG as Telegram
LLM->>FT: run_python(code, requirements)
FT->>D: Create container<br/>(no network, mem limit)
Note over D: pip install requirements<br/>Execute code<br/>Write files to /output
D-->>FT: stdout + /output files
FT->>TG: send_document(files)
FT-->>LLM: ToolResult(stdout)
LLM-Powered Todo Extraction with Section-Level Cache
Extract todos from any markdown file in the folder tree using LLM-based parsing, not just the structured todos.md. A SQLite cache layer avoids redundant LLM calls via per-section content hashing.
Architecture:
Markdown files (source of truth)
│
├── split by headings (deterministic, cheap)
│
├── per-section hash → compare with cache
│ │
│ ├── hash match → use cached extraction (free)
│ │
│ └── hash mismatch → diff old vs new content
│ │
│ ├── trivial change (status flip) → update programmatically
│ │
│ └── ambiguous change → targeted LLM call with old+new content
│
└── SQLite cache table:
(file_path, section_index, content_hash, raw_content, extracted_json)
Key design decisions:
Section-level granularity: hash and cache each heading-delimited section independently, so editing one todo in a 200-item file only re-processes that section
Store raw content: enables diffing old vs new to detect the nature of changes (status transitions, title edits, etc.) without LLM
LLM as fallback: simple/structured changes handled programmatically; LLM only called for ambiguous edits in unstructured files
File discovery: scan folder tree using existing ReadRules include/exclude patterns
Write routing: bot-created todos go to central
todos.md; LLM identifies the right project file based on contextRich extraction schema: title, description, status, effort, deadline, priority, progress, time estimates, dependencies
Lightweight Voice Transcription
Replace faster-whisper (CTranslate2 + PyTorch, ~2GB) with pywhispercpp (whisper.cpp/GGML, ~4MB). Same transcription quality, drastically smaller install. Native Apple Silicon support via CoreML.
Homebrew Formula
Provide a Homebrew tap for macOS users: brew install folderbot. Would handle Python/venv setup and launchd service configuration (macOS equivalent of the current systemd integration).