How We Tamed a Thousand Telegram Chats and Achieved a 98% Client SLA

Telegram has long been our primary communication channel with clients. It's fast, familiar, and doesn't require creating separate portals, filling out logins-passwords, and forms. Clients write to us as easily as they do to friends, while engineers respond with screenshots, logs, and terminal commands.

Telegram has long been our main communication channel with clients. It's fast, familiar, and doesn’t require creating separate portals, filling in logins-passwords, and forms. Clients write to us as easily as to friends, and engineers respond with screenshots, logs, and terminal commands.

But this convenience has a downside. When chats grow from dozens to hundreds and thousands, chaos starts to win. At some point, we realized: either we teach Telegram to operate by SLA rules, or we will stop adhering to those rules ourselves.

In this article, we will explain how we support pilot and VIP clients directly in Telegram — without a classic Service Desk, but with measurable and controllable SLA.

Where the task originated

Over the years, we have accumulated more than a thousand group Telegram chats with pilot and VIP clients. Along with them, three systemic pains emerged.

Messages get lost
When discussions are happening simultaneously in dozens of chats, it’s easy to overlook that no one has replied in one of them for two days.

60-minute SLA is impossible to calculate
Telegram doesn’t know what a “ticket,” “first response,” or “overdue” means. Attempts to keep track in Excel break down already at a hundred chats and a couple of duty shifts.

Managers need numbers
The phrase “everything seems fine” doesn’t work at the quarterly board meeting. Facts are needed: percentages, average response time, monthly dynamics.

At the same time, we didn’t want to give up on Telegram because clients love it. So, we needed to add discipline on top of the familiar interface.

Antidote: A Python bot that turns chat into a ticket

We wrote a small microservice — a Telegram bot that views each chat as a separate task and automatically measures response time.

The logic turned out to be simple.

How the bot reacts to a new message

  1. Receives updates via the Bot API.

  2. Checks the author of the message. There are user groups support and sales in the system—Telegram IDs of employees from MULTIFACTOR are added there. If the message is sent by a user not from these groups, the bot considers it a customer inquiry.

  3. If a client writes— a task is created, and SLA is initiated (until the first response).

  4. If support or an administrator responds—the timer stops, and the task is considered closed.

  5. A record is saved in PostgreSQL: chat_id, creation time, status, timestamps.

  6. Reminders are planned:

    • in 45 minutes—a yellow level and ⚡ emoji in the duty chat

    • in 55 minutes—an orange level and ❗ emoji

    • upon reaching 60 minutes—a red level, ⚠️ emoji, and a SLA MISS mark.

The bot never goes silent due to Telegram limitations: we account for Retry-After/flood control and make retries to handle 429 Too Many Requests.

As a result, every new client inquiry (when there is no open task) becomes a ticket, even if it is written in a regular group.

Accounting for work hours, not minutes

The next pitfall is working hours. If SLA is counted by the calendar, night messages instantly spoil the statistics.

We have set a support schedule:

  • Monday – Friday: 03:00–23:00 (UTC+3)

  • Saturday – Sunday: 10:00–19:00

If a message comes at night, the timer starts at the first minute of the working window.
If the 60-minute limit “falls” over the end of the day, the remainder carries over to tomorrow.

The bot takes holidays from a public .ics calendar, caches it for a day, and accounts for them just like weekends.

As a result, SLA reflects the actual work of the team, not the random time of message sending.

What the service stack consists of

The architecture turned out to be compact, without heavy ITSM systems.

Listener
Receives Telegram updates. We use pure long polling (getUpdates), but can easily switch to webhook if necessary.

SLA Engine
Calculates working minutes. An asynchronous collector of timers, all task states are stored in the tasks table.

Notifier
Sends ⚡❗⚠️ notifications to the duty chat at 45, 55, and 60-minute marks.

Mailbox Watch
Monitors the mailbox. If a message hangs without a response for more than 30 minutes, the bot writes: “Check if the ticket has gone to Yandex.Tracker.”

Reporter
Reporting directly in Telegram:
/slastat — summary from the first day of the month

/report — any period

automatic CSV export once a quarter

Role Manager
Management of “Roles” is now implemented through addition using InlineKeyboard and is allowed only for bot administrators.

DB Migrations
For schema evolution without downtime. We use Alembic: the schema version is stored in alembic_version, alembic upgrade head is run in CI before deployment and ensures zero-downtime roll-out.

How the bot works

Let’s consider a hypothetical scenario of how the bot will work if the support team does not notice a message from a client.

12:00 — the client writes: “The service is not working.”
12:00 — the bot logs the task and sets timers for 45/55/60 minutes.
12:45 — the bot writes in the duty chat: ⚡ 15 minutes left.
12:55 — the bot: ❗ 5 minutes left.
12:57 — support replies to the client.
12:57 — the bot logs the task closure and the fact that SLA was not violated.

If there is no response, the chat is marked as 🔴 SLA MISS, and the counter overdue=true increases in the database. No discussions or justifications.

Bot rights and Telegram privacy mode
To allow the bot to see regular messages in groups, we disabled privacy mode and granted the necessary rights to read messages.

Case “anonymous admin” in groups
We took into account the “anonymous admin” mode: when a message is sent “on behalf of the group”, the bot still recognizes it as a response from support and closes the task.

Resilience to restarts
After a restart, the bot resumes SLA tracking for all open tasks from the database and continues the countdown without losing state.

Honest clarification about metrics (FRT vs MTTR)
We measure First Response Time — the time until the first response. We do not count MTTR in the classical sense because a “solution” may require several iterations of correspondence.

Protection against duplicates and races
The task creation is implemented idempotently: there will not be two "open" tasks in one chat due to races or parallel updates.

Important feature
The bot does not send messages to client chats — only to duty or service ones.

Reporting without unnecessary BI tools

We consciously did not integrate Grafana or a separate BI.

Managers receive numbers directly in Telegram:

  • number of inquiries

  • First Response Time

  • percentage of responses within SLA

  • list of "hot" chats

Financial analysts receive a CSV file once a quarter with the following lines: chat_id, created_at, first_reply_at, overdue. This is sufficient for calculating KPIs.

What automation has provided

After implementing the bot, the numbers changed quickly and visibly:

  • 98% of inquiries are resolved within SLA of 60 minutes (it used to be around 70%),

  • The average First Response Time has decreased by approximately 40%

  • Support stopped setting manual reminders and alarms — the bot does this more accurately and around the clock

What’s the result

At some point, we stopped perceiving Telegram as an informal communication channel and acknowledged the obvious: if support lives in it, then metrics, accountability, and automation should also exist within it. We did not build another Service Desk or force clients to change their habits; we simply added a layer of logic that transforms the flow of messages into a managed process. As a result, engineers continue to communicate in a familiar interface, while managers receive transparent numbers.

The main conclusion turned out to be simple: even a thousand Telegram chats can operate predictably if they have clear rules and automatic control.

Comments