SimWork
A cloud-based platform for managing SMS delivery, bridging the gap between software and hardware (SIM-banks). It handles everything from dedicated and broadcast messaging to complex delivery strategies, complete with regional configuration, deep analytics, and a decentralized infrastructure.
This project represents a unique engineering journey, covering a massive technical spectrum: from building a distributed decentralized architecture and processing graph data in real-time, to low-level hardware control and implementing chaos engineering for rock-solid resilience.
The Mission
A client approached us needing a robust SMS messaging system for their customers. With a minimum set of formal requirements to start with, we built this from the ground up, discovering the nuances of the process as we went along to shape the final vision.
R&D Challenges
Our first hurdle was controlling the SIM-bank hardware via COM ports. We found an open-source library that looked promising but turned out to be outdated and abandoned. Combined with a lack of good documentation for the necessary AT-commands, we decided to roll up our sleeves and implement our own custom solution.
We also faced significant variability between carriers and what they supported. We needed a universal, highly customizable solution that could handle all this complexity under the hood while keeping things simple for the users (node operators).
Scaling was another challenge. The system needed to operate across multiple regions and countries. Each software instance had to be capable of communicating and working within a unified flow, while also retaining the ability to function independently if needed.
Feature-wise, we had to support a wide array of things: direct messages, broadcasts, message series, long message splitting, and real-time SMS templating.
On top of that, we built a comprehensive web interface for management and monitoring.
Instance Architecture
The core is a Node.js application (an instance of our distributed system node) connected to a web management interface via WebSockets.
Mass messaging generates massive amounts of real-time data. To keep operators in the loop, we need to know the status of every modem and the overall system health (speed, success and failure rates, etc.) at all times. We designed a flexible and efficient graph-based data model to store and transfer this data, ensuring real-time updates are granular and bandwidth-efficient.
Instance core stack:
- Node.js
- PostgreSQL (local database)
- Symbiote.js (local web interface)
- serialport (npm package for serial port communication)
- RabbitMQ (AMQP client for getting messaging jobs and posting stats)
Deployment & Live Updates
Our nodes are designed to work anywhere—in any network segment, within virtual private networks, or behind firewalls. We use reverse tunneling to establish secure connections to each node for control and update deployment.
We built in the ability to switch application versions on the fly and roll back in seconds if anything goes wrong. The system also supports running different versions simultaneously across the network.
Resilience & Chaos Engineering
Given the system's complexity—especially with hardware involved—we adopted a chaos engineering approach to continuously test its resilience. We designed the architecture to be adaptable, ensuring it can handle unexpected issues and changes without breaking a sweat.
Platform API
Initially, we relied on RabbitMQ for node interconnection and external communication. However, we realized we needed a unified orchestration layer to make the system more accessible to external developers and integrators.
We created a Platform API that offers full control over the system using a familiar RESTful approach. It serves as a single entry point for everything: gathering stats, planning campaigns, reserving resources, managing users and organizations, and handling security.
This is the one of our ongoing projects, and we are actively supporting and developing it further.