Tanla Improvement Center: Celebrating the Indian Spirit of Innovation and Tech, Taking on the World
At Tanla, we firmly believe in the philosophy of Kaizen - the idea that continuous, incremental improvements can lead to remarkable progress over time. This approach to constant improvement is woven into the very fabric of our company culture and is reflected in the innovative CPaaS solutions that we offer to businesses worldwide. By adhering to the ‘one improvement every day’ philosophy in our core values, we are always pushing the boundaries of what is possible in cloud communications.
Introducing The Tanla Improvement Centre
Have you ever wondered how our platforms manage to handle billions of transactions every day? It's not magic – it's proactive monitoring and continuous improvement. Enter the Tanla Improvement Center – one of the secret ingredients behind our ability to deliver cutting-edge communication solutions to businesses around the world.
At the heart of the Improvement Center is the mission to enable companies and empower customers. Each "garage" or hub within Tanla produces its own product or platform (AI/ML, Conversational, Compliance, Innovation) which must be tracked and monitored for scalability. That is where the Improvement Center comes in, providing proactive monitoring of the network, infrastructure, software quality, security, and more, to ensure that everything is in place for our platforms to scale.
But it's not just about monitoring - the Center also helps each garage continuously improve on what they are already doing. By using the latest tech stack, dashboards, and tools, we monitor all platforms within a single pane of glass, providing businesses with real-time insights into how each platform is performing. This allows businesses to keep track of their KPIs, ensuring that they can deliver on the promises made to their customers.
Furthermore, the Tanla Improvement Center takes charge of detecting and resolving bottlenecks that might hinder a platform's ability to scale. If an incident comes up, the center ensures that it is rectified as quickly as possible. Additionally, it automates processes that would take a long time to execute manually, significantly reducing the platform's downtime. This process ensures that customers' SLAs are met.
One of the critical roles the center plays is tracking the cumulative performance of the entire platform. With multiple huge campaigns simultaneously pushing millions of traffic units per hour, the center ensures that the platform scales according to the demands of the business. The center's DB team ensures that the database is ready to handle the transactions per second (TPS), while the application and infrastructure teams also attest to their readiness. The Tanla Improvement Center compiles all these insights – database, infra, and systems – and monitors their cumulative performance.
The Origins of the Tanla Improvement Centre
Before we built the Improvement Centre, our processes were fragmented. We had NOC (Network Operations Center) teams and L1/L2 support, with DevOps primarily focused on automating product builds and releases. But as the company grew, a gap began to emerge between what our operations team expected from our products and what our product development teams were delivering in terms of use cases. It became clear that we needed to bridge this gap and get our traditionally siloed development and operations teams working together. And so, the Tanla Improvement Center was born.
With the help of DevOps and SRE, the Improvement Center is bringing together teams to work towards our shared goal of continuous improvement. We have also established a process that requires our product development teams to adhere to an SLA and ensure scalability before a product goes live. Given the critical nature of our platforms, especially for customers in the banking and finance sector, security is paramount. We need to ensure compliance with industry regulations (such as GDPR for European Markets) and pass audits. From security to scalability, the Improvement Center is driving best practices across our organization, resulting in reliable, robust products that meet customer SLAs and KPIs.
Our Vision for The Improvement Center
With a dedicated team of specialists in network, infrastructure, applications, storage, and security, the center brings together all aspects of the technology stack under one roof. Unlike in the past, where teams were siloed and issues were escalated between different departments, the center offers a collaborative approach with our NOC, Site Reliability (SRE) and DevOps teams all working together towards a common goal. Our ultimate vision for the Improvement Center is AI ML Ops – Operations driven by AI and machine learning.
We have adopted a two-pronged approach to realize these goals.
- Providing best-in-class customer service
In addition to striving for zero issues across any of our platforms, we are also striving for lightning-fast resolution times in case any incident does arise. We built the Improvement Center to be capable of fixing all issues except for code-level changes within the Center itself. For instance, the NOC monitoring team can do basic level troubleshooting and fix issues like VPN tunnel failures or ISP changes. This reduces the impact window to just five minutes, as opposed to the 30 minutes it would previously take for raising child tickets to the network team and asking them to fix issues.
Additionally, when the NOC team detects storage level issues, the storage specialist who is sitting within the Improvement Center can fix the problem right away, without having to raise a ticket in JIRA or call someone in a remote location. This allows us to bring down the resolution time for our customers significantly. Sometimes, it takes 2-3 hours for services to be restored in case of a critical incident, but we are aiming to bring that down to less than 30 minutes.
Another way in which we are delivering exceptional customer service is through our self-healing capabilities. Simply put, self-healing is the ability of our systems to automatically resolve issues and restore service without any manual intervention. This means that if an application crashes or a VPN tunnel goes down, our system can quickly detect the issue and take the necessary steps to resolve it. By using runbooks or scripts, our systems can automatically execute the necessary steps to bring services back up and running. This not only minimizes the downtime for our customers but also reduces the need for manual intervention, saving valuable time and resources. With self-healing, we have been able to bring down outage times to less than five minutes, which is a significant improvement from the 30 to 45-minute outages we experienced before.
- Predictive and Proactive Monitoring
Our network, bandwidth, infrastructure, virtualization, Kubernetes, and application stack are constantly monitored to ensure smooth operation and quick detection of any potential issues. And that is not all, from our Improvement Center we are now monitoring real-time security as well. By using modern tools for network and infra monitoring, application and performance monitoring, incident management and service discovery, and security, we can proactively detect any security breaches and become instantly aware of any non-compliance situations.
This is a far cry from the reactive approach we used to have where we only discovered issues after external audits. With these new-age tools, which come equipped with AI capabilities, we can detect and solve problems much faster, bringing down our incident response time to a minimum.
How Does the Improvement Center Work?
When a new platform is being developed in one of our garages, the Tanla Improvement Center springs into action. Since we are already monitoring six to seven platforms, we know what might go wrong right from day one. Therefore, we ensure that compliance and security processes are taken care of even before the new platform goes live. We only move a product to production after we have verified that it meets our standards. Once the product goes live, it is monitored 24/7 by our teams, and the same compliance, security, and monitoring stack that we use for existing platforms is onboarded to the new platform.
To showcase the platform's performance to the engineering team, we publish monthly dashboards showing uptime, utilization, and whether it is delivering customer KPIs. We also track the number and types of tickets opened by customers, repeat tickets, and open tickets without root cause analysis. These insights are included in the health dashboard or service availability report, which is published by the Improvement Center team. With these reports, we can help the engineering team identify and address any issues, reduce downtime, and improve uptime, resulting in a better experience for our customers.
What it is like to work at The Improvement Center
Working at the Tanla Improvement Center is like being part of a futuristic space mission – there are dashboards, screens, graphs, and expert professionals every way you look. Every day, we monitor multiple platforms simultaneously, identifying critical issues and prioritizing them based on their impact on the overall business. Our team is currently made up of 18 members, but we have a seating capacity of 24. To provide efficient 24/7 support, we will need to expand to a 40-member team.
We now have visibility into where problems are occurring, which was not the case before. We can track incidents and identify the root causes, thanks to the weekly reports we publish to all stakeholders, including product, engineering, QA, and operations.
During our Friday meetings, we discuss the previous week's incidents, root causes, and platform performance, as well as the tasks that need to be done to improve performance across all functions. This level of transparency has helped us break down incidents and improve our platform's overall performance.
We even publish monthly and weekly dashboards to track which platform is having the most issues. This way, teams can take responsibility for improving their platform's performance, which has resulted in a reduction in incidence time. We are always striving for excellence, and we are committed to publishing the best platform out there.
Overall, working at the Improvement Center is both challenging and rewarding.
Looking Ahead
We have some exciting plans in store for the Improvement Center. Although we have only been in operation for about 3.5-4 months, we have a clear understanding of what is expected from us. Our goal is for all platforms to run with zero incidents and zero impact on customers. We are constantly analyzing the history of each platform's health, taking note of how many times it has gone down and how many hours were impacted. This helps us to identify the gaps and weak points in each platform, and we are always working to improve upon them.
To achieve this goal, we are implementing new-age tools which will help us to seamlessly integrate and provide engineers with root cause analysis, allowing them to easily identify where the problem lies. Whether the issue is at the network level or application level, the tools we are bringing on board will enable us to quickly pinpoint the source of the problem.
Over the next six months, we plan to integrate these tools, build a specialist team, and improve our processes. By doing so, we are confident that we will be able to continue providing our customers with the high level of service they have come to expect from us.
Conclusion
The future looks bright for the Improvement Center. As we look forward to National Technology Day, we are incredibly excited by all of the breakthroughs, innovative processes and paradigm shifts that the Center is fostering. As an incubator for our best and brightest minds, the Improvement Center is all set to turn into a bastion of our philosophy of constant improvement that has enabled us to become an industry leader in the CPaaS space.
So, if you are looking for a challenging and exciting place to work, where you can make a real difference, look no further than the Tanla Improvement Center. We are always looking for talented individuals to join our team, and together we can continue building the future of cloud communications!