How to Troubleshoot TCP Performance for Your SaaS and Cloud Applications

Guest article

From time to time, we like to ask our partners to share their ITOp expertise and best practices with you. Today, Boris Rogier Director of Business Development at Accedian discusses TCP performance for your SaaS and Cloud Applications. Read on!

As Director of Business Development, Boris is responsible for leading innovation around Accedian’s network and application performance solutions for enterprise IT. He applies more than 15 years of IT operations, network, and application development experience to advise organizations across all verticals on best practices to optimize performance in multi-cloud, virtualized, and software as-a-service (SaaS) infrastructure environments. Boris holds business law and economy & finance degrees from EDHEC Business School and Institut d’Etudes politiques de Bordeaux.

Troubleshooting TCP performance in complex IT environments that integrate SaaS and cloud-hosted applications can be quite challenging. SaaS and cloud-hosted applications often degrade because of unhealthy TCP relationships (sessions) between client and servers in physical, SaaS, and cloud infrastructure. The way TCP sessions set up and tear down directly impacts SaaS and Cloud performance, and the user experience, especially if there are reasons to believe that hosts are overloaded and messages are dropped. A persistent increase in the number of TCP zero window (0-Win) events and duplicate acknowledgements (DupAck) are typically good indicators that end-users are suffering from degraded performance. Detecting and solving poor TCP/IP performance impacting SaaS and Cloud application is straightforward and delivers quick resolution to network, server, and application degradations, eliminating dysfunctional relationships from ruining your users’ day, and your own.

Trouble with your TCP performance? Find the root cause in just 6 easy steps.

Finding the root cause of TCP performance issues impacting SaaS and Cloud applications can be challenging and time consuming. The following 6 steps enable you to speed up this process and include some actionable pointers toward finding the “low hanging fruit” when looking for ways to mitigate TCP performance issues and target improvements in SaaS and cloud application user experience.

Step 1. Start by ruling out an overloaded client or server side by taking a look at the number of 0-Win events. If these events are coming in rapidly, you may want to involve the respective desktop or system administrator(s) and have a look at the workload on these hosts.

Step 2. If the number of 0-Win events is close to zero, then most likely the TCP transmission problem is somewhere on the network path between the client and server side. If both are within the same subnet, it should be fairly easy to figure out where the delays and/or drops are coming from. A quick look at the MAC tables from the connected network devices should tell you which devices and interfaces are involved.

Step 3. If the client and server side are not within the same subnet, it means that one or more routers (or something similar) is involved. Start by finding the intermediate subnets, devices, and interfaces by looking at the MAC addresses and routing tables of the designated gateway on the client and server side. This should tell you which other routers and interfaces are actively involved in sending and receiving messages.

Step 4. If it turns out that both MAC addresses are pointing to the same routing device, then most likely that routing device has too many things to do besides routing messages. For example, maybe the device is actually a firewall with (too?) many policies. Perhaps it is a load-balancer running CPU intensive tasks such as intrusion detection and prevention (IDS/IPS), performing SSL offloading, or performing data compression. This is probably a good time to involve the system administrator of these devices.

Step 5. However, if both MAC addresses are pointing to different routing devices, then most likely one or more WAN connections are involved to access cloud or SaaS applications. If redundant, check the load-sharing algorithm on the routers. Modern IP routers and switches support packet-based load sharing. While this is a very effective way of performing load sharing, it may result in some unexpected side effects. Such asymmetric network paths may require additional processing time on the hosts as the order by which messages are received might be changed.

Step 6. Once you have an understanding of the devices and interfaces between the client and server side, start looking at things like CPU and memory utilization, frame drops, CRC errors, buffer overflows, and interface utilization. These are good indicators for figuring out what could have caused packet drops and, therefore, are causing additional delays due to retransmissions.

How can a unified N/APM monitoring solution help you troubleshoot TCP performance?

When you need to perform these steps regularly, consider deploying a wire data analytics monitoring solution. Typically, their topology capabilities support you by automating device discovery between 2 hosts. This is because they translate the contents of MAC and routing tables into a topology map. They can also automate the analysis and reporting of TCP metrics for each session: SYN, SYN-ACK, RST, 0-WIN and more, that allows you to isolate problems quickly, without having to perform manual packet analysis. Learn more on troubleshooting TCP with our 5 Steps to Troubleshoot SaaS Applications using TCP Analysis guide available here.

Similar posts

Best Practices

01/10/2024 Connecting Centreon and Canopsis to Strengthen Observability in an Open-source Ecosystem

Integrating monitoring and hypervision tools creates a synergy that can significantly improve IT infrastructure monitoring. Monitoring tools provide specific expertise and greater granularity, while hypervision tools provide enriched, consolidated analysis. When combined, they promote efficient, ...

Best Practices

06/08/2024 Unlock the Power of IT and OT Convergence with Centreon

In today’s fast-paced digital landscape, the convergence of Information Technology (IT) and Operational Technology (OT) is more critical than ever. Businesses need a unified approach to manage and monitor their diverse technology ecosystems. Centreon offers ...

Best Practices

30/11/2023 Monitoring OT With Raspberry Pi and Centreon

It might be a little late to pick raspberries, but it’s never too late to start monitoring operational technologies, such as SCADA. And it’s now easier than ever: Centreon 23.10 lets you monitor your industrial ...

Best Practices

06/09/2023 7 Essential Monitoring Best Practices for Extended IT Visibility

IT monitoring plays a crucial role in the efficient management of today’s hybrid IT infrastructures. With the growing complexity of IT environments, it is essential to adopt good monitoring practices to guarantee extended visibility and ...

Best Practices

30/08/2023 Extending visibility to hybrid IT for optimized digital performance

Accelerated digital transformation has reshaped IT landscapes, challenging IT departments and ITOps teams to attain unprecedented visibility into their IT estate. Digital performance, which is closely linked to the overall success of organizations, relies on ...

Best Practices

08/11/2022 Open Source or Paid IT Monitoring: Which Should It Be?

When it comes to IT monitoring, an incredible diversity of solutions is available. The first broad choice is between an open source or a paid IT monitoring solution. Aside from their development approach, their acquisition ...

Best Practices

05/10/2022 TCO: The key to choosing an IT monitoring tool

When the time comes to shop for a new IT monitoring solution, you probably drafted a list of criteria to inform the final selection, based on the specs obtained from various stakeholders. How better can ...

Best Practices

09/09/2022 Three ways to reduce IT monitoring costs

The war in Ukraine and the global rise of inflation have compounding effects on the global economy, which was just starting to recover from the pandemic. In that context, the topic of cost reduction becomes ...

Best Practices

11/06/2019 Dealing With Outdated IT—And Being OK With It

“Obsolete” is no one’s favorite word (unless you sell new tech). Yet, we all have to deal with it at some point in the digital transformation journey. There are countless reasons to cling to outdated ...