Updated on 2022-07-13 GMT+08:00

Basic Concepts

Topology

A topology graphically displays call and dependency relationships between applications. It is composed of circles, lines with arrows, and resources. Each circle represents a service, and each section in the circle represents an instance. The fraction in each circle indicates number of active instance/total number of instances. The values below the fraction separately indicate the service latency, number of calls, and number of errors. Each line with an arrow represents a call relationship. Thicker lines indicate more calls. The values next to each line respectively indicate the throughput and overall latency. Throughput is the number of calls in a specified time range. Application Performance Index (Apdex) is used to quantify user satisfaction with application performance. Different colors indicate different Apdex value ranges, helping you quickly detect and locate performance problems.

Transaction

A transaction is usually an HTTP request (complete process: user request > web server > database > web server > user request). In real life, a transaction is a one-time task. A user completes a task by using an application. In the example of an e-commerce application, querying a product is a transaction, and making a payment is also a transaction.

Tracing

APM traces and records service calls, and visually presents the execution tracks and statuses of service requests in distributed systems, so that you can quickly locate performance bottlenecks and faults.

Application

An application is a group of the same or similar services categorized based on service requirements. You can put services that fulfill the same function into one application for performance management. For example, you can put accounts, products, and payment services into the Mall application.

Apdex

Apdex is an open standard developed by the Apdex alliance. It defines a standard method to measure application performance. The Apdex standard converts the application response time into user satisfaction with application performance in the range of 0 to 1.

  • Apdex principle

Apdex defines the optimal threshold (T) for the application response time. T is determined by the performance evaluation personnel based on performance expectations. Based on the actual response time and T, user experience can be categorized as follows:

Satisfied: indicates that the actual response time is shorter than or equal to T. For example, if T is 1.5s and the actual response time is 1s, user experience is satisfied.

Tolerating: indicates that the actual response time is greater than T, but shorter than or equal to 4T. For example, if T is 1s, the tolerable upper threshold for the response time is 4s.

Frustrated: indicates that the actual response time is greater than 4T.

  • Apdex calculation method

    In APM, the Apdex threshold is the value configured in Setting Apdex Thresholds. The application response latency is the service latency. The Apdex value ranges from 0 to 1 and is calculated as follows:

    Apdex = (Number of satisfied samples + Number of tolerating samples x 0.5)/Total number of samples

Apdex indicates application performance status, that is, user satisfaction with application performance. Different colors indicate different Apdex ranges, as shown in Table 1.

Table 1 Apdex description

Apdex

Color

Description

0.75 ≤ Apdex ≤ 1

Green

Fast response; good user experience

0.3 ≤ Apdex < 0.75

Yellow

Slow response; fair user experience

0 ≤ Apdex < 0.3

Red

Very slow response; poor user experience

  • Configuring an Apdex threshold

You can configure an Apdex threshold according to Setting Apdex Thresholds.

TP99 Latency

TP99 latency is the minimum time meeting requirements of 99% requests. In APM, latency refers to TP99 latency.

Example: Assume that there are 100 requests, and the time consumed by the requests is 1s, 2s, 3s, 4s...98s, 99s, and 100s. To meet the requirements of 99% requests, at least 99s is required. Therefore, TP99 latency is 99s.

Calculation: Sort all requests by the consumed time in ascending order. TP99 latency = Time consumed by the Nth request. N is the rounded value of 99% x Total number of requests.

Overall Latency/Service Latency

Latency refers to the period from initiating a request to getting a response. In APM, the overall latency refers to the total time consumed by a request, and the service latency refers to the time consumed by a service. The relationship is as follows: Service latency = Overall latency – Latency for calling other services. For example, assume that service A calls service B, and service B calls service C, as shown in the following figure:

  • Service A: Overall latency = Ta; Service latency = Ta – Tb1 – Tb2 – Tc
  • Service B: Overall latency = Tb1 + Tb2 + Tc; Service latency = Tb1 + Tb2
  • Service C: Overall latency = Tc; Service latency =Tc

Probes

Probes use the bytecode enhancement technology to track calls and generate data. The data will be collected by the ICAgent and then displayed on the UI. If the memory detection mechanism is enabled and the instance memory is too large, probes enter the hibernation state, that is, stop collecting data. How Does APM Collect Probe Data?

Mesh

The Istio mesh obtains input and output application program data in non-intrusive mode. Then, the ICAgent and Cloud Container Engine (CCE) Istio mixer obtain and process the mesh data, and report it to APM. You can enable the Istio to collect mesh data. How Does APM Collect Mesh Data?

ICAgent

ICAgent is a collection agent of APM. It runs on the server where applications are deployed to collect the data obtained by probes in real time. For details about the data collection and purposes, see APM Service Agreement. Installing the ICAgent is prerequisite for using APM.