Cisco Cisco Unified Service Monitor 8.0 White Paper

White Paper

Page 3 of 26

Introduction

This white paper provides insight into voice quality and the different methods to measure voice
quality.

Voice over IP (VoIP) has passed its infancy stage and is a mature technology that has been widely
adopted by customers hoping to take advantage of the cost savings offered by VoIP in addition to a
range of advanced features that improve efficiency. Voice quality is the qualitative and quantitative
measure of sound and conversation quality on an IP phone call. Voice quality measurement
describes and evaluates the clarity and intelligibility of voice conversation.

The shift from the traditional Time-division Multiplexing (TDM) world to a packet-based IP telephony
solution poses challenges for voice quality. Unlike data, which is bursty in nature and tolerant to
delay and packet loss, voice and video are extremely sensitive to jitter, packet loss, and delay. In a
converged network with voice, video, and data residing on the same network, there is a huge
demand for the network infrastructure to be reliable and scalable and to offer different levels of
service for advanced technologies such as voice, video, wireless, and data.

Voice Impairment Parameters

The real-time nature of voice drives strict service-level agreements (SLAs) to be implemented in
the network. The primary voice impairment parameters are jitter, packet loss, and delay.

Packet Loss
In data networks, even if a few packets are lost during transmission, TCP ensures the
retransmission and assembly of the packets, and the user will not notice any difference. But when
transmitting voice packets across the IP backbone, the missing packets cause distortion in voice
quality on the receiving end, and retransmission of missing voice packets is useless. It is tolerable
to have occasional packet loss, but consecutive loss of voice packets can affect the overall quality
of the transmitted voice.

Jitter
Delay variation, or jitter, occurs when voice packets arrive at the destination at different time
intervals. This can happen because of the connectionless nature of IP. Depending on the
congestion and load on the network, the arrival rate of these packets at the destination may vary.
The devices on the receiving end should be capable of buffering these packets and playing them
back to the user at a consistent interframe interval. These types of devices are called dejitter
buffers. A dejitter buffer usually adds a forced delay (default 60 milliseconds [ms]) to every VoIP
packet received. Typically, this delay is in the 20 to 60 ms range. This delay is commonly called the
play out delay.

Delay
Delay is the finite amount of time it takes a packet to reach the receiving endpoint after being
transmitted from the sending endpoint. In the case of voice, this is the amount of time it takes for a
sound to travel from the speaker’s mouth to the listener’s ear. Delay (or latency) does not affect
voice fidelity. Extended network delay is perceived as echo in the conversation. Even though
network delay is not a direct cause of echo, it does amplify the perception of any echo present in
the media path. Extremely long delays can lead to "collisions" in the conversation, when both
parties seem to be speaking simultaneously.