HP (Hewlett-Packard) RM500SL Manuale Utente

0611

RM500SL User’s Guide Version 2.8

Page 87

19.5 Speech signal analysis

FastFacts 19.5: Speech signal analysis

One of the most-used measures of a speech signal is the long-term average
speech spectrum (LTASS). This is a 1/3 octave spectrum averaged over a
sufficiently long portion of the speech material to provide a stable curve. In
practice a 10 second average meets this requirement and, for this reason, all
RM500SL passages are at least 10 seconds long.

The dynamic nature of speech is often characterized by the distribution of short-
term levels in each 1/3 octave band. These levels are determined by calculating a
spectrum for each of a series of short time periods within the passage. Historically,
time periods of 120, 125 or 128 ms have been used. The RM500SL uses a 128
ms time period, resulting in 100 levels (or samples) in each 1/3 octave band for a
12.8 second passage. The level in each band that is exceeded by 1% of the
samples (called either the 1

or 99

percentile) has historically been referred to as

the speech peak for that band. The curve for these 1% levels is approximately 12
dB above the LTASS. The level in each band that is exceeded by 70% of the
samples (called either the 70

or 30

percentile) has historically been called the

valley of speech for that band. The curve for these 70% levels is approximately 18
dB below the LTASS. The region between these two curves is often called the
speech region, speech envelope or speech “banana”. The speech envelope, when
derived in this way, has significance in terms of both speech detection and speech
understanding. Generally, speech will be detectable if the 1 % level is at or near
threshold. The Speech Intelligibility Index (SII) is maximized when the entire
speech envelope (idealized as a 30 dB range) is above (masked) threshold. This
will not be an SII of 100% (or 1) because of loudness distortion factors, but higher
SII values will not produce significantly higher scores on most test material. The
speech-reception threshold (SRT) is attained when the LTASS is at threshold
(approximately - depending on test material and the individual)