Extension of ITU-T P.863 for multidimensional assessment of degradations in telephony speech signals up to fullband

Publisher:
International Telecommunication Union-T

Document status:

Active

Format:

Electronic (PDF)

Delivery time:

10 minutes

Delivery time (for Russian version):

200 business days

SKU:

This Recommendation describes models whose purpose is to predict perceptual dimensions of degradations linked to overall speech quality in narrowband (NB) to fullband (FB) telecommunication scenarios. The models provide more detailed information about individual quality dimensions as additional information to the ITU-T P.863 overall mean opinion score (MOS). Perceptual dimensions of degradations may originate from all speech processing components usually considered for telecommunications in clean and noisy conditions. The models predict these dimensions as they are assessed in a listening-only test context, in accordance with Annex A.

In contrast to [ITU-T P.863], the models described in this Recommendation show only one operational mode, in which degraded speech samples are scored against an FB reference signal and predict the perceptual dimension scores on a corresponding scale. The models provide an estimation of the colouration, discontinuity, noisiness and sub-optimum loudness of the degraded speech sample. These four dimensions are not identical to the seven dimensions listed in [ITU-T P.806]. Instead, the four dimensions predicted by the models described in this Recommendation target subjective judgements obtained in a test carried out according to Annex A.

The term telecommunication scenario mentioned in the first paragraph covers all transmission technologies in current:

– Public switched networks (e.g., fixed wire public switched telephone network (PSTN), global system for mobile communications, wideband (WB) code division multiple access, code division multiple access (CDMA), voice over long-term evolution and voice over new radio);

Tables 1 to 4 of [ITU-T P.863] list test factors, coding technologies and applications to which that Recommendation applies, either in the sense that they have been included in the requirement specification and have been tested accordingly, that they are not intended to be used or that further investigation or validation is necessary. Unless specified otherwise in this Recommendation, the limitations in [ITU-T P.863] also apply to the models specified in this Recommendation, as they are partially based on internal parameters of the ITU-T P.863 model.

– Push-over-cellular, voice over Internet protocol (VoIP) and PSTN-to-VoIP interconnections, terrestrial trunked radio; and

– Commonly used speech processing components (e.g., coder–decoders (codecs), noise reduction systems, adaptive gain control, comfort noise and other types of voice enhancement devices) and their combinations.

In addition to the commonly used ITU-T and ETSI speech codecs, other coding technologies, as specified by the 3rd Generation Partnership Project 2 and used in CDMA networks, have been considered in the training and selection data. Furthermore, codecs used in broadcasting services with speech-based contents have also been taken into account, e.g., Moving Picture Experts Group-1 audio layer 3 (MP3) or advanced audio coding.

Other technologies or components such as speech storage formats or non-telephony applications such as public safety networks or professional mobile radio connections have not been assessed for the described models, and thus lie outside the scope of this Recommendation.

The consideration of the acoustical path to and from (acoustical insertion and acoustical capturing) an actually used terminal may affect the colouration or noisiness of the degraded signal, and is foreseen by the described model. The score targeted by the prediction of the model stems from a diotical presentation of a monosignal, meaning that the same signal is played at each ear in the listening context that the model tries to predict.

Dimensions of speech quality that cannot be assessed in a listening-only context, such as conversational aspects and talking quality, lie outside the scope of this Recommendation. The described model considers noises and their influence on perceptual quality dimensions in a listening-only context similar to the one described in [ITU-T P.800] (test cabinet specifications, etc.). The prediction of quality as it can be perceived in a noisy listening environment and the related binaural effects lie outside the scope of this Recommendation.

Non-steady, fluctuating noises can be seen as degradations on the discontinuity scale.

NOTE – Examples of non-steady, fluctuating noises are footsteps and beeps as from an alarm clock. While a human listener can recognize those noises as natural, the models described in this Recommendation recognize them as interrupted and count them as degradations on the discontinuity scale. This is an immanent problem of the models described in this Recommendation that have no knowledge about the original background noise.

As is the case for [b-ITU-T P.862] and [ITU-T P.863], the approach of the models described in this Recommendation is called "full-reference" or "double-ended", which means that the quality prediction is based on the comparison between an undistorted reference signal and the received signal to be scored.

Edition :	24#
File Size :	1 file
Number of Pages :	33
Published :	05/01/2024

ITU-T P.863.2 PDF

Extension of ITU-T P.863 for multidimensional assessment of degradations in telephony speech signals up to fullband

History

Related products

Best-Selling Products