Language Identification of Encrypted VoIP? Traffic: alejandra y Roberto or Alice and Bob?

Citation

Charles V. Wright, Lucas Ballard, Fabian Monrose, Gerald M. Masson (Johns Hopkins Univ), 16th USENIX Security Symposium, Boston, MA, 2007.

Abstract

Voice over IP (VoIP? ) has become a popular protocol for making phone calls over the Internet. Due to the potential transit of sensitive conversations over untrusted network infrastructure, it is well understood that the contents of a VoIP? session should be encrypted. However, we demonstrate that current cryptographic techniques do not provide adequate protection when the underlying audio is encoded using bandwidth-saving Variable Bit Rate (VBR) coders. Explicitly, we use the length of encrypted VoIP? packets to tackle the challenging task of identifying the language of the conversation. Our empirical analysis of 2,066 native speakers of 21 different languages shows that a substantial amount of information can be discerned from encrypted VoIP? traffic. For instance, our 21-way classifier achieves 66% accuracy, almost a 14-fold improvement over random guessing. For 14 of the 21 languages, the accuracy is greater than 90%. We achieve an overall binary classification (e.g., "Is this a Spanish or English conversation?") rate of 86.6%. Our analysis highlights what we believe to be interesting new privacy issues in VoIP? .

Annotations

To save bandwidth, use VBR (variable bit rate) to encode frames, i.e. using fewer bits to encode “easy” sounds and more bits for “harder” sounds.

Different languages are encoded at different bit rates.

Packet sizes can be used as a predictor of the bit rate used to encode the corresponding frame.

Current recommendations for encrypting VoIP? traffic (length-preserving stream ciphers) do not conceal the size of the messages.

Experiment data: Oregon Graduate Institute Center for Speech Learning and Understanding – 21 language telephone speech by 2066 native speakers at 8kHz.

voip-1.png

Classifier: X2 classifier. Compute intraclass and interclass variability

n-grams: n different consecutive packet lengths to estimate distribution

voip-2.png

Tri-grams

voip-3.png

4-grams with grouping.

Results: 14 out 21 languages identification accuracy 90%, overall 66%.. Binary classification 86%.

Mitigation: use constant bit rate compression codec in VoIP? , or encryption with padding to a common length block size.

Topic revision: r1 - 05 Sep 2007 - 14:14:17 - QiLiao
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback