Language Identification of Encrypted VoIP? Traffic: alejandra y Roberto or Alice and Bob?
Citation
Charles V. Wright, Lucas Ballard, Fabian Monrose, Gerald M. Masson (Johns Hopkins Univ), 16th USENIX Security Symposium, Boston, MA, 2007.
Abstract
Voice over IP (
VoIP? ) has become a popular protocol for making phone calls over the Internet. Due to the potential transit of sensitive conversations over untrusted network infrastructure, it is well understood that the contents of a
VoIP? session should be encrypted. However, we demonstrate that current cryptographic techniques do not provide adequate protection when the underlying audio is encoded using bandwidth-saving Variable Bit Rate (VBR) coders. Explicitly, we use the length of encrypted
VoIP? packets to tackle the challenging task of identifying the language of the conversation. Our empirical analysis of 2,066 native speakers of 21 different languages shows that a substantial amount of information can be discerned from encrypted
VoIP? traffic. For instance, our 21-way classifier achieves 66% accuracy, almost a 14-fold improvement over random guessing. For 14 of the 21 languages, the accuracy is greater than 90%. We achieve an overall binary classification (e.g., "Is this a Spanish or English conversation?") rate of 86.6%. Our analysis highlights what we believe to be interesting new privacy issues in
VoIP? .
Annotations
To save bandwidth, use VBR (variable bit rate) to encode frames, i.e. using fewer bits to encode “easy” sounds and more bits for “harder” sounds.
Different languages are encoded at different bit rates.
Packet sizes can be used as a predictor of the bit rate used to encode the corresponding frame.
Current recommendations for encrypting
VoIP? traffic (length-preserving stream ciphers) do not conceal the size of the messages.
Experiment data: Oregon Graduate Institute Center for Speech Learning and Understanding – 21 language telephone speech by 2066 native speakers at 8kHz.
Classifier: X2 classifier. Compute intraclass and interclass variability
n-grams: n different consecutive packet lengths to estimate distribution
Tri-grams
|
4-grams with grouping.
|
Results: 14 out 21 languages identification accuracy 90%, overall 66%.. Binary classification 86%.
Mitigation: use constant bit rate compression codec in
VoIP? , or encryption with padding to a common length block size.