Jump to content

Generative audio

From Wikipedia, the free encyclopedia
Audio curves[relevant?]

Generative audio refers to the creation of audio files from databases of audio clips.[citation needed] This technology differs from synthesized voices such as Apple's Siri or Amazon's Alexa, which use a collection of fragments that are stitched together on demand.

Generative audio works by using neural networks to learn the statistical properties of an audio source, then reproduces those properties.[1]

Implications

[edit]

With this technology, a person's voice can be replicated to speak phrases that they may have never spoken. This could lead to a synthetic version of a public figure's voice being used against them.[2]

Technology

[edit]

This method uses a generative adversarial network (GAN), a deep machine learning technique where two machine learning models work against each other to create realistic audio.[3]

See also

[edit]

References

[edit]
  1. ^ "Fake news: you ain't seen nothing yet". The Economist. July 2017. Retrieved 2017-07-01.
  2. ^ Zotkin, D. N.; Shamma, S. A.; Ru, P.; Duraiswami, R.; Davis, L. S. (April 2003). "Pitch and timbre manipulations using cortical representation of sound". 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Vol. 5. pp. V–517–20. doi:10.1109/ICASSP.2003.1200020. ISBN 978-0-7803-7663-2. S2CID 10372569.
  3. ^ Mobin, Shariq (October 2016). "Voice Conversion using Convolutional Neural Networks". arXiv:1610.08927 [stat.ML].