This is a very interesting paper where the authors attempt to improve amateur singers to a professional level using machine learning techniques. I see less work of machine learning in the audio realm and so papers that tackle something related to sound always stand out for me. A Variational Auto Encoder is used on training data that consists of amateur and professional singers. This process works by encoding a recording of singing to a smaller latent representation and then expanding that representation to then replicate the original recording. During this process of training the encoder part they feed in the pitch as a value and then decode with pitch also fed in.
By feeding the pitch into the auto encoder separately this can be used to alter the pitch of recordings after an auto encoder is trained. This way they can take the recording of amateur singing and then feed it the correct pitches for the song they were attempting to sing. This allows a recording to sound professional while largely maintaining the overall sound of the original singer.
This process of replicating original songs using an auto encoder also allows them to use some unlabeled data in training without pitch values. This gives the model more practice in replicating existing songs without the need for labelled data.
Learning the Beauty in Songs: Neural Singing Voice Beautifier
Jinglin Liu and Chengxi Li and Yi Ren and Zhiying Zhu and Zhou Zhao