Home/Models/AI4Bharat - Fastspeech2 Model using Hybrid Segmentation (HS): Text to Speech Model

AI4Bharat - Fastspeech2 Model using Hybrid Segmentation (HS): Text to Speech Model

Text-to-speech models trained using FastPitch and HiFi-GAN vocoder, separately for each language. Supports both 'female' and 'male' voices

AI4Bharat
Nikhil_Narasimhan

About Model

This repository contains a Fastspeech2 Model for 16 Indian languages (male and female both) implemented using the Hybrid Segmentation (HS) for speech synthesis. The model is capable of generating mel-spectrograms from text inputs and can be used to synthesize speech. Fs2 is composed of 6 feed-forward Transformer blocks with multi-head self-attention and 1D convolution on both phoneme encoder and mel-spectrogram decoder.