Indian Flag
Government Of India
A-
A
A+

Bhashini - Fastspeech2 Model using (HS)

Text-to-speech models trained using FastPitch and HiFi-GAN vocoder, separately for each language. Supports both 'female' and 'male' voices.

  • Digital India BHASHINI Division
    Digital India BHASHINI Division
  • BHASHINI_shailendra
    BHASHINI_shailendra

About Model

This repository contains a Fastspeech2 Model for 16 Indian languages (male and female both) implemented using the Hybrid Segmentation (HS) for speech synthesis. The model is capable of generating mel-spectrograms from text inputs and can be used to synthesize speech. 
Fs2 is composed of 6 feed-forward Transformer blocks with multi-head self-attention and 1D convolution on both phoneme encoder and mel-spectrogram decoder. In each feed-forward Transformer, the hidden size of multi-head attention is set to 256 and the number of head is set to 2. The kernel size of 1D convolution in the two-layer convolution network is set to 9 and 1, and the input/output size of the number of channels in the first and the second layer is 256/1024 and 1024/256. The duration predictor and variance adaptor, which are composed of stacks of several convolution networks and the final linear projection layer. The convolution layers of the duration predictor and variance adaptor are set to 2 and 5, the kernel size is set to 3, the input/output size of all layers is 256/256, and the dropout rate is set to 0.5.

Bhashini - Fastspeech2 Model using (HS)

Metadata Metadata

MIT

IIT Madras

Speech Synthesis (TTS) Model

Open

Digital India BHASHINI Division

Sector Agnostic

05/03/25 15:22:37

Admin

286.72 MB

Activity Overview Activity Overview

  • Downloads 10
  • Views 216
  • File Size 286.72 MB

Tags Tags

  • Multilingual
  • NLP
  • Text Processing
  • Transformer
  • Text to Speech
  • Language Detection

License Control License Control

MIT

Version Control Version Control

FolderVersion 2(286.72 MB)
  • admin·1 month(s) ago
    • .zip
      Fastspeech2_HS-main.zip