Research Domain Expert - AI/ML/NLP

A pioneer in AI-driven knowledge automation

Full-time | Contract

Atlanta, GA

Job description

The Role
Cantai is developing a high-fidelity DiffSinger engine. We need an end-to-end AI Audio Specialist to take raw vocal recordings and turn them into production-ready singing voice models.

You will be responsible for executing the full technical pipeline—from data strategy to final acoustic model inference—ensuring state-of-the-art output.

Core Responsibilities

End-to-End Pipeline Execution: Independently manage the full workflow: Data Prep →→ Training →→ Inference.
Advanced Data Engineering: Implement[2] and refine automated alignment tools (e.g., Montreal Forced Aligner) to label phonemes, pitch, and duration. Note: This role requires hands-on verification of data accuracy.
Model Training: Train and fine-tune DiffSinger acoustic models and variance adaptors for specific target voices.
Vocoder Integration: Train or adapt neural vocoders (HiFi-GAN, NSF, etc.) for maximum audio fidelity.
Optimization: Troubleshoot common synthesis issues (slurring, robotic pitch, metallic artifacts) and iterate on hyperparameters to fix them.

Requirements

Proven DiffSinger Experience: You must have successfully trained DiffSinger (or very similar SVS architectures) before.
Data Processing Mastery: Expert proficiency with Python, MFA (Montreal Forced Aligner), Praat, and phoneme dictionaries.
Deep Learning Stack: Strong grasp of PyTorch and GPU environment management.
Audio Domain Knowledge: You understand DSP basics (spectrograms, mel-bands, F0 estimation) and musical concepts (pitch, vibrato, timing).

Bonus Points

Experience with OpenVPI, NNSVS, or Fish Diffusion.
Background in music theory or vocal pedagogy.
Experience exporting models for production (ONNX/C++).

Job Types: Full-time, Contract

Pay: $80.00 - $150.00 per hour

Expected hours: 30 – 40 per week

Benefits:

Flexible schedule
Professional development assistance

Work Location: Remote

Apply