Futureverse, a leading AI and metaverse technology and content company, today announced the publication of research in the advancement of music AI with the launch of JEN 1, an unprecedented universal high-fidelity model for text-to-music generation.
Despite the significant advancements in AI modalities such as imagery and text, the generation of high-fidelity and realistic music continues to present distinct and complex challenges. Through today’s publication of its research paper, Futureverse, details an exceedingly efficient approach that demonstrates higher-quality outputs than the state-of-the-art baselines (such as MusicLM and MusicGen) previously released by Google and Facebook.
“JEN 1: TEXT-GUIDED UNIVERSAL MUSIC GENERATION WITH OMNIDIRECTIONAL DIFFUSION MODELS” written by Dr. Alex Wang, Patrick Li, Boyu Chen, Yao Yao, Allen Wang and Yikai Wang of Futureverse’s Altered State Machine innovation team details the intricacy of sound; noting that audio spans a wide frequency range, from low to high pitches, unlike speech, which has a more limited spectrum and explains that music, therefore, requires higher sampling rates (41.1kHz) compared to linguistic content (16kHz) to capture subtleties. This leads to increased data processing requirements for training. Older or less sophisticated audio quality can significantly affect the output negatively. Additionally, the combination of multiple instruments and complex melodic arrangements result in intricate sonic structures that necessitate refined training objectives.
JEN 1 is extensively analyzed against state-of-the-art baselines across objective metrics both computationally and through human evaluations made up of an advisory board of respected music industry A&Rs to be announced soon. Results demonstrate that JEN 1 produces music of perceptually higher quality (85.7/100) compared to current best methods (83.8/100). Futureverse has released a variety of early audio demos showcasing JEN 1’s approach.
“Human sensitivity to musical dissonance demands high precision in music generation. We have been working deeply in this space for the last two years. We are incredibly proud to publish a first look into our team’s significant progress in the advancement of music AI that will benefit creators and progress in the music industry,” said Shara Senderoff and Aaron McDonald, Co-Founders of Futureverse.
Futureverse Co-Founder, Shara Senderoff is a staple in the music technology space. Prior to launching Futureverse, Senderoff co-founded music tech venture fund, Raised In Space, with industry titan Scooter Braun and has been a prominent thought-leader for the innovation of music copyrights, royalties and revenue streams through approaches in blockchain and web3. Shara has been featured on Billboard’s “40 Under 40,” esteemed “Women in Music,” and Rolling Stone’s “Future 25.” McDonald, whose career includes 20 years in technology as an engineer, product developer, and business leader with portfolios worth over $1B, has focused on building Futureverse to empower developers and users to create and engage with interoperable content and applications previously unavailable within the metaverse. Together, Senderoff and McDonald have been investing in the web3 and blockchain space for over six years.
The intersection of text and music, known as text-to-music generation, offers valuable capabilities to bridge free-form text prompts and musical compositions. The introduction of JEN 1 is the very beginning of what Futureverse is tackling as advancement in this space moves at the speed of light. JEN 1 performs various generation tasks including text-guided music generation, music inpainting and continuation while generating high-fidelity 48kHz stereo audio and maintaining computational efficiency. The speed of development in this space can’t go unnoticed.