Inside CulturaFX: Riding the Wave of Generative AI to Shape the Future of Ethical Audio

17 min readApr 19, 2024

The TulipAI team bids a heartfelt goodbye to the exceptional FGCU software engineering students: Samantha Walsh, Andrew Krupp, Ben Castro, Erick Rodriguez, Rose Meyers, and Tayler Bachmann. P.S. Erick ingeniously added his photo afterwards!

Working at a generative AI startup is an adventure filled with peaks and valleys, much like riding a rollercoaster blindfolded — you’re never quite sure of the next twist or turn. Yet, amidst the unpredictability, the moments when groundbreaking research emerges make all the challenges worthwhile.

On April 24, at Florida Gulf Coast University’s Design Day, six software engineering students — Samantha Walsh, Andrew Krupp, Ben Castro, Erick Rodriguez, Rose Meyers, and Tayler Bachmann — will present a poster showcasing their senior project.

They conducted their research in collaboration with TulipAI’s culturaFX, focusing on the ethical sourcing of audio to enhance AI’s cultural authenticity, including genres like Mariachi music. This project is an integral component of our ongoing research and development efforts.

This research underscores the critical role of diverse datasets in developing AI that authentically reproduces culturally rich audio, boosting product quality and relevance in both commercial and public media industries.

FGCU senior software engineering student, Ben Castro, shares insights from his research on Meta’s AudioGen and MusicGen within the realm of Mariachi music.

Their efforts have tackled critical gaps in AI-generated audio, such as the lack of cultural resonance, providing insights that will guide the future development of advanced AI applications. These applications are designed to authentically reflect the rich spectrum of human experiences and traditions, thereby supporting the ongoing evolution of culturaFX

“I am a first-generation Hispanic American, born in the United States to immigrant parents; my father hails from Cuba, and my mother from Mexico,” Castro says.

“In my senior year, I had the opportunity to collaborate with TulipAI, whose cultural values and ethical standards resonated strongly with mine. I conducted research on leveraging AI models to generate mariachi music. Delving into the intricacies of Mariachi music, I aimed to develop a model capable of recognizing the instruments integral to its composition. I believed that to authentically replicate Mariachi music, the model must first master the sounds of these traditional instruments,” he adds.

Founded in June 2023, TulipAI has initiated the culturaFX initiative, which is currently in its research and development phase. This initiative is dedicated to mastering the capture and ethical utilization of global sounds for generative audio AI.

Our goal is to develop culturaFX into a leading platform for AI sound creation, equipping creators with innovative tools to generate culturally rich audio content through text prompts. This platform is designed to be adaptable across various media, including podcasts, films, games, and smart devices, providing flexible and comprehensive solutions for audio content generation.

Aligned with founder Davar Ardalan’s vision of innovation and inclusivity, we are planning to open-source our technology and introduce a revenue-sharing model with audio contributors who assist in training our model. This strategy is designed to enhance the platform’s authenticity and competitiveness in the market.

Projected to launch in late 2025 as a B2B service, with plans to expand to B2C later, culturaFX aims to satisfy the escalating demand for genuine and ethically sourced content. By collaborating with cultural experts, the platform is designed to ensure authenticity and integrity — crucial elements for success in the rapidly expanding generative AI content creation market, which is expected to swell from USD 15.2 billion in 2024 to USD 175.3 billion by 2033.

FGCU student Tayler Bachmann discusses our research on ethically sourcing cultural sounds.

While TulipAI strives to pioneer ethical sourcing of audio, it is crucial to acknowledge the reality that major industry players such as Amazon, Spotify, Google, and Meta have already established significant footholds by training their audio models on extensive datasets.

Specifically, Google researchers highlighted in a January 2023 paper the details of their MusicLM model, which was trained on approximately 280,000 hours of music from the Free Music Archive.

The MusicLM model by Google demonstrates significant advancements in generating high-fidelity music from text descriptions, yet it also reveals inherent challenges and biases. Trained on a large dataset from the Free Music Archive, MusicLM can reflect biases present in its training data.

They noted that the model’s generated samples might reflect biases present in the training data, raising critical questions about the appropriateness of using such models for music generation related to cultures underrepresented in the data and concerns about cultural appropriation and misappropriation of creative content.

Google Researchers note that MusicLM’s capability to generate culturally authentic music is limited by the diversity and representation within its training set.

As AI systems like Google’s MusicLM reveal, through findings by their own researchers, that biases in training datasets can lead to cultural misrepresentation and appropriation, industries must prioritize the development of diverse and inclusive audio datasets to mitigate these risks and to create sound that is truly moving and rich.

Guided by our TulipAI advisors and mentors, we recognize the challenges posed by biases in AI training datasets. Addressing these challenges requires strategic partnerships with organizations that possess extensive sound libraries. This strategy not only solidifies our commitment to cultural authenticity but also enhances our product offerings, enabling us to deliver meaningful content in a competitive market.

This cooperative approach provides mutual benefits, allowing our partners to improve their offerings and culturaFX to evolve, anticipating and meeting diverse user needs in an ethical, growth-oriented ecosystem.

TulipAI’s founder, Davar Ardalan, brings a rich background in blending AI with creative content generation. Her experience spans over five years in the AI sector, creating AI-driven content solutions for academia and virtual assistants for podcasts. Ardalan’s prior leadership roles at National Geographic, NPR News, and as a White House Presidential Innovation Fellow highlight her commitment to AI initiatives.

Davar Ardalan and former team members of IVOW AI, along with renowned storytellers and scholars including AI Scientist Dr. Pérez y Pérez, at the AI For Good Summit in Geneva, 2019, where they discussed the impact of artificial intelligence on storytelling and cultural heritage.

Despite her vast experience, she recognizes a commonly overlooked aspect in the realm of ethical data practices, a focus she emphasizes in her startup work. Her research on Cultural AI, which was featured in the New York Times in March 2022, highlights the role of AI in preserving cultural heritage. This commitment to ethical and culturally sensitive AI development has resonated strongly with leaders in the field, including Dr. Rafael Pérez y Pérez, a prominent AI scientist from Mexico.

In a heartfelt note to the TulipAI team, Dr. Pérez y Pérez expressed his enthusiasm for their research on mariachi music — a subject of deep personal and cultural importance to him, as his aunt was María de Lourdes Pérez López, a renowned Mexican traditional singer. “The next time you visit Mexico, you must look for my aunt’s statue in Plaza Garibaldi,” he wrote, emphasizing the significance of mariachi music in his life and the crucial role of their project in preserving such cultural expressions.

AI Scientist, Dr. Rafael Pérez y Pérez shares a treasured memory of his aunt, María de Lourdes, a celebrated traditional Mexican singer dubbed the ambassador of Mexican song by former President Miguel Alemán.

As the field of generative AI grows, acknowledging these efforts adds an invaluable layer of authenticity and support, enriching our research as we explore the potential to honor and extend rich musical traditions.

“All you have an important challenge before you, says Pérez y Pérez. “As we can see in the material that you shared with me, it is not easy to reproduce good mariachi music. Definitely, I share with you the importance of preserving the authenticity of diverse cultural expressions around the world. Particularly, nowadays.”

Ardalan, who also judges the Webby Awards in the AI category, is co-authoring “AI and Community,” due for 2025 publication with Taylor and Francis. She recently completed ‘The Sound of AI: AI Leadership Blueprint Course,’ further honing her skills in agile AI project management to ensure AI integration is responsible and culturally informed.

The startup also benefited from the insights of Reza Moradinezhad, Founding AI Scientist from Drexel University, and Aidan Singh, Machine Learning Scientist from Cornell Tech, who advance ethical AI practices by integrating advanced technology with cultural insights.

While TulipAI is in a period of transition, the foundation laid by industry leader Andy LaMora, strategic advisor Christine Johnson, and contributors from the AI Audio Council such as Scott Dudley, Hansdale Hsu, Javier Perez, and Maria Codino, remains a vital part of its journey. Their expertise has helped shape the evolution of culturaFX, underscoring TulipAI’s ongoing commitment to ethical AI development.

For those interested in a deeper dive into the technical aspects and detailed methodologies of our project, the remainder of this post will provide an extensive exploration. If you’re looking for broader insights, feel free to consider the conclusions already presented as a comprehensive summary.

FGCU Collaboration Fuels the Evolution of culturaFX

In November 2023, TulipAI enhanced its research capabilities by partnering with senior software engineering students from FGCU, led by Dr. Fernando Gonzalez, Chair of FGCU’s Computing and Software Engineering Department. This collaboration provided the students with a senior project diving into generative AI technology, enriching their education and preparing them to thrive in the evolving tech landscape.

“Mrs. Ardalan sought expertise for her project, culturaFX, and approached our university’s Software Engineering capstone program. She presented her vision to the senior students, attracting a dedicated team despite competition from numerous organizations seeking student collaborators,” explained Dr. Gonzalez.

The TulipAI team, with its roots in diverse fields such as National Geographic, NPR News, and Trustworthy AI in academia, brings a wealth of experience in AI, audio, and culture.

With culturaFX, TulipAI leveraged advanced tools such as Meta’s AudioGen and MusicGen LLMs and Hugging Face. During this collaboration, the students improved their skills through an iterative design process, working across disciplines with AI scientists, machine learning engineers, content creators, sound engineers, composers, and a cultural anthropologist. They identified key gaps in publicly available cultural audio datasets and AI audio models, which drove further research to enhance culturaFX’s capabilities.

Samantha Walsh, the product manager, played a crucial role in coordinating between stakeholders and the FGCU team. Her management of tasks and timelines facilitated smooth project progress, while her proactive participation in regular meetings to address concerns and gather feedback was vital for moving the project forward.

“Working as a product manager at TulipAI has been incredibly rewarding. Navigating through challenges, managing priorities, and meeting stringent deadlines alongside a collaborative team has been enriching,” reflected Samantha Walsh.

Davar Ardalan, commended the collaborative effort and shared future plans: “The research and development period working with the students has been outstanding. We are planning to transition culturaFX to an open-source model by next year which is essential for incorporating global perspectives and achieving a culturally rich sound in this AI era.”

The Florida Gulf Coast University (FGCU) students involved in this project are the future developers, researchers, and scientists who will be building software and designing AI systems. Their participation has provided them with invaluable lessons in iterative development, user-centered design, and ethical AI, preparing them to lead in the tech industry.

In early Fall, Ardalan teamed up with Vanessa Terry, a business cultural anthropologist, to develop a new taxonomy for culturally relevant AI-generated audio. This initiative aimed to fill a noticeable gap in current research by exploring cultural and environmental subtleties within various audio applications.

An early draft of TulipAI’s taxonomy of cultural sounds for AI dataset categorization

During this project, Terry and the TulipAI team conducted multiple workshops and strategy sessions to refine their method of identifying and documenting cultural elements in audio clips, such as language, music, and family interactions. This led to a custom taxonomy to organize dataset collection and accuracy in processing and replicating culturally rich audio content.

Following this work in December 2023, TulipAI initiated an audio dataset project focusing on two specific cultural sound categories: “Market Melodies” and “Festival Vibes.” This project, involving team members Ben Castro, Tayler Bachmann, Rose Myers, Samantha Watson, and a product manager, aimed to create a representative audio dataset from these themes.

The team gathered ambient sounds from lively markets in Mexico, China, and India, and from various cultural festivals. All audio collected was appropriately licensed under Creative Commons through freesound.org. After collection, the audio files were downloaded, stored, and systematically cataloged in TulipAI’s Freesound Audio Training Dataset Spreadsheet with extensive descriptions, metadata, tags, and URLs.

The project placed a strong emphasis on collaboration and sharing, requiring regular updates and the use of a shared workspace to enhance team communication and feedback. The deliverables include a sample audio dataset of market and festival sounds along with completed datasheets for each file.

During the Fall and Spring semester, FGCU software engineering student Ben Castro focused on capturing the lively atmospheres of Mariachi festivals using Meta’s AudioGen LLM to analyze traditional Mariachi instruments like the Vihuela, Guitarrón, and Trumpet. Two large language models were tested, AudioGen that excels in creating sound effects, and MusicGen, which is more suited for musical sounds.

This summary outlines Castro’s findings from experiments using both AudioGen and MusicGen, aiming to improve the generation of culturally accurate Mariachi music by focusing on the AI’s ability to authentically replicate the sound of a key Mariachi instrument, the Vihuela.

The Approach:
Ben Castro’s research began with an intention to teach an AI model to recognize and accurately generate the sound of the Vihuela, a fundamental instrument in Mariachi music. The premise was that understanding and replicating the instruments authentically is essential for generating true Mariachi music. Here we will focus on the Vihuela.
Findings from AudioGen Experiments:
Vihuela Playing: The model produced a mix of sounds where string-like tones were noticeable but not distinctly identifiable as a Vihuela. Other instruments such as violins or cellos were also suggested in the audio mix.
Instrument Focused Vihuela Playing: Outputs included a variety of instruments creating a Caribbean-like rhythmic foundation with some string elements, but the Vihuela was not clearly discernible.
Vihuela Playing Mariachi Music: The audio included faint guitar echoes and a noticeable drum presence, but the Vihuela was again indistinct.
Findings from MusicGen Experiments:
Vihuela Playing: The output inaccurately replicated the Vihuela, producing sounds more akin to a violin or other bowed string instruments rather than the plucked string sound typical of the Vihuela.
Instrument Focused Vihuela Playing: The model failed to produce relevant sounds, generating futuristic and unrelated audio outputs instead.
Vihuela Playing Mariachi Music: Generated audio was repetitive and string-like but did not authentically represent the Vihuela or Mariachi music.
The experiments highlighted significant challenges in getting AI models to accurately recognize and replicate the unique sounds of the Vihuela. While the models could generate string-like sounds, they consistently failed to capture the distinct characteristics of the Vihuela necessary for authentic Mariachi music. The research suggests that further model training and refinement are needed, focusing more on the nuances of Mariachi instrumentation.
Overall Outcome
Overall, MusicGen exhibits mixed success in generating audio clips that accurately represent different musical instruments and styles. While it manages to produce clear audio for instruments like the guitarrón and guitar, it often struggles to capture the distinctive sounds and nuances of instruments such as the vihuela and trumpet. Additionally, its ability to replicate specific musical styles, like mariachi music, is limited, with generated clips often lacking the depth and authenticity needed to fully embody the genre.
Despite its capabilities in generating instrument-like sounds, MusicGen falls short in capturing the richness and complexity of musical expression across various contexts, suggesting a need for further refinement and improvement in its algorithms
Comparison Between AudioGen and MusicGen
In comparing the performance of AudioGen and MusicGen in generating audio representations of various instruments and musical styles, distinct patterns emerge. AudioGen exhibits a more consistent ability to recognize and reproduce the sounds of certain instruments like the trumpet and violin, often capturing their distinctive timbres and characteristics with relative accuracy.
However, challenges persist in controlling the output and maintaining coherence, as random sounds frequently overshadow the intended musical elements. On the other hand, MusicGen demonstrates a mixed performance across different instruments, with varying degrees of success in accurately representing their sounds.
While it may produce clear audio for some instruments like the guitarrón, it often struggles with others, such as the vihuela and trumpet, generating sounds that deviate from their expected qualities. Both models encounter difficulties in replicating specific musical styles, like mariachi music, often failing to capture the depth and authenticity needed to fully embody the genre.
These findings underscore both the potential and limitations of AI in achieving authentic cultural fidelity in audio, signaling the need for continuous improvements to more accurately replicate the intricate subtleties of Mariachi music. These efforts aim to transform audio content creation and consumption, ensuring each sound authentically represents the vast diversity of human experience.

Reza Moradinezhad, TulipAI’s Founding AI Scientist and an expert in Trustworthy AI and an Associate Professor at Drexel University, also emphasized the project’s significance: “CulturaFX is a testament to the power of merging AI with cultural understanding. The dedication of the FGCU team to researching an AI product that is both technologically advanced and culturally sensitive is groundbreaking.”

By February 2024, in collaboration with the TulipAI Team, the FGCU team achieved a milestone with the creation of TulipAI’s CulturaFX clickable design prototype. This prototype exemplifies what happens when AI innovation meets audio creativity, serving as a model for a future intuitive and culturally rich platform that enhances user engagement with AI-generated soundscapes.

To see this vision taking shape, you can explore the clickable design wireframes currently available.

The aesthetic and functionality of CulturaFX were shaped by the vision of Founder Davar Ardalan, UI/UX Designer Clinton Murphy, and FGCU students, including Front End Developer Erick Rodriguez. Erick explored ways to integrate TulipAI’s branding into CulturaFX, striving to create a platform that is not only functional but also visually and culturally appealing.

A significant challenge the team faced was ensuring the platform’s accessibility to all users. Careful consideration was given to how the use of certain colors might affect individuals with visual impairments. By incorporating inclusive design principles, CulturaFX aims to be accessible to a diverse range of users, thus enhancing its impact.

Navigating Investment Challenges to Support Ethical Data Sourcing

Throughout this collaboration, the students were fully immersed in the dynamic startup culture, experiencing firsthand the highs of AI research breakthroughs and the typical challenges of startup life.

These ranged from resource allocation and strategic adjustments in response to new partnerships to development delays and communication issues exacerbated by relying solely on virtual tools like Slack. Each challenge not only tested their problem-solving skills and adaptability but also deepened their commitment to the project’s success.

Additionally, the team faced unexpected shifts in partnership dynamics, notably with Google Cloud Startup and Zazmic, as well as minimal support from Hugging Face. These challenges provided invaluable lessons and significantly impacted the project’s timeline, offering the team profound insights into the complexities of developing cutting-edge technology in a rapidly evolving field.

As culturaFX progresses in the competitive generative AI market, securing investment remains a pivotal challenge, particularly due to its commitment to ethically sourced content. The initiative to develop an AI Sound Creation Hub that emphasizes cultural authenticity and ethical practices has introduced unique financing hurdles. TulipAI has faced several setbacks, including:

Google News Initiative Pre-Launch Lab Rejection: Despite a compelling proposal to launch an ethically sourced audio platform in September 2023, TulipAI was not selected, which underscored the difficulty of penetrating well-established funding avenues with a strong ethical focus.
NSF Proposal Decline: The National Science Foundation declined a proposal for the “SBIR Phase I: Soundscapes” project. Although the commitment to cultural diversity and ethical sourcing was recognized, the proposal was criticized for its lack of competitiveness and heavy reliance on third-party technologies.

These rejections highlight the intricacies of funding innovative projects within the realm of generative AI that prioritize ethical data sourcing. They also mirror a broader market trend that favors short-term gains over investments in sustainable and socially responsible AI solutions. Despite initial investor interest, the for-profit model of culturaFX often sees a decline in enthusiasm once the long-term nature of the ethical commitments is fully understood.

To address investment challenges, TulipAI is focusing on the long-term benefits of ethical AI practices. The company collaborates with partners and investors who value advanced AI systems that enhance content with cultural depth. By promoting a globally aware ecosystem, TulipAI aims to foster a market that prioritizes ethical growth and sustainable practices, thereby attracting investments that emphasize long-term value.

Breakthroughs in AI Audio Research

CulturaFX is advancing AI technology to better capture and reflect global cultural sounds. Aidan Singh, a Machine Learning Engineer and Masters student at Cornell Tech, is central to these enhancements. His work involves refining AI models to better handle diverse cultural audio data, aiming to ensure that the technology respects and amplifies cultural authenticity.

Under Singh and Moradinezhad’s guidance, the team is exploring new data collection methods and fine-tuning techniques. These efforts are crucial for accurately replicating complex sounds, such as those in Mariachi music. Singh’s academic background enriches his practical work, enabling him to bring the latest research into play. For instance, his insights led to adopting Meta’s Audiocraft for further development and customization of the project.

Singh has also engaged with the AI Audio Council to refine culturaFX’s approach, ensuring it aligns with sound designers’ needs. This includes identifying gaps in datasets and improving the platform’s marketability by focusing on unique, culturally specific sounds.

In collaboration with Florida Gulf Coast University students, he has used sound heuristics to guide the sourcing of culturally rich audio from sites like freesound.org, and authored scripts for downloading public domain sounds from archive.gov.

Looking forward, TulipAI plans to enhance its models using prompt engineering to address specific cultural audio needs, and potentially use these insights to secure further funding and develop a proof of concept that meets the needs of its users.

Positioning TulipAI for Long-Term Success

AI systems such as Google’s MusicLM highlight the risk of cultural misrepresentation due to biases in training datasets. To counter this, it’s crucial for industries to develop diverse and inclusive audio datasets. This not only prevents cultural appropriation but also enriches the sound quality.

Stakeholders must understand that a strategic approach in development, enriched by user feedback and real-world trials, enhances both product reliability and relevance. Collaborating with technology partners and sound library companies can amplify these benefits, leveraging shared insights to refine sound technologies.

This strategic approach is essential for building a resilient and flexible growth economy in the rapidly evolving tech world, enriched by partnerships and a commitment to ethical AI practices.

Lessons for Industries Utilizing Generative AI Audio

TulipAI’s culturaFX project highlights key lessons for industries reliant on culturally rich audio, such as public media, podcast companies, gaming, and smart device manufacturers.

Ethical considerations are paramount in AI audio usage, as biases in training datasets can lead to cultural misrepresentation. To combat this, it’s crucial to develop diverse and inclusive audio datasets that authentically represent global cultures.

Moreover, engaging in new forms of collaboration, especially with entities that maintain ethically-sourced sound libraries, not only strengthens cultural sensitivity but also expands market reach by appealing to a diverse audience.

Empowering Cultural Communities Through Generative AI Audio

For cultural communities with a rich heritage of sound, the advent of generative AI audio offers both challenges and significant opportunities. These communities must be actively involved in the creation and management of AI technologies that utilize their cultural sounds, ensuring accurate and respectful representation.

There’s also a vital opportunity for these communities to establish transparent, ethical practices in AI development. By forming partnerships with ethically-minded organizations like TulipAI, communities can protect their intellectual and cultural properties, potentially creating their own LLMs to generate revenue.

Such proactive engagement not only safeguards cultural heritage but also ensures it enhances the technological landscape positively.

Advancing Ethical Practices for Sound Designers and Producers

The evolution of AI in audio production poses unique challenges and opportunities for sound designers and producers. Addressing potential cultural biases is critical, necessitating the creation of diverse audio datasets in collaboration with cultural communities.

These efforts ensure sound authenticity and involve communities in revenue-sharing from data use. Furthermore, forming partnerships provides sound professionals with advanced tools to explore new creative territories while respecting cultural integrity.

By promoting an environment where technology amplifies creativity without compromising ethics, sound designers and producers can help shape a future where AI not only replicates but enriches human creativity in an inclusive manner.

This content was created with the help of artificial intelligence, which helped organize the narrative, check grammar, and summarize important information to improve clarity and flow. Additionally, the title of the blog post has been updated, which explains why the URL name is different.

Additional Resources:

How Native Americans Are Trying to Debug A.I.'s Biases (Published 2022)

Data on Native communities are not at the levels needed for accuracy in A.I.-driven tools. A group is trying to solve…

www.nytimes.com

Recent developments in Generative AI for Audio

The spotlight has been on language and images for Generative AI, but there's been a lot of recent progress in the audio…

www.assemblyai.com

Introducing AudioCraft: A Generative AI Tool For Audio and Music | Meta

AudioCraft lets you easily generate high-quality audio and music from text.

about.fb.com

What is Generative AI Audio? Everything You Need to Know

AI Audio is reshaping sound and industry. You'll learn about text-to-speech, voice cloning, video translation, and…

elevenlabs.io

State of Generative AI for Audio in 2024

Generative AI for Audio creates audio content from textual prompts by using Tokenization, Quantization, and…

picovoice.ai