Taukeer Alam, while speaking his language, Van Gujjari. Photo by Subhashish Panigrahi in CC BY-SA 4.0.
This post is part of Global Voices’ April 2026 Spotlight series, “Human perspectives on AI.” This series will offer insight into how AI is being used in global majority countries, how its use and implementation are affecting individual communities, what this AI experiment might mean for future generations, and more. You can support this coverage by donating here.
As the majority of the world’s languages are spoken, their speakers’ knowledge is also oral. Yet, Wikipedia and most publications require written citations.
The OpenSpeaks Archives, a digital language archive started in 2024, helps Wikimedians cite Indigenous oral knowledge. This project’s technical and educational infrastructure helps community-based language archivists document, transcribe, and archive their languages. It now hosts nearly 20 languages from India, Nepal, and Sri Lanka.
This interview series captures the stories of some of the OpenSpeaks Archives collaborators. Subhashish Panigrahi, on behalf of Rising Voices, interviewed Taukeer Alam, an Indian conservationist and a speaker of the Van Gujjari language, via voice call. Van Gujjari is a vulnerable indigenous language spoken by the Van Gujjar, a nomadic Muslim community living mostly in the northern Indian state of Uttarakhand. The following video interview with Taukeer Alam by Subhashish Panigrahi was conducted for the documentary “MarginalizedAadhaar” on OpenSpeaks Archives. It is available under the Creative Commons license BY-SA 4.0.
Rising Voices (RV): How do you see books, audio, and video for documenting your language?
Taukeer Alam (TA): Audio and video are the best mediums for documenting Van Gujjari for capturing our emotions through voice, expressions, and body language. These are very hard to capture in a book. There is another difficulty with books, especially for Indigenous languages with their own sound patterns and ways of speaking. We cannot write them fully as is: the exact way a word is pronounced, or how the sounds are released.
A word in Hindi, Van Gujjari or Punjabi might be spelled the same way, but the tone, the feeling, and the rhythm can be completely different. Audio and video show these differences clearly and feel more “pure”. In a book, any reader from any region will read the word in their own way, with their own intonation. They will not understand how that particular word is spoken in our community, how high or low the tone goes, what expression accompanies it, or what emotional weight it carries. A book is simply not as ‘alive’ as audio and video, which capture many layers of meaning beyond the words.
Rising Voices (RV): For different generations in your community, what works better: books or audio/video?
TA: Before many first generation learners in college like myself, there was no literacy in our community. Literacy did not exist for long as our people were restricted to migrate freely, they were relocated, or nomadic tribes were settled in new places. In such situation, written materials does not do very much — audio and video work better for all age groups as people cannot read. With literacy increasing among children, written material is beginning to matter. If the small stories, short narratives, or interview excerpts we collect from the community are turned into books, and used to educate children, it will become very valuable. It will help children’s contextual understanding if the knowledge comes from their own roots. It connects their education to their lived reality.
For older people, audio and video always work better, because they are not going to learn reading at this stage of life. If I convert my interview with a 60 or 70 year old into a book and give it to a 40 year old who cannot read, the book is of no use to them. For them, it will only work in interview form, whether audio or video. So different forms are needed for different generations, but in my community, that is how I see it at the moment.
RV: What do you recommend for documented material accessibility?
TA: First of all, knowledge and information should be accessible, whatever form it is in. We have to identify which mediums the community already uses and then make the material available there. If they use YouTube a lot, then interviews should be available on YouTube. If they are more inclined towards reading books, then the same content should be accessible through books.
Secondly, we need share the documentation with the community as quickly as possible. The sooner we go to the community, record, transcribe, translate, and share the material back, the more likely we are to preserve the quality of the knowledge. For example, I transcribed some folk songs recently. When I tried to translate them, I realized that for some songs, we no longer know the real meaning. People are still singing them, but those who truly understood the meaning have already passed away. Now others are singing just by hearing them, without knowing the literal meaning, the emotions behind the song, why it was sung, or in what situations it was created or performed. The full knowledge behind the song is gone.
So two things are vital: the material should be accessible in the language and format people use. Documentation of endangered languages should happen as early as possible. The longer we wait, the more the quality of knowledge reduces and vanishes.
RV: How should a language like Van Gujjari be documented, and community documentation capacity be built?
TA: Language documentation must be participatory, involving the community’s youth who represent the community’s future. They must be kept in the loop and showed the process, the methodology, what to pay attention to, and how we frame questions. If they learn this, they can later carry on the work themselves.
Secondly, the material must be accessible to the community. If it is documented but remains far away from them, then its benefit is limited. They should be able to see, hear, and use it.
Third and last, we should use reasonably good-quality equipment. If we are interviewing a 70 or 80 year old, we do not know if we will ever again meet a person that old with that knowledge. The data we capture from them is very precious; it should be of good quality so that it lasts and remains useful.
RV: Do you worry about community data misuse or exploitation through AI?
TA: Yes, I do. I worry what will happen while publishing community knowledge, which is very deep. Before sharing it anywhere, we have to think carefully. We may document and record it with good intentions, but what happens to it later? How much right will the community retain over it?
Medicinal knowledge, rituals, and other practices that are considered private or are seen as the community’s collective asset. People have lived with them, evolved them, and learned from them over thousands of years. That fear of such unique and sensitive knowledge being marketed or abused persists.
Ideally, before making such knowledge public, some kind of protection should be in place, clearly recognizing it as community knowledge. The publication should acknowledge the community’s rights, and proper consent and recognition should be needed before the content enters another domain. Then anyone using it would have to say, ‘This comes from the Van Gujjar community,’ and respect that.
Even if someone does not acknowledge it, we could still claim later that we documented and published it in, say, 2024. So it is ours and we have rights over it. Others should use it only according to the permissions given by the community. So I have no objection to documentation itself; the main questions are how and where the material is stored, and how much control and rights the community retains over it.
The interview with Alam resulted in “Maari Jaban Maari Birsa” (meaning, our language, our culture), a language documentation project at OpenSpeaks Archives in 2024.




