Developing Audiobooks for Tech Topics

My history…

As a developer, I often struggled to find time to learn text-to-speech (TTS) technology, but I was determined to do it. I started with a simple project to get some hands-on experience and learned a bit of the theory along the way. However, I faced many challenges, especially with GPU resources on my computer. Despite the difficulties, I learned a lot, and I’m excited to share my experiences with you. Here is my project.

Challenge 1: Handling the model of tts

One of the biggest challenges I faced was experimenting with different models, often late into the nigth. I installed the CUDA SDK and spent hours waiting for packages to install, only to encounter errors that would break he setup. I even tried using two differetn computers to resolve the issues. Two years ago, I Spent a lot of time playing videogames and had an Nvidia GPU, but i hadn’t installed the necesary SDK at that time.

TTS models

Challenge 2: Pytorch problems

if you’ve read my previus posts, ypu’ll know I worked on a project usinf PyTorch, At the time, I hadn’t fully grasped the basics, and I encountered some challenges with the model I was using, which was facebook/mms-tts-spa. One major issue was with the Scipy library, as my audio data was in tensor format, and I wasn’t sure how to convert a tensor back to audio. After some research, I discovered another library, Torchaudio, which turned out to be more suitable for my needs. I learned the parameters required, but then faced another challenge. I didn’t have an audio program installed, so I set up PySoundFile. The Tensor I was working with had a 2D dimension, and I made a small mistake by converting it to 1D. Luckily, I realized my error and fixed it in just 2 minutes.

I took this course

Challenge 3: PDF rendering

Five months ago, I encountered a problem with rendering PDFs in web browsers on Android devices. I remembered a workaround using the UART parse method, but it wasn’t the best solution. I wanted to find a better approach, I tried Mozilla’s PDF.js, spending many hours on it. Unfortunately, it had issues on some devices, so I had to abandon it. Although I really wanted to contribute and there’s still a lot of work to be done, I eventually had to give up on using PDF.js. Instead, I decided to use another technique.

HTML Basics

Link of my project: techaudio

Offtopic

Lastly, if you have a dog, please remember to take good care of them.🐶