Go Simple Example: Generate Audio Stories with Google Gemini, TTS, and Cloudflare R2

SeongKuk Han - Feb 27 - - Dev Community

I'm currently working on a side project about language learning. The main features include generating content with AI and converting text into audio files. To store the audio files, I also need cloud storage.

Cost was my main priority because I figured switching between cloud platforms wouldn't be too difficult.

In the end, I chose Google Gemini, Google TTS, and Cloudflare R2. They provide API documentation and examples, but I found some parts lacking, so I decided to write a post about it. I used Go, and this covers only basic usage.

For Google Gemini and TTS, I use the RESTful API. Although they provide a library, I found using the RESTful API more convenient than setting up the library.

  1. Google Gemini – Send a prompt and receive a response.
  2. Google TTS – Send text and receive an audio file.
  3. Cloudflare R2 – Store an audio file in binary format in Cloudflare.
  4. Final Code

1.Google Gemini

package api

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"

    "github.com/spf13/viper"
)

type Part struct {
    Text string `json:"text"`
}

type Content struct {
    Parts []Part `json:"parts"`
}

type Candidates struct {
    Content Content `json:"content"`
}

type PromptResult struct {
    Candidates []Candidates `json:"candidates"`
}

func Prompt(prompt string) (*PromptResult, error) {
    // I use viper to manage environment variables, you can replace this with your api key.
    url := fmt.Sprintf("https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=%s", viper.Get("GOOGLE_CLOUD_API_KEY"))

    // In this example, it sends only one prompt but you can send more information
    data, err := json.Marshal(map[string]interface{}{
        "contents": []map[string]interface{}{{
            "parts": []map[string]interface{}{{
                "text": prompt,
            }},
        }},
    })

    if err != nil {
        return nil, err
    }

    req, err := http.NewRequest("POST", url, bytes.NewBuffer(data))

    if err != nil {
        return nil, err
    }

    req.Header.Add("Content-Type", "application/json")

    // Request the API
    res, err := http.DefaultClient.Do(req)

    if err != nil {
        return nil, err
    }

    defer res.Body.Close()

    resBody, err := io.ReadAll(res.Body)

    if err != nil {
        return nil, err
    }

    var promptResult PromptResult

    // Parse the result
    err = json.Unmarshal(resBody, &promptResult)

    if err != nil {
        return nil, err
    }

    return &promptResult, nil
}
Enter fullscreen mode Exit fullscreen mode
package main

import (
    "fmt"

    "github.com/hsk-kr/tutorial/lib/api"
    "github.com/spf13/viper"
)

func main() {
    viper.SetConfigFile(".env")
    viper.ReadInConfig()

    promptResult, _ := api.Prompt("Generate a short story for kids")

    fmt.Println(promptResult.Candidates[0].Content.Parts[0].Text)
}
Enter fullscreen mode Exit fullscreen mode
% go run main.go
Barnaby Bumble, a fuzzy, striped bee with a wobbly stinger, was known throughout Honeycomb Hollow for one thing: he was terribly afraid of heights.

“Buzz off, Barnaby!” the other young bees would tease, zooming past him as they practiced their loop-de-loops around the tallest sunflower stalks. Barnaby would cling tightly to the petals of a daisy, his tiny heart thumping like a hummingbird's wings.

He longed to fly high. He dreamed of seeing the whole meadow spread out below him, a carpet of shimmering colors. But every time he tried to climb a little higher, a dizzy feeling would overwhelm him, and he’d tumble back down, buzzing with fear.

One sunny morning, Mrs. Higgins, the wise old queen bee, announced a very important task. “The Queen Clover is blooming!” she declared. “Her nectar is extra sweet and good for making the best honey. But she’s blooming on the very highest hill, atop the tallest thistle! Someone brave and strong must bring her nectar back to the hive.”

All the young bees buzzed excitedly, eager to volunteer. Barnaby, however, shrunk back, his stripes seeming to fade to gray. He knew he couldn't possibly fly that high.

But then, he saw little Penelope Petal, a tiny bee with a torn wing. She looked longingly at the queen, but her wing flapped weakly. Penelope was too small and hurt to make the journey.

Barnaby felt a surge of courage. He knew he couldn’t let Penelope down, and he knew how important the Queen Clover nectar was. Taking a deep breath, he buzzed forward.

"Mrs. Higgins," he stammered, "I... I want to try."

Mrs. Higgins smiled kindly. "Are you sure, Barnaby? It's a long way up."

"I'll do my best," he promised, his voice trembling only a little.

He took off, his wings beating harder than ever. The air rushed past him, and his head began to spin. He looked down and saw the hive shrinking below. Fear prickled his antennae.

But then, he thought of Penelope and the delicious honey they could make. He focused on the top of the thistle, a tiny purple dot in the distance.

He flew on, one wing beat at a time. He rested on fluffy clouds of milkweed seeds, took tiny sips of dew, and told himself, "Just a little further, Barnaby. Just a little further."

Finally, after what seemed like forever, he reached the top of the thistle. There, bathed in sunshine, was the Queen Clover, her petals glistening with sweet nectar. Barnaby carefully collected the precious liquid into his pollen baskets.

The journey back was easier. He was filled with a sense of accomplishment, and the fear had almost completely vanished. He even managed to do a little wiggle in the air, just for fun!

When Barnaby landed at the hive, he was greeted with cheers. Penelope Petal buzzed around him, her eyes shining with gratitude. Mrs. Higgins beamed.

"Barnaby Bumble," she declared, "you are braver and stronger than you know! You not only brought us the nectar of the Queen Clover, but you also showed us that even the smallest bee can overcome their biggest fears."

From that day on, Barnaby Bumble was no longer known for being afraid of heights. He was known for his courage, his kindness, and the most delicious Queen Clover honey in all of Honeycomb Hollow. And every now and then, you might even see him doing a little loop-de-loop around the tallest sunflower.
Enter fullscreen mode Exit fullscreen mode

You can find more details about the API here: https://ai.google.dev/gemini-api/docs.

In this example, a single prompt is sent, and a response is received. The response contains more than just text, but for simplicity, I defined the struct to handle only the text and printed it to the console.

I use Viper to manage environment variables, but you can test the code by replacing the API key with your own.


2. Google TTS

package api

import (
    "bytes"
    "encoding/base64"
    "encoding/json"
    "errors"
    "fmt"
    "io"
    "net/http"
    "strings"

    "github.com/spf13/viper"
)

type Voice struct {
    LanguageCodes          []string `json:"languageCodes"`
    Name                   string   `json:"name"`
    SsmlGender             Gender   `json:"ssmlGender"` // "MALE" or "FEMALE"
    NaturalSampleRateHertz int      `json:"naturalSampleRateHertz"`
}

type VoiceSelectionParam struct {
    LanguageCode string `json:"languageCode"`
    Name         string `json:"name"`
    SsmlGender   Gender `json:"ssmlGender"`
}

type VoicesResponse struct {
    Voices []Voice `json:"voices"`
}

type AudioEncoding string

type Gender string

const (
    MALE   Gender = "MALE"
    FEMALE Gender = "FEMALE"
)

const (
    LINEAR16 AudioEncoding = "LINEAR16"
    MP3      AudioEncoding = "MP3"
    OGG_OPUS AudioEncoding = "OGG_OPUS"
    MULAW    AudioEncoding = "MULAW"
    ALAW     AudioEncoding = "ALAW "
)

func convertVoiceToVoiceSelectionParam(voice Voice) (*VoiceSelectionParam, error) {
    voiceSelectionParam := new(VoiceSelectionParam)

    if voice.LanguageCodes == nil || len(voice.LanguageCodes) <= 0 {
        return nil, errors.New("Empty LanguageCodes")
    }

    voiceSelectionParam.LanguageCode = voice.LanguageCodes[0]
    voiceSelectionParam.Name = voice.Name
    voiceSelectionParam.SsmlGender = voice.SsmlGender

    return voiceSelectionParam, nil
}

func GetVoiceList(languageCode string) ([]Voice, error) {
    url := fmt.Sprintf("https://texttospeech.googleapis.com/v1/voices?languageCode=%s&key=%s", languageCode, viper.GetString("GOOGLE_CLOUD_API_KEY"))

    req, err := http.NewRequest("GET", url, nil)

    if err != nil {
        return nil, err
    }

    req.Header.Add("Content-Type", "application/json")

    res, err := http.DefaultClient.Do(req)

    if err != nil {
        return nil, err
    }

    defer res.Body.Close()

    resBody, err := io.ReadAll(res.Body)

    if err != nil {
        return nil, err
    }

    var voicesRes VoicesResponse
    err = json.Unmarshal(resBody, &voicesRes)

    if err != nil {
        return nil, err
    }

    return voicesRes.Voices, nil
}

func ConvertTextToAudio(input string, voice Voice) ([]byte, error) {
    url := fmt.Sprintf("https://texttospeech.googleapis.com/v1/text:synthesize?key=%s", viper.GetString("GOOGLE_CLOUD_API_KEY"))

    voiceSelectionParam, err := convertVoiceToVoiceSelectionParam(voice)
    if err != nil {
        return nil, err
    }

    data, err := json.Marshal(map[string]interface{}{
        "input":       map[string]string{"text": input},
        "voice":       voiceSelectionParam,
        "audioConfig": map[string]string{"audioEncoding": string(OGG_OPUS)},
    })

    if err != nil {
        return nil, err
    }

    req, err := http.NewRequest("POST", url, bytes.NewBuffer(data))

    if err != nil {
        return nil, err
    }

    req.Header.Add("Content-Type", "application/json")

    res, err := http.DefaultClient.Do(req)

    if err != nil {
        return nil, err
    }

    defer res.Body.Close()

    body, err := io.ReadAll(res.Body)
    if err != nil {
        return nil, err
    }

    var result map[string]interface{}
    if err := json.Unmarshal(body, &result); err != nil {
        return nil, err
    }

    audioContent, ok := result["audioContent"].(string)
    if !ok {
        return nil, fmt.Errorf("No audio content found in response")
    }

    audioData, err := base64.StdEncoding.DecodeString(audioContent)
    if err != nil {
        return nil, err
    }

    return audioData, nil
}

func getFirstXVoice(voices []Voice, strToFind string, gender Gender) *Voice {
    for i, v := range voices {
        if strings.Contains(strings.ToLower(v.Name), strToFind) && v.SsmlGender == gender {
            return &voices[i]
        }
    }

    return nil
}

func GetFirstStandardVoice(voices []Voice, gender Gender) *Voice {
    return getFirstXVoice(voices, "standard", gender)
}

func GetFirstWavenetVoice(voices []Voice, gender Gender) *Voice {
    return getFirstXVoice(voices, "wavenet", gender)
}

func GetFirstNeuralVoice(voices []Voice, gender Gender) *Voice {
    return getFirstXVoice(voices, "neural", gender)
}
Enter fullscreen mode Exit fullscreen mode
package main

import (
    "fmt"
    "os"

    "github.com/hsk-kr/tutorial/lib/api"
    "github.com/spf13/viper"
)

func main() {
    viper.SetConfigFile(".env")
    viper.ReadInConfig()

    voices, _ := api.GetVoiceList("en-US")
    voice := api.GetFirstWavenetVoice(voices, "MALE")
    bAudio, _ := api.ConvertTextToAudio("By the way, I am using neovim.", *voice)

    os.WriteFile("./audio.opus", bAudio, 0644)
}
Enter fullscreen mode Exit fullscreen mode

After running the program, you will find the audio file in the same directory, named audio.opus.

There are three functions:

  • GetVoiceList – Retrieves the list of voices supported by the API.
  • GetFirstWavenetVoice – As of February 27, 2025, there are three types of voices: wavenet, standard, and neural. Each type includes multiple voices, but since that’s not a priority for me, I created a function to simply get the first voice of a given type.
  • ConvertTextToAudio – Takes text and a voice as parameters and returns the result as []byte. The function requests the audio file in OGG_OPUS format since I plan to use it in a web environment. However, you can use any supported format.

If you follow the documentation, you might notice that it lacks some important details. For example, I came across the audio_encoding parameter and wanted to check which formats were supported, but there was no direct link to that information. I think the documentation could be improved—when I searched for links to the audio encoding documentation, I found none, just plain black text. Eventually, I managed to find the document by searching manually at the top of the documentation.


3. Cloudflare R2

package api

import (
    "bytes"
    "context"
    "errors"
    "fmt"
    "time"

    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/credentials"
    "github.com/aws/aws-sdk-go-v2/feature/s3/manager"
    "github.com/aws/aws-sdk-go-v2/service/s3"
    "github.com/google/uuid"
    "github.com/spf13/viper"
)

type Storage struct {
    client     *s3.Client
    uploader   *manager.Uploader
    bucketName string
}

func (s *Storage) Init() error {
    cfg, err := config.LoadDefaultConfig(context.TODO(),
        config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(viper.GetString("CLOUDFLARE_R2_ACCESS_KEY_ID"), viper.GetString("CLOUDFLARE_R2_SECRET_ACCESS_KEY"), "")),
        config.WithRegion("auto"),
    )
    if err != nil {
        return err
    }

    s.bucketName = viper.GetString("CLOUDFLARE_R2_BUCKET_NAME")
    s.client = s3.NewFromConfig(cfg, func(o *s3.Options) {
        o.BaseEndpoint = aws.String(fmt.Sprintf("https://%s.r2.cloudflarestorage.com", viper.GetString("CLOUDFLARE_R2_ACCOUNT_ID")))
    })

    s.uploader = manager.NewUploader(s.client)
    return nil
}

func (s *Storage) Put(data []byte) (string, error) {
    if s.client == nil {
        return "", errors.New("client is nil")
    }

    objectKey, err := uuid.NewUUID()
    if err != nil {
        return "", err
    }

    bucket := aws.String(s.bucketName)
    key := aws.String(objectKey.String())
    ctx := context.Background()

    input := &s3.PutObjectInput{
        Bucket: bucket,
        Key:    key,
        Body:   bytes.NewReader(data),
    }
    output, err := s.uploader.Upload(ctx, input)
    if err != nil {
        return "", err
    }

    err = s3.NewObjectExistsWaiter(s.client).Wait(ctx, &s3.HeadObjectInput{
        Bucket: bucket,
        Key:    key,
    }, time.Minute)

    if err != nil {
        return "", err
    }

    return *output.Key, nil
}
Enter fullscreen mode Exit fullscreen mode
package main

import (
    "github.com/tutorial/justsayit/lib/api"
    "github.com/spf13/viper"
)

func main() {
    viper.SetConfigFile(".env")
    viper.ReadInConfig()

    voices, _ := api.GetVoiceList("en-US")
    voice := api.GetFirstWavenetVoice(voices, "MALE")
    bAudio, _ := api.ConvertTextToAudio("By the way, I am using neovim.", *voice)

    storage := new(api.Storage)

    storage.Init()
    storage.Put(bAudio)
}
Enter fullscreen mode Exit fullscreen mode

Cloudflare R2 is comparable to Amazon S3, so you can use its API with the AWS S3 library.

You can find more examples in the AWS documentation, https://docs.aws.amazon.com/code-library/latest/ug/go_2_s3_code_examples.html.

Since creating some instances is required before uploading an object to the cloud, I defined the necessary methods in a struct.

After running the program, you should see the object successfully uploaded to Cloudflare.

uploaded object

To test whether the audio file loads correctly in a web environment, I placed the object URL in an tag—but it didn’t work. It turns out that even though I had set the file to be publicly accessible, I still needed to use a proxy to access it. This makes sense from a security perspective, as access must be explicitly allowed in the settings.

If you want to access the files temporarily for testing purposes, you can enable dev mode and use the dev link.


Final Result

package main

import (
    "github.com/tutorial/justsayit/lib/api"
    "github.com/spf13/viper"
)

func main() {
    viper.SetConfigFile(".env")
    viper.ReadInConfig()

    promptResult, _ := api.Prompt("Say something short in German")
    generatedText := promptResult.Candidates[0].Content.Parts[0].Text

    voices, _ := api.GetVoiceList("de-DE")
    voice := api.GetFirstWavenetVoice(voices, "MALE")
    bAudio, _ := api.ConvertTextToAudio(generatedText, *voice)

    storage := new(api.Storage)
    storage.Init()
    storage.Put(bAudio)
}
Enter fullscreen mode Exit fullscreen mode

Here is the final code: generating text using Google Gemini, converting the text into an audio file, and storing the audio file in a cloud platform.


I hope you find this helpful.

Happy Coding!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .