I built an AI-powered children's book generator that creates fully illustrated stories based on user input. The user provides a brief prompt outlining the desired storyline, such as:
Create a fun story about a bird who was afraid to fly.
In response, an entire illustrated book is generated!
Users can also explore and read stories created by others. All stories are public and anonymous.
Infrastructure: The Terraform and stack files used to deploy the application on a Docker Swarm cluster.
Each subfolder includes instructions for running the project locally. Setup is straightforward,
as everything has been containerized, running docker compose up is all that’s needed.
Warning
You need a good NVIDIA GPU to run this project!!.
For a more detailed overview, including screenshots, you can read the submission sent to the challenge here:
It was used to generate the logo, book cover and every page based on the prompt generated by Ollama.
The main base model utilized was juggernaut-xl, alongside a few LoRAs. They can all be found in the project workflow file.
Stack
Overview of other technologies used:
PHP / Laravel / FrankenPHP / WebSocket
VueJs / Typescript / Tailwind
Docker / Docker Swarm / Traefik / Redis
Terraform / Vultr GPU Cloud
Technical Description
The process for the book generation is as following:
The user creates a prompt, which is optional. If left blank, Ollama generates a new story independently.
A preliminary check is performed to ensure the prompt does not contain any violent or inappropriate content for children. If such content is found, the generation process is aborted immediately. (prompt).
If the user provides input, the 10 records most similar to the user’s prompt are retrieved based on their embeddings. If no input is provided, the top 10 closest embeddings to each other are retrieved. This is done to prevent the LLM from generating duplicate content. In previous tests, it often produced stories about Max and the paintbrush. With this control, if a story about a paintbrush is generated, it ensures that it is at least not related to Max.
The LLM is then instructed to create a story with at least 10 paragraphs, ensuring that it doesn’t closely resemble any of the top 10 stories already in the database. (prompt)
Once the main story is created, each paragraph on its own doesn’t provide enough context for an image generation model to maintain consistency across pages. To address this, a new prompt is given to the LLM to generate context-rich descriptions for each paragraph, including details about the story, main characters, gender, and other relevant information. (prompt)
These prompts are then sent to ComfyUI, which generates the necessary assets.
Final Thoughts
I certainly learned a lot through this experience. Until now, I hadn’t had much exposure to or understanding of vector embeddings, but now it has finally clicked.
Also, knowing that with pgai I can interact with LLMs directly in SQL gives me more ideas than I have time to execute them. I had also never tried to configure a server with NVIDIA GPUs, and now I understand many of the challenges involved. My current setup is a Docker Swarm cluster with three nodes: one for ComfyUI, Ollama, and CPU-based apps. Getting NVIDIA to run on Swarm was troublesome, but it was a valuable learning experience.
I intend to keep this demo app running until the end of the challenge, and after that, anyone curious to see it can host it on their own computer. All the Docker files and instructions on how to run it are in the github repository.
Prizes qualifications
I believe my submission qualifies for the following additional prize categories: