An app to create a storybook/flipbook using LLM and stable diffusion models. You tell it a rough story and it will generate a story with illustrations.
This project uses an LLM to generate a story and a diffusion model to generate image illustrations. It's a very simplified use case of AI models to demonstrate how easy it is to integrate CloudFlare AI on apps. This project uses two models,
@hf/thebloke/mistral-7b-instruct-v0.1-awq to generate the story and extracting a suitable illustration description
@cf/stabilityai/stable-diffusion-xl-base-1.0 to generate the illustration
This is not the first time I worked with CloudFlare Workers, but this is the first time I tried the Workers AI services. I have to say the models are plenty and the inference time is quite fast too. With easy integration using Cloudflare AI SDK and easy deployment using wrangler, my experience with developing this app has been really fun.
The idea behind this app is to create a “benchmark” how easy it is to use CloudFlare AI compared to other services? So, I created the simplest app possible I can imagine, (1) create stories using LLM and (2) create illustrations using those stories.
To achieve this, I used two AI models:
Multiple Models
Text Generation: @hf/thebloke/mistral-7b-instruct-v0.1-awq to generate the story and extracting a suitable illustration description.
Text-to-Image: @cf/stabilityai/stable-diffusion-xl-base-1.0 to generate the illustration
I built the web app using Vite, React, and Mantine and the backend using CloudFlare Page Functions. Overall, the creation and integration process was smooth and I couldn’t appreciate enough how wrangler makes it easy to deploy the web app.
What I Learned
The CloudFlare platform has grown into a much larger and complete ecosystem, offering developers a complete set of tools to build and scale their apps. With great developer experience offered by wrangler, developing apps using CloudFlare is really a bliss.
However, I also faced some limitation with the Mistral model, which are the limited number of tokens. Sometimes the generated stories are incomplete, but we can solve this by tuning the prompt to include the maximum number of sentences per story.
Another room for improvement is to implement story continuation, in case the previous response is incomplete or trying different art style to make the illustration more appealing.