Anything Can be Used for File Storage if You Use it Wrongly Enough

Nick Maloney - Jul 31 '23 - - Dev Community

I recently came across the most excellent blog post “Anything can be a message queue if you use it wrongly enough” (It is a fun read and I highly recommend checking it out). After reading it, it dawned on me that I recently created an abomination that essentially did the inverse, I used a “message queue” (if you would please indulge me in considering Redis one) as a semi-ephemeral file storage service. Please do not use this for production, you have been warned!

In building out a proof-of-concept for some fun with Machine Learning (ML) audio processing, I needed a quick, easy way to store processed files. One of the challenges was that files required multiple processing steps across different services so local file storage was not the best option. The ideal solution would be to use something like S3, but given this was purely a proof of concept (POC) and being a lazy developer, I was looking for something simple and easy. The processing was happening via the most excellent Dramatiq library, which uses Redis. This got me thinking that Redis is actually pretty well suited for this use case; it stores binary data well, is relatively fast, is quite stable AND by using expiration param in SET it would handle house cleaning of old files.

For the sake of this example, we’ll create a simple Flask application that handles POST requests for file uploads and saves them to Redis, using a sha256 of the payload as the key:

from flask import Flask, request, Response, send_file
import base64
import datetime
import hashlib
import logging
import magic
import mimetypes
import re
import redis

app = Flask(__name__)
app.logger.setLevel(logging.DEBUG)

r = redis.Redis(host='localhost', port=6379, db=0)

@app.route("/", methods=["POST"])
def create() -> Response:
  file = request.files['file']
  encoded_file = base64.b64encode(file.read())
  key = hashlib.sha256(encoded_file).hexdigest()
  expiration_time = datetime.timedelta(hours=24)
  r.set(key, encoded_file, ex=int(expiration_time.total_seconds()))
  return str(key)
Enter fullscreen mode Exit fullscreen mode

The example above will save the data to Redis and return the key. It sets the file to expire in 24 hours.

To retrieve the files, we’ll use a few utilities for attempting to determine mime-type/extension of the saved files and use Flask’s send_file method for the response.

Using that endpoint, GET requests made to /sha_of_file will return the file, with the appropriate mime-type/extension.

@app.route("/<key>", methods=["GET"])
def show(key) -> Response:
  encoded_file = r.get(key)
  if encoded_file is None:
    return "File not found", 404
  decoded_file = base64.b64decode(encoded_file)
  mimetype = magic.from_buffer(decoded_file, mime=True)
  file_extension = mimetypes.guess_extension(mimetype)
  temp_filename = f'{key}.{file_extension}'

  with open(temp_filename, 'wb') as temp_file:
    temp_file.write(decoded_file)

  return send_file(temp_filename, mimetype=mimetype, as_attachment=True)
Enter fullscreen mode Exit fullscreen mode

…and there you have it. This ended up working quite well for an audio processing pipeline, quickly and easily being able to send audio files exceeding 100mb. The Dramatiq library has the concept of storing results between tasks baked in, but doesn’t solve for sending results back over the wire to the browser. Again, do not use this in a production app, but I would absolutely use this technique again for future POC projects.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .