Processing File Uploads with IBM Cloud Functions

Matt Hamilton - Jul 18 '20 - - Dev Community

I've been doing a lot with IBM Cloud Functions recently. So far I've always been passing up simple query string values or json-encoded data. But what if you want to upload an actual binary file? The usual way, in the web world, is to use the multipart/form-data content type.

But how does Apache Openwhisk (that IBM Cloud Functions are based on) handle this?

If you set the function to raw type, rather than getting the JSON-decoded values passed to your function as a dictionary, you get a dictionary that gives you the raw data. The data itself if base64 encoded in __ow_body and the headers in __ow_headers.

There are several ways to process this in Python. The requests_toolbelt package has a MultipartDecoder class. But I'm going to use one from the cgi library that is included in Python itself:

from cgi import parse_multipart, parse_header
from io import BytesIO
from base64 import b64decode

def main(args):
    c_type, p_dict = parse_header(args['__ow_headers']['content-type'])
    decoded_string = b64decode(args['__ow_body'])
    p_dict['boundary'] = bytes(p_dict['boundary'], "utf-8")
    p_dict['CONTENT-LENGTH'] = len(decoded_string)
    form_data = parse_multipart(BytesIO(decoded_string), p_dict)

    # Do something with the data. In this simple example
    # we will just return a dict with the part name and 
    # length of the content of that part
    ret = {}
    for key, value in form_data.items():
        ret[key] = len(value[0])

    return ret
Enter fullscreen mode Exit fullscreen mode

The parse_multipart method expects to find a header called CONTENT-LENGTH so we need to add that in to our headers before passing it to the parser.

To create this cloud function, we need to set --web raw flag to tell Openwhisk not to try and parse the payload as JSON for us:

% ibmcloud fn action create upload upload.py --web raw
Enter fullscreen mode Exit fullscreen mode

We can then get the URL for it:

% ic fn action get upload --url
ok: got action upload
https://eu-gb.functions.appdomain.cloud/api/v1/web/1d0ffa5a-835d-4c40-ac80-77ca4a35f028/upload
Enter fullscreen mode Exit fullscreen mode

And we can then call it using cURL and upload some data to it. In this case we are passing both a simple string value (foo) and the contents of a binary WAV audio file:

% curl -F id=foo -F audio=@test.wav  https://eu-gb.functions.appdomain.cloud/api/v1/web/1d0ffa5a-835d-4c40-ac80-77ca4a35f028/upload.json   
{
  "audio": 3840102,
  "id": 3
}                    
Enter fullscreen mode Exit fullscreen mode

It seems that IBM Cloud functions has a limit of 5MB for the payload. I guess anything larger than this and you'd be better to upload it to IBM Cloud Object Storage (COS) then pass the URL of that uploaded item to the cloud function.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .