Thursday, January 14, 2016

Web Application Data Input Validation and Easy Documenting (Flask) Routes in one Fell Swoop

One of the things that has always bugged me about input validation for web projects (in particular the one we are building at work) was that it was done inside the endpoint. Furthermore, when one does the validation programmatically inside the endpoint function, it is hard to get a good "definition" of the method -- i.e. there is no easy way to know exactly what valid inputs are. The usual way that people solve this problem is through documentation. The trouble with normal function documentation is that it can get stale very easily (it's much to easy to forget to update the documentation when updating part of the code). As such I wanted input validation to achieve two goals simultaneously:
  1. Input validation for HTTP parameters and JSON data
  2. API Documentation (that doesn't go stale)
For the sake of simplicity I will be using the python micro-framework Flask for all of the examples below, but this can be applied to pretty much any framework.

First, let's see two examples of things that are commonly done: validating HTTP parameters and validating JSON data
@route('/foo')
def foo():
    try:
        param1 = request.args['param1']
    except:
        return "param1 missing", 400
    try:
        param2 = int(request.args.get('param2'))
    except:
        return "param2 needs to be an int", 400
    # Do stuff

@route('/bar')
def bar():
    try:
        param1 = request.json['param1']
    except:
        return "param1 missing", 400
    try:
        param2 = int(request.json.get('param2'))
    except:
        return "param2 needs to be an int", 400
    # Do stuff
So, when one looks at those functions, the only way to know that param1 is required and param2 needs to be an integer is to actually look at the code. Not only is ugly and hard to maintain, it is also hard to understand.

One potential way to validate input JSON is to use JSON Schema. (There is even a flask-jsonschema project for exactly this). The pitfall with this is that it depends on the JSON Schema standard that has a couple of limitations (e.g. hard to do validation across attributes). Instead I turned to the object serialization library marshmallow that has fantastic object validation methods. So, continuing with the above examples, we can create a marshmallow schema to define the valid input parameters.

MySchema(Schema):
    param1 = fields.Str(required=True)
    param2 = fields.Int()
Now that we have a schema to help us validate the input, we need a way to actually apply it to the input. We can accomplish this by using a decorator that applies the given schema to the input parameters and/or the JSON data -- I happened to call this decorator ensure. The function is a bit long so I won't include it here, but it is available in full here. Since both routes happen to have the same validation needs, we simply need to say where to apply the schema.
@route('/foo')
@ensure(params=MySchema)
def foo():
    pass

@route('/bar')
@ensure(input=MySchema)
def bar():
    pass
While defining explicit marshmallow schemas in advance is great if the same schema is reused (as in the above example), it is a painstaking process when having to generate explicit schemas for routes that are all different. As such, with a little bit of coding, we can create Schemas on the fly after being defined as a dictionary.
def build_schema_from_dict(d, allow_nested=True):
    """Build a Marshmallow schema based on a dictionary of parameters

    :param d: The dict of parameters to use to build the Schema
    :param allow_nested: Whether or not nested schemas are allowed. If
                         ``True`` then a fields.Nested() will be created
                         when there is a nested value.
    :return: A Marshmallow schema based on the dictionary
    """
    for k, v in d.iteritems():
        if isinstance(v, tuple):
            schema = v[0]
            if len(v) > 1:
                opts = v[1]
        elif isinstance(v, dict):
            schema = v
            opts = {}
        else:
            continue

        if not allow_nested:
            raise ValueError("Nested attributes not allowed.")

        # Recursively generate the nested schema(s)
        schema = build_schema_from_dict(schema)

        # Update the current dict with the Nested schema
        d[k] = fields.Nested(schema, **opts)

    return type('Schema', (Schema, ), d)
Combining the above function build_schema_from_dict and our ensure decorator, we can achieve the following
@route('/foo')
@ensure(
    params={
        'param1': fields.Str(required=True),
        'param2': fields.Int()
    })
def foo():
    # Do stuff. Oh, and request.params is a dict that has all of the 
    # validated values from the request. yay!
    pass

@route('/bar')
@ensure(
    input={
        'param1': fields.Str(required=True),
        'param2': fields.Int()
    })
def bar():
    # Do stuff. Oh, and request.input is a dict that has all of the 
    # validated values from the request. yay!
    pass
Hurray! Now we have an easy to use way to both validate and document input by just decorating routes with the ensure decorator. Win, win!

If you look at the details of the ensure function available in full here, you will see that I add the validated data to the request object, so that it is easy to get the correctly typed data in the function

Happy coding!