Part 4: Services

All external dependencies (like filesystems, network clients, database connections, etc.) should be provided to transformations as a service. It allows great flexibility, including the ability to test your transformations isolated from the external world, and being friendly to the infrastructure guys (and if you’re one of them, it’s also nice to treat yourself well).

In the last section, we used the fs service to access filesystems, we’ll go even further by switching our requests call to use the http service, so we can switch the requests session at runtime. We’ll use it to add an http cache, which is a great thing to avoid hammering a remote API.

Default services

As a default, Bonobo provides only two services:

Overriding services

You can override the default services, or define your own services, by providing a dictionary to the services= argument of bonobo.run:

import requests

def get_services():
    http = requests.Session()
    http.headers = {'User-Agent': 'Monkeys!'}
    return {
        'http': http
    }

Switching requests to use the service

Let’s replace the requests.get call we used in the first steps to use the http service:

from bonobo.config import use

@use('http')
def extract_fablabs(http):
    yield from http.get(FABLABS_API_URL).json().get('records')

Tadaa, done! You’re not anymore tied to a specific implementation, but to whatever requests compatible object the user want to provide.

Adding cache

Let’s demonstrate the flexibility of this approach by adding some local cache for HTTP requests, to avoid hammering the API endpoint as we run our tests.

First, let’s install requests-cache:

$ pip install requests-cache

Then, let’s switch the implementation, conditionally.

def get_services(use_cache=False):
    if use_cache:
        from requests_cache import CachedSession
        http = CachedSession('http.cache')
    else:
        import requests
        http = requests.Session()

    return {
        'http': http
    }

Then in the main block, let’s add support for a –use-cache argument:

if __name__ == '__main__':
    parser = bonobo.get_argument_parser()
    parser.add_argument('--use-cache', action='store_true', default=False)

    with bonobo.parse_args(parser) as options:
        bonobo.run(get_graph(**options), services=get_services(**options))

And you’re done! Now, you can switch from using or not the cache using the –use-cache argument in command line when running your job.

Moving forward

You now know:

  • How to use builtin service implementations
  • How to override a service
  • How to define your own service
  • How to tune the default argument parser

It’s now time to jump to Part 5: Projects and Packaging.