Bonobo

Module:bonobo

Contains all the tools you need to get started with the framework, including (but not limited to) generic transformations, readers, writers, and tools for writing and executing graphs and jobs.

All objects in this module are considered very safe to use, and backward compatibility when moving up from one version to another is maximal.

class Graph(*chain)[source]

Bases: object

Represents a directed graph of nodes.

add_chain(*nodes, _input=<Begin>, _output=None, _name=None)[source]

Add a chain in this graph.

add_node(c)[source]

Add a node without connections in this graph and returns its index.

copy()[source]
outputs_of(idx, create=False)[source]

Get a set of the outputs for a given node index.

graphviz
name = ''
topologically_sorted_indexes

Iterate in topological order, based on networkx’s topological_sort() function.

class CsvReader(*args, **kwargs)[source]

Bases: bonobo.nodes.io.file.FileReader, bonobo.nodes.io.csv.CsvHandler

Reads a CSV and yield the values as dicts.

Parameters:
  • path (str) – Path to use within the provided filesystem.
  • eol (str) –

    Character to use as line separator.

    Default: ‘n’

  • encoding (str) –

    Encoding.

    Default: ‘utf-8’

  • fs (str) –

    The filesystem instance to use.

    Default: ‘fs’

  • mode (str) –

    What mode to use for open() call.

    Default: ‘r’

  • output_fields (ensure_tuple) – Specify the field names of output lines. Mutually exclusive with “output_type”.
  • output_type – Specify the type of output lines. Mutually exclusive with “output_fields”.
  • delimiter (str) –
  • quotechar (str) –
  • escapechar (str) –
  • doublequote (str) –
  • skipinitialspace (str) –
  • lineterminator (str) –
  • quoting (int) –
  • headers
  • fields (ensure_tuple) –
  • skip (int) – If set and greater than zero, the reader will skip this amount of lines.
  • reader_factory

    Builds the CSV reader, a.k.a an object we can iterate, each iteration giving one line of fields, as an iterable.

    Defaults to builtin csv.reader(…), but can be overriden to fit your special needs.

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

read(file, context, *, fs)[source]

Write a row on the next line of given file. Prefix is used for newlines.

reader_factory

Builds the CSV reader, a.k.a an object we can iterate, each iteration giving one line of fields, as an iterable.

Defaults to builtin csv.reader(…), but can be overriden to fit your special needs.

skip

If set and greater than zero, the reader will skip this amount of lines.

class CsvWriter(*args, **kwargs)[source]

Bases: bonobo.nodes.io.file.FileWriter, bonobo.nodes.io.csv.CsvHandler

Parameters:
  • path (str) – Path to use within the provided filesystem.
  • writer_factory

    Builds the CSV writer, a.k.a an object we can pass a field collection to be written as one line in the target file.

    Defaults to builtin csv.writer(…).writerow, but can be overriden to fit your special needs.

  • eol (str) –

    Character to use as line separator.

    Default: ‘n’

  • encoding (str) –

    Encoding.

    Default: ‘utf-8’

  • fs (str) –

    The filesystem instance to use.

    Default: ‘fs’

  • mode (str) –

    What mode to use for open() call.

    Default: ‘w+’

  • delimiter (str) –
  • quotechar (str) –
  • escapechar (str) –
  • doublequote (str) –
  • skipinitialspace (str) –
  • lineterminator (str) –
  • quoting (int) –
  • headers
  • fields (ensure_tuple) –

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

write(file, context, *values, fs)[source]

Write a row on the next line of opened file in context.

writer_factory

Builds the CSV writer, a.k.a an object we can pass a field collection to be written as one line in the target file.

Defaults to builtin csv.writer(…).writerow, but can be overriden to fit your special needs.

class FileReader(*args, **kwargs)[source]

Bases: bonobo.nodes.io.base.Reader, bonobo.nodes.io.base.FileHandler

Component factory for file-like readers.

On its own, it can be used to read a file and yield one row per line, trimming the “eol” character at the end if present. Extending it is usually the right way to create more specific file readers (like json, csv, etc.)
Parameters:
  • path (str) – Path to use within the provided filesystem.
  • eol (str) –

    Character to use as line separator.

    Default: ‘n’

  • encoding (str) –

    Encoding.

    Default: ‘utf-8’

  • fs (str) –

    The filesystem instance to use.

    Default: ‘fs’

  • mode (str) –

    What mode to use for open() call.

    Default: ‘r’

  • output_fields (ensure_tuple) – Specify the field names of output lines. Mutually exclusive with “output_type”.
  • output_type – Specify the type of output lines. Mutually exclusive with “output_fields”.

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

read(file, *, fs)[source]

Write a row on the next line of given file. Prefix is used for newlines.

mode

What mode to use for open() call.

Default: ‘r’

output
output_fields

Specify the field names of output lines. Mutually exclusive with “output_type”.

output_type

Specify the type of output lines. Mutually exclusive with “output_fields”.

class FileWriter(*args, **kwargs)[source]

Bases: bonobo.nodes.io.base.Writer, bonobo.nodes.io.base.FileHandler

Component factory for file or file-like writers.

On its own, it can be used to write in a file one line per row that comes into this component. Extending it is usually the right way to create more specific file writers (like json, csv, etc.)
Parameters:
  • path (str) – Path to use within the provided filesystem.
  • eol (str) –

    Character to use as line separator.

    Default: ‘n’

  • encoding (str) –

    Encoding.

    Default: ‘utf-8’

  • fs (str) –

    The filesystem instance to use.

    Default: ‘fs’

  • mode (str) –

    What mode to use for open() call.

    Default: ‘w+’

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

write(file, context, line, *, fs)[source]

Write a row on the next line of opened file in context.

mode

What mode to use for open() call.

Default: ‘w+’

class Filter(*args, **kwargs)[source]

Bases: bonobo.config.configurables.Configurable

Filter out hashes from the stream depending on the filter callable return value, when called with the

current hash as parameter.

Can be used as a decorator on a filter callable.

filter

A callable used to filter lines.

If the callable returns a true-ish value, the input will be passed unmodified to the next items.

Otherwise, it’ll be burnt.

Parameters:filter

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

filter
class FixedWindow(*args, **kwargs)[source]

Bases: bonobo.config.configurables.Configurable

Transformation factory to create fixed windows of inputs, as lists.

For example, if the input is successively 1, 2, 3, 4, etc. and you pass it through a FixedWindow(2), you’ll get lists of elements 2 by 2: [1, 2], [3, 4], …
Parameters:length (int) –

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

buffer
length
class JsonReader(*args, **kwargs)[source]

Bases: bonobo.nodes.io.json.JsonHandler, bonobo.nodes.io.file.FileReader

Parameters:
  • path (str) – Path to use within the provided filesystem.
  • eol (str) –

    Character to use as line separator.

    Default: ‘n’

  • encoding (str) –

    Encoding.

    Default: ‘utf-8’

  • fs (str) –

    The filesystem instance to use.

    Default: ‘fs’

  • mode (str) –

    What mode to use for open() call.

    Default: ‘r’

  • output_fields (ensure_tuple) – Specify the field names of output lines. Mutually exclusive with “output_type”.
  • output_type – Specify the type of output lines. Mutually exclusive with “output_fields”.
  • loader

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

read(file, *, fs)[source]

Write a row on the next line of given file. Prefix is used for newlines.

loader
class JsonWriter(*args, **kwargs)[source]

Bases: bonobo.nodes.io.json.JsonHandler, bonobo.nodes.io.file.FileWriter

Parameters:
  • path (str) – Path to use within the provided filesystem.
  • eol (str) –

    Character to use as line separator.

    Default: ‘n’

  • encoding (str) –

    Encoding.

    Default: ‘utf-8’

  • fs (str) –

    The filesystem instance to use.

    Default: ‘fs’

  • mode (str) –

    What mode to use for open() call.

    Default: ‘w+’

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

write(file, context, *args, fs)[source]

Write a json row on the next line of file pointed by ctx.file.

Parameters:
  • ctx
  • row
envelope
class LdjsonReader(*args, **kwargs)[source]

Bases: bonobo.nodes.io.json.LdjsonHandler, bonobo.nodes.io.json.JsonReader

Read a stream of line-delimited JSON objects (one object per line).

Not to be mistaken with JSON-LD (where LD stands for linked data).
Parameters:
  • path (str) – Path to use within the provided filesystem.
  • eol (str) –

    Character to use as line separator.

    Default: ‘n’

  • encoding (str) –

    Encoding.

    Default: ‘utf-8’

  • fs (str) –

    The filesystem instance to use.

    Default: ‘fs’

  • mode (str) –

    What mode to use for open() call.

    Default: ‘r’

  • output_fields (ensure_tuple) – Specify the field names of output lines. Mutually exclusive with “output_type”.
  • output_type – Specify the type of output lines. Mutually exclusive with “output_fields”.
  • loader

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

read(file, *, fs)[source]

Write a row on the next line of given file. Prefix is used for newlines.

class LdjsonWriter(*args, **kwargs)[source]

Bases: bonobo.nodes.io.json.LdjsonHandler, bonobo.nodes.io.json.JsonWriter

Write a stream of Line-delimited JSON objects (one object per line).

Not to be mistaken with JSON-LD (where LD stands for linked data).
Parameters:
  • path (str) – Path to use within the provided filesystem.
  • eol (str) –

    Character to use as line separator.

    Default: ‘n’

  • encoding (str) –

    Encoding.

    Default: ‘utf-8’

  • fs (str) –

    The filesystem instance to use.

    Default: ‘fs’

  • mode (str) –

    What mode to use for open() call.

    Default: ‘w+’

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

class Limit(*args, **kwargs)[source]

Bases: bonobo.config.configurables.Configurable

Creates a Limit() node, that will only let go through the first n rows (defined by the limit option), unmodified.

limit

Number of rows to let go through.

TODO: simplify into a closure building factory?

Parameters:limit

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

counter
limit
class PickleReader(*args, **kwargs)[source]

Bases: bonobo.nodes.io.file.FileReader, bonobo.nodes.io.pickle.PickleHandler

Reads a Python pickle object and yields the items in dicts.

Parameters:
  • path (str) – Path to use within the provided filesystem.
  • eol (str) –

    Character to use as line separator.

    Default: ‘n’

  • encoding (str) –

    Encoding.

    Default: ‘utf-8’

  • fs (str) –

    The filesystem instance to use.

    Default: ‘fs’

  • output_fields (ensure_tuple) – Specify the field names of output lines. Mutually exclusive with “output_type”.
  • output_type – Specify the type of output lines. Mutually exclusive with “output_fields”.
  • fields (tuple) –
  • mode (str) –

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

read(file, context, *, fs)[source]

Write a row on the next line of given file. Prefix is used for newlines.

mode
class PickleWriter(*args, **kwargs)[source]

Bases: bonobo.nodes.io.file.FileWriter, bonobo.nodes.io.pickle.PickleHandler

Parameters:
  • path (str) – Path to use within the provided filesystem.
  • eol (str) –

    Character to use as line separator.

    Default: ‘n’

  • encoding (str) –

    Encoding.

    Default: ‘utf-8’

  • fs (str) –

    The filesystem instance to use.

    Default: ‘fs’

  • fields (tuple) –
  • mode (str) –

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

write(file, context, item, *, fs)[source]

Write a pickled item to the opened file.

mode
class PrettyPrinter(*args, **kwargs)[source]

Bases: bonobo.config.configurables.Configurable

Parameters:
  • filter

    A filter that determine what to print.

    Default is to ignore any key starting with an underscore and none values.

  • max_width (int) –

    If set, truncates the output values longer than this to this width.

    Default: 80

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

format_console(index, key, value, *, fields=None)[source]
format_quiet(index, key, value, *, fields=None)[source]
print_console(context, *args, **kwargs)[source]
print_jupyter(context, *args)[source]
print_quiet(context, *args, **kwargs)[source]
context
filter

A filter that determine what to print.

Default is to ignore any key starting with an underscore and none values.

max_width

If set, truncates the output values longer than this to this width.

Default: 80

class RateLimited(*args, **kwargs)[source]

Bases: bonobo.config.configurables.Configurable

Parameters:
  • handler
  • initial (int) –
  • period (int) –
  • amount (int) –

Custom instance builder. If not all options are fulfilled, will return a PartiallyConfigured instance which is just a functools.partial object that behaves like a Configurable instance.

The special _final argument can be used to force final instance to be created, or an error raised if options are missing.

Parameters:
  • args
  • _final – bool
  • kwargs
Returns:

Configurable or PartiallyConfigured

amount
bucket
handler
initial
period
run(graph, *, plugins=None, services=None, strategy=None)[source]

Main entry point of bonobo. It takes a graph and creates all the necessary plumbing around to execute it.

The only necessary argument is a Graph instance, containing the logic you actually want to execute.

By default, this graph will be executed using the “threadpool” strategy: each graph node will be wrapped in a thread, and executed in a loop until there is no more input to this node.

You can provide plugins factory objects in the plugins list, this function will add the necessary plugins for interactive console execution and jupyter notebook execution if it detects correctly that it runs in this context.

You’ll probably want to provide a services dictionary mapping service names to service instances.

Parameters:
  • graph (Graph) – The Graph to execute.
  • strategy (str) – The bonobo.execution.strategies.base.Strategy to use.
  • plugins (list) – The list of plugins to enhance execution.
  • services (dict) – The implementations of services this graph will use.
Return bonobo.execution.graph.GraphExecutionContext:
 
inspect(graph, *, plugins=None, services=None, strategy=None, format)[source]
create_strategy(name=None)[source]

Create a strategy, or just returns it if it’s already one.

Parameters:name
Returns:Strategy
open_fs(fs_url=None, *args, **kwargs)[source]

Wraps fs.opener.registry.Registry.open_fs, with default to local current working directory and expanding ~ in path.

Parameters:
  • fs_url (str) – A filesystem URL
  • parse_result (ParseResult) – A parsed filesystem URL.
  • writeable (bool) – True if the filesystem must be writeable.
  • create (bool) – True if the filesystem should be created if it does not exist.
  • cwd (str) – The current working directory (generally only relevant for OS filesystems).
  • default_protocol (str) – The protocol to use if one is not supplied in the FS URL (defaults to "osfs").
Returns:

fs.base.FS object

Format(**formats)[source]
OrderFields(fields)[source]

Transformation factory to reorder fields in a data stream.

Parameters:fields
Returns:callable
Rename(**translations)[source]
SetFields(fields)[source]

Transformation factory that sets the field names on first iteration, without touching the values.

Parameters:fields
Returns:callable
Tee(f)[source]
UnpackItems(*items, fields=None, defaults=None)[source]
>>> UnpackItems(0)
Parameters:
  • items
  • fields
  • defaults
Returns:

callable

count(counter)[source]
identity(x)[source]
noop(*args, **kwargs)[source]
get_examples_path(*pathsegments)[source]
open_examples_fs(*pathsegments)[source]
get_argument_parser(parser=None)[source]

Creates an argument parser with arguments to override the system environment.

Api:bonobo.get_argument_parser
Parameters:_parser
Returns:
parse_args(mixed=None)[source]

Context manager to extract and apply environment related options from the provided argparser result.

A dictionnary with unknown options will be yielded, so the remaining options can be used by the caller.

Api:bonobo.patch_environ
Parameters:mixed – ArgumentParser instance, Namespace, or dict.
Returns: