================
Welcome to Flupy
================

flupy is a lightweight library and CLI for implementing python data pipelines with a fluent interface.


Under the hood, flupy is built on generators. That means its pipelines evaluate lazily and use a constant amount of memory no matter how much data are being processed. This allows flupy to tackle Petabyte scale data manipulation as easily as it operates on a small list.

API
===
::

    import json
    from flupy import flu

    logs = open('logs.jl', 'r')

    error_count = (
        flu(logs)
        .map(lambda x: json.loads(x))
        .filter(lambda x: x['level'] == 'ERROR')
        .count()
    )

    print(error_count)
    # 14


CLI
===

The flupy library, and python runtime, are also accessible from `flu` command line utility::

    $ cat logs.txt | flu "_.filter(lambda x: x.startswith('ERROR'))"


For more information about the `flu` command see :doc:`command line <./cli>`.


Getting Started
===============

**Requirements**

Python 3.6+

**Installation**
::

    $ pip install flupy


Example
=======

Since 2008, what domains are our customers comming from?::


    from flupy import flu

    customers = [
        {'name': 'Jane', 'signup_year': 2018, 'email': 'jane@ibm.com'},
        {'name': 'Fred', 'signup_year': 2011, 'email': 'fred@google.com'},
        {'name': 'Lisa', 'signup_year': 2014, 'email': 'jane@ibm.com'},
        {'name': 'Jack', 'signup_year': 2007, 'email': 'jane@apple.com'},
    ]

    pipeline = (
        flu(customers)
        .filter(lambda x: x['signup_year'] > 2008)
        .map_item('email')
        .map(lambda x: x.partition('@')[2])
        .group_by() # defaults to identity
        .map(lambda x: (x[0], x[1].count()))
        .collect()
    )

    print(pipeline)
    # [('google.com', 1), ('ibm.com', 2)]