Siuba

Latest version: v0.4.4

Safety actively analyzes 628903 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 6

0.0.18

Fixes

* filter now preserves column order, rather than moving grouping columns to left (205)
* symbolic representations now correctly align on keywords (222)

Features

* **sql** supports custom join conditions via sql_on (202)
* **siuba.series.spec** now includes all Series methods, even unsupported ones (209)
* the spec also now is derived from the file `siuba/series/spec.yml` (211)
* siu **Symbolic** is no longer falsey (210)
* added new verb **top_n** (222)
* added vector functions **ceil_date and floor_date** to siuba.experimental.datetime (222)

QA

* re-enabled testing of example jupyter notebooks (206)

0.0.17

Fixes

* added more fast grouped method tests, and fixed fast summarize (197)

Features

* support prop argument in fct_lump (195)

0.0.16

Fixes

* if_else doesn't try to coerce to new type at end (179)
* removed psycopg2 dependency (causes install to fail if user does not have postgres) 189

0.0.15

Fixes nest raising the error "TypeError: copy() takes no keyword arguments". Nest now uses a more principled approach to splitting a grouped DataFrame, and creating a list of sub frames! (see 182)

Also fixed doc build, by not trying to run notebooks starting with `draft-`. (186)

0.0.14

New Feature: support user defined functions (146)

* Support for user defined functions (UDFs). Note that these require annotating the return type. For more on the theory behind these see [ADR-003](https://github.com/machow/siuba/blob/master/examples/architecture/003-fast-mutate.ipynb).

python
from siuba.siu import symbolic_dispatch
from pandas.core.groupby import SeriesGroupBy, GroupBy
from pandas import Series

symbolic_dispatch(cls = Series)
def cummean(x):
"""Return a same-length array, containing the cumulative mean."""
return x.expanding().mean()


cummean.register(SeriesGroupBy)
def _cummean_grouped(x) -> SeriesGroupBy:
grouper = x.grouper
n_entries = x.obj.notna().groupby(grouper).cumsum()

res = x.cumsum() / n_entries

return res.groupby(grouper)

from siuba import _, mutate
from siuba.data import mtcars

a pandas DataFrameGroupBy object
g_cyl = mtcars.groupby("cyl")

mutate(g_students, cumul_mean = cummean(_.score))



* Support for many methods in vector.py, using UDFs (158)

Bug Fixes

* Fix regression where .str wasn't being removed when processing siu expressions for SQL (159)
* Grouped filter now preserves order
* Verbs now tested to preserve original index (https://github.com/machow/siuba/commit/d938ab323e080832af8274f330c6562cf9b447b0)

Tests

* Add many more versions of python and pandas to travis CI test matrix (161)

0.0.13

**Features**

* Implementation of fast mutate, filter, and summarize using CallTreeLocal (134). For even just a couple thousand groups, the fast methods are close to optimal hand-written pandas, and the slow versions are almost 1000x slower :o.
* fixed current grouped pandas mutate to preserve row order (139)
* laid down tests of all supported series methods, currently skipping SQL backends (but ready to go!)
* put up some very basic documentation (145)
* wrote an ADR on the rational for fast groupby (135)

Note that CallTreeLocal has new options, allowing it to look up based on chained attributes (e.g. look for an entry named "dt.year", and override custom function calls.).

I still need to finish support for user defined operations and some light siu refactoring.

**Breaking changes**

* Removed the rm_attr argument from CallTreeLocal, since converting subattrs like `dt.year` will consume `dt` anyway (can't imagine a situation where we'd want to keep it, and couldn't do that in the translator function)

**Demo**

python
from siuba.experimental.pd_groups import fast_mutate, fast_filter, fast_summarize
from siuba import *
from siuba.data import mtcars

g_cars = mtcars.groupby(['cyl', 'gear'])

fast_mutate(g_cars, _.hp - _.hp.mean())

Page 5 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.