Koalas

Latest version: v1.8.2

Safety actively analyzes 629723 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 9

0.30.0

Slice column selection support in loc

We continue to improve `loc` indexer and added the slice column selection support (1351).

python
>>> from databricks import koalas as ks
>>> df = ks.DataFrame({'a':list('abcdefghij'), 'b':list('abcdefghij'), 'c': range(10)})
>>> df.loc[:, "b":"c"]
b c
0 a 0
1 b 1
2 c 2
3 d 3
4 e 4
5 f 5
6 g 6
7 h 7
8 i 8
9 j 9

Slice row selection support in loc for multi-index

We also added the support of slice as row selection in `loc` indexer for multi-index (1344).

python
>>> from databricks import koalas as ks
>>> import pandas as pd
>>> df = ks.DataFrame({'a': range(3)}, index=pd.MultiIndex.from_tuples([("a", "b"), ("a", "c"), ("b", "d")]))
>>> df.loc[("a", "c"): "b"]
a
a c 1
b d 2

Slice row selection support in iloc

We continued to improve `iloc` indexer to support iterable indexes as row selection (1338).

python
>>> from databricks import koalas as ks
>>> df = ks.DataFrame({'a':list('abcdefghij'), 'b':list('abcdefghij')})
>>> df.iloc[[-1, 1, 2, 3]]
a b
1 b b
2 c c
3 d d
9 j j

Support of setting values via loc and iloc at Series

Now, we added the basic support of setting values via `loc` and `iloc` at Series (1367).

python
>>> from databricks import koalas as ks
>>> kser = ks.Series([1, 2, 3], index=["cobra", "viper", "sidewinder"])
>>> kser.loc[kser % 2 == 1] = -kser
>>> kser
cobra -1
viper 2
sidewinder -3

Other new features and improvements

We added the following new feature:

DataFrame:

- `take` (1292)
- `eval` (1359)

Series:

- `dot` (1136)
- `take` (1357)
- `combine_first` (1290)

Index:

- `droplevel` (1340)
- `union` (1348)
- `take` (1357)
- `asof` (1350)

MultiIndex:

- `droplevel` (1340)
- `unique` (1342)
- `union` (1348)
- `take` (1357)

Other improvements

- Compute Index.is_monotonic/Index.is_monotonic_decreasing in a distributed manner (1354)
- Fix SeriesGroupBy.apply() to respect various output (1339)
- Add the support for operations between different DataFrames in groupby() (1321)
- Explicitly don't support to disable numeric_only in stats APIs at DataFrame (1343)
- Fix index operator against Series and Frame to use iloc conditionally (1336)
- Make nunique in DataFrame to return a Koalas DataFrame instead of pandas' (1347)
- Fix MultiIndex.drop() to follow renaming et al. (1356)
- Add column axis in ks.concat (1349)
- Fix iloc for Series when the series is modified. (1368)
- Support MultiIndex for duplicated, drop_duplicates. (1363)

0.29.0

Slice support in `iloc`

We improved `iloc` indexer to support slice as row selection. (1335)

For example,

py
>>> kdf = ks.DataFrame({'a':list('abcdefghij')})
>>> kdf
a
0 a
1 b
2 c
3 d
4 e
5 f
6 g
7 h
8 i
9 j
>>> kdf.iloc[2:5]
a
2 c
3 d
4 e
>>> kdf.iloc[2:-3:2]
a
2 c
4 e
6 g
>>> kdf.iloc[5:]
a
5 f
6 g
7 h
8 i
9 j
>>> kdf.iloc[5:2]
Empty DataFrame
Columns: [a]
Index: []

Documentation

We added links to the previous talks in our document. (1319)

You can see a lot of useful talks from the previous events and we will keep updated.

https://koalas.readthedocs.io/en/latest/getting_started/videos.html

Other new features and improvements

We added the following new feature:

DataFrame:
- `stack` (1329)

Series:

- `repeat` (1328)

Index:

- `difference` (1325)
- `repeat` (1328)

MultiIndex:

- `difference` (1325)
- `repeat` (1328)

Other improvements

- DataFrame.pivot should preserve the original index names. (1316)
- Fix _LocIndexerLike to handle a Series from index. (1315)
- Support MultiIndex in DataFrame.unstack. (1322)
- Support Spark UDT when converting from/to pandas DataFrame/Series. (1324)
- Allow negative numbers for head. (1330)
- Return a Koalas series instead of pandas' in stats APIs at Koalas DataFrame (1333)

0.28.0

pandas 1.0 support

We added pandas 1.0 support (1197, 1299), and Koalas now can work with pandas 1.0.

map_in_pandas

We implemented `DataFrame.map_in_pandas` API (1276) so Koalas can allow any arbitrary function with pandas DataFrame against Koalas DataFrame. See the example below:

python
>>> import databricks.koalas as ks
>>> df = ks.DataFrame({'A': range(2000), 'B': range(2000)})
>>> def query_func(pdf):
... num = 1995
... return pdf.query('A > num')
...
>>> df.map_in_pandas(query_func)
A B
1996 1996 1996
1997 1997 1997
1998 1998 1998
1999 1999 1999

Standardize code style using Black

As a development only change, we added [Black](https://github.com/psf/black) integration (#1301). Now, all code style is standardized automatically via running `./dev/reformat`, and the style is checked as a part of `./dev/lint-python`.

Other new features and improvements

We added the following new feature:

DataFrame:

- `query` (1273)
- `unstack` (1295)

Other improvements
- Fix `DataFrame.describe()` to support multi-index columns. (1279)
- Add util function validate_bool_kwarg (1281)
- Rename data columns prior to filter to make sure the column names are as expected. (1283)
- Add an faq about Structured Streaming. (1298)
- Let extra options have higher priority to allow workarounds (1296)
- Implement 'keep' parameter for ``drop_duplicates`` (1303)
- Add a note when type hint is provided to DataFrame.apply (1310)
- Add a util method to verify temporary column names. (1262)

0.27.0

`head` ordering

Since Koalas doesn't guarantee the row ordering, `head` could return some rows from distributed partition and the result is not deterministic, which might confuse users.

We added a configuration `compute.ordered_head` (1231), and if it is set to `True`, Koalas performs natural ordering beforehand and the result will be the same as pandas'.
The default value is `False` because the ordering will cause a performance overhead.

py
>>> kdf = ks.DataFrame({'a': range(10)})
>>> pdf = kdf.to_pandas()
>>> pdf.head(3)
a
0 0
1 1
2 2

>>> kdf.head(3)
a
5 5
6 6
7 7
>>> kdf.head(3)
a
0 0
1 1
2 2

>>> ks.options.compute.ordered_head = True
>>> kdf.head(3)
a
0 0
1 1
2 2
>>> kdf.head(3)
a
0 0
1 1
2 2

GitHub Actions

We started trying to use GitHub Actions for CI. (1254, 1265, 1264, 1267, 1269)

Other new features and improvements

We added the following new feature:

DataFrame:
- apply (1259)

Other improvements

- Fix identical and equals for the comparison between the same object. (1220)
- Select the series correctly in SeriesGroupBy APIs (1224)
- Fixes `DataFrame/Series.clip` function to preserve its index. (1232)
- Throw a better exception in `DataFrame.sort_values` when multi-index column is used (1238)
- Fix `fillna` not to change index values. (1241)
- Fix `DataFrame.__setitem__` with tuple-named Series. (1245)
- Fix `corr` to support multi-index columns. (1246)
- Fix output of `print()` matches with pandas of Series (1250)
- Fix fillna to support partial column index for multi-index columns. (1244)
- Add as_index check logic to groupby parameter (1253)
- Raising NotImplementedError for elements that actually are not implemented. (1256)
- Fix where to support multi-index columns. (1249)

0.26.0

`iat` indexer

We continued to improve indexers. Now, `iat` indexer is supported too (1062).

python
>>> df = ks.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... columns=['A', 'B', 'C'])
>>> df
A B C
0 0 2 3
1 0 4 1
2 10 20 30

>>> df.iat[1, 2]
1

Other new features and improvements

We added the following new features:

koalas.Index

- `equals` (1216)
- `identical` (1215)
- `is_all_dates` (1205)
- `append` (1163)
- `to_frame` (1187)

koalas.MultiIndex:

- `equals` (1216)
- `identical` (1215)
- `swaplevel` (1105)
- `is_all_dates` (1205)
- `is_monotonic_increasing` (1183)
- `is_monotonic_decreasing ` (1183)
- `append` (1163)
- `to_frame` (1187)

koalas.DataFrameGroupBy

- `describe` (1168)

Other improvements

- Change default write mode to overwrite to be consistent with pandas (1209)
- Prepare Spark 3 (1211, 1181)
- Fix `DataFrame.idxmin/idxmax`. (1198)
- Fix reset_index with the default index is "distributed-sequence". (1193)
- Fix column name as a tuple in multi column index (1191)
- Add favicon to doc (1189)

0.25.0

`loc` and `iloc` indexers improvement

We improved `loc` and `iloc` indexers. Now, `loc` can support scalar values as indexers (1172).

python
>>> import databricks.koalas as ks
>>>
>>> df = ks.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df.loc['sidewinder']
max_speed 7
shield 8
Name: sidewinder, dtype: int64
>>> df.loc['sidewinder', 'max_speed']
7

In addition, Series derived from a different Frame can be used as indexers (1155).

python
>>> import databricks.koalas as ks
>>>
>>> ks.options.compute.ops_on_diff_frames = True
>>>
>>> df1 = ks.DataFrame({'A': [0, 1, 2, 3, 4], 'B': [100, 200, 300, 400, 500]},
... index=[20, 10, 30, 0, 50])
>>> df2 = ks.DataFrame({'A': [0, -1, -2, -3, -4], 'B': [-100, -200, -300, -400, -500]},
... index=[20, 10, 30, 0, 50])
>>> df1.A.loc[df2.A > -3].sort_index()
10 1
20 0
30 2

Lastly, now `loc` uses its natural order according to index identically with pandas' when using the slice (1159, 1174, 1179). See the example below.

python
>>> df = ks.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df.loc['cobra':'viper', 'max_speed']
cobra 1
viper 4
Name: max_speed, dtype: int64

Other new features and improvements

We added the following new features:

koalas.Series:

- `get` (1153)

koalas.Index

- `drop` (1117)
- `len` (1161)
- `set_names` (1134)
- `argmin` (1162)
- `argmax` (1162)

koalas.MultiIndex:

- `from_product` (1144)
- `drop` (1117)
- `len` (1161)
- `set_names` (1134)

Other improvements

- Add support `from_pandas` for Index/MultiIndex. (1170)
- Add a hidden column `__natural_order__`. (1146)
- Introduce `_LocIndexerLike` and consolidate some logic. (1149)
- Refactor `LocIndexerLike.__getitem__`. (1152)
- Remove sort in `GroupBy._reduce_for_stat_function`. (1147)
- Randomize index in tests and fix some window-like functions. (1151)
- Explicitly don't support `Index.duplicated` (1131)
- Fix `DataFrame._repr_html_()`. (1177)

Page 7 of 9

Releases

Has known vulnerabilities

Previous Next

Koalas

Page 7 of 9

0.30.0

0.29.0

0.28.0

0.27.0

0.26.0

0.25.0

Page 7 of 9

Links

Releases