PyUp Safety actively tracks 362,670 Python packages for vulnerabilities and notifies you when to upgrade.
* New main classes (`ImputationKernel`, `ImputedData`) replace (`ImputationKernel`, `ImputationKernel`, `ImputedDataSet`, `MultipleImputedDataSet`). * Data can now be referenced and imputed in place. This saves a lot of memory allocation and is much faster. * Data can now be completed in place. This allows for only a single copy of the dataset to be in memory at any given time, even if performing multiple imputation. * mean_match_subset parameter has been replaced with data_subset. This subsets the data used to build the model as well as the candidates. * More performance improvements around when data is copied and where it is stored. * Raw data is now stored as the original. Can handle pandas DataFrame and numpy ndarray.
This release improved a number of areas: * Huge performance improvements, especially if categorical variables were being imputed. These come from not predicting candidate data if we don't need to, using a much faster neighbors search, using numpy internally for indexing instead of pandas, and others. * Ability to tune parameters of models, and use best parameters for mice. * Improvements to code layout - got rid of ImputationSchema. * Raw data is now stored as a numpy array to save space and improve indexing. * Numpy arrays can be imputed, if you want to avoid pandas. * Options of multiple build-in mean matching functions. * Mean matching functions can handle most lightgbm objectives.
This is a major release, with breaking API changes: * The random forest package is now lightgbm - Much more lightweight (serialized kernels tend to be 5x smaller or more) - Much faster on big datasets (for comparable parameters) - More flexible... We can now use gbdt if we wish. lightgbm is more flexible in general. * Added a mean_match_subset parameter. This will help greatly speed up many processes. * mean_match_candidates now lazily accepts dicts as long as the keys are a subset of parameters in variable_schema. * Model parameters can be specified by variable, or globally. * Mean matching function can be overwritten if the user wishes.
* Models from all iterations can be saved with save_models == 2. * Kernel classes inherit from base imputed classes - allows for methods to be called on imputed datasets obtained form impute_new_data(). * Time log was added * MultipleImputedDataset is now a collection of ImputedDataSets with methods for comparing them. Subscripting gives the desired dataset. * Tests updated to be much more comprehensive * Datasets can now be added and removed from a MultipleImputedDataSet/MultipleImputedKernel.
Automatic testing, coverage, and formatting has been implemented. Code is (reasonably) bug free.