Gimmemotifs

Latest version: v0.18.0

Safety actively analyzes 613705 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.18.0

Added

- `gimme scan` and `gimme maelstrom` now accept a random seed for (most) operations
- for (optimal) deterministic behaviour, delete the cache and then run the command with a seed
- `Scanner` now accepts a `np.random.RandomState` and `progress` on init.
- `progress=None` (the default) should print progress bars to the command line only, not to file.
- `Scanner.set_genome` now accepts the optional argument `genomes_dir`
- `gimmemotifs.maelstrom.Moap.create` now accepts a `np.random.RandomState`.
- `gimmemotifs.maelstrom.run_maelstrom` now accepts a `np.random.RandomState`.

Changed

- `gimme diff` (`diff_plot()` to be exact) will now print to stdout, like all other functions
- now using the logger instead of print/sys.stderr.write in many more places
- string formatting now (mostly) done with f-strings
- refactored Fasta class
- split `scanner.py` into 3 submodules:
- `scanner/__init__.py` with the exported functions
- `scanner/base.py` with the Scanner class
- `scanner/utils.py` with the rest
- `gimmemotifs/maelstrom.py` renamed to `gimmemotifs/maelstrom/_init__.py`
- `rank.py` and `moap.py` are now submodules of maelstrom.

Fixed

- `gimme maelstrom` works with or without xgboost (but will give a warning without xgboost)
- fixed warning "in validate_matrix(): Row sums in df are not close to 1. Reormalizing rows..."
- fixed multiprocess.Pool Warnings
- fixed a pandas copywarning (in `gc_bin_bedfile()` to be exact)
- fixed warnings when leaving files open
- fixed deprecation warning in maelstrom (and in tests)
- fixed futurewarning in report.py
- silence warnings from external tools in motif prediction (`pp_predict_motifs()` to be exact)
- updated last references from `Motif.pwm_scan` and `Motif.pwm_scan_all` to `Motif.scan` and `Motif.scan_all` respectively
- typo in `gimme motifs` output ("%matches background" to "% matches background")
- `Scanner` now uses a cheaper method to determine a genome's identity
- (filesize + name instead of the md5sum of the whole genome's contents)
- `gimme motifs` gives an informative error when `fraction` is not within 0-1.
- `gimme threshold` works again

Removed

- removed old python2 code (scanning with MOODS & import shenanigans)

0.17.2

Changed

- made xgboost an optional dependency (to save space on bioconda)
- an existing config will now update available tools when accessed (e4b3275)
- applied the bioconda patch to compile_externals.py (11b0c2c)
- `coverage_table` and `combine_peaks` have their positional arguments under positional arguments (20819ee)
- `coverage_table` should be slightly faster now (20819ee)

Fixed

- biofluff dependency back in requirements
- pinned conda and mamba versions in `.travis.yaml`
- temp fix until conda>=4.12 can install mamba properly
- documentation is working again!
- gimmemotifs now supports pandas >=1.30

Removed

- pyarrow dependency

0.17.1

Added

- `requirements.yaml` contains all conda dependencies.
- packages available from one channel have been pinned (for solving speed)
- packages have minimum versions where known (for solving speed)

Changed

- alphabetized tools everywhere (how could you live like that!?)
- updated `setup.py`
- updated installation instructions

Fixed

- Yamda is now recognized in the config
- most tools work with the editable installation again
- all tests work for unix
- there were still some flakey values, where randomness is involved.
- background.py updated to work with the specified minimum `genomepy` version
- all `sphinx-build docs build` warnings
- motifs require to have unique ids when clustering, thanks akmorrow13!
- motif2factors removes apostrophes so it wont crash :)
- removed a print

Removed

- a bunch of redundant requirement files.
- OSX tests. Possibly temporary.
- The tests haven't working for ages, so I have no idea where to begin.
- and Travis asks 5x credits for OSX machines...

0.17.0

Added

* Added `--genomes_dir` argument to `gimme motif2factors`.
* Added `--version` flag.
* Function `sample()` for fast sequence sampling from a `Motif()` instance.
* Added JASPAR 2022 motif databases.
* Updated Homer motif database.
* Operators:
* `+` - take the combination of two motifs (average), based on pfm, which means that motifs with higher counts will be weighed more heavily.
* `&` - take the combination of two motifs (average), based on the ppm, which means that both motifs will be weighed equally.
* `<<` - "shift" motif left (adding a non-informative position to the right side)
* `>>` - "shift" motif right (adding a non-informative position to the left side)
* `~` - reverse complement
* `*` - multiply the pfm by a value
* Progress bar for scanning.
* `list_installed_libraries()` to list available motif libraries.

Changed

* `Motif()` class completely restructured:
* Split into multiple files with coherent function.
* Uses `numpy.array` internally.
* All functions that mention `pwm` renamed to `ppm` (position-probability matrix), as the definition of a PWM is usually a log-odds matrix, not a probability matrix.
* `to_pwm()` is deprecated, use `to_ppm()` instead.
* Changed functions `pwm_min_score()` and `pwm_max_score()` to properties `max_score` and `min_score`.
* All internal data is correctly updated when `Motif()` is changed, for instance by trimming (218).


Fixed

* `gimme motif2factors` can now unzip genome fastas.
* `gimme motif2factors` will sanitize genome names.
* Fixed bugs related to partial rerun of `gimme motif2factors`.
* Fixed unhandled `OSError` during installation on Mac.
* Fixed bug related to `RFE()` (226).
* Positional probability matrix now sum to 1 over all positions (209).
* Fixed issue with pandas >= 1.3.
* Fixed issue with `non_reducing_slice` import from pandas.
* Fix threshold calculation if more than 20,000 sequences are supplied.
* Fix issue with config file getting corrupted.
* Fix FPR threshold calculation.

Removed

0.16.1

Bugfix release.

Added

* Added warning when the number of sequences used for de novo motif prediction is low.

Fixed

* Fixed bug with `gimme motif2factors`.
* Fixed "Motif does not occur in motif database when running maelstrom" (192).
* Fixed bugs related to runs where no (significant) motifs is found.

0.16.0

Many bugfixes, thanks to kirbyziegler, irzhegalova, wangmhan, ClarissaFeuersteinAkgoz and fgualdr for reporting and proposing solutions!
Thanks to Maarten-vd-Sande for the speed improvements.

Added

* `gimme motif2factors` command to annotate a motif database with TFs from different species
based on orthogroups.
* Informative error message with link to fix when cache is corrupted (running on a cluster).
* Print an informative error message if the input file is not in the correct format.

Changed

* Speed improvements to motif scanning, which is now up to 2X faster!
* Size of input regions is now automatically adjusted (123, 128, 129)
* Quantile normalization in `coverage_table` now uses multiple CPUs.

Fixed

* Fixes issue where % of motif occurence would be incorrectly reported in `gimme maelstrom` output (162).
* Fix issues with running Trawler (181)
* Fix issues with running YAMDA (180)
* Fix issues with parsing XXmotif output (178)
* Fix issue where command line argument (such as single strand) are ignored (177)
* Fix pyarrow dependency (176)
* The correct % of regions with motif is now reported (162)
* Fix issue with running `gimme motifs` with the HOMER database (135)
* Fix issue with the `--size` parameter in `gimme motifs`, which now works as expected (128)

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.