Datalad

Latest version: v1.0.2

Safety actively analyzes 629678 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 12 of 15

0.10.1

Not secure
The is a minor bugfix release.

Fixes

- Be able to use backports.lzma as a drop-in replacement for pyliblzma.
- Give help when not specifying a procedure name in `run-procedure`.
- Abort early when a downloader received no filename.
- Avoid `rerun` error when trying to unlock non-available files.

0.10.0

Not secure
This release is a major leap forward in metadata support.

Major refactoring and deprecations

- Metadata
- Prior metadata provided by datasets under `.datalad/meta` is no
longer used or supported. Metadata must be reaggregated using 0.10
version
- Metadata extractor types are no longer auto-guessed and must be
explicitly specified in `datalad.metadata.nativetype` config
(could contain multiple values)
- Metadata aggregation of a dataset hierarchy no longer updates all
datasets in the tree with new metadata. Instead, only the target
dataset is updated. This behavior can be changed via the --update-mode
switch. The new default prevents needless modification of (3rd-party)
subdatasets.
- Neuroimaging metadata support has been moved into a dedicated extension:
https://github.com/datalad/datalad-neuroimaging
- Crawler
- moved into a dedicated extension:
https://github.com/datalad/datalad-crawler
- `export_tarball` plugin has been generalized to `export_archive` and
can now also generate ZIP archives.
- By default a dataset X is now only considered to be a super-dataset of
another dataset Y, if Y is also a registered subdataset of X.

Fixes

A number of fixes did not make it into the 0.9.x series:

- Dynamic configuration overrides via the `-c` option were not in effect.
- `save` is now more robust with respect to invocation in subdirectories
of a dataset.
- `unlock` now reports correct paths when running in a dataset subdirectory.
- `get` is more robust to path that contain symbolic links.
- symlinks to subdatasets of a dataset are now correctly treated as a symlink,
and not as a subdataset
- `add` now correctly saves staged subdataset additions.
- Running `datalad save` in a dataset no longer adds untracked content to the
dataset. In order to add content a path has to be given, e.g. `datalad save .`
- `wtf` now works reliably with a DataLad that wasn't installed from Git (but,
e.g., via pip)
- More robust URL handling in `simple_with_archives` crawler pipeline.

Enhancements and new features

- Support for DataLad extension that can contribute API components from 3rd-party sources,
incl. commands, metadata extractors, and test case implementations.
See https://github.com/datalad/datalad-extension-template for a demo extension.
- Metadata (everything has changed!)
- Metadata extraction and aggregation is now supported for datasets and individual
files.
- Metadata query via `search` can now discover individual files.
- Extracted metadata can now be stored in XZ compressed files, is optionally
annexed (when exceeding a configurable size threshold), and obtained on
demand (new configuration option `datalad.metadata.create-aggregate-annex-limit`).
- Status and availability of aggregated metadata can now be reported via
`metadata --get-aggregates`
- New configuration option `datalad.metadata.maxfieldsize` to exclude too large
metadata fields from aggregation.
- The type of metadata is no longer guessed during metadata extraction. A new
configuration option `datalad.metadata.nativetype` was introduced to enable
one or more particular metadata extractors for a dataset.
- New configuration option `datalad.metadata.store-aggregate-content` to enable
the storage of aggregated metadata for dataset content (i.e. file-based metadata)
in contrast to just metadata describing a dataset as a whole.
- `search` was completely reimplemented. It offers three different modes now:
- 'egrep' (default): expression matching in a plain string version of metadata
- 'textblob': search a text version of all metadata using a fully featured
query language (fast indexing, good for keyword search)
- 'autofield': search an auto-generated index that preserves individual fields
of metadata that can be represented in a tabular structure (substantial
indexing cost, enables the most detailed queries of all modes)
- New extensions:
- [addurls][], an extension for creating a dataset (and possibly subdatasets)
from a list of URLs.
- export_to_figshare
- extract_metadata
- add_readme makes use of available metadata
- By default the wtf extension now hides sensitive information, which can be
included in the output by passing `--senstive=some` or `--senstive=all`.
- Reduced startup latency by only importing commands necessary for a particular
command line call.
- [create][]:
- `-d <parent> --nosave` now registers subdatasets, when possible.
- `--fake-dates` configures dataset to use fake-dates
- [run][] now provides a way for the caller to save the result when a
command has a non-zero exit status.
- `datalad rerun` now has a `--script` option that can be used to extract
previous commands into a file.
- A DataLad Singularity container is now available on
[Singularity Hub](https://singularity-hub.org/collections/667).
- More casts have been embedded in the [use case section of the documentation](http://docs.datalad.org/en/docs/usecases/index.html).
- `datalad --report-status` has a new value 'all' that can be used to
temporarily re-enable reporting that was disable by configuration settings.

0.9.3

Not secure
Some important bug fixes which should improve usability

Fixes

- `datalad-archives` special remote now will lock on acquiring or
extracting an archive - this allows for it to be used with -J flag
for parallel operation
- relax introduced in 0.9.2 demand on git being configured for datalad
operation - now we will just issue a warning
- `datalad ls` should now list "authored date" and work also for datasets
in detached HEAD mode
- `datalad save` will now save original file as well, if file was
"git mv"ed, so you can now `datalad run git mv old new` and have
changes recorded

Enhancements and new features

- `--jobs` argument now could take `auto` value which would decide on
of jobs depending on the of available CPUs.
`git-annex` > 6.20180314 is recommended to avoid regression with -J.
- memoize calls to `RI` meta-constructor -- should speed up operation a
bit
- `DATALAD_SEED` environment variable could be used to seed Python RNG
and provide reproducible UUIDs etc (useful for testing and demos)

0.9.2

Not secure
Largely a bugfix release with a few enhancements.

Fixes

- Execution of external commands (git) should not get stuck when
lots of both stdout and stderr output, and should not loose remaining
output in some cases
- Config overrides provided in the command line (-c) should now be
handled correctly
- Consider more remotes (not just tracking one, which might be none)
while installing subdatasets
- Compatibility with git 2.16 with some changed behaviors/annotations
for submodules
- Fail `remove` if `annex drop` failed
- Do not fail operating on files which start with dash (-)
- URL unquote paths within S3, URLs and DataLad RIs (///)
- In non-interactive mode fail if authentication/access fails
- Web UI:
- refactored a little to fix incorrect listing of submodules in
subdirectories
- now auto-focuses on search edit box upon entering the page
- Assure that extracted from tarballs directories have executable bit set

Enhancements and new features

- A log message and progress bar will now inform if a tarball to be
downloaded while getting specific files
(requires git-annex > 6.20180206)
- A dedicated `datalad rerun` command capable of rerunning entire
sequences of previously `run` commands.
**Reproducibility through VCS. Use `run` even if not interested in `rerun`**
- Alert the user if `git` is not yet configured but git operations
are requested
- Delay collection of previous ssh connections until it is actually
needed. Also do not require ':' while specifying ssh host
- AutomagicIO: Added proxying of isfile, lzma.LZMAFile and io.open
- Testing:
- added DATALAD_DATASETS_TOPURL=http://datasets-tests.datalad.org to
run tests against another website to not obscure access stats
- tests run against temporary HOME to avoid side-effects
- better unit-testing of interactions with special remotes
- CONTRIBUTING.md describes how to setup and use `git-hub` tool to
"attach" commits to an issue making it into a PR
- DATALAD_USE_DEFAULT_GIT env variable could be used to cause DataLad
to use default (not the one possibly bundled with git-annex) git
- Be more robust while handling not supported requests by annex in
special remotes
- Use of `swallow_logs` in the code was refactored away -- less
mysteries now, just increase logging level
- `wtf` plugin will report more information about environment, externals
and the system

0.9.1

Not secure
Minor bugfix release

Fixes

- Should work correctly with subdatasets named as numbers of bool
values (requires also GitPython >= 2.1.6)
- Custom special remotes should work without crashing with
git-annex >= 6.20170924

0.9.0

Not secure
Major refactoring and deprecations

- the `files` argument of [save][] has been renamed to `path` to be uniform with
any other command
- all major commands now implement more uniform API semantics and result reporting.
Functionality for modification detection of dataset content has been completely replaced
with a more efficient implementation
- [publish][] now features a `--transfer-data` switch that allows for a
disambiguous specification of whether to publish data -- independent of
the selection which datasets to publish (which is done via their paths).
Moreover, [publish][] now transfers data before repository content is pushed.

Fixes

- [drop][] no longer errors when some subdatasets are not installed
- [install][] will no longer report nothing when a Dataset instance was
given as a source argument, but rather perform as expected
- [remove][] doesn't remove when some files of a dataset could not be dropped
- [publish][]
- no longer hides error during a repository push
- publish behaves "correctly" for `--since=` in considering only the
differences the last "pushed" state
- data transfer handling while publishing with dependencies, to github
- improved robustness with broken Git configuration
- [search][] should search for unicode strings correctly and not crash
- robustify git-annex special remotes protocol handling to allow for spaces in
the last argument
- UI credentials interface should now allow to Ctrl-C the entry
- should not fail while operating on submodules named with
numerics only or by bool (true/false) names
- crawl templates should not now override settings for `largefiles` if
specified in `.gitattributes`


Enhancements and new features

- **Exciting new feature** [run][] command to protocol execution of an external
command and rerun computation if desired.
See [screencast](http://datalad.org/features.html#reproducible-science)
- [save][] now uses Git for detecting with sundatasets need to be inspected for
potential changes, instead of performing a complete traversal of a dataset tree
- [add][] looks for changes relative to the last committed state of a dataset
to discover files to add more efficiently
- [diff][] can now report untracked files in addition to modified files
- [uninstall][] will check itself whether a subdataset is properly registered in a
superdataset, even when no superdataset is given in a call
- [subdatasets][] can now configure subdatasets for exclusion from recursive
installation (`datalad-recursiveinstall` submodule configuration property)
- precrafted pipelines of [crawl][] now will not override `annex.largefiles`
setting if any was set within `.gitattribues` (e.g. by `datalad create --text-no-annex`)
- framework for screencasts: `tools/cast*` tools and sample cast scripts under
`doc/casts` which are published at [datalad.org/features.html](http://datalad.org/features.html)
- new [project YouTube channel](https://www.youtube.com/channel/UCB8-Zf7D0DSzAsREoIt0Bvw)
- tests failing in direct and/or v6 modes marked explicitly

Page 12 of 15

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.