Enhancements and new features
- Command execution is now performed by a new `Runner` implementation that is
no longer based on the `asyncio` framework, which was found to exhibit
fragile performance in interaction with other `asyncio`-using code, such as
Jupyter notebooks. The new implementation is based on threads. It also supports
the specification of "protocols" that were introduced with the switch to the
`asyncio` implementation in 0.14.0. ([5667][])
- `clone` now supports arbitrary URL transformations based on regular
expressions. One or more transformation steps can be defined via
`datalad.clone.url-substitute.<label>` configuration settings. The feature can
be (and is now) used to support convenience mappings, such as
`https://osf.io/q8xnk/` (displayed in a browser window) to `osf://q8xnk`
(clonable via the `datalad-osf` extension. ([5749][])
- Homogenize SSH use and configurability between DataLad and git-annex, by
instructing git-annex to use DataLad's `sshrun` for SSH calls (instead of SSH
directly). ([5389][])
- The ORA special remote has received several new features:
- It now support a `push-url` setting as an alternative to `url` for write
access. An analog parameter was also added to `create-sibling-ria`.
([5420][], [5428][])
- Access of RIA stores now performs homogeneous availability checks,
regardless of access protocol. Before, broken HTTP-based access due to
misspecified URLs could have gone unnoticed. ([5459][], [5672][])
- Error reporting was introduce to inform about undesirable conditions in
remote RIA stores. ([5683][])
- `create-sibling-ria` now supports `--alias` for the specification of a
convenience dataset alias name in a RIA store. ([5592][])
- Analog to `git commit`, `save` now features an `--amend` mode to support
incremental updates of a dataset state. ([5430][])
- `run` now supports a dry-run mode that can be used to inspect the result of
parameter expansion on the effective command to ease the composition of more
complicated command lines. ([5539][])
- `run` now supports a `--assume-ready` switch to avoid the (possibly
expensive) preparation of inputs and outputs with large datasets that have
already been readied through other means. ([5431][])
- `update` now features `--how` and `--how-subds` parameters to configure how
an update shall be performed. Supported modes are `fetch` (unchanged
default), and `merge` (previously also possible via `--merge`), but also new
strategies like `reset` or `checkout`. ([5534][])
- `update` has a new `--follow=parentds-lazy` mode that only performs a fetch
operation in subdatasets when the desired commit is not yet present. During
recursive updates involving many subdatasets this can substantially speed up
performance. ([5474][])
- DataLad's command line API can now report the version for individual commands
via `datalad <cmd> --version`. The output has been homogenized to
`<providing package> <version>`. ([5543][])
- `create-sibling` now logs information on an auto-generated sibling name, in
the case that no `--name/-s` was provided. ([5550][])
- `create-sibling-github` has been updated to emit result records like any
standard DataLad command. Previously it was implemented as a "plugin", which
did not support all standard API parameters. ([5551][])
- `copy-file` now also works with content-less files in datasets on crippled
filesystems (adjusted mode), when a recent enough git-annex (8.20210428 or
later) is available. ([5630][])
- `addurls` can now be instructed how to behave in the event of file name
collision via a new parameter `--on-collision`. ([5675][])
- `addurls` reporting now informs which particular subdatasets were created.
([5689][])
- Credentials can now be provided or overwritten via all means supported by
`ConfigManager`. Importantly, `datalad.credential.<name>.<field>`
configuration settings and analog specification via environment variables are
now supported (rather than custom environment variables only). Previous
specification methods are still supported too. ([5680][])
- A new `datalad.credentials.force-ask` configuration flag can now be used to
force re-entry of already known credentials. This simplifies credential
updates without having to use an approach native to individual credential
stores. ([5777][])
- Suppression of rendering repeated similar results is now configurable via the
configuration switches `datalad.ui.suppress-similar-results` (bool), and
`datalad.ui.suppress-similar-results-threshold` (int). ([5681][])
- The performance of `status` and similar functionality when determining local
file availability has been improved. ([5692][])
- `push` now renders a result summary on completion. ([5696][])
- A dedicated info log message indicates when dataset repositories are
subjected to an annex version upgrade. ([5698][])
- Error reporting improvements:
- The `NoDatasetFound` exception now provides information for which purpose a
dataset is required. ([5708][])
- Wording of the `MissingExternalDependeny` error was rephrased to account
for cases of non-functional installations. ([5803][])
- `push` reports when a `--to` parameter specification was (likely)
forgotten. ([5726][])
- Detailed information is now given when DataLad fails to obtain a lock for
credential entry in a timely fashion. Previously only a generic debug log
message was emitted. ([5884][])
- Clarified error message when `create-sibling-gitlab` was called without
`--project`. ([5907][])
- `add-readme` now provides a README template with more information on the
nature and use of DataLad datasets. A README file is no longer annex'ed by
default, but can be using the new `--annex` switch. ([5723][], [5725][])
- `clean` now supports a `--dry-run` mode to inform about cleanable content.
([5738][])
- A new configuration setting `datalad.locations.locks` can be used to control
the placement of lock files. ([5740][])
- `wtf` now also reports branch names and states. ([5804][])
- `AnnexRepo.whereis()` now supports batch mode. ([5533][])
Deprecations and removals
- The minimum supported git-annex version is now 8.20200309. ([5512][])
- ORA special remote configuration items `ssh-host`, and `base-path` are
deprecated. They are completely replaced by `ria+<protocol>://` URL
specifications. ([5425][])
- The deprecated `no_annex` parameter of `create()` was removed from the Python
API. ([5441][])
- The unused `GitRepo.pull()` method has been removed. ([5558][])
- Residual support for "plugins" (a mechanism used before DataLad supported
extensions) was removed. This includes the configuration switches
`datalad.locations.{system,user}-plugins`. ([5554][], [5564][])
- Several features and comments have been moved to the `datalad-deprecated`
package. This package must now be installed to be able to use keep using this
functionality.
- The `publish` command. Use `push` instead. ([5837][])
- The `ls` command. ([5569][])
- The web UI that is deployable via `datalad create-sibling --ui`. ([5555][])
- The "automagic IO" feature. ([5577][])
- `AnnexRepo.copy_to()` has been deprecated. The `push` command should be used
instead. ([5560][])
- `AnnexRepo.sync()` has been deprecated. `AnnexRepo.call_annex(['sync', ...])`
should be used instead. ([5461][])
- All `GitRepo.*_submodule()` methods have been deprecated and will be removed
in a future release. ([5559][])
- `create-sibling-github`'s `--dryrun` switch was deprecated, use `--dry-run` instead.
([5551][])
- The `datalad --pbs-runner` option has been deprecated, use `condor_run`
(or similar) instead. ([5956][])
π Fixes
- Prevent invalid declaration of a publication dependencies for 'origin' on any
auto-detected ORA special remotes, when cloing from a RIA store. An ORA
remote is now checked whether it actually points to the RIA store the clone was
made from. ([5415][])
- The ORA special remote implementation has received several fixes:
- It can now handle HTTP redirects. ([5792][])
- Prevents failure when URL-type annex keys contain the '/' character.
([5823][])
- Properly support the specification of usernames, passwords and ports in
`ria+<protocol>://` URLs. ([5902][])
- It is now possible to specifically select the default (or generic) result
renderer via `datalad -f default` and with that override a `tailored` result
renderer that may be preconfigured for a particular command. ([5476][])
- Starting with 0.14.0, original URLs given to `clone` were recorded in a
subdataset record. This was initially done in a second commit, leading to
inflation of commits and slowdown in superdatasets with many subdatasets. Such
subdataset record annotation is now collapsed into a single commits.
([5480][])
- `run` now longer removes leading empty directories as part of the output
preparation. This was surprising behavior for commands that do not ensure on
their own that output directories exist. ([5492][])
- A potentially existing `message` property is no longer removed when using the
`json` or `json_pp` result renderer to avoid undesired withholding of
relevant information. ([5536][])
- `subdatasets` now reports `state=present`, rather than `state=clean`, for
installed subdatasets to complement `state=absent` reports for uninstalled
dataset. ([5655][])
- `create-sibling-ria` now executes commands with a consistent environment
setup that matches all other command execution in other DataLad commands.
([5682][])
- `save` no longer saves unspecified subdatasets when called with an explicit
path (list). The fix required a behavior change of
`GitRepo.get_content_info()` in its interpretation of `None` vs. `[]` path
argument values that now aligns the behavior of `GitRepo.diff|status()` with
their respective documentation. ([5693][])
- `get` now prefers the location of a subdatasets that is recorded in a
superdataset's `.gitmodules` record. Previously, DataLad tried to obtain a
subdataset from an assumed checkout of the superdataset's origin. This new
default order is (re-)configurable via the
`datalad.get.subdataset-source-candidate-<priority-label>` configuration
mechanism. ([5760][])
- `create-sibling-gitlab` no longer skips the root dataset when `.` is given as
a path. ([5789][])
- `siblings` now rejects a value given to `--as-common-datasrc` that clashes
with the respective Git remote. ([5805][])
- The usage synopsis reported by `siblings` now lists all supported actions.
([5913][])
- `siblings` now renders non-ok results to avoid silent failure. ([5915][])
- `.gitattribute` file manipulations no longer leave the file without a
trailing newline. ([5847][])
- Prevent crash when trying to delete a non-existing keyring credential field.
([5892][])
- git-annex is no longer called with an unconditional `annex.retry=3`
configuration. Instead, this parameterization is now limited to `annex get`
and `annex copy` calls. ([5904][])
π§ͺ Tests
- `file://` URLs are no longer the predominant test case for `AnnexRepo`
functionality. A built-in HTTP server now used in most cases. ([5332][])
---