Datalad-next

Latest version: v1.3.0

Safety actively analyzes 620901 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

1.3.0

💫 Enhancements and new features

- Code organization is adjusted to clearly indicate what is part of the
package's public Python API. Anything that can be imported directly from
the top-level of any sub-package is part of the public API.
As an example: `from datalad_next.runners import iter_git_subproc`
imports a part of the public API, but
`from datalad_next.runners.git import iter_git_subproc` does not.
See `README.md` for more information.
Fixes https://github.com/datalad/datalad-next/issues/613 via
https://github.com/datalad/datalad-next/pull/615 (by mih)
https://github.com/datalad/datalad-next/pull/617 (by mih)
https://github.com/datalad/datalad-next/pull/618 (by mih)
https://github.com/datalad/datalad-next/pull/619 (by mih)
https://github.com/datalad/datalad-next/pull/620 (by mih)
https://github.com/datalad/datalad-next/pull/621 (by mih)
https://github.com/datalad/datalad-next/pull/622 (by mih)
https://github.com/datalad/datalad-next/pull/623 (by mih)

- New `patched_env` context manager for patching a process'
environment. This avoids the for importing `unittest` outside
test implementations.
Via https://github.com/datalad/datalad-next/pull/633 (by mih)

- `call_git...()` functions received a new `force_c_locale`
parameter. This can be set whenever Git output needs to be parsed
to force running the command with `LC_ALL=C`. Such an environment
manipulation is off by default and not done unconditionally to
let localized messaging through in a user's normal locale.

🐛 Bug Fixes

- `datalad-annex::` Git remote helper now tests for a repository
deposit, and distinguishes an absent remote repository deposit
vs cloning from an empty repository deposit. This rectifies
confusing behavior (successful clones of empty repositories
from broken URLs), but also fixes handling of subdataset clone
candidate handling in `get` (which failed to skip inaccessible
`datalad-annex::` URLs for the same reason).
Fixes https://github.com/datalad/datalad-next/issues/636 via
https://github.com/datalad/datalad-next/pull/638 (by mih)

📝 Documentation

- API docs have been updated to include all top-level symbols
of any sub-package, or in other words: the public API.
See https://github.com/datalad/datalad-next/pull/627 (by mih)

🏠 Internal

- The `tree` command no longer uses the `subdatasets` command
for queries, but employs the recently introduced `iter_submodules()`
for leaner operations.
See https://github.com/datalad/datalad-next/pull/628 (by mih)

- `call_git...()` functions are established as the only used abstraction
to interface with Git and git-annex commands outside the use in
DataLad's `Repo` classes. Any usage of DataLad's traditional
`Runner` functionality is discontinued.
Fixes https://github.com/datalad/datalad-next/issues/541 via
https://github.com/datalad/datalad-next/pull/632 (by mih)

- Type annotations have been added to the implementation of the
`uncurl` git-annex remote. A number of unhandled conditions have
been discovered and were rectified.

1.2.0

🐛 Bug Fixes

- Fix an invalid escape sequence in a regex that caused a syntax warning.
Fixes https://github.com/datalad/datalad-next/issues/602 via
https://github.com/datalad/datalad-next/pull/603 (by mih)

💫 Enhancements and new features

- Speed up of status reports for repositories with many submodules.
An early presence check for submodules skips unnecessary evaluation
steps. Fixes https://github.com/datalad/datalad-next/issues/606 via
https://github.com/datalad/datalad-next/pull/607 (by mih)

🏠 Internal

- Fix implementation error in `ParamDictator` class that caused a test
failure. The class itself is unused and has been scheduled for removal.
See https://github.com/datalad/datalad-next/issues/611 and
https://github.com/datalad/datalad-next/pull/610 (by christian-monch)

🛡 Tests

- Promote a previously internal fixture to provide a standard
`modified_dataset` fixture. This fixture is sessions-scope, and
yields a dataset with many facets of modification, suitable for
testing change reporting. The fixture verifies that no
modifications have been applied to the testbed. (by mih)

- `iterable_subprocess` tests have been robustified to better handle the
observed diversity of execution environments. This addresseses, for example,
https://bugs.debian.org/1061739.
https://github.com/datalad/datalad-next/pull/614 (by christian-monch)

1.1.0

💫 Enhancements and new features

- A new paradigm for subprocess execution is introduced. The main
workhorse is `datalad_next.runners.iter_subproc`. This is a
context manager that feeds input to subprocesses via iterables,
and also exposes their output as an iterable. The implementation
is based on https://github.com/uktrade/iterable-subprocess, and
a copy of it is now included in the sources. It has been modified
to work homogeneously on the Windows platform too.
This new implementation is leaner and more performant. Benchmarks
suggest that the execution of multi-step pipe connections of Git
and git-annex commands is within 5% of the runtime of their direct
shell-execution equivalent (outside Python).
See https://github.com/datalad/datalad-next/pull/538 (by mih),
https://github.com/datalad/datalad-next/pull/547 (by mih).

With this change a number of additional features have been added,
and internal improvements have been made. For example, any
use of `ThreadedRunner` has been discontinued. See
https://github.com/datalad/datalad-next/pull/539 (by christian-monch),
https://github.com/datalad/datalad-next/pull/545 (by christian-monch),
https://github.com/datalad/datalad-next/pull/550 (by christian-monch),
https://github.com/datalad/datalad-next/pull/573 (by christian-monch)

- A new `itertools` module was added. It provides implementations
of iterators that can be used in conjunction with `iter_subproc`
for standard tasks. This includes the itemization of output
(e.g., line-by-line) across chunks of bytes read from a process
(`itemize`), output decoding (`decode_bytes`), JSON-loading
(`json_load`), and helpers to construct more complex data flows
(`route_out`, `route_in`).

- The `more_itertools` package has been added as a new dependency.
It is used for `datalad-next` iterator implementations, but is also
ideal for client code that employed this new functionality.

- A new `iter_annexworktree()` provides the analog of `iter_gitworktree()`
for git-annex repositories.

- `iter_gitworktree()` has been reimplemented around `iter_subproc`. The
performance is substantially improved.

- `iter_gitworktree()` now also provides file pointers to
symlinked content. Fixes https://github.com/datalad/datalad-next/issues/553
via https://github.com/datalad/datalad-next/pull/555 (by mih)

- `iter_gitworktree()` and `iter_annexworktree()` now support single
directory (i.e., non-recursive) reporting too.
See https://github.com/datalad/datalad-next/pull/552

- A new `iter_gittree()` that wraps `git ls-tree` for iterating over
the content of a Git tree-ish.
https://github.com/datalad/datalad-next/pull/580 (by mih).

- A new `iter_gitdiff()` wraps `git diff-tree|files` and provides a flexible
basis for iteration over changesets.

- `PathBasedItem`, a dataclass that is the bases for many item types yielded
by iterators now more strictly separates `name` property from path semantics.
The name is a plain string, and an additional, explicit `path` property
provides it in the form of a `Path`. This simplifies code (the
`_ZipFileDirPath` utility class became obsolete and was removed), and
improve performance.
Fixes https://github.com/datalad/datalad-next/issues/554 and
https://github.com/datalad/datalad-next/issues/581 via
https://github.com/datalad/datalad-next/pull/583 (by mih)

- A collection of helpers for running Git command has been added at
`datalad_next.runners.git`. Direct uses of datalad-core runners,
or `subprocess.run()` for this purpose have been replaced with call
to these utilities.
https://github.com/datalad/datalad-next/pull/585 (by mih)

- The performance of `iter_gitworktree()` has been improved by about
10%. Fixes https://github.com/datalad/datalad-next/issues/540
via https://github.com/datalad/datalad-next/pull/544 (by mih).

- New `EnsureHashAlgorithm` constraint to automatically expose
and verify algorithm labels from `hashlib.algorithms_guaranteed`
Fixes https://github.com/datalad/datalad-next/issues/346 via
https://github.com/datalad/datalad-next/pull/492 (by mslw adswa)

- The `archivist` remote now supports archive type detection
from `*E`-type annex keys for `.tgz` archives too.
Fixes https://github.com/datalad/datalad-next/issues/517 via
https://github.com/datalad/datalad-next/pull/518 (by mih)

- `iter_zip()` uses a dedicated, internal `PurePath` variant to report on
directories (`_ZipFileDirPath`). This enables more straightforward
`item.name in zip_archive` tests, which require a trailing `/` for
directory-type archive members.
https://github.com/datalad/datalad-next/pull/430 (by christian-monch)

- A new `ZipArchiveOperations` class added support for ZIP files, and enables
their use together with the `archivist` git-annex special remote.
https://github.com/datalad/datalad-next/pull/578 (by christian-monch)

- `datalad ls-file-collection` has learned additional collections types:

- The new `zipfile` collection type that enables uniform reporting on
the additional archive type.

- The new `annexworktree` collection that enhances the `gitworktree`
collection by also reporting on annexed content, using the new
`iter_annexworktree()` implementation. It is about 15% faster than a
`datalad --annex basic --untracked no -e no -t eval`.

- The new `gittree` collection for listing any Git tree-ish.

- A new `iter_gitstatus()` can replace the functionality of
`GitRepo.diffstatus()` with a substantially faster implementation.
It also provides a novel `mono` recursion mode that completely
hides the notion of submodules and presents deeply nested
hierarchies of datasets as a single "monorepo".
https://github.com/datalad/datalad-next/pull/592 (by mih)

- A new `next-status` command provides a substantially faster
alternative to the datalad-core `status` command. It is closely
aligned to `git status` semantics, only reports changes (not repository
listings), and supports type change detection. Moreover, it exposes
the "monorepo" recursion mode, and single-directory reporting options
of `iter_gitstatus()`. It is the first command to use `dataclass`
instances as result types, rather than the traditional dictionaries.

- `SshUrlOperations` now supports non-standard SSH ports, non-default
user names, and custom identity file specifications.
Fixed https://github.com/datalad/datalad-next/issues/571 via
https://github.com/datalad/datalad-next/pull/570 (by mih)

- A new `EnsureRemoteName` constraint improves the parameter validation
of `create-sibling-webdav`. Moreover, the command has been uplifted
to support uniform parameter validation also for the Python API.
Missing required remotes, or naming conflicts are now detected and
reported immediately before the actual command implementation runs.
Fixes https://github.com/datalad/datalad-next/issues/193 via
https://github.com/datalad/datalad-next/pull/577 (by mih)

- `datalad_next.repo_utils` provide a collection of implementations
for common operations on Git repositories. Unlike the datalad-core
`Repo` classes, these implementations do no require a specific
data structure or object type beyond a `Path`.

🐛 Bug Fixes

- Add patch to fix `update`'s target detection for adjusted mode datasets
that can crash under some circumstances.
See https://github.com/datalad/datalad/issues/7507, fixed via
https://github.com/datalad/datalad-next/pull/509 (by mih)

- Comparison with `is` and a literal was replaced with a proper construct.
While having no functional impact, it removes an ugly `SyntaxWarning`.
Fixed https://github.com/datalad/datalad-next/issues/526 via
https://github.com/datalad/datalad-next/pull/527 (by mih)

📝 Documentation

- The API documentation has been substantially extended. More already
documented API components are now actually renderer, and more documentation
has been written.

🏠 Internal

- Type annotations have been extended. The development workflows now inform
about type annotation issues for each proposed change.

- Constants have been migrated to `datalad_next.consts`.
https://github.com/datalad/datalad-next/pull/575 (by mih)

🛡 Tests

- A new test verifies compatibility with HTTP serves that do not report
download progress.
https://github.com/datalad/datalad-next/pull/369 (by christian-monch)

- The overall noise-level in the test battery output has been reduced
substantially. INFO log messages are no longer shown, and command result
rendering is largely suppressed. New test fixtures make it easier
to maintain tidier output: `reduce_logging`, `no_result_rendering`.
The contribution guide has been adjusted encourage their use.

- Tests that require an unprivileged system account to run are now skipped
when executed as root. This fixes an issue of the Debian package.
https://github.com/datalad/datalad-next/pull/593 (by adswa)

1.0.2

🏠 Internal

- The `www-authenticate` dependencies is dropped. The functionality is
replaced by a `requests`-based implementation of an alternative parser.
This trims the dependency footprint and facilitates Debian-packaging.
The previous test cases are kept and further extended.
Fixes https://github.com/datalad/datalad-next/issues/493 via
https://github.com/datalad/datalad-next/pull/495 (by mih)

🛡 Tests

- The test battery now honors the `DATALAD_TESTS_NONETWORK` environment
variable and downgrades by skipping any tests that require external
network access. (by mih)

1.0.1

🐛 Bug Fixes

- Fix f-string syntax in error message of the `uncurl` remote.
https://github.com/datalad/datalad-next/pull/455 (by christian-monch)

- `FileSystemItem.from_path()` now honors its `link_target` parameter, and
resolves a target for any symlink item conditional on this setting.
Previously, a symlink target was always resolved.
Fixes https://github.com/datalad/datalad-next/issues/462 via
https://github.com/datalad/datalad-next/pull/464 (by mih)

- Update the vendor installation of versioneer to v0.29. This
resolves an installation failure with Python 3.12 due to
the removal of an ancient class.
Fixes https://github.com/datalad/datalad-next/issues/475 via
https://github.com/datalad/datalad-next/pull/483 (by mih)

- Bump dependency on Python to 3.8. This is presently the oldest version
still supported upstream. However, some functionality already used
3.8 features, so this is also a bug fix.
Fixes https://github.com/datalad/datalad-next/issues/481 via
https://github.com/datalad/datalad-next/pull/486 (by mih)

💫 Enhancements and new features

- Patch datalad-core's `run` command to honor configuration defaults
for substitutions. This enables placeholders like `{python}` that
point to `sys.executable` by default, and need not be explicitly
defined in system/user/dataset configuration.
Fixes https://github.com/datalad/datalad-next/issues/478 via
https://github.com/datalad/datalad-next/pull/485 (by mih)

📝 Documentation

- Include `gitworktree` among the available file collection types
listed in `ls-file-collection`'s docstring. Fixes
https://github.com/datalad/datalad-next/issues/470 via
https://github.com/datalad/datalad-next/pull/471 (by mslw)

- The renderer API documentation now includes an entrypoint for the
runner-related functionality and documentation at
https://docs.datalad.org/projects/next/en/latest/generated/datalad_next.runners.html
Fixes https://github.com/datalad/datalad-next/issues/466 via
https://github.com/datalad/datalad-next/pull/467 (by mih)

🛡 Tests

- Simplified setup for subprocess test-coverage reporting. Standard
pytest-cov features are not employed, rather than the previous
approach that was adopted from datalad-core, which originated
in a time when testing was performed via nose.
Fixes https://github.com/datalad/datalad-next/issues/453 via
https://github.com/datalad/datalad-next/pull/457 (by mih)

1.0.0

This release represents a milestone in the development of the extension.
The package is reorganized to be a collection of more self-contained
mini-packages, each with its own set of tests.

Developer documentation and guidelines have been added to aid further
development. One particular goal is to establish datalad-next as a proxy
for importing datalad-core functionality for other extensions. Direct imports
from datalad-core can be minimized in favor of imports from datalad-next.
This helps identifying functionality needed outside the core package,
and guides efforts for future improvements.

The 1.0 release marks the switch to a more standard approach to semantic
versioning. However, although a substantial improvements have been made,
the 1.0 version nohow indicates a slowdown of development or a change in the
likelihood of (breaking) changes. They will merely become more easily
discoverable from the version label alone.

Notable high-level features introduced by this major release are:

- The new `UrlOperations` framework to provide a set of basic operations like
`download`, `upload`, `stat` for different protocols. This framework can be
thought of as a replacement for the "downloaders" functionality in
datalad-core -- although the feature list is not 100% overlapping. This new
framework is more easily extensible by 3rd-party code.

- The `Constraints` framework elevates parameter/input validation to the next
level. In contrast to datalad-core, declarative input validation is no longer
limited to the CLI. Instead, command parameters can now be validated regardless
of the entrypoint through which a command is used. They can be validated
individually, but also sets of parameters can be validated jointly to implement
particular interaction checks. All parameter validations can now be performed
exhaustive, to present a user with a complete list of validation errors, rather
then the fail-on-first-error method implemented exclusively in datalad-core.
Validation errors are now reported using dedicated structured data type to aid
their communication via non-console interfaces.

- The `Credentials` system has been further refined with more homogenized
workflows and deeper integration into other subsystems. This release merely
represents a snapshot of continued development towards a standardization of
credential handling workflows.

- The annex remotes `uncurl` and `archivist` are replacements for the
datalad-core implementations `datalad` and `datalad-archive`. The offer
substantially improved configurability and leaner operation -- built on the
`UrlOperations` framework.

- A growing collection of iterator (see `iter_collections`) aims to provide
fast (and more Pythonic) operations on common data structures (Git worktrees,
directories, archives). The can be used as an alternative to the traditional
`Repo` classes (`GitRepo`, `AnnexRepo`) from datalad-core.

- Analog to `UrlOperations` the `ArchiveOperations` framework aims to provide
an abstraction for operations on different archive types (e.g., TAR). The
represent an alternative to the traditional implementations of
`ExtractedArchive` and `ArchivesCache` from datalad-core, and aim at leaner
resource footprints.

- The collection of runtime patches for datalad-core has been further expanded.
All patches are now individually documented, and applied using a set of standard
helpers (see http://docs.datalad.org/projects/next/en/latest/patches.html).

For details, please see the changelogs of the 1.0.0 beta releases below.

💫 Enhancements and new features

- `TarArchiveOperations` is the first implementation of the `ArchiveOperations`
abstraction, providing archive handlers with a set of standard operations:
- `open` to get a file object for a particular archive member
- `__contains__` to check for the presence of a particular archive member
- `__iter__` to get an iterator for processing all archive members
https://github.com/datalad/datalad-next/pull/415 (by mih)

🐛 Bug Fixes

- Make `TarfileItem.name` be of type `PurePosixPath` to reflect the fact
that a TAR archive can contain members with names that cannot be represent
unmodified on a non-POSIX file system.
https://github.com/datalad/datalad-next/pull/422 (by mih)
An analog change is done for `ZipfileItem.name`.
https://github.com/datalad/datalad-next/pull/409 (by christian-monch)

- Fix `git ls-file` parsing in `iter_gitworktree()` to be compatible with
file names that start with a `tab` character.
https://github.com/datalad/datalad-next/pull/421 (by christian-monch)

📝 Documentation

- Expanded guidelines on test implementations.

- Add missing and fix wrong docstrings for HTTP/WebDAV server related fixtures.
https://github.com/datalad/datalad-next/pull/445 (by adswa)

🏠 Internal

- Deduplicate configuration handling code in annex remotes.
https://github.com/datalad/datalad-next/pull/440 (by adswa)

🛡 Tests

- New test fixtures have been introduced to replace traditional test helpers
from datalad-core:

- `datalad_interactive_ui` and `datalad_noninteractive_ui` for testing
user interactions. They replace `with_testsui`.
https://github.com/datalad/datalad-next/pull/427 (by mih)

- Expand test coverage for `create_sibling_webdav` to include recursive
operation.
https://github.com/datalad/datalad-next/pull/434 (by adswa)

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.