Changelogs »

PyUp Safety actively tracks 263,166 Python packages for vulnerabilities and notifies you when to upgrade.


  * No changes since RC1


  * Add support for Ubuntu 20.04 as base-distro for session images
  * Allow overriding of intrinsic compute devices (`cpu` and `mem`) using compute device plugins ([224](
  * Move invocation of user-defined bootstrap script from the container entrypoint to the krunner's main loop for better log visibility and ability to interrupt ([225](


  Breaking Changes
  * Apply the plugin API v2 -- all stat/error/accelerator plugins must be updated along with the agent ([222](
  * Allow kernel containers to know their identities via `BACKENDAI_KERNEL_ID` environment variable ([218](
  * Global configuration for agent/container
  - Global configuration on etcd overrides existsing local configuration for agent/container. ([219](
  * Fix instability caused by stat-synchronizer processes under heavy loads by collecting statistics periodically only ([212](
  * Apply batching when producing "kernel_stat_sync" events to reduce manager loads and increase timeout for caching stats in Redis from 30 seconds to 2 minutes ([213](
  * Improve stability under heavily loaded scenarios ([214](
  - Skip lifecycle sync for already terminating kernels to reduce excessive Docker Engine overheads with a many number of being-terminated kernels
  - Increase timeout for container termination to 60 seconds during restarting kernels, by observing deletion latencies under heavy load tests
  * Prevent executing startup command multiple times for batch session. ([217](
  * Stabilize container lifecycle management and RPC exception handling with updated Callosum ([218](
  * Make it possible to add more backend implementations by generalizing importing and initialization of backend modules ([222](
  * Fix hang-up of service-port functionality of a session when one of its service starts but fails to initialize ([223](


  Breaking Changes
  * - Now it runs on Python 3.8 or higher only.
  * Now we support ROCM (Radeon Open Compute) accelerators via `` plugin.
  * Now our manager-to-agent RPC uses [Callosum]( instead of aiozmq, supporting Python 3.8 natively. ([157](
  * Support user-defined bootstrap script (e.g., this can be used to clone a git repo) ([161](
  * ResourceSlots are now more permissive. Agent still checks the validity of known slots but also allows zero-valued unknown slots as well. ([162](
  * All CLI commands are now accessible via ` ag` ([165](
  * Add support for pre-open service ports for user-written apps ([167](
  * Add a new "app" kernel-runner runtime type for GUI and application-only kernels ([189](
  * Mark kernel started after bootstrap script is executed ([190](
  * Generalize kernel-runner volume lists using plugin-like krunner package auto-detection ([198](
  * Using CentOS 6.10 as the base distribution for importing images is deprecated. ([189](
  * Detection for manager now works for HA setup seamlessly. (It now determines if at least one manager is running.) [lablup/backend.ai125](
  * Fix wrong ownership of .ssh and keypair files when the SSH keypair is set via the `internal_data` field of the kernel creation config.
  * Make scratch directory accesses for setup/tear-down fully asynchronous ([186](
  * Fix service-port parsing routines to recognize "vnc-web" service ports for GUI containers ([189](
  * Make the kernel runner's service parser consistent with documentation ([197](
  * Update test fixtures to work with pytest-asyncio 0.11 and use a separate registry state data file for each test sessions ([208](
  * Update test fixtures to work with pytest-asyncio 0.12 ([209](
  * Fix lifecycle-related code errors when handling results of batch-mode tasks ([210](
  * Now the kernel-runner runs on the prebuilt Python 3.8 mounted inside containers. ([189](
  * Adopt [towncrier]( to manage changelogs. ([196](
  * Update flake8 to a prerelease version supporting Python 3.8 syntaxes ([211](


  * Prevent executing startup command multiple times for batch session. ([217](
  * Fix the duplicate Docker event issue due to a misuse of the aidoocker API ([220](
  * Fix blocking of subsequent lifecycle event processing by a hanging-up handler ([221](


  * Improve stability under heavily loaded scenarios ([214](
  - Skip lifecycle sync for already terminating kernels to reduce excessive Docker Engine overheads with a many number of being-terminated kernels
  - Remove timeout for container termination for both self-terminated and user-requested cases
  - Increase timeout for container termination to 60 seconds during restarting kernels, by observing deletion latencies under heavy load tests


  * Fix lifecycle-related code errors when handling results of batch-mode tasks ([210](
  * Fix instability caused by stat-synchronizer processes under heavy loads by collecting statistics periodically only ([212](
  * Apply batching when producing "kernel_stat_sync" events to reduce manager loads and increase timeout for caching stats in Redis from 30 seconds to 2 minutes ([213](


  * Check closing of zmq sockets in code runner to avoid writing on invalid/closed sockets and indefinite waiting for the krunner execution results ([205](
  * Introduce "already-terminated" reason for `kernel_terminated` event ([206](


  * Ensure unhandled exceptions in aiozmq.rpc handlers to be always msgpack-serializable to keep the agent daemon logs clean ([199](
  * Add the force option when calling Docker's container deletion API to avoid rare "stop before removal" errors even when we try deletion after receiving the termination event ([200](
  * Keep the docker event processor running regardless of unexpected exceptions in the middle ([202](
  * Use destroy() of ZeroMQ context objects instead of term() to stabilize container removals ([203](
  * Delete auto-created/temporary volumes with when deleting containers ([204](
  * Adopt towncrier for changelog management ([201](


  * FEATURE: Place bootstrap script upon session creation (174)
  * FIX: Update the restricted/reserved list of service port numbers, allowing privileged TCP ports
  to be used by the kernel image authors. (195)


  * HOTFIX: Regression in a majority of kernels due to a missing `self.loop` initialization.


  * IMPROVE: Support code-server 2.x (the web version of VSCode) as an intrinsic service port.
  (191, 194)
  * BACKPORT/IMPROVE: Keep track of service process lifecycle expplicitly and allow service processes to
  be restarted if they terminate. (183)
  * FIX: Load that hooks user inputs only in the batch-mode execution.
  Previously it was loaded for all Python processes and prevented user prompts in pip and other tools.
  * BACKPORT/FIX: Allow users to edit/override their dotfiles when spawning new containers. (192)
  * FIX: Image list update regression due to typo in the codes (188)


  * IMPROVE/FIX: Rewrite container lifecycle management using queues and persistent states (187)
  * MAINTENANCE: Update dependency.


  * MAINTENANCE: Update dependency packages.


  * HOTFIX: Remove duplicate code block causing agent startup crashes, which is originated from
  a merge error.


  * FIX: Corruption of resource allocation maps due to abnormal termination of containers and/or
  race conditions of resource cleanup handlers. (180)
  * FIX: .bashrc not loaded by the tmux session which is default-enabled in the ttyd intrinsic app.


  * NEW: ttyd terminals now use a shared tmux session by default, making the container's shell session
  persistent across browser refreshes and intermittent connection failures, and also allowing
  pair-programming by sharing the ttyd session. (168, 178)


  * NEW: Add support for user-specific dotfiles population in session containers (166)
  * MAINTENANCE: Revamp the CI/CD pipelines (173)


  * FIX: Package version conflicts due to aiobotocore/botocore and python-dateutil, by removing no longer
  used codes and aiobotocore dependency.


  * FIX: Support host-to-container PID mapping in older Linux kernels (lower than 4.1) which does not
  provide NSPid field in /proc task status.
  * FIX: Invalid ownership of several runtime-generated files in the container working directory such as
  SSH keypair and basic dotfiles, which may prevent containers from working properly.
  * MAINTENANCE: Update aiodocker to 0.17


  * IMPROVE: Skip containers and images with a unsupported (future) kernelspec version.


  * NEW: Provide some minimal basic dotfiles in kernel containers by default (.bashrc and .vimrc) (160)
  - Make the "ls" command always colorized using an alias.
  * NEW: Add support for keypair-specific SSH private key setup (158)


  * ROLLBACK: SFTP throughput optimization. It has caused PyCharm's helper upload failures for its
  remote interpreter and debugging support, while all other tested SFTP clients (Cyberduck, FileZilla)
  have worked flawlessly.
  - While we are investigating both the SSHJ library and dropbear part to find the root cause,
  the optimization is hold back since working is better than fast.


  * FIX/IMPROVE: for kernel containers startup
  - Handle UID overlap (not only GID) correctly by renaming the image's existing account
  - Allow execution as root if the agent is configured to do so.
  - FIX: Ensure library preloads not modifiable by the user accounts in kernels even when they unset
  "LD_PRELOAD" environment variable, by writing "/etc/" file as root.
  NOTE: Alpine-based images does not support this because musl-libc do not use /etc/ld* configurations
  but only depend on environment variables with a few hard-coded defaults.
  * FIX: Ensure dropbear (our intrinsic SSH daemon) to keep environment variables when users either open a
  new SSH session or execute a remote command.
  * FIX: Regression of the batch-mode execution API.
  * MAINTENANCE: Update dependencies and pin Trafaret to v1.x because Trafraet v2.0 release breaks the
  backward compatibility.


  * FIX: SFTP/SCP should work consistently in all images, even without `/usr/bin/scp` and `libcrypto`.
  Applied static builds of OpenSSH utilities with OpenSSL and zlib included.


  * OPTIMIZE: SFTP file transfers are now 3x faster by increasing the network buffer sizes used by
  * FIX: Regression of that caused failure of user/group creation, which resulted in
  inability to use the SSH service port due to missing username.


  * FIX: for kernel containers did not work properly when the container image has an user ID
  or group ID that overlaps with the given values or when the agent is configured to use root for
  containers.  This fixes kernel launches in macOS where the default user's group "staff" has the group
  ID 20 which overlaps with the group "dialout" in Ubuntu or "games" in CentOS.


  * FIX: SSH and SFTP support now works as expected in all types of kernels, including Alpine-based ones.
  The auto-generated keypair name is changed to "id_container" and now it uses RSA instead of ECDSA for
  better compatibility.
  * FIX: Handle rarely happened ProcessLookupError when cleaning up kernels and stat synchronizers
  which has caused infinitely repeated warning "cannot read stats: sysfs unreadable for container xxxx".
  * FIX: Use the canonical, normalized version number for the setup dependency to silence
  pip warnings during installation.


  * FIX: Regression of code execution due to wrong-ordered arguments of code execution RPC call.
  * FIX: Potential memory leak and PID exhaustion due to improper termination of stat synchronizer
  and its logger processes.


  * FIX: In some kernels, git command has failed due to "undefined symbol: dlsym" error.
  It's fixed by adding `-ldl` option to the linker flag of libbaihook.
  * FIX: Reconnection and cancellation of etcd watchers used for manager launch detection


  This is the last preview, feature-freeze release for v19.09 series.
  Stability updates will follow in the v19.09.0 and possibly a few more v19.09.x releases.
  * NEW: Support batch tasks (148, lablup/backend.ai199)
  * NEW: Support image import tasks, with internal-purpose security flag implementations (149,
  * NEW: Intrinsic SSH support to any session, as "sshd" service port.
  The host key and user keypair is randomly generated.  To pin your own SSH keypair, create a
  ".ssh" user vfolder which will be automatically mounted to all your compute sessions.
  * NEW: Add support for a new service port: "sftp" for large-file transfers with vfolders using
  a special dedicated kernel.
  * NEW: Add support for a new service port: "vscode" to access Visual Studio Code running as an
  web application in the interactive sessions.  Note that the sessions running VSCode are recommended to
  have more than 2 GiB of free main memory. (147)
  * IMPROVE: Enable the debugger port in TensorBoard.  Note that this port is for private-use only
  so that a TensorFlow process can send debug-logging data to it in the same container.
  * IMPROVE: Add support for multiple TCP ports to be mapped for a single service.


  * Minor bug fixes
  * CHANGE: The default of "debug.coredump" config becomes false in the halfstack configuration.


  * NEW: Add a new intrinsic service port "ttyd" for all kernels, which provides a clean and slick
  web-based shell access.
  * NEW: Add support for sftp service if the kernel supports it (146).
  * FIX: Now "kernel_terminated" events carry the correct "reason" field, which is stored in the
  "status_info" in the manager's kernels table.
  * FIX: Avoid binary-level conflicts of Python library ( in containers due to
  "/opt/" mounts.  This had crashed some vendor-specific images which relies on
  Python 3.6.4 while our krunner daemon uses Python 3.6.8.
  * CHANGE: The agent-to-manager notifications use Redis instead of ZeroMQ (144,
  lablup/, lablup/, and make the agent to survive
  intermittent Redis connection disruptions.


  * NEW: Add support for specifying shared memory for containers (lablup/backend.ai52, 140)
  * Internally applied static type checks to avoid potential bugs due to human mistakes. (138)
  Also refactored the codebase to split the manager-agent communication part and the kernel interaction
  part (which is now replacible!) for extensible development.
  * Update dependencies including aiohttp 3.6, twine, setuptools, etc.


  * NEW: Add shared-memory stats
  * CHANGE: watcher commands are now executed with "sudo".


  * FIX: regression of batch-mode execution (file uploads to kernels) due to refactoring


  * FIX: Apply a keepalive messaging at the 10-sec interval for agent-container RPC connection to avoid
  kernel-enforced NAT connection tracker timeout (126, lablup/backend.ai46)
  This allow execution of very long computation (more than 5 days) without interruption as long as
  the idle timeout configuration allows.
  * FIX: When reading plugin configurations, merge scaling-group and global configurations correctly.
  * FIX: No longer change the fstab if mount operations fail. Also delete the unmounted folder
  if it is empty after unmount was successful.


  * NEW: Add support for running CentOS-based kernel images by adding CentOS 7.6-based builds for
  libbaihook and su-exec binaries.
  * NEW: watcher: Add support for fstab/mount/unmount management APIs for superadmins (134)
  * Improve stability of cancellation during shutdown via refactoring and let uvloop work more consistently
  with vanilla asyncio.  (133)
  - Now the agent daemon handles SIGINT and SIGTERM much more gracefully.
  - Upgrade aiotools to v0.8.2+
  - Rewrite kernel's `list_files` RPC call to work safer and faster (124).


  * FIX: TensorBoard startup error due to favoring IPv6 address
  * CHANGE: Internally restructured the codebase so that we can add different agent implementations
  easily in the future.  Kubernetes support is coming soon! (125)
  * Accept a wider range of `ai.backend.base-distro` image label values which do not
  include explicit version numbers.


  * CHANGE: Reduce the default websocket ping interval of Jupyter notebooks to 10 seconds
  to prevent intermittent connection losts in specific browser environments. (131)


  * NEW: Add support for watcher information reports (107)
  * Improve versioning of krunner volumes not to interfere with running containers
  when upgraded (120)
  * Add support for getting core dumps inside container as configuration options (114)
  * Fix missing instance ID for configuration scope maps (127)
  * Pin the pyzmq version to 18.1.0 (lablup/backend.ai47)


  * FIX: Disable trash bins in the Jupyter browsers (lablup/backend.ai45)
  * FIX: Revert "net.netfilter.nf_conntrack_tcp_timeout_established" in the recommended kernel parameters
  to the Linux kernel's default (5 days = 432000 seconds). (lablup/backend.ai46)
  * CHANGE: The CPU overcommit factor (previously fixed to 2) is now adjustable by the environment variable
  "BACKEND_CPU_OVERCOMMIT_FACTOR" and the dfault is now 1.
  * NEW: Add an option to change the underlying event loop implementation.


  * Include attached_devices in the kernel creation response (lablup/
  - Compute plugins now should implement `get_attched_devices()` method.
  * Improved support for separation of agent host and kernel (container) hosts
  * Add support for scaling-groups as configured by including them in heartbeats
  * Implement reserved resource slots for CPU and memory (110, 112)


  * CHANGE: Now krunner-env is served as local Docker volumes instead of dummy contaienrs (117, 118)
  - This fixes infinite bloating of anonymous Docker volumes implicitly created from dummy containers
  which consumed the disk space indefinitely.
  - The agent auto-creates and auto-udpates the krunner-env volumes. Separate Docker image deployment
  and manual image tagging are no longer required!
  - The krunner-env image archives are distributed as separate "{distro}" wheel
  * IMPROVED: Now the agent can be run *without* root, given that:
  - The docker socket is accessible by the agent's user permission.
  (usually you have to add the user to the "docker" system group)
  - container.stats-type is set to "docker".
  - The permission/ownership of /tmp/ and agent/event sockets inside it is writable by the
  user/group of the agent.
  - container.kernel-uid, container.kernel-gid is set to -1 or the same values that
  ai/backend/agent/ file stored in the disk has (e.g., inside virtualenv's site-packages
  * Also improved the clean up of scratch directories due to permission issues caused by bind-mounting
  files inside bind-mounted directories.


  - BREAKING CHANGE: The daemon configurations are read from TOML files and
  shared configurations are from the etcd. (112)
  - NEW: The agent now automatically determines the local agent IP address when:
  - etcd's "config/network/subnet/agent" is set to a non-zero network prefix
  - rpc-listen-addr is an empty string
  - Update Jupyter custom styles and resources
  - Update dependencies including uvloop
  - Add explicit timeout for service-port startup


  - Add support for live collection of for node-level, per-device, and per-kernel resource metrics.
  - Include version and compute plugin information in heartbeats.
  - Make it possible to use specific IP address ranges for public ports of kernel containers.


  - Fix inability to delete files in the Jupyter file browser running in containers.


  - Add missing updates for Jupyter style resources to disable Jupyter cluster
  extension which is not compatible with us and to remove unused headers in the
  terminal window.


  - Fix permission handling for container-agent intercommunication socket which
  has prevented unexpected crashes of containers in certain conditions.
  - Mount hand-made tmp dirs only when custom tmpfs is enabled.
  - Update Jupyter style resources.


  - Fix handling of empty resource allocation when rescanning running containers.
  (The bug may happen when the CUDA plugin is installed in the nodes that do not have
  CUDA-capable GPUs.)


  - Minor updates to match with the manager changes.
  - Update dependency: aioredis


  - NEW: Add (official) support for TensorBoard with the default logdir:
  - CHANGE: Use the same "dev" krunner-env image tags for all pre-release and
  development versions to prevent hassles of tag renaming during development.
  - CHANGE: Now the idle timeout is applied per kernel to support
  lablup/ implementation.
  - CHANGE: Rename "--redis-auth" option to "--redis-password" and its
  environment variable equivalent as well.
  - Fix and update accelerator plugin support by adding an in-container socket
  which provides host-only-available information to in-container programs.
  - Apply a customized look-and-feel to Jupyter notebooks in Python-based containers.


  - NEW: A side-by-side watcher daemon (107)
  - It provides a separate channel for watching and controlling the agent
  even when the agent become unavailable (e.g., deadlock or internal crash).
  - It works best with a SystemD integration.
  - WARNING: Currently "reload" (agent restart without terminating running
  containers) has problems with PID tracking.  Finding solutions for this...
  - NEW: Support Redis/etcd authentication (lablup/
  - NOTE: Currently etcd authentication is *not* usable in productions due to
  a missing implementation of automatic refreshing auth tokens in the upstream
  etcd3 library.
  - NEW: Agent-level (system-wide) live statistics (101)
  - Fix detection of up-to-date local Docker image (105)
  - Fix ordering of prompt outputs and user input events in the query mode (106)


  - Make logs and error messages to have more details.
  - Implement RW/RO permissions when mounting vfolders (lablup/
  - Change statistics collector to use UNIX domain socketes, for specific environments
  where locally bound sockets are not accessible via network-local IP addresses.
  - Update Alpine-based kernel runners with a fix for uid-match functionality for them.
  - Fix some bugs related to allocation maps and ImageRef class.


  - NEW: Jupyter notebooks now have our Backend.AI logo and a slightly customized look.
  - Fix the jupyter notebook service-port to work with conda-based images,
  where "python -m jupyter notebook" does not work but "python -m notebook"
  - Let agent fail early and cleanly if there is an initialization error,
  for ease of debugging with supervisord.
  - Fix restoration of resource allocation maps upon agent restarts.


  - Handle failures of accelerator plugin initialization more gracefully.


  - Fix duplicate resource allocation when a computedevice plugin defines
  multiple resource slots.
  - Fix handling multiple sets of docker container configuration arguments
  generated by different compute device plugins.


  - Restore support for fractionally scaled accelerators and a reusable
  FractionAllocMap class for them.
  - Fix a bug after automatically pull-updating kernel images from registries.
  - Fix heartbeat serialization error.


  - Add missing implementation for authenticated image pulls from private docker


  - BIG: Support dynamic resource slots and full private Docker registries. (98)
  - Expand support for various kernel environments: Python 2, R, Julia, JupyterHub


  - Replace "--skip-jail" option with "--sandbox-type", which now defaults to use
  Docker-provided sandboxing until we get our jail stabilized.


  - Fix missing stderr outputs in the query mode.  Now standard Python exception logs
  may contain ANSI color codes as `jupyter_client` automatically highlights them.


  - NEW: Rewrite the kernel image specification.  Now it is much easier to build
  your own kernel image by adding just a few more labels in Dockerfiles.
  - We now support official NVIDIA GPU Cloud images in this way.
  - We are now able to support Python 2.x kernels again!
  - Now agent/kernel-runner/jail/hook are all managed together and the kernel
  images are completely separated from their changes.
  - NEW: New command-line options
  - `--skip-jail`: disables our jail and falls back to the Docker's default seccomp
  filter.  Useful for troubleshotting with our jail.
  - `--jail-arg`: when using our jail, add extra command-line arguments to the jail
  by specifying this option multiple times.
  Note that options starting with dash must be prepended with an extra space to
  avoid parsing issues imposed by the Python's standard argparse module.
  - `--kernel-uid`: when the agent is executed as root, use this to make the kernel
  containers to run as specific user/UID.
  - `--scratch-in-memory`: moves the scratch and /tmp directories into a separate
  in-memory filesystem (tmpfs) to avoid inode/quota exahustion issues in
  multi-tenant setups.
  This option is only available at Linux and the agent must be run as root. When
  used, the size of each directory is limited to 64 MiB. (In the future this will
  become configurable.)
  - CHANGE: The kernel runner now preserves container-defined environment variables.


  - Technical release to fix a packaging mistake in 18.12.0.


  - Version numbers now follow year.month releases like Docker.
  We plan to release stable versions on every 3 months (e.g., 18.12, 19.03, ...).
  - NEW: Support TPU (Tensor Processing Units) on Google Clouds.
  - Clean up log messages for on-premise devops & IT admins.


  - NEW: Support specifying credentials for private Docker registries.
  - CHANGE: Now it prefers etcd-based docker registry configs over CLI arguments.


  - Technical release to fix the dependency version.


  - NEW: Support user-specified ranges for the service ports published by containers
  via the `--container-port-range` CLI argument for firewall-sensitive setups.
  (The default range is 30000-31000) (90)
  - CHANGE: The agent now automatically pulls the image if not available in the host.
  - CHANGE: The process monitoring tools will now show prettified process names for
  Backend.AI's daemon processes which exhibit the role and key configurations (e.g.,
  namespace) at a glance.
  - Improve support for using custom/private Docker registries.


  - NEW: App service ports!  You can start a compute session and directly connect to a
  service running inside it, such as Jupyter Notebook! (89)
  - Internal refactoring to clean up and fix bugs related to image name references.
  - Fix bugs in statistics collection.
  - Monitoring tools are separated as plugins.


  - Generalizes accelerator supports
  - Accelerators such as CUDA GPUs can be installed as a separate plugin (66)
  - Adds support for nvidia-docker v2 (64)
  - Adds support for allocation of multiple accelerators for one kernel container as
  well as partial shares of each accelerator (66)
  - Revamp the agent restart and kernel initialization processes (35, 73)
  - The view of the agent can be limited to specific CPU cores and GPUs
  using extra CLI arguments: `--limit-cpus`, `--limit-gpus` for
  debugging and performance benchmarks. (65)


  - Hotfix for handling of dotted image names when they are terminated.


  - Hotfix for handling subdirectories in batch-mode file uploads.


  - Fix vfolder mounts to use the configuration specified in the etcd.
  (No more fixed to "/mnt"!)


  - Fix occasional KeyError when destroying kernels. (56)
  - Deploy a debug log for occasional FileNotFoundError when uploading files
  in the batch mode. (57)


  - Fix wrong kernel_host sent back to the manager when not overridden.


  - Technical release to fix depedency version.


  - Technical release to update CI configuration.


  - Fix repeating docker event polling even when there is connection/client-side
  aiohttp errors.
  - Upgrade aiohttp to v3.0 release.
  - Improve dockerization. (55)
  - Improve inner beauty.


  - From this release, the manager and agent versions will go together, which indicates
  the compatibility of them, even when either one has relatively little improvements.
  - Include the exit code of the last executed in-kernel process when returning
  `build-finished` or `finished` results in the batch mode.
  - Improve logging to support rotating file-based logs.
  - Upgrade aiotools to v0.5.2 release.
  - Remove the image name prefix when reporting available images. (51)
  - Improve debug-kernel mode to mount host-side kernel runner source into the kernel
  containers so that they use the latest, editable source clone of the kernel runner.


  - Automatically assign the run ID if set None when starting a run.
  - Pass environment variables in the start-config to the kernels via
  `/home/work/.config/environ.txt` file mounted inside kernels.
  - Include the list of kernel images available to the agent when sending
  heartbeats. (51)
  - Remove simplejson from dependencies in favor of the standard library.
  The stdlib has been updated to support all required features and use
  an internal C-based module for performance.


  - Update aioredis to v1.0.0 release.
  - Remove "mode" argument from completion RPC calls.
  - Fix a bug when terminating overlapped execute streams, which has caused
  indefinite hangs in the client side due to missing "finished" notification.


  - Implement virtual folder mounting (assuming /mnt is already configured)


  - Fix synchronization issues when restarting kernels
  - Improve "debug-kernel" mode to use the given kernel name


  - Fix a bug in duplicate-check of our Docker event stream monitoring coroutine


  - Fix automatic mounting of deeplearning-samples Docker volume for ML kernels
  - Stabilize statistics collection
  - Fix typos


  - Prevent duplicate Docker event generation
  - Various bug fixes and improvements (44, 45, 46, 47)


  - This release is replaced with v1.0.1 due to many bugs.
  - Rename the package to "Backend.AI" and the import path to `ai.backend.agent`
  - Rewrite interaction with the manager
  - Read configuration from etcd shared with the manager
  - Add FIFO-style scheduling of overlapped execution requests
  - Implement I/O and network statistic collection using sysfs


  - Fix and improve version reference mechanisms.
  - Fix missing import error vanished during hostfix cherrypick


  - It now applies the same UID to the spawned containers if they have the "uid-match"
  feature label flag. (backported from develop)


  - Add missing "sorna-common" dependency and update other requirements.


  - Fix the wrong version range of an optional depedency package "datadog"


  - Improve packaging so that has the source list of dependencies
  whereas requirements.txt has additional/local versions from exotic
  - Support exception/event logging with Sentry and runtime statistics with Datadog.


  - Fix interactive user inputs in the batch-mode execution.


  - Add support for the batch-mode API with compiled languages such as
  - Add support for the file upload API for use with the batch-mode API.
  (up to 20 files per request and 1 MiB per each file)
  - Only files stored in "/home/work.output" directories of kernel containers
  are auto-uploaded to S3 as downloadable files, as now we rely on our
  dedicated multi-media output interfaces to show plots and other graphics.
  Previously, all non-hidden files in "/home/work" were uploaded.


  - Fix a regression in console output streaming.


  - Add PyTorch support.
  - Upgrade aiohttp to v2 and relevant dependencies as well.


  - Update missing long_description.


  - Improve packaging: auto-converted as long description and unified
  requirements.txt and dependencies.


  - Fix sorna-common requirement version.


  - Separate console output formats for API v1 and v2.
  - Deprecate unused matching option for execution API.
  - Remove control messages in API responses.


  - PUSH/PULL-based kernel interaction protocol to support streaming outputs.
  This enables interactive input functions and streaming outputs for long-running codes,
  and also makes kernel execution more resilient to network failures.
  (ZeroMQ's REQ/REP sockets break the system if any messages get dropped)


  - Fix a typo that generates errors during GPU kernel initialization.
  - Fix regression of '--agent-ip-override' cli option.


  - Minor internal polishing release.


  - Bump version to 0.8 to match with sorna-manager and sorna-client.
  - Fix events lost by HTTP connection timeouts when using `` from
  aiodocker.  (It is due to default 5-minute timeout set by aiohttp)
  - Correct task cancellation


  - Add new aliases for "git" kernel: "git-shell" and "shell"


  - Now it uses [`aiodocker`](
  instead of [`docker-py`]( to
  prevent timeouts with many concurrent requests.
  NOTE: You need to run `pip install -r requirements.txt` to install the
  non-pip (GitHub) version of aiodocker correctly, before running
  `pip install sorna-agent`.
  - Fix corner-case exceptions in statistics/heartbeats.


  - Increase docker API timeouts.
  - Fix heartbeats stop working after kernel/agent timeouts.
  - Fix exception logging in the main server loop.


  - Hotfix for missing dependency: coloredlogs


  - `--agent-ip-override` CLI option to override the IP address of agent
  reported to the manager.


  - Add support for kernel restarts.
  Restarting preserves kernel metadata and its ID, but removes and recreates
  the working volume and the container itself.
  - Add `--debug` option to the CLI command.


  - Add support for GPU-enabled kernels (using
  [nvidia-docker plugin](
  The kernel images must be built upon nvidia-docker's base Ubuntu images and
  have the label "io.sorna.nvidia.enabled" set `yes`.
  - Change the agent to add "lablup/" prefix when creating containers from
  kernel image names, to ease setup and running using the public docker
  repository.  (e.g., "lablup/kernel-python3" instead of "kernel-python3")
  - Change the prefix of kernel image labels from "com.lablup.sorna." to
  "io.sorna." for simplicity.
  - Increase the default idle timeout to 30 minutes for offline tutorial/workshops.
  - Limit the CPU cores available in kernel containers.
  It uses an optional "io.sorna.maxcores" label (default is 1 when not
  specified) to determine the requested number of CPU cores in kernels, with a
  hard limit of 4.
  NOTE: You will still see the full count of CPU cores of the underlying
  system when running `os.cpu_count()`, `multiprocessing.cpu_count()` or
  `os.sysconf("SC_NPROCESSORS_ONLN")` because the limit is enforced by the CPU
  affinity mask.  To get the correct result, try


  - First public release.
  You should *NOT* be adding new change log entries to this file, this
  file is managed by towncrier. You *may* edit previous change logs to
  fix problems like typo corrections or such.
  To add a new change log entry, please refer
  We named the news folder "changes".
  WARNING: Don't drop the last line!
  .. towncrier release notes start