Caper

Latest version: v2.3.2

Safety actively analyzes 630217 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 15

1.2.0

Added `caper cleanup` to clean up outputs of a single workflow.
- Usage:
- `caper cleanup [METADATA_JSON_FILE]`: Define a `metadata.json` file for the first positional argument.
- `caper cleanup [KEYWORD_TO_FIND_WORKFLOW]`: Define a keyword to find a workflow from a running `caper server`.
- This **DELETE**s **ALL FILES** on a workflow root directory (under key `workflowRoot` in metadata JSON) written in metadata. Deletion is done in a `rm -rf` fashion.
- This action is **NOT REVERSIBLE**. Use this at your own risk.
- Add `--delete` to actually delete all outputs. Otherwise, `caper cleanup` will just dry-run.

Added `caper gcp_monitor` to monitor resources of workflows run on Google Clouds.
- Usage:
- `caper gcp_monitor [METADATA_JSON_FILE]`: Define a `metadata.json` file for the first positional argument.
- `caper gcp_monitor [KEYWORD_TO_FIND_WORKFLOW]`: Define a keyword to find a workflow from a running `caper server`.
- Output will be either in a TSV format or in a JSON format.
- TSV: Flatten version (with dot notation) of JSON one.
- JSON: More detailed.
- This feature is available for a workflow run with Caper>=1.2.0.
- Outputs:
- Task name / shard index / status / ...
- Instance resource info: cpu, memory, Cromwell-mounted disk.
- Statistics of resources: `mean`, `std`, `min`, `max` of each resource (memory, Cromwell-mounted disk's usage, cpu_pct).

1.1.0

Users need to change default Cromwell from 51 to 52.
- Edit `~/.caper/default.conf` to replace `51` with `52` for keys `cromwell` and `womtool`.
- Or Make a backup of the conf file and start over with `caper init YOUR_BACKEND`.
- See [README](README.md) for details about `caper init`.

Upgraded Cromwell from 51 to 52
- Due to change of Google API, [Cromwell-51 will not work](https://github.com/broadinstitute/cromwell/releases/tag/52) after 8/20/2020.
- Recently-created service accounts will not work even before 8/20/2020.

Added a shell script to automate Caper server instance on Google Cloud Platform
- Added `scripts/gcp_caper_server/create_instance.sh`

Lowered logging level for some annoying messages
- Exporting Google Cloud environment variables.

Bug fixes
- Double slashed directory `//` for `metadata.json` on `gs://`.
- CLI can catch `SIGTERM` for graceful shutdown of Cromwell Java thread.
- Can detect mutually exclusive parameters. e.g. `--singularity` and `--docker`.

1.0.0

Most parts of the code is rewritten for Caper 1.0.0.

Upgraded Cromwell from 47 to 51.
- Metadata DB generated with Caper<1.0 will not work with Caper>=1.0.
- See [this note](https://github.com/broadinstitute/cromwell/releases/tag/49) to find DB migration instruction.

Changed hashing strategy for all local backends (`local`, `slurm`, `sge`, `pbs`).
- Default hashing strategy: `file` (based on md5sum, which is expensive) to `path+modtime`.
- Changing hashing strategy and using the same metadata DB will result in cache-miss.

Changed duplication strategy for all local backends (`local`, `slurm`, `sge`, `pbs`).
- Default file duplication strategy: `hard-link` to `soft-link`.
- For filesystems (e.g. beeGFS) that do not allow hard-linking.
- Caper<1.0 hard-linked input files even with `--soft-glob-output`.
- For Caper>=1.0, you still need to use `--soft-glob-output` for such filesystems.

Google Cloud Platform backend (`gcp`):
- Cau use a service account instead of an application default (end user's auth.).
- Added `--gcp-service-account-key-json`.
- Make sure that such service account has enough permission (roles) to resources on Google Cloud Platform project (`--gcp-prj`). See [details](docs/conf_gcp.mdhow-to-run-caper-with-a-service-account).
- Can use Google Cloud Life Sciences API (v2beta) instead of deprecating Google Cloud Genomics API (v2alpha1).
- Added `--use-google-cloud-life-sciences`.
- For `caper server/run`, you need to specify a region `--gcp-region` to use Life Sciences API. Check [supported regions](https://cloud.google.com/life-sciences/docs/concepts/locations). `--gcp-zones` will be ignored.
- Make sure to enable `Google Cloud Life Sciences API` on Google Cloud Platform console (APIs & Services -> `+` button on top).
- Also if you use a service account then add a role `Life Sciences Admin` to your service account.
- We will deprecate old `Genomics API` support. `Life Sciences API` will become a new default after next 2-3 releases.
- Added [`memory-retry`](https://cromwell.readthedocs.io/en/stable/backends/Google/) to Caper. This is for `gcp` backend only.
- Retries (controlled by `--max-retries`) on an instance with increased memory if workflow fails due to OOM (out-of-memory) error.
- Comma-separated keys to catch OOM: `--gcp-prj-memory-retry-error-keys`.
- Multiplier for every retrial due to OOM: `--gcp-prj-memory-retry-multiplier`.

Improved Python interface.
- Old Caper<1.0 was originally designed for CLI.
- New Caper>=1.0 is designed for Python interface first and then CLI is based on such Python interface.
- Can retrieve `metadata.json` embedded with subworkflows' metadata JSON.

Better logging and troubleshooting.
- Defaults to write Cromwell STDOUT/STDERR to `cromwell.out` (controlled by `--cromwell-stdout`).

Notes for devs

Server/run example:
python
c = CaperRunner(
local_loc_dir='/scratch/me/tmp_loc_dir',
local_out_dir='/scratch/me/out',
default_backend='Local')

get server thread
th = c.server(port=8000)

do something
while th.returncode is None:
break

stop the server
th.stop()

wait
th.join()

run example
metadata_dict = c.run('my.wdl', inputs='my_input.json', ...)


Client example
python
cs = CaperClientSubmit(hostname='localhost', port=8000)

r = c.submit('my.wdl', inputs='my_inputs.json', imports='my_imports.zip', ...)
workflow_id = r['id']

for m in c.metadata([workflow_id], embed_subworkflow=True):
m = metadata dict embedded with subworkflows' metadata JSON
print(m['status'])


How to read from conf_file.
python
from caper.caper_args import get_parser_and_defaults

get both argparse.ArgumentParser (defaults updated with contents in conf_file)
and conf_dict including key/value pairs in conf_file.
such value is converted to a correct type (guessed from ArgumentParser's defaults).
parser, conf_dict = get_parser_and_defaults(conf_file='~/.caper/default.conf')

server_port_from_conf = conf_dict['port']

0.8.2.1

Bug fix for importing `autouri`.

Traceback (most recent call last):
File "/opt/circleci/.pyenv/versions/3.7.0/bin/caper", line 11, in <module>
from caper.caper import main
File "/opt/circleci/.pyenv/versions/3.7.0/lib/python3.7/site-packages/caper/caper.py", line 32, in <module>
from autouri import logger as autouri_logger
ImportError: cannot import name 'logger' from 'autouri'

0.8.2

> **IMPORTANT**: Users need to re-install Caper then it will automatically install latest bug-fixed `pyhocon`. Or manually upgrade `pyhocon` to >= 0.3.53.
bash
$ pip install pyhocon==0.3.54

If you want to keep using old Caper versions (<0.8.2), then downgrade `pyhocon` to < 0.3.53.

Bug fixes
- For Singularity users: Can follow symlinks recursively in input JSON to get `SINGULARITY_BINDPATH`.
- For cluster users (SLURM, SGE, PBS, ...): Upgraded `pyhocon` >= 0.3.53 to fix a bug for parsing escaped characters (e.g. `\\` and `\"`).

0.8.1.1

Fix dependency problem.

Page 5 of 15

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.