pyspacer Changelog

0.9.0

- Python 3.8 and 3.9 support have been dropped; Python 3.11 support has been added.

- torch and torchvision accepted versions have been relaxed to accommodate Python 3.11. (torch==1.13.1 to torch>=1.13.1,<2.3; torchvision==0.14.1 to torchvision>=0.14.1,<0.18)

- `task_utils.preprocess_labels()` now has three available modes on how to split training annotations between train, ref, and val sets. Differences between the three modes - `VECTORS`, `POINTS`, and `POINTS_STRATIFIED` - are explained in the `SplitMode` Enum's comments. Additionally, all three modes now ensure that the ordering of the given training data has no effect on which data goes into train, ref, and val.

The table below compares the three modes to the splitting functionality of earlier versions of pyspacer. Note that it's still possible to split train/ref/val yourself instead of letting pyspacer do it.

| Mode | Sets split in pyspacer | Order agnostic | Vectors can be split | Stratifies by label |
|-------------------|------------------------|----------------|----------------------|---------------------|
| 0.6.1 and earlier | Train/ref | No | No | No |
| 0.7.0 - 0.8.0 | Train/ref/val | No | No | No |
| VECTORS | Train/ref/val | Yes | No | No |
| POINTS | Train/ref/val | Yes | Yes | No |
| POINTS_STRATIFIED | Train/ref/val | Yes | Yes | Yes |

- The `train_classifier` task now accepts label IDs as either integers or strings, not just integers.

- The `train_classifier` task is now able to locally cache feature vectors which were loaded from remote storage, which can greatly speed up training from epoch 2 onward. This is optional and enabled by default; the location of the cache directory is also configurable.

0.8.0

- `ImageFeatures` with `valid_rowcol=False` are no longer supported for training. For now they are still supported for classification.

- S3 downloads are now always performed in the main thread, to prevent `RuntimeError: cannot schedule new futures after interpreter shutdown`.

- `S3Storage` and `storage_factory()` now use the parameter name `bucket_name` instead of `bucketname` to be consistent with other usages in pyspacer (by yeelauren).

- `URLStorage` downloads and existence checks now have an explicit timeout of 20 seconds (this is a timeout for continuous unresponsiveness, not for the whole response).

- EfficientNet feature extraction now uses CUDA if available (by yeelauren).

- Updates to pip-install dependencies:

- Pillow: >=10.0.1 to >=10.2.0

0.7.0

- `TrainClassifierMsg` labels arguments have changed. Instead of `train_labels` and `val_labels`, it now takes a single argument `labels`, which is a `TrainingTaskLabels` object (basically a set of 3 `ImageLabels` objects: training set, reference set, and validation set).

- The new function `task_utils.preprocess_labels()` can be called in advance of building a TrainClassifierMsg, to 1) split a single ImageLabels instance into reasonably-proportioned train/ref/val sets, 2) filter labels to only a desired set of classes, and 3) run error checks.

- Removed `MIN_TRAINIMAGES` config var. Minimum number of images for training is now 1 train set, 1 ref set, and 1 val set; or 3 total if leaving the split to pyspacer.

- Added `LOG_DESTINATION` and `LOG_LEVEL` config vars, providing configurable logging for test-suite runs or quick scripts.

- Logging statements throughout pyspacer's codebase now use module-name loggers rather than the root logger, allowing end-applications to keep their logs organized.

- Fixed bug where int config vars couldn't be configured through environment vars or secrets.json.

- Updated various error cases (mainly SpacerInputErrors, asserts, and ValueErrors) with more descriptive error classes. The `SpacerInputError` class is no longer available.

0.6.1

- In 0.5.0, the hash check when loading a feature extractor was broken in two ways. First, it got an error when trying to check the hash. Second, if the hash check failed for a remote-loaded extractor file, then a second attempt at loading would still allow extraction to proceed. This release fixes both problems.

0.6.0

- Fixed `DummyExtractor` constructor so that `data_locations` defaults to an empty dict, not an empty list. This fixes serialization of an `ExtractFeaturesMsg` containing `DummyExtractor`.

- Updates to pip-install dependencies:

- Pillow: >=9.0.1 to >=10.0.1

0.5.0

- Generalized feature extractor support by allowing use of any `FeatureExtractor` subclass instance, and extractor files loaded from anywhere (not just from CoralNet's S3 bucket, which requires CoralNet auth).

- In `ExtractFeaturesMsg` and `ClassifyImageMsg`, the parameter `feature_extractor_name` (a string) has been replaced with `extractor` (a `FeatureExtractor` instance).

- In `ExtractFeaturesReturnMsg`, `model_was_cached` has been replaced by `extractor_loaded_remotely`, because now filesystem-caching doesn't apply to some extractor files (they may originally be from the filesystem).

- Config variable `LOCAL_MODEL_PATH` is now `EXTRACTORS_CACHE_DIR`. This is now used by any remote-loaded (S3 or URL based) extractor files. If extractor files are loaded from the filesystem, then it's now possible to run PySpacer without defining any config variable values.

- Added `AWS_REGION` config var, which is now required for S3 usage.

- Added `TEST_EXTRACTORS_BUCKET` and `TEST_BUCKET` config vars for unit tests, but these are not really usable by anyone besides core devs at the moment.

- Some raised errors' types have changed to PySpacer's own `ConfigError` or `HashMismatchError`, and there are cases where error-raising semantics/timing have changed slightly.

Pyspacer

Page 1 of 2