Torchaudio

Latest version: v2.3.0

Safety actively analyzes 629564 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 14 of 15

0.7.0

Highlights

Example Pipelines

torchaudio is expanding its support for models and [end-to-end applications](https://github.com/pytorch/audio/tree/master/examples). Please file an issue on [github](https://github.com/pytorch/audio/issues/new?template=questions-help-support.md) to provide feedback on them.

* **Speech Recognition:** Building on the addition of the Wav2Letter model for speech recognition in the last release, we added a training example pipelines for speech recognition that uses the LibriSpeech dataset.
* **Text-to-Speech:** With the goal of supporting text-to-speech applications, we added a vocoder based on the WaveRNN model. WaveRNN model is based on the implementation from [this repository](https://github.com/fatchord/WaveRNN). The original implementation was introduced in "Efficient Neural Audio Synthesis". We provide an example training pipeline in the example folder that uses the LibriTTS dataset added to torchaudio in this release.
* **Source Separation:** We also support source separation with the addition of the ConvTasNet model, based on the paper "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation." An example training pipeline is provided with the wsj0-mix dataset.

I/O Improvements

As you are likely already aware from the last release we’re currently in the process of making `sox_io`, which ships with new features such as TorchScript support and performance improvements, the new default. If you want to benefit from these features now, we encourage you to migrate. For more information see issue 903.

Backwards Incompatible Changes

* Switched all %-based string formatting to `str.format` to adopt changes in PyTorch, leading to improved error messages for TorchScript (850)
* Split `sox_utils.list_formats()` for read and write (811)
* Made directory traversal order alphabetical and breadth-first, consistent across operating systems (814)
* Changed GTZAN so that it only traverses filenames belonging to the dataset (791)

New Features

* Added ConvTasNet model (920, 933) with pipeline (894)
* Added canonical pipeline with wav2letter (632)
* The WaveRNN model (705, 797, 801, 810, 836) is available with a canonical pipeline (749, 802, 831, 863)
* Added all 3 releases of tedlium dataset (882, 934, 945, 895)
* Added `VCTK_092` dataset (812)
* Added LibriTTS (790, 820)
* Added SPHERE support to `sox_io` backend (871)
* Added torchscript sox effects (760)
* Added a flag to change the interface of `soundfile` backend to the one identical to `sox_io` backend. (922)

Improvements

* Added `soundfile` compatibility backend. (922)
* Improved the speed of `torchaudio.compliance.kaldi.fbank` (947)
* Improved the speed of phaser (660)
* Added warning when a Mel filter is all zero (914)
* Added `pathlib.Path` support to `sox_io` backend (907)
* Simplified C++ registration with TORCH_LIBRARY (840)
* Merged sox effect and `sox_io` C++ implementation (779)

Internal

* CI: Added test to validate torchscript backward compatibility (838)
* CI: Used mocked datasets to test CMUArctic (829), CommonVoice (827), Speech Commands (824), LJSpeech (826), LibriSpeech (825), YESNO (792, 832)
* CI: Made *nix unit test fail if C++ extension is not available (847, 849)
* CI: Separated I/O in testing. (813, 773, 783)
* CI: Added smoke tests to `sox_io` and `sox_effects` (806)
* CI: Tested utilities have been refactored (805, 808, 809, 817, 822, 831)
* Doc: Added how to run tests (843)
* Doc: Added 0.6.0 to version matrix in README (833)

Bug Fixes

* Fixed device in interactive ASR example (900)
* Fixed incorrect extension parsing (885)
* Fixed dither with `noise_shaping = True` (865)
* Run unit test with non-editable installation (845), and set `zip_safe = False` to disable egg installation (842)
* Sorted GTZAN dataset and use on-the-fly data in GTZAN test (819)

Deprecations

* Removed `istft` wrapper in favor of [torch.istft](https://pytorch.org/docs/master/generated/torch.istft.html#torch.istft). (841)
* Deprecated `SoxEffect` and `SoxEffectsChain` (787)
* I/O: Deprecated `sox` backend. (904)
* I/O: Deprecated the current interface of `soundfile`. (922)
* I/O: Deprecated `load_wav` functions. (905)

0.6.0

Highlights

torchaudio now includes a new model module (with wav2letter included), new functionals (contrast, cvm, dcshift, overdrive, vad, phaser, flanger, biquad), datasets (GTZAN, CMU), and a new optional sox backend with support for torchscript. torchaudio now also supports Windows, with the soundfile backend.

torchaudio requires python 3.6 or more recent.

Backwards Incompatible Changes

* We reorganized the C++ resources (630) and replaced C++ bindings for sox_effects init/list/shutdown with torch binding (748).
* We removed code specific to python 2 (691), and we no longer tests against python 2 (575) and 3.5 (577)

New Features

* We now support Windows. (604, 637, 642, 655, 743)
* We now have a model module which includes wav2letter. (462, 722)
* We added the GTZAN and CMU datasets. (668, 710)
* We now have the contrast functional (551), cvm (540), dcshift (558), overdrive (569), vad (578, 599), phaser (587, 607, 702), flanger (651, 702), biquad (661).
* We added a new sox_io backend (718, 728, 734, 727, 763, 752, 731, 732, 726, 780) that is compatible with torchscript with a new AudioMetaData class (761).
* MelSpectrogram now has power and normalized parameters (633), and slaney normalization (589, 641).
* lfilter now has a clamp option. (600)
* Griffin-Lim can now have zero momentum. (601)
* sliding_window_cmn now supports batching. (570)
* Downloaded datasets now verify checksums. (499)

Improvements

* We added ogg/vorbis/opus support to binary distribution (750, 755).
* We replaced the use of torch.norm in spectrogram to improve performance (747).
* We now use fused operations in lfilter for faster computation. (517, 564)
* STFT is now called directly from torchaudio. (531)
* We redesigned the backend mechanism to support torchscript, by restructuring the code (695, 696, 700, 706, 707, 698), adding dynamic listing (697)
* torchaudio can be built along with sox, or can use external sox. (625, 669, 739)
* We redesigned the sox_effects module. (708)
* We added more details to compilation instructions. (667)
* We updated the README with instructions on changing the backend. (553)
* We now have a version compatibility matrix in README. (685)
* We now use cmake to build third party libraries (753).
* We now use CircleCI instead of travis (576, 584, 598, 603, 636, 738) and we test on GPU (586, 777).
* We run the test suite against nightlies. (538, 678)
* We redesigned our test suite: with new helper functions (514, 519, 521, 565, 616, 690, 692, 694), standard pytorch test utilities (513, 640, 643, 645, 646, 652, 650, 712), separated CPU and GPU tests (513, 528, 644), more descriptive names (532), clearer organization (539, 541, 542, 664, 672, 687, 703, 716, 732), standardized name (559), and backend aware (719). This is detailed in a new README for testing (566, 759).
* We now support typing, for datasets (511, 522), for backends (527), for init (526), and inline (530), with mypy configuration (524, 544, 590).

Bug Fixes

* We removed in place operations so that Griffin-Lim can be backpropagated through. (730)
* We fixed kaldi MFCC on GPU. (681)
* We removed multiple definitions of SoxEffect in C++. (635)
* We fixed the docstring of masking. (612)
* We replaced views by reshape for batching. (594)
* We fixed missing conda environment when testing in python 3.8. (582)
* We ensure that sox is not exposed in windows. (579)
* We corrected the instructions to install nightlies. (547, 552)
* We fix the seed of mask_along_iid. (529)
* We correctly report GPU tests as skipped instead of passed. (516)

Deprecations

* Since sox_effects is now automatically initialized and shutdown (572, 693), we are deprecating these functions (709).
* ISTFT is migrating to torch. (523)

0.5.1

Highlights

* Updated pinned version of PyTorch to [`v1.5.1`](https://github.com/pytorch/pytorch/releases/tag/v1.5.1)

0.5.0

Highlights

torchaudio includes new transforms (e.g. Griffin-Lim and inverse Mel scale), new filters (e.g. all pass, fade, band pass/reject, band, treble, deemph, riaa), and datasets (LJ Speech and SpeechCommands).

Backwards Incompatible Changes

* torchaudio no longer supports python 2. We removed future and six imports. We added inline typing. (413, 478, 479, 482, 486)
* We fixed CommonVoice dataset download, and updated to the latest version. (498)
* We now skip data point with missing data in VCTK dataset. (484)

New Features

* We now have the Vol transforms, and DB_to_amplitude.(468, 469)
* We now have the InverseMelScale (448)
* We now have the Griffin-Lim functional. (365)
* We now support allpass, fade, bandpass, bandreject, band, treble, deemph, riaa. (444, 449, 464, 470, 508)
* We now offer LJSpeech and SpeechCommands datasets. (439, 437)

Improvements

* We added inline typing to SoxEffects and Kaldi compliance. (490, 497)
* We refactored the tests. (480, 485, 496, 491, 501, 502, 503, 506, 507, 509)
* We now run tests with sox only when sox is available. (419)
* We extended batch support to MelScale, MelSpectrogram, MFCC, Resample. (391, 435)
* The speed of torchaudio.functional.istft was improved. (471)
* We now have transform and functional tests for AmplitudeToDB. (463)
* We now ignore pycharm and OSX files in git. (461)
* TimeStretch now has a batch test. (459)
* Docstrings in transforms were polished. (442)
* TimeStretch and AmplitudeToDB are now torch.nn.Module. (456)
* Resample is now jitable. (441)
* We support python 3.8. (397)
* Add cuda test for complex norm. (421)
* Dither is jitable with the latest version of pytorch. (417)
* Batching uses view instead of reshape. (409)
* We refactored the jitability test. (395)
* In .circleci, we removed a conditional block that wasn't doing anything. (399)
* We now have Windows CI for building. (394 and 398)
* We corrected the use of standard variable names in code. (393)
* We adopted native-Python code generation convention. (378)
* torchaudio.istft creates tensors directly on device. (377)
* torchaudio.compliance.kaldi.resample_waveform is now jitable. (362)
* The runtime of torchaudio.functional.lfilter was decreased. (374)

Bug Fixes

* We fixed flake8 errors. (504, 505)
* We fixed Windows test by only testing with cpu-only binaries. (489)
* Spelling correction in docstrings for transforms.FrequencyMasking and transforms.TimeMasking. (474)
* In .circleci, we switched to use token for conda uploads. (460)
* The default value of dither parameter was changed. (453)
* TimeStretch moves device correctly. (457)
* Adding dev-other option in librispeech. (433)
* In build script, we install the correct version of pytorch for pip. (412)
* Upgrading dataset DeprecationWarning to UserWarning so that the user gets the warning. (402)
* Make power of spectrogram a float to work with complex norm. (392)
* Fix random seed for flaky test_griffinlim test. (388)
* Apply 'nightly' branch filter to binary uploads. (385)
* Fixed build errors: added explicitly utf8 decoration, added explicit utf_8_encoder definition if not available, explicitly cast to int. (380)

Deprecations

* None

0.4

* We introduce an interactive speech recognition demo. (266, 229, 248)
* SoX is now optional, and a new extensible backend dispatch mechanism exposes SoundFile as an alternative to SoX.
* The interface for datasets has been unified. This enables the addition of two large datasets: LibriSpeech and Common Voice.
* New filters such as biquad, data augmentation such as time and frequency masking, and transforms such as gain and dither, and new feature computation such as deltas, are now available.
* Transformations now support batches and are jitable.

We would like to thank again our contributors and the wider community for their significant contributions to this release. In particular we'd like to thank keunwoochoi, ksanjeevan, and all the other maintainers and contributors of torchaudio-contrib for their significant and valuable additions around augmentations (285) and batching (327).

Breaking Changes

* torchaudio now requires PyTorch 1.3.0 or newer, see https://pytorch.org/ for installation instructions. (#312)
* We make jit compilation optional for functions and use nn.Module where possible. (314, 326, 342, 369)
* By unifying the interface for datasets, we changed the interface for VCTK and YESNO (303, 316). In particular, the construction parameters `downsample`, `transform`, `target_transform`, and `return_dict` are being deprecated.
* SoxEffectsChain.EFFECTS_AVAILABLE replaced by SoxEffectsChain().EFFECTS_AVAILABLE (355)
* This is the last version to support Python 2.

New Features

* SoX is now optional, and a new extensible backend dispatch mechanism exposes SoundFile as an alternative to SoX. This makes it possible to use torchaudio even when SoX or SoundFile are not installed or available. (355)
* We now have a unified dataset interface that loads in memory only one item at a time enabling new large datasets: LibriSpeech and CommonVoice. (303, 316, 330)
* We introduce a pitch detection algorithm: `torchaudio.functional.detect_pitch_frequency`. (313, 322)
* We offer data augmentations in `torchaudio.transforms`: `TimeStretch`, `FrequencyMasking`, `TimeMasking`. (285, 333, 348)
* We introduce a complex norm transform: `torchaudio.transform.ComplexNorm`. (285, 333)
* We now have a new audio feature generation for computing deltas: `torchaudio.functional.compute_deltas`. (268, 326)
* We introduce `torchaudio.functional.gain` and `torchaudio.functional.dither` (319, 360). We welcome work to continue the effort to implement features available in SoX, see 260.
* We now include `equalizer_biquad` (315, 340), `lowpass_biquad`, `highpass_biquad` (275), `lfilter`, and `biquad` (275, 291, 326) in `torchaudio.functional`.
* MFCC is available as `torchaudio.functional.mfcc`. (228)

Improvements

* We now support batching in transforms. (327, 337, 404)
* Functions are now jitable, and nn.Module is used where possible. (314, 326, 342, 362, 369, 395)
* Downloads of large files are now automatically resumed with new download function. (320)
* New tests for ISTFT are added. (279)
* We introduce nightly builds. (301)
* We now have smoke tests for builds. (346, 359)

Bug Fixes

* Fix mismatch between `MelScale` and librosa. (294)
* Fix `torchaudio.compliance.kaldi.resample_waveform` where internal variables where not moved to the GPU when used. (277)
* Fix a bug that occurred when importing torchaudio built outside of a git repository. (276)
* Fix `istft` where the `dtype` and `device` of parameters were not created on the same device as the tensor provided by the user. (264)
* Fix size mismatch when saving and loading from state dictionary (`load_state_dict`). (246)
* Clarified internal naming convention within transforms and functionals. (298)
* Fix build script to be more tolerant to download drops. (280, 284, 305)
* Correct documentation for SoxEffectsChain. (283)
* Fix resample error with cuda tensors. (277)
* Fix error when importing version outside of git. (276)
* Fix missing asound in linux build. (254)
* Fix deprecated torch. (254)
* Fix link in README. (253)
* Fix window device in ISTFT. (240)
* Documentation: Fix range in documentation for `torchaudio.load` to [-1, 1]. (283)

0.4.0

Page 14 of 15

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.