Gcgc

Latest version: v1.0.0

Safety actively analyzes 629691 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 5

0.12.2

- Fix bug in the event that a token id is supplied that overrides a default of
an inferred token.
- Add `pad_at_end` boolean setting that when True pads at the end of the
sequence, and when False pads at the beginning.
- Add dedicated `Vocab` object which replaces the dictionary of string to
integer.
- Update tokenizer integration to override `convert_tokens_to_string`
- Fix bug when trying to save the huggingface tokenizer.
- Make the third party "extras" during python packaging.
- Add better testing and batch encoding operatons.

0.12.0

- Improved the docs to reflect the `SequenceTokenizerSpec` that was added in
0.11.0.
- Made max length optional for the tokenizer.
- Added CLI that parses use the SequencePiece library.
- Began versioning docker build, and make pushing easier during build process.
- Have the tokenizer resolve the named alphabets.
- Use poetry along with general updates to a build pipeline.

0.11.0

Added

- Added the `SequenceTokenizerSpec` object for specifying the tokenizer.
- Added `Vocab` object for storing the int to token, and token to int encodings.
- Added example of using tensorflow/keras together with gcgc.

0.10.0

gcgc` has been revamped quite a bit to better support existing processing
pipelines for NLP without trying to do to much. See the docs for more
information about how this works.

0.9.1

0.9.0

Added

- Parser now outputs the length of the tensor not including padding. This is
useful for packing and length based iteration.
- Generating masked output from the parse_record method is now available.
- Alphabet can include an optional mask token.

Changed

- Can now specify how large of kmer step size to generate when supplying a kmer
value.
- Renames EncodedSeq.integer_encoded to EncodedSeq.get_integer_encoding which
takes a kmer_step_size to specify how large of steps to take when encoding.
- Add parsed_seq_len to the SequenceParser object to control how much padding to
apply to the end of the integer encoded sequence. This is useful since a batch
of tensors is expected to have the same size.

Page 3 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.