emmental Changelog

0.1.1

-------------------

Fixed
^^^^^

* `lorr1`_: Fix multiple wand issues.
(`118 <https://github.com/senwu/emmental/pull/118>`_,
`119 <https://github.com/senwu/emmental/pull/119>`_)
* `senwu`_: Fix scikit-learn version.
(`120 <https://github.com/senwu/emmental/pull/120>`_)

0.1.0

-------------------

Deprecated
^^^^^^^^^^

* `senwu`_: Deprecated argument `active` in learner and loss function api, and
deprecated `ignore_index` argument in configuration.
(`107 <https://github.com/senwu/emmental/pull/107>`_)

Fixed
^^^^^

* `senwu`_: Fix the metric cannot calculate issue when scorer is none.
(`112 <https://github.com/senwu/emmental/pull/112>`_)
* `senwu`_: Fix Meta.config is None issue in collate_fn with num_workers > 1 when
using python 3.8+ on mac.
(`117 <https://github.com/senwu/emmental/pull/117>`_)

Added
^^^^^

* `senwu`_: Introduce two new classes: `Action` and `Batch` to make the APIs more
modularized and make Emmental more extendable and easy to use for downstream tasks.
(`116 <https://github.com/senwu/emmental/pull/116>`_)

.. note::

1. We introduce two new classes: `Action` and `Batch` to make the APIs more
modularized.

- `Action` are objects that populate the `task_flow` sequence. It has three
attributes: name, module and inputs where name is the name of the action, module
is the module name of the action and inputs is the inputs to the action. By
introducing a class for specifying actions in the `task_flow`, we standardize its
definition. Moreover, `Action` enables more user flexibility in specifying a
task flow as we can now support a wider-range of formats for the input attribute
of a `task_flow` as discussed in (2).

- `Batch` is the object that is returned from the Emmental `Scheduler`. Each
`Batch` object has 6 attributes: uids (uids of the samples), X_dict (input
features of the samples), Y_dict (output of the samples), task_to_label_dict
(the task to label mapping), data_name (name of the dataset that samples come
from), and split (the split information). By defining the `Batch` class, we unify
and standardize the training scheduler interface by ensuring a consistent output
format for all schedulers.

2. We make the `task_flow` more flexible by supporting more formats for specifying
inputs to each module.

- It now supports str as inputs (e.g., inputs="input1") which means take the
`input1`'s output as input for current action.

- It also supports a list as inputs which can be constructed by three
different formats:

- x (x is str) where takes whole output of x's output as input: this enables
users to pass all outputs from one module to another without having to
manually specify every input to the module.

- (x, y) (y is int) where takes x's y-th output as input.

- (x, y) (y is str) where takes x's output str as input.

Few emmental.Action examples:

.. code:: python

from emmental.Action as Act
Act(name="input", module="input_module0", inputs=[("_input_", "data")])
Act(name="input", module="input_module0", inputs=[("_input_", 0)])
Act(name="input", module="input_module0", inputs=["_input_"])
Act(name="input", module="input_module0", inputs="_input_")
Act(name="input", module="input_module0", inputs=[("_input_", "data"), ("_input_", 1), "_input_"])
Act(name="input", module="input_module0", inputs=None)

This design also can be applied to action_outputs, here are few example:

.. code:: python

action_outputs=[(f"{task_name}_pred_head", 0), ("_input_", "data"), f"{task_name}_pred_head"]
action_outputs="_input_"

0.0.9

-------------------

Added
^^^^^

* `senwu`_: Support wandb logging.
(`99 <https://github.com/senwu/emmental/pull/99>`_)
* `senwu`_: Fix log writer cannot dump functions in Meta.config issue.
(`103 <https://github.com/senwu/emmental/pull/103>`_)
* `senwu`_: Add `return_loss` argument model predict and forward to support the case
when no loss calculation can be done or needed.
(`105 <https://github.com/senwu/emmental/pull/105>`_)
* `lorr1`_ and `senwu`_: Add `skip_learned_data` to support skip trained data in
learning.
(`101 <https://github.com/senwu/emmental/pull/101>`_,
`108 <https://github.com/senwu/emmental/pull/108>`_)

Fixed
^^^^^

* `senwu`_: Fix model learning that cannot handle task doesn't have Y_dict from
dataloasder such as contrastive learning.
(`105 <https://github.com/senwu/emmental/pull/105>`_)

0.0.8

-------------------

Added
^^^^^

* `senwu`_: Support fp16 optimization.
(`77 <https://github.com/SenWu/emmental/pull/77>`_)
* `senwu`_: Support distributed learning.
(`78 <https://github.com/SenWu/emmental/pull/78>`_)
* `senwu`_: Support no label dataset.
(`79 <https://github.com/SenWu/emmental/pull/79>`_)
* `senwu`_: Support output model immediate_ouput.
(`80 <https://github.com/SenWu/emmental/pull/80>`_)

.. note::

To output model immediate_ouput, the user needs to specify which module output
he/she wants to output in `EmmentalTask`'s `action_outputs`. It should be a pair of
task_flow name and index or list of that pair. During the prediction phrase, the
user needs to set `return_action_outputs=True` to get the outputs where the key is
`{task_flow name}_{index}`.

.. code:: python

task_name = "Task1"
EmmentalTask(
name=task_name,
module_pool=nn.ModuleDict(
{
"input_module": nn.Linear(2, 8),
f"{task_name}_pred_head": nn.Linear(8, 2),
}
),
task_flow=[
{
"name": "input",
"module": "input_module",
"inputs": [("_input_", "data")],
},
{
"name": f"{task_name}_pred_head",
"module": f"{task_name}_pred_head",
"inputs": [("input", 0)],
},
],
loss_func=partial(ce_loss, task_name),
output_func=partial(output, task_name),
action_outputs=[
(f"{task_name}_pred_head", 0),
("_input_", "data"),
(f"{task_name}_pred_head", 0),
],
scorer=Scorer(metrics=task_metrics[task_name]),
)

* `senwu`_: Support action output dict.
(`82 <https://github.com/SenWu/emmental/pull/82>`_)
* `senwu`_: Add a new argument `online_eval`. If `online_eval` is off, then model won't
return `probs`.
(`89 <https://github.com/SenWu/emmental/pull/89>`_)
* `senwu`_: Support multiple device training and inference.
(`91 <https://github.com/SenWu/emmental/pull/91>`_)

.. note::

To train model on multiple devices such as CPU and GPU, the user needs to specify
which module is on which device in `EmmentalTask`'s `module_device`. It's a
ditctionary with key as the module_name and value as device number. During the
training and inference phrase, the `Emmental` will automatically perform forward
pass based on module device information.

.. code:: python

task_name = "Task1"
EmmentalTask(
name=task_name,
module_pool=nn.ModuleDict(
{
"input_module": nn.Linear(2, 8),
f"{task_name}_pred_head": nn.Linear(8, 2),
}
),
task_flow=[
{
"name": "input",
"module": "input_module",
"inputs": [("_input_", "data")],
},
{
"name": f"{task_name}_pred_head",
"module": f"{task_name}_pred_head",
"inputs": [("input", 0)],
},
],
loss_func=partial(ce_loss, task_name),
output_func=partial(output, task_name),
action_outputs=[
(f"{task_name}_pred_head", 0),
("_input_", "data"),
(f"{task_name}_pred_head", 0),
],
module_device={"input_module": -1, f"{task_name}_pred_head": 0},
scorer=Scorer(metrics=task_metrics[task_name]),
)

* `senwu`_: Add require_prob_for_eval and require_pred_for_eval to optimize score
function performance.
(`92 <https://github.com/SenWu/emmental/pull/92>`_)

.. note::

The current approach during score the model will store probs and preds which might
require a lot of memory resources especially for large datasets. The score function
is also used in training. To optimize the score function performance, this PR
introduces two new arguments in `EmmentalTask`: `require_prob_for_eval` and
`require_pred_for_eval` which automatically selects whether `return_probs` or
`return_preds`.

.. code:: python

task_name = "Task1"
EmmentalTask(
name=task_name,
module_pool=nn.ModuleDict(
{
"input_module": nn.Linear(2, 8),
f"{task_name}_pred_head": nn.Linear(8, 2),
}
),
task_flow=[
{
"name": "input",
"module": "input_module",
"inputs": [("_input_", "data")],
},
{
"name": f"{task_name}_pred_head",
"module": f"{task_name}_pred_head",
"inputs": [("input", 0)],
},
],
loss_func=partial(ce_loss, task_name),
output_func=partial(output, task_name),
action_outputs=[
(f"{task_name}_pred_head", 0),
("_input_", "data"),
(f"{task_name}_pred_head", 0),
],
module_device={"input_module": -1, f"{task_name}_pred_head": 0},
require_prob_for_eval=True,
require_pred_for_eval=True,
scorer=Scorer(metrics=task_metrics[task_name]),
)

* `senwu`_: Support save and load optimizer and lr_scheduler checkpoints.
(`93 <https://github.com/SenWu/emmental/pull/93>`_)
* `senwu`_: Support step based learning and add argument `start_step` and `n_steps` to
set starting step and total step size.
(`93 <https://github.com/SenWu/emmental/pull/93>`_)

Fixed
^^^^^

* `senwu`_: Fix customized optimizer support issue.
(`81 <https://github.com/SenWu/emmental/pull/81>`_)
* `senwu`_: Fix loss logging didn't count task weight.
(`93 <https://github.com/SenWu/emmental/pull/93>`_)

0.0.7

-------------------

Added
^^^^^

* `senwu`_: Support gradient accumulation step when machine cannot run large batch size.
(`74 <https://github.com/SenWu/emmental/pull/74>`_)
* `senwu`_: Support user specified parameter groups in optimizer.
(`74 <https://github.com/SenWu/emmental/pull/74>`_)

.. note::

When building the emmental learner, user can specify parameter groups for optimizer
using `emmental.Meta.config["learner_config"]["optimizer_config"]["parameters"]`
which is function takes the model as input and outputs a list of parameter groups,
otherwise learner will create a parameter group with all parameters in the model.
Below is an example of optimizing Adam Bert.

.. code:: python

def grouped_parameters(model):
no_decay = ["bias", "LayerNorm.weight"]
return [
{
"params": [
p
for n, p in model.named_parameters()
if not any(nd in n for nd in no_decay)
],
"weight_decay": emmental.Meta.config["learner_config"][
"optimizer_config"
]["l2"],
},
{
"params": [
p
for n, p in model.named_parameters()
if any(nd in n for nd in no_decay)
],
"weight_decay": 0.0,
},
]

emmental.Meta.config["learner_config"]["optimizer_config"][
"parameters"
] = grouped_parameters

Changed
^^^^^^^

* `senwu`_: Enabled "Type hints (PEP 484) support for the Sphinx autodoc extension."
(`69 <https://github.com/SenWu/emmental/pull/69>`_)
* `senwu`_: Refactor docstrings and enforce using flake8-docstrings.
(`69 <https://github.com/SenWu/emmental/pull/69>`_)

0.0.6

-------------------

Added
^^^^^

* `senwu`_: Support probabilistic gold label in scorer.
* `senwu`_: Add `add_tasks` to support adding one task or mulitple tasks into model.
* `senwu`_: Add `use_exact_log_path` to support using exact log path.

.. note::

When init the emmental there is one extra argument `use_exact_log_path` to use
exact log path.

.. code:: python

emmental.init(dirpath, use_exact_log_path=True)

Changed
^^^^^^^

* `senwu`_: Change running evaluation only when evaluation is triggered.

Emmental

Page 1 of 2