Changelogs » Dagster

Dagster

0.4.0

**API Changes**

- There is now a new top-level configuration section ``storage`` which controls whether or not
execution should store intermediate values and the history of pipeline runs on the filesystem,
on S3, or in memory. The ``dagster`` CLI now includes options to list and wipe pipeline run
history. Facilities are provided for user-defined types to override the default serialization
used for storage.
- Similarily, there is a new configuration for ``RunConfig`` where the user can specify
intermediate value storage via an API.
- ``OutputDefinition`` now contains an explicit ``is_optional`` parameter and defaults to being
not optional.
- New functionality in ``dagster.check``: ``is_list``
- New functionality in ``dagster.seven``: py23-compatible ``FileNotFoundError``, ``json.dump``,
``json.dumps``.
- Dagster default logging is now multiline for readability.
- The ``Nothing`` type now allows dependencies to be constructed between solids that do not have
data dependencies.
- Many error messages have been improved.
- ``throw_on_user_error`` has been renamed to ``raise_on_error`` in all APIs, public and private

**GraphQL**

- The GraphQL layer has been extracted out of Dagit into a separate dagster-graphql package.
- ``startSubplanExecution`` has been replaced by ``executePlan``.
- ``startPipelineExecution`` now supports reexecution of pipeline subsets.

**Dagit**

- It is now possible to reexecute subsets of a pipeline run from Dagit.
- Dagit's `Execute` tab now opens runs in separate browser tabs and a new `Runs` tab allows you to
browse and view historical runs.
- Dagit no longer scaffolds configuration when creating new `Execute` tabs. This functionality will
be refined and revisited in the future.
- Dagit's `Explore` tab is more performant on large DAGs.
- The ``dagit -q`` command line flag has been deprecated in favor of a separate command-line
``dagster-graphql`` utility.
- The execute button is now greyed out when Dagit is offline.
- The Dagit UI now includes more contextual cues to make the solid in focus and its connections
more salient.
- Dagit no longer offers to open materializations on your machine. Clicking an on-disk
materialization now copies the path to your clipboard.
- Pressing Ctrl-Enter now starts execution in Dagit's Execute tab.
- Dagit properly shows List and Nullable types in the DAG view.

**Dagster-Airflow**

- Dagster-Airflow includes functions to dynamically generate containerized (``DockerOperator``-based)
and uncontainerized (``PythonOperator``-based) Airflow DAGs from Dagster pipelines and config.

**Libraries**

- Dagster integration code with AWS, Great Expectations, Pandas, Pyspark, Snowflake, and Spark
has been reorganized into a new top-level libraries directory. These modules are now
importable as ``dagster_aws``, ``dagster_ge``, ``dagster_pandas``, ``dagster_pyspark``,
``dagster_snowflake``, and ``dagster_spark``.
- Removed dagster-sqlalchemy and dagma

**Examples**

- Added the event-pipeline-demo, a realistic web event data pipeline using Spark and Scala.
- Added the Pyspark pagerank example, which demonstrates how to incrementally introduce dagster
into existing data processing workflows.

**Documentation**

- Docs have been expanded, reorganized, and reformatted.

0.2.8.post3

Hotfix to not put config values in error messages. Had to re-release because of packaging errors uploaded pypi (.pyc files or similar were included)

0.2.8.post0

Pushing an update because dagit 0.2.8 was getting out-of-date code.

0.2.8

- Version bump to deal with likely pypi issue around using a fourth-level version number
- Added more elegant syntax for building solid and context configs

v


v.0.2.7

0.2.7

Most notable improvements in this release are bunch of improvements to dagit, most notably hot reloading and the in-browser rendering of python error. Also the ability to scaffold configs from the command line is the first fruit of the rearchitecting of the config system.

- Dagster improvements:
- Added scaffold_config command which generates the template of a yaml file needed to drive the execution of a particular pipeline
- Added the ability to automatically serialize intermediate inputs as they flow between solids. Consider this alpha quality. It is currently hard-coded to write out to /tmp/dagster/runs/<<run_id>>

- Dagit improvements:
- Hot-Reloading and in-browser rendering of python errors.
- Scrolling and performance improvements
- Keyboard short cuts to navigate between solids using arrow keys
- In-app previews of notebooks for dagstermill solids

0.2.6

Changes:

- 'run_id' value automatically included in ExecutionContext context
stack. This is a uuid.
- Config system update:

This is a significant change in the config system. Now the top level environment objects (and all descendants) are now part of the dagster type system. Unique types are generated on a per-pipeline basis. This unlocks a few things:

1. The entirety of yaml config files are now type-checked in the same fashion as the user-defined config.
2. One can now pass dictionaries to execute_pipeline that mimic the yaml files exactly. You no longer have to use the dagster.config APIs (although those still work)
3. The entire config system is queryable via graphql (and therefore shows up in dagit). This adds some noise to the type browser (we can mitigate that soon), but this will enable the building of a config-editor is fully aware of the dagster type system.
4. This has one *breaking* change. The yaml file's format has changed slightly.

Previously:


context:
name: context_name
config: some_config_value


Now:


context:
context_name:
config: some_config_value


BREAKING CHANGE: Config format change. See above.

0.2.5

Version bump to 0.2.5 (227)

- Added the Type Explorer in Dagit. You can now browse all the types
declared in a pipeline.
- Added the --watch/--no-watch flag to dagit. This allows you to turn
off watching in cases where there are two many files below the
current working directory.



v.0.2.4
This version bump contains a few changes (including one breaking
change).

- New, radically improved version of dagit. Vertical layout, and a
beautiful new design. H/T to bengotow for this spectacular work.
- All types now *require* names. This is breaking change for
ConfigDictionary, which did not require a name. You will
have to change your calls to ConfigDictionary or
ConfigDefinition.config_dict to include a name that is unique to the
Pipeline.
- Solids default to take *no* config definition, rather than a config
definition typed as any.

0.2.3

Driving factor to release this is a bug in the command line interface in 0.2.2 (https://github.com/dagster-io/dagster/issues/207)

Other changes in this release:

- CLI interface has changed slightly. Whenver dagit or dagster needs to
specify a function to load a repo or a pipeline, us the -n/--fn-name
flag combo. Before this was split out into to different use cases in
dagster.
- We now have the ability to reuse a single solid definition multiple
times within the same pipeline using the SolidInstance API. See the
corresponding tutorial section for more details.
- Documentation continues to improve.

0.2.2

channel

The first dot release! up-to-date versions of dagster and dagit 0.2.2. (I just skipped 0.2.1 of dagster so that dagit and dagster are in sync. I won’t get into why pypi is dumb and made me do that)

There are virtually no changes to the python API. This update was for the CLI interface to make it so that you can use it without the repository.yml file and without installed modules.

You can now use dagit (or dagster) like:

`dagit -f step_one.py -r define_pipeline`

to load the pipeline straightaway from a function rather than having to go through repositories and yaml files.

0.2.0

This is the first "major" release of dagster meant for consumption. The public APIs in this release will be supported for some time.

New things in this release:

- Solids do not specify their dependencies anymore. They are more easily reusable between pipelines. Dependencies now specified at the pipeline level.
- Solids support multiple outputs and branching
- Solids can take config, in addition to inputs and outputs.
- Sources and materializations have been eliminated as formal abstractions. Solids accepting configs enabled this.
- New configuration system with full type system instead of argument dictionary. Configs can be arbitrarily nested and support composite types.
- New result api
- New execution engine. this now does a compiler-esque pass where a new logical execution graph of nodes is generated from the logical definition files and config.
- Python 2.7, 3.5 and 3.6 now supported
- RepositoryDefinition has been added. pipelines.yml is gone
- Full documentation of all public APIs
- Multi-part tutorial that introduces all concepts.
- solid now must take info object, which has config and context members. lambda_solid is for simple cases that do not require config and context.
- ... Much more

0.1.6

Just starting to use the tag feature to mark releases for the first time.

This is the last version I will be release before the major breaking change coming in 0.2.0 that will change the way that dependencies are configured and eliminate sources and materializations as formalized abstractions.