Docs
Launch GraphOS Studio

Deploying API changes with Managed Federation and GraphOS

federation

A federated GraphQL API is a multi-tiered architecture made up of independent components. GraphQL requests flow through multiple components to construct a response, which in turn flow back through those same components to create a response to send back to the caller, as illustrated below:

Request
Response
Client
Router
Subgraph A
Subgraph B
REST API
Database
gRPC API
Third party API

Allowing teams to independently update the components they own is a key benefit of federation. However, understanding and de-risking changes to a component can be challenging, especially because changes might require cascading changes in other components. This is especially true for GraphQL definitions, which must be coordinated between services and the .

This document will cover the basic release steps that involve changes to an existing supergraph API. We will not go into depth on deploying changes to runtime code or the runtime configuration. Specifically, we'll examine two scenarios, both of which involve introducing changes to an existing federated API

This guide assumes the general use of Apollo GraphOS and some of it's specific features:

The release guidelines for each scenario should generally apply to all release management systems, whether we're deploying to Kubernetes, serverless, or bare metal. Some details might change depending on whether we have a manual deployment process, a continuous delivery process, or something in between.

Backward compatible subgraph schema changes

A change is backward compatible if it doesn’t affect existing s. Some examples of backward compatible changes include:

  • Adding a type or (unused until a client includes the fields in new s)
  • Adding an optional to existing s, ideally with a default value (the argument will have a null or default value until a client uses it in new s)
  • Making a nullable non-nullable (this restricts the set of possible output values)
  • Removing or changing unused elements (no existing will be broken)

NOTE

This also includes changes to metadata like adding or changing a deprecations or descriptions, but we won't consider these changes because they don’t affect runtime behavior.

Release steps

The release steps for backward compatible changes are as follows:

  1. In the code repository, merge a changeset containing both the change and the corresponding s to the release branch, triggering the release pipeline.
  2. The release pipeline performs prerelease steps like running tests and .
  3. The release pipeline builds the deployable artifacts:
    • The container, jar, or zip of the runtime code
    • Deployment manifests such as Kubernetes YAML or SAM templates
    • The document
  4. The release pipeline then triggers two "deployments" in sequence.
    1. A rolling deploy of the runtime code.
    2. A publish of the document to Apollo , which in turn updates the with a new (if the subgraph composes successfully; more on that later).
  5. The release pipeline performs postrelease steps such as smoke tests.

Sequencing schema publishing with service deployments

During Step 4, our system will most likely end up in an inconsistent state, where the 's version of the is out of sync with the 's version.

When the s update first, a client could request the new elements, and the will raise an validation error because it is not yet aware of changes.

In practice, however, this rarely matters. Due to the declarative nature of GraphQL, clients must update their s to use the new elements. Clients aren’t polling our schema's introspection response to discover new fields and immediately start requesting them. (Also, Apollo recommends disabling introspection in production. Only after the system reaches a consistent state will we announce the availability of these new elements to clients.

If we enable Schema Change Notifications, Apollo will automatically notify our colleagues that the new is available right after the receives the updated schema.

Concurrent subgraph schema changes

Most of the time we can focus on a single , but it is possible to encounter conflicts when two or more subgraphs change. Consider this sequence of events:

  1. Team A makes a change to A. The build check succeeds in CI and they merge their code.
  2. Team B makes a change to B. Their build check also succeeds in CI, even though it actually conflicts with the change in Subgraph A, because the build check runs against released definitions.
  3. Team B releases their change to production.
  4. Team A attempts to release their change, but their new fails to compose and the does not update.

The solution is to encounter the build error much earlier in the process. As soon as Team A merges a changeset, Team B should run build checks against that change.

We can accomplish this by creating a special of our graph that tracks the main branch of each code repository. Apollo provides a default called current for just this use-case, but any can be used for this purpose. This requires that:

  • Each repository immediately publishes the to the current each time we merge a change to the main branch, even if we won't release the change for a while.
  • We run build checks for a proposed change against the current (to catch build errors quickly) as well as our production variants (to run checks against live traffic).

If the build check against the current fails, we'll need to coordinate with another team to ensure that our subgraph s are compatible before we attempt to release changes to production.

NOTE

The longer the time between merging and releasing our change, the more important it is to track changes using the current . On the other hand, if we have a continuous delivery release process and the time between merge and release is relatively short, it's less likely for teams to introduce incompatible changes with unreleased s.

Rollbacks

Once clients start using the new changes, rolling back a schema change is the same as making a backward incompatible schema change. This may be difficult or impossible to do depending on client usage. For this reason, it's important to test our new schema changes thoroughly before clients start using the new schema in their s.

If clients are not yet using the changes, we most likely don't need to roll them back — they have no effect until s start using them.

If circumstances require a rollback, the safest process is to revert the change in the code repository and perform the full release process, publishing the subgraph and rolling out the new code together.

Staged schema change releases

Instead of attempting rollbacks of changes, we may want to release our changes in stages, such as alpha, beta, and generally-available (GA). The Contracts feature of makes this process simple:

  1. Mark elements of our with the '@tag' for the targeted release:

    type Product {
    id: ID!
    newField: String @tag(name: "alpha")
    }
  2. Configure our GA to be a contract that filters out all s tagged "alpha" or "beta".

  3. Configure a separate beta to exclude the "alpha" tag, and another alpha variant that doesn't exclude anything.

  4. Set up three separate s configured to use the alpha, beta, and GA s.

  5. Perform the standard release process. Because we've tagged our new with "alpha", the for the beta and GA s will not change. This gives us plenty of opportunity to test our change with alpha versions of our clients.

  6. Releasing the change to a wider audience is simply a matter of changing the tag to "beta" or removing it altogether.

💡 TIP

Read more on contracts usage patterns.

💡 TIP

If you're an enterprise customer looking for more material on this topic, try the Enterprise best practices: Contracts course on Odyssey.

Not an enterprise customer? Learn about GraphOS for Enterprise.

Backward incompatible subgraph schema changes

Backward incompatible changes remove or alter schema elements in ways that would break existing s.

Crucially, if a element is unused, this does not qualify as a backward incompatible change.

Release steps

The release steps for backward incompatible changes are similar to those for backward compatible changes, only with some important pre-work:

  1. Use the @deprecated to communicate with clients that elements should no longer be relied on.
  2. Use Studio to track the usage of types and s. Reach out to clients identified as performing the affected s before releasing the change.
  3. Continue to track usage until it reaches zero (or to a minimally acceptable level, for example, where a is still in use by an older version of a mobile application that is no longer supported).
  4. Perform the release steps for a backward compatible change.

💡 TIP

Consider enforcing that all requests to your have the proper identification headers. These identifiers enable the metrics that allow you to make potentially breaking changes with confidence. Read more about Client Awareness here. See Client Id Enforcement for an example implementation.

Sequencing schema publishing with service deployments

As with a backward compatible release, our system will most likely end up in an inconsistent state, where the 's version of the is out of sync with the 's version.

When the updates in the , it will no longer validate requests that include removed s, or respond with null values that are now marked non-null.

But with the pre-work outlined above, we've either minimized or completely removed requests that cause these errors.

Rollbacks

Assuming we've performed the pre-work and reduced usage of relevant types and s before changing/removing them, rolling back our changes is effectively a backward compatible change.

If circumstances require a rollback, the safest process is to revert the change in the code repository and perform the full release process, publishing the subgraph and rolling out the new code together.

Key takeaways:

  • Isolate types of changes from other types. When possible, don't release changes and implementation/configuration changes at the same time.
  • Publish s to the "current" immediately after merging code and run build checks against the "current" variant to catch subgraph compatibility issues long before a production release. (The longer the delay between merge and deploy, the more important this step is.)
  • Sequencing / changes isn't as critical in practice as we might think, especially if we use tools like deprecation workflows, usage reporting, and schema change notifications to manage client s.
  • Think about our API as a document that always moves forward. While rollbacks of code and configuration must be supported, "rolling back" our published API creates confusions for clients and consumers. Instead of trying to support API rollbacks, consider implementing a staged rollout process for API changes.
Next
Home
Edit on GitHubEditForumsDiscord