Improving Apollo Gateway performance
If your supergraph gateway uses Apollo Server with the @apollo/gateway
library, this article provides recommendations for improving your gateway's performance.
Consider moving to the Apollo Router
The Apollo Router is a high-throughput, low-latency Apollo Federation GraphQL gateway written in Rust. Its performance and consistency significantly exceed any Node.js-based gateway.
Migrating to Apollo Router from @apollo/gateway
Disable or reduce ftv1
inline tracing
Inline tracing (also known as federated tracing or ftv1
) provides helpful field-level latency information in GraphOS Studio, however it comes at a cost: trace information can be expensive to calculate, and it's attached to subgraph responses, increasing payload size.
In Apollo Server 3.6 and later, you can set the fieldLevelInstrumentation
option to disable inline tracing entirely, or limit tracing to a representative sample of all requests.
- During load testing, we recommend disabling tracing entirely.
- For normal production traffic, we recommend setting a low sample rate such as
0.01
(one percent of requests).
Usage Reporting plugin docs for fieldLevelInstrumentation
Change the default maxSockets
setting
The maxSockets
setting controls the maximum number of connections that Apollo Gateway's fetch
implementation can create to each subgraph host and port. With higher values, fewer subgraph requests are queued in the gateway, but the event loop might become overloaded serializing responses from subgraphs.
In Apollo Gateway versions prior to v0.51.0, the default maxSockets
setting is 15
, which is too low for many systems. In later versions, the default is Infinity
, which lends itself to the overload scenario.
The following ApolloGateway
constructor sets maxSockets to 50:
import {ApolloGateway, RemoteGraphQLDataSource} from '@apollo/gateway';import fetcher from 'make-fetch-happen';const gateway = new ApolloGateway({buildService({name, url}) {return new RemoteGraphQLDataSource({name,url,fetcher: fetcher.defaults({maxSockets: 50})});}});
Configuring the subgraph fetcher
Instrument Gateway middleware and plugins
If you've installed middleware (such as Express middleware) or Apollo Server plugins, it's worth using a code profiling tool such as clinicjs to investigate potential issues, such as synchronous code blocking the main thread or expensive plugin functions.
Use OpenTelemetry tracing
OpenTelemetry is the CNCF standard for instrumenting distributed systems. By tracing Apollo Gateway, subgraph services, data sources, and any other systems in the request path, you might discover that the cause of performance issues occurs outside of Apollo Gateway.
OpenTelemetry in Apollo Gateway
Investigate subgraph latency
Try running load tests against the subgraphs directly. You can simulate requests from the Gateway by querying the _entities
field:
query MyQuery__subgraphA__0($representations: [_Any!]!) {_entities(representations: $representations) {... on EntityA {fieldAfieldB}... on EntityB {fieldCfieldD}}}
By testing _entities
queries, you might discover that you need to implement a DataLoader in your reference resolvers.
The slower your subgraphs, the more time the Gateway spends keeping track of concurrent in-flight requests. The Gateway's maximum requests per second drops linearly as subgraph latencies rise.
Tweak CPU and memory limits
You might need to set higher CPU and memory limits on your Apollo Gateway pods. Determining the best limits for your system depends on a number of factors:
- Size of operations
- Cardinality of operations (number of unique shapes)
- Complexity of query plans
- Subgraph latency
- Use of shared caches
In general, look at increasing memory limits as Node.js is single-threaded and cannot use more than one CPU. That being said, we recommend keeping CPU utilization below 50%. At higher CPU utilization percentages, you may start to see event loop delays increase relative. Monitoring both memory usage and CPU upsage metrics can help provide a clearer image of bottlenecks beyond just code profiling alone.
If CPU throttling becomes a problem, consider removing the limits on your gateway pods as discussed in this blog post on Kubernetes resource limits.
Check query plans
Query plans (the steps the Gateway takes to resolve an operation from multiple subgraphs) are almost always reasonable approximations of the original operation. They're also highly cacheable so they don't incur much of a performance penalty. But it's worth looking at them to make sure they're of an appropriate size and complexity. Fragment and abstract type usage may cause undesirable query plans in very specific cases.
const gateway = new ApolloGateway({debug: true // print query plans to stdout});
Add more instances
Apollo Gateway is designed to scale horizontally. If you've tried the previous strategies and performance still isn't where it should be, distribute the load across more instances of Apollo Gateway.