Apollo data federation for GraphQL schema-first
Data federation is the creation of a virtual database that aggregates data from distributed sources, giving them a common data model. It is an approach to data integration that provides a single source of data for front-end applications. It also gives backend developers flexibility in design and service isolation.
To get the most out of GraphQL, your organization should expose a single data graph that provides a unified interface for querying any combination of your backing data sources. However, it can be challenging to represent an enterprise-scale data graph with a single, monolithic GraphQL server.
To remedy this, you can divide your graph’s implementation across multiple composable services with Apollo Federation. Unlike other distributed GraphQL architectures (such as schema stitching), Apollo Federation uses a declarative programming model that enables each subgraph to implement only the part of your composed supergraph that it’s responsible for.
An Apollo Federation architecture consists of:
-
A collection of subgraphs (usually represented by different back-end services) that each define a distinct GraphQL schema
-
A gateway that composes the subgraphs into a federated data graph and executes queries across multiple subgraphs
+--------------------+
| Federated schema |
| (Apollo Gateway) |
+--------------------+
|
+------------+-----------+
| |
+--------+---------+ +--------+---------+
| Library schema | | Orders schema |
| (Stargate) | | (Node.js Apollo |
| Book/Reader | | server) |
+------------------+ +------------------+
GraphQL schema
To achieve data federation, schemas need to be created and annotated to indicate how ownership is distributed. Letβs look at an example with three core entities:
Book: In a library, books are stored. The Book type uses a title and ISBN to uniquely identify each Book object, while also storing author information.
Reader: Readers read books and write reviews. Each Reader is uniquely identified by a name and user_id, while also storing birthdate, email addresses, street addresses, and reviews that a read has written. Each review consists of the book title, comment, rating, and review date.
Order: When a reader checks out books from the library, a record of the checkout_id, reader, and the books checked out are stored, uniquely identified by the checkout_id.
These three domains could be owned by three separate engineering teams responsible for their own data sources, business logic, and corresponding microservices. In an unfederated implementation, we would have to have this simple schema and the associated resolvers owned and implemented by a single team.
Stargate GraphQL schema
In this example, Book
and Reader
schema are supplied from a Stargate instance
as the first data source, and the schema is described in
GraphQL schema-first.
scalar _FieldSet
directive @key(fields: _FieldSet!) on OBJECT | INTERFACE
directive @extends on OBJECT | INTERFACE
directive @external on FIELD_DEFINITION
directive @requires(fields: _FieldSet!) on FIELD_DEFINITION
directive @provides(fields: _FieldSet!) on FIELD_DEFINITION
Javascript Apollo server schema
For Order
, a simple Javascript, orders.js
will be used to define the Order
schema, supply some resolvers, insert some data, and start an Apollo server as
the second data source. Let’s break down the script into four parts.
First, set up the script to require apollo server
and apollo federation
.
Set the port for the server to 4001
.
const { ApolloServer, gql } = require("apollo-server");
const { buildFederatedSchema } = require("@apollo/federation");
const port = 4001;
Next, we need to load the schema for this server.
The schema consists of the object type Order
, extensions of the object types
Book
and Reader
gathered from the Stargate instance, queries and mutations
required to insert data and make queries.
const typeDefs = gql`
# need to declare types used by cql
scalar Uuid
scalar Date
type Order @key(fields: "checkout_id reader books") {
checkout_id: Int!
reader: Reader
books: [Book]
}
# Stub of the book entity from Stargate:
# Add an extension for Order to find a checkout for a book
extend type Book @key(fields: "title isbn") {
title: String! @external
isbn: String @external
#author: [String]
orders: [Order]
}
# Stub of the reader entity from Stargate:
# Add an extension for Order to find a checkout for a reader
extend type Reader @key(fields: "name user_id") {
name: String! @external
user_id: Uuid! @external
orders: [Order]
}
extend type Query {
order(checkout_id: Int!): Order
orders: [Order]
}
`;
In order to fetch field values from the objects that belong to another service,
an extension
of that object must be included in the Apollo server.
Two directives are used in this example, @key
and @external
that pertain to
data federation.
The @key
directive defines a combination of fields that uniquely identify and
are used to fetch an object or interface.
Only the primary key fields can be used in @key
.
The @external
directive marks a field as owned by another service.
In Stargate schema, you can use just |
In this example, for the Book
object type, the fields title
and isbn
are
defined in the @key
directive as the fields that are required to uniquely identify
a specific book (the primary key) and fetch the requested information from the object.
The @external
directive further marks the same fields are owned by the Stargate
service, not the Apollo server.
The Reader
object type has a similar extension defined.
Now let’s define the resolvers that this script will use. A resolver is a function that’s responsible for populating the data for a single field in your schema. Stargate resolves objects based on the schema supplied, but the Javascript server requires resolver definition.
const resolvers = {
Book: {
orders(book) {
return orders.filter(({ books }) => books.includes(book.title));
}
},
Order: {
book(order) {
return order.book.map(title => ({ __typename: "Book", title }));
}
},
Query: {
order(_, args) {
return orders.find(order => order.checkout_id == args.checkout_id);
},
orders() {
return orders;
}
}
};
Finally, a server is started, building federated schema, listening on port 4001. Data is provided on the orders that are inserted into the service.
server.listen({ port }).then(({ url }) => {
console.log(`π Orders service ready at ${url}`);
});
const orders = [
{ checkout_id: 1, reader: {name: "Herman Melville", user_id: "e0ec47e1-2b46-41ad-961c-70e6de629810"} , books: [ {title: "Moby Dick", isbn: "978-0140861723"}, {title: "Pride and Prejudice", isbn: "" } ] },
{ checkout_id: 2, reader: {name: "Jane Doe", user_id: "f02e2894-db48-4347-8360-34f28f958590"}, books: [ {title: "Native Son", isbn: "978-0061148507"}] }
];
The full script:
# tag::setup[]
const { ApolloServer, gql } = require("apollo-server");
const { buildFederatedSchema } = require("@apollo/federation");
const port = 4001;
# end::setup[]
# tag::schema[]
const typeDefs = gql`
# need to declare types used by cql
scalar Uuid
scalar Date
type Order @key(fields: "checkout_id reader books") {
checkout_id: Int!
reader: Reader
books: [Book]
}
# Stub of the book entity from Stargate:
# Add an extension for Order to find a checkout for a book
extend type Book @key(fields: "title isbn") {
title: String! @external
isbn: String @external
#author: [String]
orders: [Order]
}
# Stub of the reader entity from Stargate:
# Add an extension for Order to find a checkout for a reader
extend type Reader @key(fields: "name user_id") {
name: String! @external
user_id: Uuid! @external
orders: [Order]
}
extend type Query {
order(checkout_id: Int!): Order
orders: [Order]
}
`;
# end::schema[]
# tag::resolvers[]
const resolvers = {
Book: {
orders(book) {
return orders.filter(({ books }) => books.includes(book.title));
}
},
Order: {
book(order) {
return order.book.map(title => ({ __typename: "Book", title }));
}
},
Query: {
order(_, args) {
return orders.find(order => order.checkout_id == args.checkout_id);
},
orders() {
return orders;
}
}
};
# end::resolvers[]
const server = new ApolloServer({
schema: buildFederatedSchema([
{
typeDefs,
resolvers
}
])
});
# tag::server[]
server.listen({ port }).then(({ url }) => {
console.log(`π Orders service ready at ${url}`);
});
const orders = [
{ checkout_id: 1, reader: {name: "Herman Melville", user_id: "e0ec47e1-2b46-41ad-961c-70e6de629810"} , books: [ {title: "Moby Dick", isbn: "978-0140861723"}, {title: "Pride and Prejudice", isbn: "" } ] },
{ checkout_id: 2, reader: {name: "Jane Doe", user_id: "f02e2894-db48-4347-8360-34f28f958590"}, books: [ {title: "Native Son", isbn: "978-0061148507"}] }
];
# end::server[]
Installing Apollo gateway
Let’s look at a simple Javascript that runs the Apollo gateway: Let’s break this script down as well, into three parts.
const { ApolloServer } = require("apollo-server");
const { ApolloGateway, RemoteGraphQLDataSource } = require("@apollo/gateway");
// The Stargate token that Apollo Gateway will use when it fetches the schema
// definitions.
// Note that this will only be used for internal queries; for user queries, the
// client must provide their own 'x-cassandra-token' HTTP header, and the
// gateway will forward it to Stargate.
const stargateIntrospectionToken = 'd5e7e1fb-399b-4f0b-964b-296ee97d59d3';
class StargateGraphQLDataSource extends RemoteGraphQLDataSource {
willSendRequest({ request, context }) {
const token = context.stargateToken
if (token != null) {
request.http.headers.set('x-cassandra-token', token);
}
}
}
const gateway = new ApolloGateway({
serviceList: [
// Stargate:
{ name: "library", url: "http://127.0.0.1:8080/graphql/library"},
// External service (mock):
{ name: "orders", url: "http://localhost:4001/graphql" }
],
introspectionHeaders: {
'x-cassandra-token': stargateIntrospectionToken
},
buildService({name, url}) {
if (name == "library") {
return new StargateGraphQLDataSource({url});
} else {
return new RemoteGraphQLDataSource({url});
}
},
// Experimental: Enabling this enables the query plan view in Playground.
__exposeQueryPlanExperimental: true,
});
(async () => {
const server = new ApolloServer({
gateway,
// Apollo Graph Manager (previously known as Apollo Engine)
// When enabled and an `ENGINE_API_KEY` is set in the environment,
// provides metrics, schema management and trace reporting.
engine: false,
// Subscriptions are unsupported but planned for a future Gateway version.
subscriptions: false,
context: ({ req, res }) => {
return {
stargateToken: req.headers['x-cassandra-token']
};
}
});
server.listen().then(({ url }) => {
console.log(`π Gateway ready at ${url}`);
});
})();
In Gateway1, an Apollo server and Apollo gateway are defined as required.
In order for the gateway to access the Stargate data source, a token is required.
That token is specified as stargateIntrospectionToken
.
In Gateway2, a serviceList
that defines the data sources lists the Stargate
instance and the Javascript Apollo server that are described above.
It builds the services based on logic tied to the name of the data source,
library
.
Note that if more data sources are specified, you’ll need to change the logic
that specifies which service is used for queries and mutations.
An experimental flag is set to allow GraphQL Playground to display the query plan for the federated service.
In Gateway3, a new Apollo server, designated as a gateway is started.
There are some settings currently set to false; the code gives an explanation.
If the fetched data is from the Stargate instance, the token must be passed
in the header of the request as x-cassandra-token
.
The full script:
# tag::gateway1[]
const { ApolloServer } = require("apollo-server");
const { ApolloGateway, RemoteGraphQLDataSource } = require("@apollo/gateway");
// The Stargate token that Apollo Gateway will use when it fetches the schema
// definitions.
// Note that this will only be used for internal queries; for user queries, the
// client must provide their own 'x-cassandra-token' HTTP header, and the
// gateway will forward it to Stargate.
const stargateIntrospectionToken = 'd5e7e1fb-399b-4f0b-964b-296ee97d59d3';
class StargateGraphQLDataSource extends RemoteGraphQLDataSource {
willSendRequest({ request, context }) {
const token = context.stargateToken
if (token != null) {
request.http.headers.set('x-cassandra-token', token);
}
}
}
# end::gateway1[]
# tag::gateway2[]
const gateway = new ApolloGateway({
serviceList: [
// Stargate:
{ name: "library", url: "http://127.0.0.1:8080/graphql/library"},
// External service (mock):
{ name: "orders", url: "http://localhost:4001/graphql" }
],
introspectionHeaders: {
'x-cassandra-token': stargateIntrospectionToken
},
buildService({name, url}) {
if (name == "library") {
return new StargateGraphQLDataSource({url});
} else {
return new RemoteGraphQLDataSource({url});
}
},
// Experimental: Enabling this enables the query plan view in Playground.
__exposeQueryPlanExperimental: true,
});
# end::gateway2[]
# tag::gateway3[]
(async () => {
const server = new ApolloServer({
gateway,
// Apollo Graph Manager (previously known as Apollo Engine)
// When enabled and an `ENGINE_API_KEY` is set in the environment,
// provides metrics, schema management and trace reporting.
engine: false,
// Subscriptions are unsupported but planned for a future Gateway version.
subscriptions: false,
context: ({ req, res }) => {
return {
stargateToken: req.headers['x-cassandra-token']
};
}
});
server.listen().then(({ url }) => {
console.log(`π Gateway ready at ${url}`);
});
})();
# end::gateway3[]
Federated queries
Simple example
Now that we have two subgraphs supplied from two different data sources, and a gateway running, we can explore how data from all data sources can be returned in a query.
For example, if I want to discover all the books that were checked out at the
same time, and get all the book data, I can use this query in GraphQL Playground,
pointed to the URL where the gateway is running, or localhost:4000
in this case:
query getOrder {
order(checkout_id: 1) {
checkout_id
reader {
name
user_id
email
address {
city
}
}
books {
title
isbn
author
}
}
}
{
"data": {
"order": {
"checkout_id": 1,
"reader": {
"name": "Herman Melville",
"user_id": "e0ec47e1-2b46-41ad-961c-70e6de629810",
"email": [
"herman.melville@gmail.com",
"hermy@mobydick.org"
],
"address": [
{
"city": "Boston"
}
]
},
"books": [
{
"title": "Moby Dick",
"isbn": "978-0140861723",
"author": null
},
{
"title": "Pride and Prejudice",
"isbn": "",
"author": null
}
]
}
}
}
While the resulting return seems unremarkable, think about the result for a moment.
The Apollo server returning an order did supply the reader name
and user_id
, but
the email
and address
information is fetched from the Stargate instance.
Likewise, the Stargate instance is also supplying the author information for
the books returned in the order
query.
Thus, the federated supergraph of Order, Book, and Reader, along with the
resolvers in the Apollo server, are fetching data from two different servers!
That is a very valuable feature, especially for application developers who need
to fetch and use data from data sources that they do not control.