Appearance
Configuration
Shotover proxy accepts a two seperate YAML based configuration files. A configuration file specified by --config-file
and a topology file specified by --topology-file
configuration.yaml
The configuration file is used to change general behavior of Shotover. Currently it supports two values:
- main_log_level
- observability_interface
main_log_level
This is a single string that you can use to configure logging with Shotover. It supports env_filter style configuration and filtering syntax. Log levels and filters can be dynamically changed while Shotover is still running.
observability_interface
Shotover has an observability interface for you to collect Prometheus data from. This value will define the address and port for Shotover's observability interface. It is configured as a string in the format of 127.0.0.1:8080
for IPV4 addresses or [2001:db8::1]:8080
for IPV6 addresses. More information is on the observability page.
topology.yaml
The topology file is currently the primary method for defining how Shotover behaves. Within the topology file you can configure sources, transforms and transform chains.
The below documentation shows you what each section does and runs through an entire example of a Shotover configuration file.
sources
The sources top level resource is a map of named sources, to their definitions.
The sources section of the configuration file allow you to specify a source or origin for requests. You can have multiple sources and even multiple sources of the same type. Each is named to allow you to easily reference it.
A source will generally represent a database protocol and will accept connections and queries from a compatible driver. For example the Redis source will accept connections from any Redis (RESP2) driver such as redis-py.
There is a special source type, called a mpsc_chan source (named after the rust multi-producer, single consumer channel that backs it's implementation). This source will only listen to the configured topic name and the associated topic and will then pass the received messages from the channel onto it's mapped transform chain.
There are many Transforms
that will push a message to a given topic in different ways, enabling complex asynchronous topologies to be created.
yaml
---
# The source section
sources:
# The configured name of the source
my_named_redis_source:
# The source type and any configuration needed for it
# This will generally include a listen address and port
Redis:
listen_addr: "127.0.0.1:6379"
# The configured name of the source
my_cassandra_prod:
# The source type and any configuration needed for it
# This will generally include a listen address and port
Cassandra:
listen_addr: "127.0.0.1:9042"
# The spcial mpsc_chan source, it will receive messages from a named topic
mpsc_chan:
Mpsc:
topic_name: testtopic
chain_config
(Chain Configuration)
The chain_config
top level resource is a map of named chains, to their definitions.
The chain_config section of the configuration file allows you to name and define a transform chain. A transform chain is represented as an array of transforms and their respective configuration. The order in which a transform chain, is the order in which a query will traverse it. So the first transform in the chain, will get the request from source first, and pass it to the second transform in the chain.
As each transform chain is synchronous, with each transform being able to call the next transform in it's chain, the response from the upstream database or generated by a transform down the chain will be passed back up the chain, allowing each transform to handle the response.
The last transform in a chain should be a "terminating" transform. That is, one that passes the query on to the upstream database (e.g. CassandraSinkSingle
) or one that returns a Response on it's own ( e.g. EchoSink
).
For example
yaml
chain_config:
example_chain:
- One
- Two
- Three
- TerminatingTransform
A query from a client will go:
Source
->One
->Two
->Three
->TerminatingTransform
The response (returned to the chain by the TerminatingTransform
) will follow the reverse path:
TerminatingTransform
->Three
->Two
->One
->Source
Under the hood, each transform is able to call it's down-chain transform and wait on it's response. Each Transform has it's own set of configuration values, options and behavior. See Transforms for details.
The following example chain_config
has three chains:
redis_chain
- Consists of a Tee, a transform that will copy the query to the named topic and also pass the query down-chain to a terminating transformRedisSinkSingle
which sends to the query to a Redis server. Very similar to thetee
linux program.main_chain
- Also consists of a Tee that will copy queries to the same topic as theredis_chain
before sending the query onto caching layer that will try to resolve the query from a redis cache before ending up finally sending the query to the destination Cassandra cluster via aCassandraSinkSingle
yaml
# This example will replicate all commands to the DR datacenter on a best effort basis
---
chain_config:
# The name of the first chain
redis_chain:
# The first transform in the chain, in this case it's the Tee transform
- Tee:
behavior: Ignore
# The number of message batches that the tee can hold onto in it's buffer of messages to send.
# If they arent sent quickly enough and the buffer is full then tee will drop new incoming messages.
buffer_size: 10000
#The child chain, that Tee will asynchronously pass requests to
chain:
- QueryTypeFilter:
filter: Read
- Coalesce:
flush_when_buffered_message_count: 2000
- QueryCounter:
name: "DR chain"
- RedisSinkCluster:
first_contact_points: [ "127.0.0.1:2120", "127.0.0.1:2121", "127.0.0.1:2122", "127.0.0.1:2123", "127.0.0.1:2124", "127.0.0.1:2125" ]
connect_timeout_ms: 3000
#The rest of the chain, these transforms are blocking
- QueryCounter:
name: "Main chain"
- RedisSinkCluster:
first_contact_points: [ "127.0.0.1:2220", "127.0.0.1:2221", "127.0.0.1:2222", "127.0.0.1:2223", "127.0.0.1:2224", "127.0.0.1:2225" ]
connect_timeout_ms: 3000
source_to_chain_mapping
Chain Mapping
The source_to_chain_mapping
top level resource is a map of source names to chain name. This is the binding that will link a defined source to chain and allow messages/queries generated by a source to traverse a given chain.
The below snippet would complete our entire example:
yaml
source_to_chain_mapping:
redis_prod: redis_chain
This mapping would effectively create a solution that:
- All Redis requests are first batched and then sent to a remote Redis cluster in another region. This happens asynchronously and if the remote Redis cluster is unavailable it will not block operations to the current cluster.
- Subsequently, all Redis actions get identified based on command type, counted and provided as a set of metrics.
- The Redis request is then transform into a cluster aware request and routed to the correct node