Snapshots
A snapshot is a copy of the state of the system of record’s data at a specific point in time.
Snapshots are requested via a message sent to the Kafka snapshot request topic and are created by the snapshot agent, which signifies completion by writing a message to the Kafka snapshot response topic.
A snapshot is not a representation of a single data file but rather reflects the current state of ALL of the system of record’s data files, or at least the subset of those files that has been configured for replication.
The data format within each snapshot depends on the intended use of the snapshot:
-
For SDMS replication, a snapshot consists of a single
TAR archivein network storage that in turn contains the data exported from each data file. The data from each file is exported as acounted file, which is then compressed into anLZ4compressed file that is added to theTAR archive. -
For SQL replication, a snapshot consists of a directory in network storage that in turn also contains the data exported from each data file but is saved as a
parquetfile. Parquet is a columnar storage file format, commonly used in big data and analytics systems, that supports strong compression and encoding, reducing storage size and improving performance. Parquet files are schema based, meaning data types and structure are stored with the file, making them self-describing.