Skip to main content
Skip to main content

Table parts

What are table partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. in ClickHouse?


The data from each table in the ClickHouse MergeTree engine family is organized on disk as a collection of immutable data parts.

To illustrate this, we use this table (adapted from the UK property prices dataset) tracking the date, town, street, and price for sold properties in the United Kingdom:

You can query this table in our ClickHouse SQL Playground.

A data part is created whenever a set of rows is inserted into the table. The following diagram sketches this:


When a ClickHouse server processes the example insert with 4 rows (e.g., via an INSERT INTO statement) sketched in the diagram above, it performs several steps:

Sorting: The rows are sorted by the table's sorting keyIn ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key. (town, street), and a sparse primary index is generated for the sorted rows.

Splitting: The sorted data is split into columns.

Compression: Each column is compressed.

Writing to Disk: The compressed columns are saved as binary column files within a new directory representing the insert's data part. The sparse primary index is also compressed and stored in the same directory.

Depending on the table's specific engine, additional transformations may take place alongside sorting.

Data partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. are self-contained, including all metadata needed to interpret their contents without requiring a central catalog. Beyond the sparse primary index, partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. contain additional metadata, such as secondary data skipping indexes, column statistics, checksums, min-max indexes (if partitioning is used), and more.

Part merges

To manage the number of partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. per table, a background merge job periodically combines smaller partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. into larger ones until they reach a configurable compressed size (typically ~150 GB). Merged partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. are marked as inactive and deleted after a configurable time interval. Over time, this process creates a hierarchical structure of merged partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key., which is why it's called a MergeTreeA MergeTree in ClickHouse is a table engine designed for high data ingest rates and large data volumes. It is the core storage engine in ClickHouse, providing features such as columnar storage, custom partitioning, sparse primary indexes, and support for background data merges. table:


To minimize the number of initial partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. and the overhead of merges, database clients are encouraged to either insert tuples in bulk, e.g. 20,000 rows at once, or to use the asynchronous insert mode, in which ClickHouse buffers rows from multiple incoming INSERTs into the same table and creates a new part only after the buffer size exceeds a configurable threshold, or a timeout expires.

Monitoring table partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key.

You can query the list of all currently existing active partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. of our example table by using the virtual column _part:

The query above retrieves the names of directories on disk, with each directory representing an active data part of the table. The components of these directory names have specific meanings, which are documented here for those interested in exploring further.

Alternatively, ClickHouse tracks info for all partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. of all tables in the system.parts system table, and the following query returns for our example table above the list of all currently active partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key., their merge level, and the number of rows stored in these partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key.:

The merge level is incremented by one with each additional merge on the part. A level of 0 indicates this is a new part that has not been merged yet.