Partition Merging or Compact or merging ss tables
Cassandra does not do in-place writes or updates. Rather, it uses a log structured format. Writes are done to Memtables, which are periodically flushed to disk as SSTables. As a result of this approach, the number of SSTables grows over time.
Having multiple SSTables causes read operations to be less efficient as columns for an associated key may be spread over multiple SSTables. Cassandra uses Compaction to merge multiple SSTables into a single larger one. This recipe shows how to adjust two compaction settings: MinCompactionThreshold and MaxCompactionThreshold.
root@wash-i-16ca26c8-prod /scripts/db_reconfiguration $ ccm node1 nodetool getcompactionthreshold PortfolioDemo Portfolios
Current compaction thresholds for PortfolioDemo/Portfolios:
min = 4, max = 32
If you use Size-Tiered Compaction Strategy you have an opportunity to have really large SSTables.
STCS will combine SSTables in a minor compaction when there are at least min_threshold (default 4) sstables of the same size by combining them into one file, expiring data and merging keys. This has the possibility to create very large SSTables after a while.
Using Leveled Compaction Strategy there is a sstable_size_in_mb option that controls a target size for SSTables.
In general SSTables will be less than or equal to this size unless you have a partition key with a lot of data ('wide rows').
With Date-Tiered Compaction Strategy, but that works similar to STCS in that it merges files of the same size, but it keeps data together in time order and it has a configuration to stop compacting old data (max_sstable_age_days) which could be interesting.
The key is to find the compaction strategy which works best for your data and then tune the properties around what works best for your data model / environment.
-Thanks
Follow Me!!!