File compression
You can compress or decompress files either with the mmchattr command or with the mmapplypolicy command with a MIGRATE rule. You can do the compression or decompression synchronously or defer it until a later call to mmrestripefile or mmrestripefs.
IBM Spectrum Scale™ V4.2 adds file compression to reduce the size of data at rest. File compression is intended primarily for cold data and favors saving space over access speed. File compression can be driven by policies that enabled administrators to compress only files that are not accessed for some specified time. Data is decompressed inline for each read access.
- Comparison with object compression
- When to use file compression
- Setting up file compression and decompression
- Warnings
- Reported size of compressed files
- Deferred file compression
- Indicators of file compression or decompression
- Updates to compressed files
- File compression and memory mapping
- File compression and direct I/O
- Backing up and restoring compressed files
- Limitations
Comparison with object compression
File compression is a different feature than object compression. Both features compress files, and both features can be policy-driven. However, object compression is available only through Cluster Export Systems (CES) and is done with the mmobj command. File compression is available outside CES and is done with the mmapplypolicy command (policy-driven) or the mmchattr command (direct). Also, with file compression you can defer the compression or decompression operation until a time when the system is not loaded with processes and I/O. For more information about object compression, see the topic Administering storage policies for object storage.
When to use file compression
File compression in this release is designed to be used only for compressing cold data or write-once objects and files. Compressing other types of data can result in performance degradation. File compression uses the zlib data compression library and favors saving space over speed.
Setting up file compression and decompression
The sample script /usr/lpp/mmfs/samples/ilm/mmcompress.sample, installed with IBM Spectrum Scale, provides examples of how to compress or decompress a fileset or a directory tree.
mmchattr --compression yes trcrpt.150913.13.30.13.3518.txt
The
following command decompresses the same file:mmchattr --compression no trcrpt.150913.13.30.13.3518.txt
rule 'COMPR1' migrate from pool 'datapool' COMPRESS('yes') where name like 'green%'
The
following rule migrates and decompresses the same set of files:rule 'COMPR1' migrate from pool 'datapool' COMPRESS('no') where name like 'green%'
RULE 'NEVER_COMPRESS' EXCLUDE WHERE lower(NAME) LIKE '%.mpg' OR lower(NAME) LIKE '%.jpg'
RULE 'COMPRESS_COLD' MIGRATE COMPRESS('yes') WHERE (CURRENT_TIMESTAMP - ACCESS_TIME) > (INTERVAL '30' DAYS)
- The topic mmchattr command in the IBM Spectrum Scale: Administration and Programming Reference
- Overview of policies in the IBM Spectrum Scale: Advanced Administration Guide
- Policy rules: Syntax in the IBM Spectrum Scale: Advanced Administration Guide
- Policy rules: Syntax in the IBM Spectrum Scale: Advanced Administration Guide
When you do file compression, you can defer the compression operation a later time. For more information, see the subtopic Deferred file compression.
Warnings
- Doing file compression or decompression.
- Running the mmrestripefile command or the mmrestripefs, either to complete a deferred file compression or decompression, or for any other reason.
- Do not run file compression or decompression while an mmrestorefs command is running. This warning includes compression or decompression with the mmchattr command or with the mmapplypolicy command.
- Do not run the mmrestripefs or mmrestripefile command while an mmrestorefs command is running.
Reported size of compressed files
After a file is compressed, operating system commands, such as ls -l, display the uncompressed size. Use du or the GPFS™ command mmdf to display the actual, compressed size. You can also make the stat() system call to find how many blocks the file occupies.
Deferred file compression
By default, the command that launches a file compression or decompression does not return until after the compression or decompression operation is completed. However, with both the mmchattr command and the mmapplypolicy compression, you can defer the compression or decompression operation and have the command return as soon as it completes any other operations. By deferring compression or decompression, you can complete the operation later when the system is not heavily loaded with processes or I/O.
mmchattr -I defer --compression yes trcrpt.150913.13.30.13.3518.txt
With
the mmapplypolicy command, the -I
defer option defers compression or decompression as
well as data movement or deletion. For example, the following command
applies the rules in the file policyfile but
defers the file operations that are specified in the rules, including
compression or decompression:mmapplypolicy fs1 -P policyfile -I defer
mmrestripefile -z trcrpt.150913.13.30.13.3518.txt
Indicators of file compression or decompression
- COMPRESSED
- The mmlsattr command displays the COMPRESSED indicator on the Misc attributes line of its output. See the example of mmlsattr output in Figure 1. If present, COMPRESSED indicates that the file is compressed or is marked for deferred compression. If absent, the absence indicates that the file is uncompressed or is marked for deferred decompression.
-
This indicator reflects the state of the GPFS_IWINFLAG_COMPRESSED flag in the gpfs_iattr64_t structure of the inode of the file. For more information about this structure, see the topic gpfs_iattr64_t_structure in the IBM Spectrum Scale: Administration and Programming Reference.
- illCompressed
- The mmlsattr command displays the illCompressed indicator on the flags line of its output. See Figure 1. If present, illCompressed indicates that the file is marked for compression or decompression but that compression or decompression is not completed. If absent, the absence indicates that compression or decompression is completed. For more information about this structure, see the topic gpfs_iattr64_t_structure in the IBM Spectrum Scale: Administration and Programming Reference.
-
This indicator reflects the state of the GPFS_IAFLAG_ILLCOMPRESSED flag in the gpfs_iattr64_t structure of the inode of the file. For more information about this structure, see the topic gpfs_iattr64_t_structure in the IBM Spectrum Scale: Administration and Programming Reference.
- Note: Some file system events can cause the illCompressed flag to be set. Consider the following examples:
- When data is written into an already compressed file, the existing data remains compressed but the new data is uncompressed. The illCompressed flag is set for this file.
- When a compressed file is memory-mapped, the memory-mapped area of the file is decompressed before it is read into memory. The illCompressed flag is set for this file.
mmlsattr -L green02.51422500687
file name: green02.51422500687
metadata replication: 1 max 2
data replication: 2 max 2
immutable: no
appendOnly: no
flags: illCompressed
storage pool name: datapool
fileset name: root
snapshot name:
creation time: Wed Jan 28 19:05:45 2015
Misc attributes: ARCHIVE COMPRESSED
Encrypted: no
Together the Compressed and illCompressed indicators indicate the compressed or uncompressed state of the file. See the following table:
State of the file | COMPRESSED is displayed? | illCompressed is displayed? |
---|---|---|
Uncompressed. | No | No |
Decompression is not complete. | No | Yes |
Compressed. | Yes | No |
Compression is not complete. | Yes | Yes |
Updates to compressed files
mmrestripefile -z trcrpt.150913.13.30.13.3518.txt
The mmrestorefs command can cause a compressed file in the active file system to become decompressed if it is overwritten by the restore process. To recompress the file, run the mmrestripefile command with the -z option.
For more information, see the preceding subtopic Deferred file compression.
File compression and memory mapping
You can memory-map a file that is already compressed. The file system automatically decompresses the paged-in region and sets the illCompressed flag. To recompress the file, run the mmrestripefile command with the -z option.
As a convenience, the file system does not compress an uncompressed file or partially decompressed file if the file is memory-mapped. Compressing the file would not be not effective because memory mapping decompresses any compressed data in the regions that are paged in.
File compression and direct I/O
You can open a compressed file for Direct I/O, but internally the direct I/O reads and writes are replaced by buffered decompressed I/O reads and writes.
As a convenience, the file system does not compress a file that is opened for Direct I/O. Compressing the file would not be effective because direct I/O would be replaced by buffered decompressed I/O.
Backing up and restoring compressed files
Files are decompressed when they are moved out of storage that is directly managed by IBM Spectrum Scale. This fact affects file backups by products like IBM Spectrum Protect, Tivoli Storage Manager for Space Management (HSM), Linear Tape File System™ (LTFS), Transparent Cloud Tiering (TCT), and others. When you back up a file with these products, the file system decompresses the file data inline when it is read by the backup agent. The file system also sets the illCompressed flag in the file properties. The backed-up file data is not compressed.
When you restore a file to the IBM Spectrum Scale file system, the file data remains uncompressed but the illCompressed flag is still set. You can recompress the file by running mmrestripefs or mmrestripefile with the -z option.
Limitations
- File compression in this release is designed to be used only for compressing cold data or write-once objects and files. Compressing other types of data can result in performance degradation. File compression uses the zlib data compression library and favors saving space over speed.
- File compression processes consecutive segments of a file. For each segment, file compression calculates the potential savings in space. If the space savings are less than a certain threshold (10%), file compression does not compress the file segment but skips to the next segment.
- Direct I/O is not supported for compressed files.
- The following operations are not supported:
- Compressing files in snapshots
- Compressing a clone
- Compressing files in an AFM cache site or in an AFM-based asynchronous Disaster Recovery (DR) fileset.
- Compressing small files (files that consume fewer than two subblocks, compressing small files into an inode).
- Compressing files other than regular files, such as directories.
- Compressing files in a File Placement Optimizer (FPO) environment or in horizontal storage pools.
- Cloning a compressed file
- On Windows:
- Compression or decompression with the mmapplypolicy command is not supported.
- Compression of files in Windows hyper allocation mode is not supported.
- The following Windows APIs
are not supported:
- FSCTL_SET_COMPRESSION to enable/disable compression on a file
- FSCTL_GET_COMPRESSION to retrieve compression status of a file
- In Windows Explorer, in the Advanced Attributes window, the compression feature is not supported.