File compression

You can compress or decompress files either with the mmchattr command or with the mmapplypolicy command with a MIGRATE rule. You can do the compression or decompression synchronously or defer it until a later call to mmrestripefile or mmrestripefs.

Start of changeIBM Spectrum Scale™ V4.2 adds file compression to reduce the size of data at rest. File compression is intended primarily for cold data and favors saving space over access speed. File compression can be driven by policies that enabled administrators to compress only files that are not accessed for some specified time. Data is decompressed inline for each read access.End of change

Start of change

Comparison with object compression

File compression is a different feature than object compression. Both features compress files, and both features can be policy-driven. However, object compression is available only through Cluster Export Systems (CES) and is done with the mmobj command. File compression is available outside CES and is done with the mmapplypolicy command (policy-driven) or the mmchattr command (direct). Also, with file compression you can defer the compression or decompression operation until a time when the system is not loaded with processes and I/O. For more information about object compression, see the topic Administering storage policies for object storage.

End of change

When to use file compression

File compression in this release is designed to be used only for compressing cold data or write-once objects and files. Compressing other types of data can result in performance degradation. File compression uses the zlib data compression library and favors saving space over speed.

Setting up file compression and decompression

The sample script /usr/lpp/mmfs/samples/ilm/mmcompress.sample, installed with IBM Spectrum Scale, provides examples of how to compress or decompress a fileset or a directory tree.

You can do file compression or decompression with either the mmchattr command or the mmapplypolicy command.
Note: File compression and decompression with the mmapplypolicy command is not supported on Windows.
With the mmchattr command, you specify the -compression option and the names of the files or filesets that you want to compress or decompress. For example, the following command compresses a file:
mmchattr --compression yes trcrpt.150913.13.30.13.3518.txt
The following command decompresses the same file:
mmchattr --compression no trcrpt.150913.13.30.13.3518.txt
Start of changeFor more information, see the topic mmchattr command in the IBM Spectrum Scale: Administration and Programming Reference.End of change
With the mmapplypolicy command, you create a MIGRATE rule that specifies the COMPRESS option and run mmapplypolicy to apply the rule. For example, the following rule, which applies to files with names that contain the string green, migrates files out of a storage pool and compresses them:
rule 'COMPR1' migrate from pool 'datapool' COMPRESS('yes') where name like 'green%'
The following rule migrates and decompresses the same set of files:
rule 'COMPR1' migrate from pool 'datapool' COMPRESS('no') where name like 'green%'
Start of changeIn the following example, the first rule excludes from compression any file that ends with .mpg or .jpg. The second rule automatically compresses any file that was not accessed in the last 30 days:End of changeStart of change
RULE 'NEVER_COMPRESS' EXCLUDE WHERE lower(NAME) LIKE '%.mpg' OR lower(NAME) LIKE '%.jpg'
RULE 'COMPRESS_COLD' MIGRATE COMPRESS('yes') WHERE (CURRENT_TIMESTAMP - ACCESS_TIME) > (INTERVAL '30' DAYS)
End of change
For more information, see the following help topics:
  • Start of changeThe topic mmchattr command in the IBM Spectrum Scale: Administration and Programming ReferenceEnd of change
  • Start of changeOverview of policies in the IBM Spectrum Scale: Advanced Administration GuideEnd of change
  • Start of changePolicy rules: Syntax in the IBM Spectrum Scale: Advanced Administration GuideEnd of change
  • Start of changePolicy rules: Syntax in the IBM Spectrum Scale: Advanced Administration GuideEnd of change

When you do file compression, you can defer the compression operation a later time. For more information, see the subtopic Deferred file compression.

Warnings

Doing any of the following operations while the mmrestorefs command is running can corrupt file data:
  • Doing file compression or decompression.
  • Running the mmrestripefile command or the mmrestripefs, either to complete a deferred file compression or decompression, or for any other reason.
Warning:
  • Do not run file compression or decompression while an mmrestorefs command is running. This warning includes compression or decompression with the mmchattr command or with the mmapplypolicy command.
  • Do not run the mmrestripefs or mmrestripefile command while an mmrestorefs command is running.

Reported size of compressed files

After a file is compressed, operating system commands, such as ls -l, display the uncompressed size. Use du or the GPFS™ command mmdf to display the actual, compressed size. You can also make the stat() system call to find how many blocks the file occupies.

Deferred file compression

By default, the command that launches a file compression or decompression does not return until after the compression or decompression operation is completed. However, with both the mmchattr command and the mmapplypolicy compression, you can defer the compression or decompression operation and have the command return as soon as it completes any other operations. By deferring compression or decompression, you can complete the operation later when the system is not heavily loaded with processes or I/O.

To defer the compression, with either command, specify the -I defer option. For example, the following command marks the specified file as needing compression but defers the compression operation:
mmchattr -I defer --compression yes  trcrpt.150913.13.30.13.3518.txt
With the mmapplypolicy command, the -I defer option defers compression or decompression as well as data movement or deletion. For example, the following command applies the rules in the file policyfile but defers the file operations that are specified in the rules, including compression or decompression:
mmapplypolicy fs1 -P policyfile -I defer
To complete a deferred compression or decompression, run the mmrestripefile command or the mmrestripefs command with the -z option. (Do not run either of these commands if an mmrestorefs command is running. See the warnings in the preceding subtopic Warnings.) The following command completes the deferred compression or decompression of the specified file:
mmrestripefile -z trcrpt.150913.13.30.13.3518.txt

Indicators of file compression or decompression

The mmlsattr command displays two indicators that together describe the state of compression or decompression of the specified file:
COMPRESSED
The mmlsattr command displays the COMPRESSED indicator on the Misc attributes line of its output. See the example of mmlsattr output in Figure 1. If present, COMPRESSED indicates that the file is compressed or is marked for deferred compression. If absent, the absence indicates that the file is uncompressed or is marked for deferred decompression.

This indicator reflects the state of the GPFS_IWINFLAG_COMPRESSED flag in the gpfs_iattr64_t structure of the inode of the file. Start of changeFor more information about this structure, see the topic gpfs_iattr64_t_structure in the IBM Spectrum Scale: Administration and Programming Reference.End of change

illCompressed
The mmlsattr command displays the illCompressed indicator on the flags line of its output. See Figure 1. If present, illCompressed indicates that the file is marked for compression or decompression but that compression or decompression is not completed. If absent, the absence indicates that compression or decompression is completed. Start of changeFor more information about this structure, see the topic gpfs_iattr64_t_structure in the IBM Spectrum Scale: Administration and Programming Reference.End of change

This indicator reflects the state of the GPFS_IAFLAG_ILLCOMPRESSED flag in the gpfs_iattr64_t structure of the inode of the file. Start of changeFor more information about this structure, see the topic gpfs_iattr64_t_structure in the IBM Spectrum Scale: Administration and Programming Reference.End of change

Note: Some file system events can cause the illCompressed flag to be set. Consider the following examples:
  • When data is written into an already compressed file, the existing data remains compressed but the new data is uncompressed. The illCompressed flag is set for this file.
  • When a compressed file is memory-mapped, the memory-mapped area of the file is decompressed before it is read into memory. The illCompressed flag is set for this file.
For more information, see the subtopic Updates to compressed files.
In the following example, the output from the mmlsattr command includes both the COMPRESSED indicator and the illCompressed indicator. This combination indicates that the file is marked for compression but that compression is not completed:
Figure 1. Compression and decompression indicators
mmlsattr -L green02.51422500687
file name:            green02.51422500687
metadata replication: 1 max 2
data replication:     2 max 2
immutable:            no 
appendOnly:           no
flags:                illCompressed
storage pool name:    datapool
fileset name:         root
snapshot name:
creation time:        Wed Jan 28 19:05:45 2015
Misc attributes:      ARCHIVE COMPRESSED
Encrypted:            no
       

Together the Compressed and illCompressed indicators indicate the compressed or uncompressed state of the file. See the following table:

Table 1. COMPRESSED and illCompressed indicators
State of the file COMPRESSED is displayed? illCompressed is displayed?
Uncompressed. No No
Decompression is not complete. No Yes
Compressed. Yes No
Compression is not complete. Yes Yes

Updates to compressed files

When a compressed file is updated by a write operation, the file system automatically decompresses the region of the file that contains the affected data and sets the illCompressed flag. The file system then makes the update. To recompress the file, run the mmrestripefile command with the -z option, as in the following example:
mmrestripefile -z trcrpt.150913.13.30.13.3518.txt

The mmrestorefs command can cause a compressed file in the active file system to become decompressed if it is overwritten by the restore process. To recompress the file, run the mmrestripefile command with the -z option.

For more information, see the preceding subtopic Deferred file compression.

File compression and memory mapping

You can memory-map a file that is already compressed. The file system automatically decompresses the paged-in region and sets the illCompressed flag. To recompress the file, run the mmrestripefile command with the -z option.

As a convenience, the file system does not compress an uncompressed file or partially decompressed file if the file is memory-mapped. Compressing the file would not be not effective because memory mapping decompresses any compressed data in the regions that are paged in.

File compression and direct I/O

You can open a compressed file for Direct I/O, but internally the direct I/O reads and writes are replaced by buffered decompressed I/O reads and writes.

As a convenience, the file system does not compress a file that is opened for Direct I/O. Compressing the file would not be effective because direct I/O would be replaced by buffered decompressed I/O.

Start of change

Backing up and restoring compressed files

Files are decompressed when they are moved out of storage that is directly managed by IBM Spectrum Scale. This fact affects file backups by products like IBM Spectrum Protect, Tivoli Storage Manager for Space Management (HSM), Linear Tape File System™ (LTFS), Transparent Cloud Tiering (TCT), and others. When you back up a file with these products, the file system decompresses the file data inline when it is read by the backup agent. The file system also sets the illCompressed flag in the file properties. The backed-up file data is not compressed.

When you restore a file to the IBM Spectrum Scale file system, the file data remains uncompressed but the illCompressed flag is still set. You can recompress the file by running mmrestripefs or mmrestripefile with the -z option.

End of change

Limitations

File compression has the following limitations:
  • File compression in this release is designed to be used only for compressing cold data or write-once objects and files. Compressing other types of data can result in performance degradation. File compression uses the zlib data compression library and favors saving space over speed.
  • Start of changeFile compression processes consecutive segments of a file. For each segment, file compression calculates the potential savings in space. If the space savings are less than a certain threshold (10%), file compression does not compress the file segment but skips to the next segment.End of change
  • Direct I/O is not supported for compressed files.
  • The following operations are not supported:
    • Compressing files in snapshots
    • Compressing a clone
    • Start of changeCompressing files in an AFM cache site or in an AFM-based asynchronous Disaster Recovery (DR) fileset.End of change
    • Compressing small files (files that consume fewer than two subblocks, compressing small files into an inode).
    • Compressing files other than regular files, such as directories.
    • Start of changeCompressing files in a File Placement Optimizer (FPO) environment or in horizontal storage pools.End of change
    • Cloning a compressed file
  • On Windows:
    • Compression or decompression with the mmapplypolicy command is not supported.
    • Compression of files in Windows hyper allocation mode is not supported.
    • The following Windows APIs are not supported:
      • FSCTL_SET_COMPRESSION to enable/disable compression on a file
      • FSCTL_GET_COMPRESSION to retrieve compression status of a file
    • In Windows Explorer, in the Advanced Attributes window, the compression feature is not supported.