HAMMER compression
naota
Short description: This project add compression feature to HAMMER filesystem. HAMMER is a default filesystem on DragonFly. It provide fine grainded history/undo and historical snapshot feature, so the historical data is kept for a longer period of time. Compression should be done transparently from userspace. It might be possible to specify what to be compressed in many ways: per-filesystem, per-directory or per-file.
Name:Naohiro Aota
Email:naota@elisp.net
Phone number (include country and area code):
Website: http://elisp.net/
Project title:HAMMER compression
Description of project goals:
HAMMER is a default filesystem on DragonFly. It provides instant crash recovery, multi-volume file systems, integrity checking, fine grained history/undo, networked mirroring, and historical snapshots.
HAMMER has "dedup" feature. It check data blocks in the filesystem and merge duplicated data blocks into one, so that you can save your disk space. Like that, this project add "compression" feature for HAMMER. It check if some data block is not compressed and the block is required to be compressed, and then compress the data block. What to be compressed should be specified with per-filesystem, per-directory or per-file. So of course, I need to implement userland tools to specify what to be compressed and tools to determine how compression save disk space.
So the project steps are:
- Implement "compressed flag" metadata
- Implement ioctl to dump hardcoded compressed data to a file
- Implement uncompression
- Implement userland tool to search and compress blocks
- Implement the ioctl or such
- documentat the feature and the implementation
For this project, I'll use "gzip" to compress the data. But for the future extension, I'll implement as one can replace/add the compression way, I'll implement some "compression way" record as well.
About myself
Skills and Experience
- Kernel related works
- Fix Linux kconfig bug
- Fix/improve Linux idr doc
- DragonFly related works
- porting Gentoo to DragonFly (last year GSoC)
I'm building and testing Linux kernel on both x86 and amd64 machines. I'm also working to port FreeBSD kernel to Gdium (MIPS).
On non-kernel area, I ported Gentoo system to DragonFly BSD as last year GSoC project and now I'm Gentoo developer, so I have deep knowledge of system packages and toolchains like gcc, binutils or dynamic loader. I'm working with some Emacs related projects too. For example, I'm a maintainor of emacs-w3m and navi2ch. I write code with Emacs Lisp, C, C++, Haskell, Perl, PHP and Python. I love lerning languages (including natural ones).
Project timeline broken down by week:
(Community Bonding Period):
Prepare for the coding work. Read the documentation and code about following:
- HAMMER metadata (for CRC, compression recording)
- HAMMER data block implementation
- HAMMER dedup implementation
- How HAMMER double-buffer works
Week 1:
Implement HAMMER block "compressed flag" metadata.
Week 2:
Implement ioctl to dump hardcoded compressed data to a file. The all blocks of the file should be marked "compressed". Confirm reading from the file return the compressed data and the data can be uncompressed properly with the userland gzip tool.
Week 3:
Begin implement kernel side reading feature. Uncompress the data on reading the file.
Week 4:
Finish implementing the reading feature. Confirm the data is properly uncompressed.
Write a userland tool to parse a pfs and request blocks in the files to be compressed.
Week 5:
Begin writing kernel side features (ioctl) to compress the data blocks on requests.
Week 6:
Finish the ioctl implemntation. Confirm the followings:
- file data blocks are marked "compressed"
- the data is actually compressed
- the uncompressed data can be read
Week 7:
Implement a tool to determine how well the data is compressed (or will be compressed)
Week 8: (mid-term)
At Mid-term evaluation. gzip compressed data reading/writing should be done. Also you can determine how well the data is compressed. Bechmark reading from compressed data speed.
Week 9:
Improve: implement CRC metadata. Writing record the CRC and reading should verify the CRC.
Week 10:
Improve: implement historical blocks only compression.
Week 11:
Continue implementing historical blocks only compression feature.
documentat the userland tools
Week 12:
documentat the kernel side features, including HAMMER internal
Week 13: (pencil down)
document HAMMER internal
I give a bit more focus on documentation side. For later HAMMER work, I'd like to write some fine documentation about HAMMER internal data structure and its implementation.
