Skip to main content
  1. Projects/

huffman coding

251 words·2 mins· ·

A few weeks ago I read a couple of articles online about efficient data compression. It made me remember that during bachelor’s degree we saw Huffman coding, an algorithm that generates prefix-free binary codes starting from a set of characters and their frequencies. It can be used to efficiently compress file because the characters with the highest frequencies are converted into short bitstrings. I wanted to try to implement it from scratch using Go.

The project became a command line application (cli) for lossless file compression featuring vanilla Huffman coding. The cli has been developed using cobra.

To try Huffman coding I downloaded a .txt version of the Divine Comedy by Dante Alighieri and I tried to compress it using the cli.

divine-comedy

After compressing it with the command ./huffman c divine-comedy-it.txt the cli generates two files:

  • divine-comedy-it.txt.bin: the compressed file
  • divine-comedy-it.txt.metadata: contains the dictionary used to compress the original file

As shown below the .bin version is almost one half of the size (291KB) compared to the original file (530KB), in fact it is approximately 45% smaller.

term-screenshot

To decompress the file and get back the original version the cli provides a command which takes as inputs the compressed file and its metadata. You can even specify the output filename with the flag -o as shown below where -o output.txt is passed.

term-screenshot

Once the cli decompress the file you can see the file back to normal!

divine-comedy-decompressed

Repository on GitHub

micheledinelli/huffman-coding

Huffman coding implementation in Go for lossless file compression

Go
0
0