1.3.1 Compression, Encryption and Hashing

Compression: The method used to make files smaller by reducing the number of bits (1’s and 0’s) used to store the information.

  • Reduction of file size is useful when it comes to sharing and transmitting data over the internet. Images on websites need to be in a compressed format to allow a webpage to load at a faster rate. It will also allow files to download at a faster rate.
  • Helps to reduce the size of files so that more data can be stored on a storage device.

Lossy Compression: An algorithm is applied to remove unnecessary detail from the original file.

  • Some data is permanently lost, but enough remains so that the file is still useful. There is barely a noticeable difference.
  • Lossy compression results in dramatic file size reduction.
  • Removes non-essential information from a file. The information lost in the process is not recoverable.
  • Used for sound and image files such as JPEG, MPEG & MP3.

Lossless Compression: An algorithm is used to retain all the information in a file while reducing its size.

  • Records patterns in data rather than the actual data. Using these patterns and a set of instructions on how to use them, the computer can reverse the procedure and reassemble an image, sound or text file with exact accuracy and no data is lost.
  • None of the original data is lost.
  • An algorithm can be used to perfectly restore the original file when needed.
  • This is useful for executable files, where all of the data is necessary.
  • Lossless compression causes file size to reduce moderately.
  • Used in ZIP, GIF & PNG.

Run Length Encoding (RLE): This is a form of lossless compression that replaces repeating sequences of 0s and 1s with more efficient representations. Each repeating string will be replaced by a code which represents the character and the amount of times it is to be repeated.

  • In images, adjacent pixels are likely to be similar colours by slightly different. Image compression algorithms often group these pixels together, and given them an ‘average’ colour. The Run Length Encoding algorithm can then run on the image. This technique then becomes lossy.
  • A dictionary is used to store pixels, words or other grouping of bits. Repeated occurrences are stored in a dictionary or table, plus their number of occurrences. For example: 100 blue pixels can be stored as B100.
  • This is used in TIFF and BMP files.

Dictionary-based Compression Techniques (Dictionary Coding): This is used to store text in a lossless format.

  • The compression algorithm searches through the text to find suitable entries in its own dictionary or it may use a known dictionary and translate the message accordingly.
  • Each word is replaced by the binary number of the word in a dictionary.
  • Each time it finds a new word which is not in its dictionary, it will add it to the dictionary and give it a binary number.
  • The word in the actual text is replaced with the binary number.
  • Benefit: It takes less bits to store a 2-digit binary number than a 10 letter word. No data is lost, so the coding is lossless.

Encryption: The transformation of data from one from to another to prevent an authorised third party from being able to understand it. Only the intended recipient will know how to decode the data.

  • It is a method of protecting data by scrambling the contents using an algorithm, which makes use of a key, so that data cannot be read unless the correct key is provided.

Encryption Keys: Many encryption methods depend upon keys, which are shared secret. Keys are made up of a pair of very large prime numbers. It would take an impractical amount of time to guess the key, so we say that the encryption is secure.

Symmetric (Private Key) Encryption: This uses the same key to encrypt and decrypt data. This means that the key must also be transferred, which is known as the Key Exchange, to the same destination as the cipher text. This can cause obvious security problems as the key can be intercepted as easily as the cipher text message to decrypt the data.

Asymmetric (Public Key) Encryption: This uses two separate, but related keys. One key, known as the public key, is made public so that others wishing to send you data can use this to encrypt the data. This public key cannot decrypt data. Instead, another private key is known by the user and this can be used to decrypt the data. It is virtually impossible to deduce the private key from the public key.

  • However, it is possible a message could be encrypted using your own public key and sent to you by a malicious third-party impersonating a trusted individual. To prevent this, a message can be digitally ‘signed’ to authenticate the sender.

Hashing: This is the process of taking an input, performing some form of calculation on the input that outputs a value of fixed size. The output is known as a hash.

  • The hash function is non-veritable. This means hashing is extremely secure as you cannot reverse a hash to get back to the original data that was inputted.
  • It is useful for storing encrypted passwords so that they cannot be read by a hacker.

Cryptographic Hash Functions: A hash total is a mathematical value calculated from unencrypted message data. This value is also referred as a checksum or digest.

  • The process is irreversible and impossible to crack other than by trying all of the possible inputs until a match is found.
  • Since the hash total is generated from the entire message, even the slightest change in the message will produce a different total.

Digital Signatures (or Hash Value): This is the equivalent of a handwritten signature or security stamp, but offers greater security.

  • The sender of the message uses their own private key to encrypt the hash total. The encrypted total becomes the digital signature since only the holder of the private key could have encrypted it.
  • The signature is attached to the message to be sent and the whole message including the digital signature is encrypted using the recipient’s public key before being sent.
  • The recipient decrypts the message using their private key, and decrypts the signature using the sender’s public key. The hash total is then reproduced based on the message data and if this matches the total in the digital signature, it is certain that the message genuinely came from the sender and that no parts of the message were changed during the transmission.
  • To ensure that the message could not be copied and resent at a later date, the time and date can be included in the original message, which if altered, would cause a different hash total to be generated.

Loading