Wednesday, August 17, 2022

Encryption, Encoding, and Hashes, Oh My!


Maybe it's because many people have entered the cybersecurity field in the past few years, but I've been seeing basic security terms used incorrectly lately. As security professionals, we need to have a good handle on the fundamentals to communicate clearly and build trust. Some terms can be confusing at first blush. With this post I'll explain, at a high level, a few basic terms like encryption, encoding, and hashing that often get mixed up.

But first, let's talk about the CIA. No, not the Central Intelligence Agency. The CIA I'm referring to is Confidentiality, Integrity, and Availability, also known as the CIA triad. The CIA triad was pounded into my brain when I first got into application security and has stuck with me ever since. It is a fundamental concept everyone in cybersecurity should understand. When a vulnerability is exploited or a cyberattack is successful, it will negatively impact one or more of these areas.

  • Confidentiality means preventing data from being accessed or viewed unless it is properly authorized.
  • Integrity means protecting data against unauthorized changes. Without integrity, data is untrustworthy.
  • Availability means access to data and systems is maintained. A denial-of-service (DOS) attack aims to prevent or reduce availability.


Now let's dive into encryption, encoding, and hashing.

Encryption is typically used to protect confidentiality of data and often the integrity of data as well. Encrypted data is also known as ciphertext and it looks like gibberish to the human eye. The one thing that should always jump out to you when you hear the word "encryption" or "encrypted data" is that the data can be decrypted. Decrypting data means you're reversing the encryption. Decrypted data is also called plaintext or cleartext.

Data becomes encrypted by running it through an encryption algorithm using a key. A key can be a random string of bytes or a password of a certain length. There are two main encryption types - symmetric and asymmetric. The main point to take away is that symmetric encryption uses the same encryption key for encrypting and decrypting data while asymmetric encryption uses a pair of keys, a public one and a private one. With asymmetric encryption, the public key is used to encrypt data and the private key used to decrypt the data. If you see the term "public-key cryptography" or "public key infrastructure", it means that asymmetric encryption is involved.

Some common symmetric algorithms include:

Some common asymmetric algorithms include:

Encoding involves running some data through an algorithm of some sort. Encoded data is not secure and should never be called encrypted data (even though it often looks like gibberish to humans). It offers no protection when it comes to the confidentiality, integrity, or availability of data.

There are many different types of encoding, such as:

Encoding does have valid and useful purposes. HTML and URL encoding are indispensable when it comes to web browsers and web applications. Base64-encoded data is represented with standard ASCII characters, so it's perfect for sending images or other binary data over a text-based system like email.

Here's a Base64-encoded string as a example: 

    QXBwU2VjIGlzIGZ1biE=

Keep in mind that if you see an equals sign (or two) at the end of a string, that's a strong indicator that the data is Base64 encoded.

Hashing means that data is sent through a one-way, irreversible algorithm. It becomes gibberish and unreadable to the human eye. No one should ever talk about "reversing" or "decrypting" a hash value. It can't be done. There is no encryption key. You can, however, try to crack a hashed value (a hashed value is often just called a "hash"). Cracking essentially involves a big table of lookups and there are many cracking tools available to help with such things.

Common hashing algorithms include:

  • MD5 (old and not secure)
  • SHA-1 (also not considered secure anymore)
  • SHA-2 (includes SHA-256 and SHA-512 among others)
  • Argon2 (considered best for protecting stored passwords)

Finally, if you ever need to hash some data or want to encode or decode some data, take a look at this nice online utility. Use it to decode the example Base64-encoded string above!

I hope this article has been helpful to explain one tiny part of the cybersecurity ecosystem.

(This post first published as a LinkedIn article)