Jump to content

Talk:Glossary of digital forensics terms

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Hashing

[edit]

Whether comparing hash values from drive images or individual files, identical hash values DO NOT verify that the questioned file is 'identical' to the source. Hash values can only verify that the questioned file is NOT identical to the source file.

Matching hash values imply that the questioned file is identical to the source but does not verify or guarantee. For instance, it can be guaranteed that two dissimilar files will generate identical hash values when a file system has 2(hash key space) + 1 different files. In the case of the MD5, that would be 2^128 + 1 or roughly speaking, 3.4 * 1038 + 1, or even more roughly speaking, 3.4 billion billion billion billion + 1. Two different files generating the same hash value is called a collision or in the vernacular it's more often known as a false positive. Granted, when two files generate identical hash values there is an substantially high probability that both files are identical but it is not a verification.

However, when the hash values of two files are different it's guaranteed (absent faulty calculations or faulty code implementation) that the two files are different. A hash is calculated by an known algorithm. The files are the input and the hash is the output and the algorithm is of course a mathematical calculation.

x = our input (file)
y = our output (hash value)

Imagine then that our hashing algorithm is: y = x + 3. Therefore, with two files, if y1y2 then the inputs (x) must be different. But the opposite is not necessarily true. What makes this difficult for many to understand is that normally in math, and using our simplified equation, if y1 = y2 we can know that the inputs (x) are identical. But in the case of hash values, when y1 = y2 we cannot conclude that x1 = x'2 are identical. The reason for the difference is that while our example equation has no limit to outputs (y is infinite) the possible outputs from a hashing algorithm are finite and constrained by the key space. Using our example, and in its simplest form, let's constrain y and say that y must be a whole number between 1 and 9, inclusive. Looking at our simple equation then, if x = 7 then we must use some process by which we can reduce y to a value within our constraints.

To do this, let's turn then to numerology; not an exact science and not what's used in hashing algorithms but it's illustrative. In numerology, one finds the numerological value of a number by adding each of the individual numbers in the number until a single digit number results. For instance, the numerological value of 1, 8, and 12 = 3. We obtain this by adding the individual numbers until we have a single digit number. Like this: 1+8+12 = 21, 2+1 = 3. We'll use this formula to constrain y. So, if x1 = 15 then the properly constrained y1 = 9 (15+3=18 and 1+8=9). But when x2 = 24 the constrained value of y2 is also = 9 (24+3=27 and 2+7=9). While we can say, with absolute certainty, that y1 = y2 we cannot say (and in fact we know it not to be the case) that x1 = x2. It's for this simplified reason that we cannot say that matching hash values verify that two files are identical. — Preceding unsigned comment added by Pndfam05 (talkcontribs) 17:11, 20 June 2012 (UTC)[reply]

All absolutely true; and with good sourcing worth including in the article about hashing. However given the way in which forensic data is usually hashed, and the size of the relevant files, the possibility of collisions is extremely small. So for a one line summary the short hand is appropriate in explaining the concept. --Errant (chat!) 17:39, 20 June 2012 (UTC)[reply]
[edit]

Hello fellow Wikipedians,

I have just modified one external link on Glossary of digital forensics terms. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 01:22, 20 October 2017 (UTC)[reply]