.NET Core, Hashes, and Windows vs Linux Line Endings
In this series I am sharing practical advice on how to solve problems I have stumbled upon when working on projects. In this particular post, I share a problem with hashing and what turned out to be a problem with line endings when running .NET Core on Linux box.
I one project I needed to hash some XML so that I can make quick distinctions between portions of XML. The code looks very similar to this:
private static string GetHash(string xmlText)
{
var xml = XElement.Parse(xmlText);
// Some xml operations omitted.
var xmlOutput = xml.ToString();
var sha = new SHA512Managed();
var hash = sha.ComputeHash(Encoding.UTF8.GetBytes(xmlOutput));
return BitConverter.ToString(hash).Replace("-", String.Empty)
}
Everything was fine and dandy until unit tests against this code started running on a Linux box. All of a sudden, I had a failing test that was asserting the generation of hashes.
// Expected hash
// A5160E8C189ACDCA1EF46BE0A1F2E40F278DC97...
// Actual hash
// 177603E9625A42B0BE47FC7ADD9788BCCB7F40E002...
When you are caught off guard like this, you start questioning everything - event the stability of the SHA512 across Windows and Linux (which is silly).
The code where the actual difference occurs, as many of you might have guessed already, is XElement.ToString().
By default, this method will format the XML which includes adding line endings for the respective platform. Of course, those are different on Windows vs. Linux. So hashing will produce different results on the two platforms since you are essentially hashing two different inputs.
The solution is to account for this difference the best way possible. In my case, the xml content was the important thing - not the formatting of that xml. The quick and easy solution was to simply remove formatting all together by using .ToString(SaveOptions.DisableFormatting).
And that is it. Thank you for tuning in.