Continuous Integration: Differencing .NET Assemblies – MVID regenerates for every compilation
Posted by archworx on January 22, 2007
Background: This post is about challenges in differencing binaries to identify if they are identical between assemblies on the testing environment and assemblies generated from code on SourceSafe. Read on for more details.
The Problem: At the end of your development efforts, you typically need to subject your code base to testing, and then upon testing approval, you ship your code to the client. However, bear in mind the following:
- Your Code Base is on a Source Control engine
- Your binaries are not – they are on the development and testing servers. It doesn’t make sense from a configuration management perspective to store binaries on source control (because:
- You can always generate them (theoretically) exactly from the source.
- It is a huge burden to ensure object/source consistency
- Even if you do keep them, you’ll have to check that the source & object are indeed consistent, which is an extra overhead)
- You typically need to ship to your client the binaries that are on your testing server – as those are the ones that have been approved by the testers.
- You can’t ship the source unless you are sure it generates the binaries the developers claim they have produced on the testing environment.
- So the obvious process is to retrieve your source code from your Source repository and recompile it.
Enter .NET Assemblies – which include the following obstacles to successfully being able to recompile the exact binary stream of code twice:
- By default .NET assemblies change their version # every time you compile – this is a good thing, as it provides for very good tracking of version numbers; something that is sadly lacking in many developer’s culture. However, this means that binary differencing will yield false positives.
- If you need your assembly to be hosted on the GAC, or otherwise want to sign your assembly, your assembly must be strongly named, this can pose challenges if you sign them with keys that are not controlled properly.
- The assembly header also contains a field called the “MVID” – which is the Module Version Identifier. This field’s purpose is solely to be unique for each time the module is compiled. This is a rather powerful concept, in the sense that this is the first time I’ve personally seen the concept of someone wanting to distinguish a compilation instance from another one, irrespective of the code being compiled itself.
The Solution: This article is about attempts to solve the three aspects of the problem described above. At this time, we have a simple solution and a workaround for the first two – about the version and the signatures, and we have hopeful indicators that the MVID issue too can be resolved.
- Version Numbers – can be explicitly defined through the removal of the “*” sign for release builds. You can find this field in the assembly info.
- Strongly Named – let’s ignore thise case temporarilly.
- MVID – we believe this can be controlled via a compiler option – but I am yet to find it.
The rest of this post is mostly dedicated to discussing the MVID issue.
Intermediate Language Disassembly:
ildasm /text /all file.dll
The MVID is used by the .NET CLR to determine whether or not to reload the precompiled assembly data.
This is to allow caching such precompiled data, and consequently ensuring cache integrity.
This would imply that the MVID is only useful when precompiled information exists in the assembly.
Typically precompilation only happens when you use NGEN.EXE.
Consequently not generating an MVID or generating it with the same ID is not necessarily a dangerous idea to contemplate.
Emperical Observation has shown that Nant manages to automagically generate the same MVID each time it recompiles, thus dispelling the myth that it must be unique for every ”compilation”. There must be a way to mimic Nant’s communication with the C# Compiler, as it must be using it to do the compilation. There is no way that Nant is faking a compilation. Or is there? 😉
The observations proposed herein are very encouraging, even in so far as they encourage extreme ideas, such as:
1. manually coercing the same uid value for the mvid for otherwise identical compilations (via injecting it into the binary for example); because this would theoretically not jeopardize the sanity of the ngen-generated data.
2. We could do a manual textual comparison of the assembly’s code via ildasm /text and a script that conceals the mvid information