The mystery of stuck inactive msbuild.exe processes, locked Stylecop.dll, Nuget AccessViolationException and CI builds clashing with each other
MsbuildNugetStylecopMsbuild Problem Overview
Observations:
-
On our Jenkins build server, we were seeing lots of msbuild.exe processes (~100) hanging around after job completion with around 20mb memory usage and 0% CPU activity.
-
Builds using different versions of stylecop were intermittently failing:
workspace\packages\StyleCop.MSBuild.4.7.41.0\tools\StyleCop.targets(109,7): error MSB4131: The "ViolationCount" parameter is not supported by the "StyleCopTask" task. Verify the parameter exists on the task, and it is a gettable public instance property.
-
Nuget.exe was intermittently exiting with the following access violation error (0x0000005):
.\workspace\.nuget\nuget install .\workspace\packages.config -o .\workspace\packages" exited with code -1073741819.
MsBuild was launched in the following way via a Jenkins Matrix job, with 'BuildInParallel' enabled:
`msbuild /t:%Targets% /m
/p:Client=%Client%;LOCAL_BUILD=%LOCAL_BUILD%;BUILD_NUMBER=%BUILD_NUMBER%;
JOB_NAME=%JOB_NAME%;Env=%Env%;Configuration=%Configuration%;Platform=%Platform%;
Clean=%Clean%; %~dp0\_Jenkins\Build.proj`
Msbuild Solutions
Solution 1 - Msbuild
After a lot of digging around and trying various things to no effect, I eventually ended up creating a new minimal solution which reproduced the issue with very little else going on. The issue turned out to be caused by msbuild's multi-core parallelisation - the 'm' parameter.
- The 'm' parameter tells msbuild to spawn "nodes", these will remain alive after the build has ended, and are then re-used by new builds!
- The StyleCop 'ViolationCount' error was caused by a given build re-using an old version of the stylecop.dll from another build's workspace, where ViolationCount was not supported. This was odd, because the CI workspace only contained the new version. It seems that once the StyleCop.dll was loaded into a given MsBuild node, it would remain loaded for the next build. I can only assume this is because StyleCop loads some sort of singleton into the nodes processs? This also explains the file-locking between builds.
- The nuget access violation crash has now gone (with no other changes), so is evidently related to the above node re-use issue.
- As the 'm' parameter defaults to the number of cores - we were seeing 24 msbuild instances created on our build server for a given job.
The following posts were helpful:
- https://stackoverflow.com/questions/3919892/msbuild-exe-staying-open-locking-files
- http://www.hanselman.com/blog/FasterBuildsWithMSBuildUsingParallelBuildsAndMulticoreCPUs.aspx
- http://stylecop.codeplex.com/discussions/394606
- https://github.com/Glimpse/Glimpse/issues/115
- http://msdn.microsoft.com/en-us/library/vstudio/ms164311.aspx
The fix:
- Add the line
set MSBUILDDISABLENODEREUSE=1
to the batch file which launches msbuild - Launch msbuild with
/m:4 /nr:false
- The 'nr' paremeter tells msbuild to not use "Node Reuse" - so msbuild instances are closed after the build is completed and no longer clash with each other - resulting in the above errors.
- The 'm' parameter is set to 4 to stop too many nodes spawning per-job
Solution 2 - Msbuild
I had the same issue. One old reference I found was in csproj files
<PropertyGroup>
<StyleCopMSBuildTargetsFile>..\packages\StyleCop.MSBuild.4.7.48.0\tools\StyleCop.targets</StyleCopMSBuildTargetsFile>
Also, I deleted the entire "Packages" folder that's located in the same folder as sln file after I closed the visual studio. It triggered VS to rebuild the folder and let go of the cache of the old version of stylecop
Solution 3 - Msbuild
I've had the same issue for a while, builds were taking over 6 minutes to finish after some digging I found our it's node reuse fault so adding /m:4 /nr:false fixing my issue immediately