Adaptive Batch Sizes for Backend Processing


Most business systems include some form of backend processing. This could be report generation, data transformations, credit card processing, payment auditing, or countless other scenarios. It’s typical for these systems to pull records out of a queue, perform the necessary processing, and then move on to the next record. When possible, these systems are engineered to process more than one record at a time, reducing overhead and increasing efficiency. Each time a batch processing system is created though, we face a difficult question.

What is the best batch size?

This question is always hard to answer because we know that our development environment will differ from the production environment. To combat this problem, most developers define an environment variable or configuration setting that will control the batch size, and then hard-code a default value if the setting is not supplied. This provides a feeling of comfort that we can change the setting in production without having to update the code. But this approach falls short in many ways.

NuGet Package Statistics

NuGet.org creates records every time a package is downloaded—this happens about 750,000 times per day or 8.5 times per second. The records are stored in the production database in a denormalized table where the raw values can easily be inserted at that pace. Then twice each day, we produce updated package download reports for every package with download activity since the last time its report was generated.

To generate these package download reports, we have backend processes that aggregate total download numbers, replicate the records into a warehouse database, and then purge records that are at least 7 days old and that have already been replicated. Each of these processes works against batches of records; choosing batch sizes for each of them was difficult.

Throughput Factors

When trying to select a batch size for each of these processes, we realized that there are lots of factors that come into play. Here are the variables that we found to have significant impact on throughput:

  1. Scale settings for our production database (SQL Azure)
  2. Scale settings for our warehouse database (SQL Azure)
  3. Virtual Machine specifications on our backend processing server (Azure VM)
  4. Current load on the production database
  5. Current load on the warehouse database
  6. Current load on the backend processing server (it performs lots of other backend jobs at the same time)
  7. Index fragmentation in the production database
  8. Index fragmentation in the warehouse database
  9. Number of records in the queue
  10. Network latency

Each time any of these factors changed, the previous choice we’d made for our batch sizes become stale. Every once in a while, a batch would fail, cause an error, and raise an operations alert. We would then file a bug: “Stats Replicator cannot process the current batch size without timing out.” There are two obvious fixes for the bug:

  1. Increase the timeout
  2. Reduce the batch size

Either of these “fixes” would get the job unstuck, but then it’s just a matter of time before the change is stale.

The Edge of Failure

Batch processing can be more efficient because it reduces overhead. There’s startup/shutdown time required for each iteration of the process. When you only pay the startup/shutdown cost once but process thousands of records, the savings can be significant. The bigger the batch, the more we save on overhead. But there’s usually a breaking point where giant batch sizes lead to failure. Finding the largest batch size that can be successfully processed often yields the best performance.

To make the backend processes for NuGet.org as efficient as possible at all times, I created an approach that discovers this breaking point and then automatically adapts batch sizes to achieve the best throughput attainable within the current environment.

Defining Batch Size Ranges

Instead of defining a single batch size setting to be used, the new approach uses a pair of parameters to specify the minimum and maximum batch sizes. These batch sizes aren’t guesses, they are objective numbers with meaning.

Minimum Batch Size

The minimum batch size is truly a minimum. If the system fails to process a batch of this size, it is considered an error and the process will crash. This will lead to an operations alert to inform the team that something is wrong.

Maximum Batch Size

The maximum batch size is the max size that we would ever want to be processed at one time. This number can be selected based on the scenario and it should take into account issues like debugging when a batch encounters a bug. But this number should be as large as you’re comfortable with—don’t worry about what the system will be “capable of” handling—because all of the factors above affect the capability. If you scale your server up significantly, a previously unfathomable batch size may become not only possible, but preferable.

Sampling and Adapting

With a batch size range provided, we can now take samples of different batch sizes. This sampling will produce two important pieces of data:

  1. The edge of failure, where the batch succeeds but larger batch sizes fail (generally by exceeding a timeout period)
  2. The throughput measured for each sampled batch size, in terms of records per second

To accomplish the sampling, we take the following approach:

  1. Process the minimum batch size and record the throughput (records/second)
  2. Incrementally increase the batch size toward the maximum batch size, stepping by 10%

    batchSize = minBatchSize + ((maxBatchSize - minBatchSize) / 10 * samplesTaken);

  3. Record the throughput for each sample

    batchTimes[perSecond] = batchSize;

  4. If a batch size times out, record its throughput as Int32.MaxValue and decrease the maximum batch size by 33%

    maxBatchSize = maxBatchSize * 2 / 3;

Once we’ve finished taking our 11 samples (yes, 11, because fenceposts), we then use the sampling data to begin adapting our batch sizes. Each time we’re ready to process another batch, we calculate the next batch size to use. This calculation aims to find the best possible batch size, but we don’t simply want to choose the best batch size we’ve seen so far because there’s usually a batch size better than what we’ve already seen. Instead, we select the best 25% of our batches and then use the average batch size across them.

var bestBatches = batchTimes.OrderByDescending(b => b.Key).Take(batchTimes.Count / 4);
var nextBatchSize = (int)bestBatches.Select(b=> b.Value).Average();

We will then use this size to process the next batch. We’ll record its throughput and add it into our samples. As we continue to process more batches, we’ll have a larger pool of sample values to select our 25% best batches from, and we’ll be averaging out more batch sizes. But because previous batch sizes were selected based on the averages in the first place, the result is zeroing in on the batch size that yields the best throughput.

Examining the Numbers

Let’s take a look at how this can play out.

Configuration

  • Min Batch Size: 100
  • Max Batch Size: 10000
  • Timeout Period: 30 seconds

Initial Sampling

  1. Batch: 100; Time: 1 sec; Pace: 100/sec
  2. Batch: 1090; Time: 9 sec; Pace: 121/sec
  3. Batch: 2080; Time: 14 sec; Pace: 149/sec
  4. Batch: 3070; Time: 19 sec; Pace: 162/sec
  5. Batch: 4060; Time: 26 sec; Pace: 156/sec
  6. Batch: 5040; Time: TIMEOUT (Int32.MaxValue). Max set to 10000 * 2 /3 = 6667
  7. Batch: 4042; Time: 25 sec; Pace: 161/sec
  8. Batch: 4699; Time: 29 sec; Pace: 162/sec
  9. Batch: 5356; Time: TIMEOUT (Int32.MaxValue). Max set to 6667 * 2 / 3 = 4445
  10. Batch: 4015; Time: 26 sec; Pace: 154/sec
  11. Batch: 4445; Time: 27 sec; Pace: 165/sec

Adapting

After taking these 11 samples, we’ve learned that we can’t seem to get past ~5000 records in a batch without timing out; the maximum successful batch was 4699 at 29 seconds (162/sec). But we also see that within the timeout period, larger batches are providing better throughput than smaller batches. The system will now automatically adapt to use this data.

The samples we've taken can be ordered like this:

  1. 4445 (165/sec)
  2. 4699 (162/sec)
  3. 3070 (162/sec)
  4. 4042 (161/sec)
  5. 4060 (156/sec)
  6. 4015 (154/sec)
  7. 2080 (149/sec)
  8. 1090 (121/sec)
  9. 100 (100/sec)
  10. 5040 (Int32.MaxValue/sec)
  11. 5356 (Int32.MaxValue/sec)

Considering the best 25% of these values (that will be the top 3), we calculate the average of the batch sizes to be 4071. That will be the next batch size. We’ll time that batch as well, and put its data into the sample set.

As more batches are executed, we’ll see performance fluctuate, batch sizes vary a bit, but ultimately narrow down to a small deviation. After around 100 iterations, the value becomes relatively static. So the next step is to guard against circumstances changing and our data becoming stale.

Periodic Resets

After around 100 iterations, we lose some of our ability to adapt. Even if the times start to get very bad for the batch size we’re zeroing in on, there’s too much data indicating that batch size should be efficient. The easiest way to combat this problem is to perform periodic resets. After 100 iterations, simply reset all sample data and start fresh—take 11 new samples and then run 89 more iterations afterward, adapting anew.

While this reset can lead to a few inefficient batches, it’s an important part of what makes the system fully reliable. If load on the production system or any of the other throughput factors changes, it won’t be long before we reset and discover that we need to change our target range.

The Code

This approach is in use within a few of our backend processes around package statistics. The most straight-forward example is the job that finds package statistics from the production database that have already been replicated over to the warehouse and can now be purged from the production database.

Interesting Methods

GetNextBatchSize

RecordSuccessfulBatchTime

RecordFailedBatchSize

PurgeCore

Benefits

The biggest benefit I've seen from this approach is that our production system stays alive and efficient all the time. We used to have to tweak the batch sizes pretty regularly. And when our statistics processing fell behind, it could take a long time to catch up because our batch sizes were conservative. Now, the batch sizes can get more aggressive automatically, while ensuring we avoid timeouts.

Overall, these processes are now much more hands-off. If we need to increase throughput, we can scale a server up and the process will automatically take advantage of the improvement and use bigger batch sizes if that yields better results. But if the system is under load, the process will automatically back off if smaller batch sizes are proving to run at a steady pace.

Technorati Tags: ,

author: Jeff Handley | posted @ Tuesday, December 16, 2014 11:49 PM | Feedback (0)

Evolving NuGet's Code at Øredev 2014


I had the pleasure of attending Øredev 2014 and presenting two sessions about evolving NuGet's code. It was my first time attending the conference and it was a terrific experience.

My sessions were essentially 3-year retrospectives on NuGet's code, both server-side and client-side. I talked through how we built the projects, the initial goals and principles, what we've learned, and what our new principles are.

If you work in a monolithic codebase that you feel is hard to maintain and add features to, and you have that strong desire to throw it all away and start over, then you might be able to relate to the stories. We found, as many teams do, that there's never a good way to start over, and you have to find creative ways to replace subsystems of the giant beast. The sessions tell the story of how we've been doing that with NuGet.

Evolving the NuGet.org Architecture

Video

EVOLVING THE NUGET.ORG ARCHITECTURE from Øredev Conference on Vimeo.

Slides

NuGet 3.0 - Transitioning from OData to JSON-LD

Video

NuGet 3.0 – Transitioning from OData to JSON-LD from Øredev Conference on Vimeo.

Slides

author: Jeff Handley | posted @ Wednesday, December 10, 2014 12:00 AM | Feedback (0)

Volunteering with Cub Scouts


I've volunteered with the Cub Scouts for the last 5 years and it's been a great experience. As I'm winding down my role in the pack, I was asked to write a testimonial about how rewarding it has been to be involved. Here it is.

Spending my days working in an office on a computer, I generally declare that I'm not very handy. As my sons showed interest in building and fixing things, I would often tell them that I was not capable of doing it. But after 5 years of volunteering with Pack 561, I've learned that in fact, I am handy, and I can build and fix things!

You see, as adults, we tend to know our boundaries; we know what we can and cannot do. But our children are still learning their boundaries. And Cub Scouts are taught that in fact, there are no boundaries--you can do whatever you're interested in! If you want to learn how to whittle, you can! If you want to learn how to build a Derby car, you can! If you want to learn how to identify trees, birds, or animal poop, you can! You can tour a TV station. You can meet and interview the mayor. You can experience what it's like to sit in a fire truck or inside a jail cell. You can learn a little bit about a lot of things so that you can discover what you want to learn more about.

When I first started volunteering with Pack 561, I faced some boundaries. How could I possibly teach children how to build a Derby car when I've never built one myself? How could I possibly lead them on a trek at the Arboretum, identifying birds, when I can't tell the difference between a cardinal and a robin? I can't do these things and I'm too old to learn. But when I saw first-hand the eagerness to learn that our Scouts have, it rubbed off on me. I can learn how to build a Derby car! I can learn to identify birds! I can learn all kinds of things!

I may not have been in Scouts when I was a boy, but I've spent the last 5 years volunteering with the Scouts, and I've learned so much. But the most important thing I've learned is that I don't have as many boundaries as I thought I had, and I'm not too old to learn about all of the exciting topics that my sons want to learn about themselves. And just today, I worked with my sons to to take apart and fix two of their toys that have been broken for years. I used to tell them that I didn't know how to build and fix things, but today I proved to them that life isn't about what you know, it's about what you are willing to learn.

Jeff Handley

Den Leader, Den 2 (2009-2015)
Den Leader, Den 4 (2011-2014)
Derby Car Workshop Host (2012-2014)
Webmaster (2013-2014)
Honorary Scout (2009-2015)

author: Jeff Handley | posted @ Friday, November 28, 2014 10:56 PM | Feedback (0)

A Fun ValidationAttribute Bug


I tweeted about a bug that I recently helped fix in System.ComponentModel.DataAnnotations.ValidationAttribute. As I said, it's a bug resulting from code I wrote for that class years ago. I was honestly surprised there was any interest in this, but there was! Since I piqued your interest, I thought it only fair that I quench your thirst and show you the details of the bug.

History and Backwards Compatibility

The first implementation of ValidationAttribute had a method with the following signature:

public abstract bool IsValid(object value);

 

Any class that inherited from ValidationAttribute had to override the IsValid method and put the validation logic in place.  This was fine and dandy until .NET 4.0 when I worked on a set of features to introduce context-aware validation attributes using new classes called ValidationContext and ValidationResult.  Using ValidationContext, validation attributes could perform complex business logic using application services or even calls into a database.  With this, we wanted to add an overload to IsValid to allow the following signature:

public abstract ValidationResult IsValid(object value, ValidationContext validationContext);
 

Of course we couldn’t add a new abstract method to a class, as that would break existing implementations.  So instead, we looked into adding the following:

public virtual ValidationResult IsValid(object value, ValidationContext validationContext) {
    ValidationResult result = ValidationResult.Success;
    
    if (!this.IsValid(value)) {
        string[] memberNames = validationContext.MemberName != null ? new string[] {
            validationContext.MemberName
        } : null;
        
        result = new ValidationResult(
            this.FormatErrorMessage(validationContext.DisplayName),
            memberNames);
    }

     return result;
}

 

This introduced a new problem: new attributes that want to use the ValidationContext must now override both overloads of IsValid, and that would be rather confusing.  We wanted new attributes to only have to override the ValidationContext-based IsValid overload and add documentation that the old boolean-based IsValid method should not be overridden—changing it from abstract to virtual.  We’d change that method to the following:

public virtual bool IsValid(object value) {
    // Call the ValidationContext-based method and if it's successful, return true
    return this.IsValid(value, validationContext: null) == ValidationResult.Success;
}

 

This is where I got in the code before I introduced the bug.  We’ll cover that next.

Ensuring One Overload is Overridden

This is an unusual situation.  We want to introduce a new overload that calls into the original method if it’s overridden.  But we want to make the original method virtual and have it call into the new overload if it’s overridden.

Let’s state that again, because it can be confusing:

  1. If the original method is overridden, have the new overload’s base implementation call into it
  2. If the new overload is overridden, have the original method’s base implementation call into it

A third way of stating it is:

  1. Allow implementers to override either method
  2. Call into the overridden method from whichever base implementation remains

Needless to say, there’s a risk of a circular reference that we need to prevent too.  The way I solved this was to use a private field and a lock to track the state of whether a base implementation was in the middle of making a call to an overridden implementation.  You can see this in the .NET Framework reference source for the System.ComponentModel.DataAnnotations assembly’s ValudationAttribute class.  Here’s the snippet too:

/// <summary>
/// Gets the value indicating whether or not the specified <paramref name="value"/> is valid
/// with respect to the current validation attribute.
/// <para>
/// Derived classes should not override this method as it is only available for backwards compatibility.
/// Instead, implement <see cref="IsValid(object, ValidationContext)"/>.
/// </para>
/// </summary>
/// <remarks>
/// The preferred public entry point for clients requesting validation is the <see cref="GetValidationResult"/> method.
/// </remarks>
/// <param name="value">The value to validate</param>
/// <returns><c>true</c> if the <paramref name="value"/> is acceptable, <c>false</c> if it is not acceptable</returns>
/// <exception cref="InvalidOperationException"> is thrown if the current attribute is malformed.</exception>
/// <exception cref="NotImplementedException"> is thrown when neither overload of IsValid has been implemented
/// by a derived class.
/// </exception>
#if !SILVERLIGHT
public
#else
internal
#endif
virtual bool IsValid(object value) {
    lock (this._syncLock) {
        if (this._isCallingOverload) {
            throw new NotImplementedException(DataAnnotationsResources.ValidationAttribute_IsValid_NotImplemented);
        } else {
            this._isCallingOverload = true;

            try {
                return this.IsValid(value, null) == null;
            } finally {
                this._isCallingOverload = false;
            }
        }
    }
}

#if !SILVERLIGHT
/// <summary>
/// Protected virtual method to override and implement validation logic.
/// <para>
/// Derived classes should override this method instead of <see cref="IsValid(object)"/>, which is deprecated.
/// </para>
/// </summary>
/// <param name="value">The value to validate.</param>
/// <param name="validationContext">A <see cref="ValidationContext"/> instance that provides
/// context about the validation operation, such as the object and member being validated.</param>
/// <returns>
/// When validation is valid, <see cref="ValidationResult.Success"/>.
/// <para>
/// When validation is invalid, an instance of <see cref="ValidationResult"/>.
/// </para>
/// </returns>
/// <exception cref="InvalidOperationException"> is thrown if the current attribute is malformed.</exception>
/// <exception cref="NotImplementedException"> is thrown when <see cref="IsValid(object, ValidationContext)" />
/// has not been implemented by a derived class.
/// </exception>
#else
/// <summary>
/// Protected virtual method to override and implement validation logic.
/// </summary>
/// <param name="value">The value to validate.</param>
/// <param name="validationContext">A <see cref="ValidationContext"/> instance that provides
/// context about the validation operation, such as the object and member being validated.</param>
/// <returns>
/// When validation is valid, <see cref="ValidationResult.Success"/>.
/// <para>
/// When validation is invalid, an instance of <see cref="ValidationResult"/>.
/// </para>
/// </returns>
/// <exception cref="InvalidOperationException"> is thrown if the current attribute is malformed.</exception>
/// <exception cref="NotImplementedException"> is thrown when <see cref="IsValid(object, ValidationContext)" />
/// has not been implemented by a derived class.
/// </exception>
#endif
protected virtual ValidationResult IsValid(object value, ValidationContext validationContext) {
    lock (this._syncLock) {
        if (this._isCallingOverload) {
            throw new NotImplementedException(DataAnnotationsResources.ValidationAttribute_IsValid_NotImplemented);
        } else {
            this._isCallingOverload = true;

            try {
                ValidationResult result = ValidationResult.Success;

                if (!this.IsValid(value)) {
                    string[] memberNames = validationContext.MemberName != null ? new string[] { validationContext.MemberName } : null;
                    result = new ValidationResult(this.FormatErrorMessage(validationContext.DisplayName), memberNames);
                }
                return result;
            } finally {
                this._isCallingOverload = false;
            }
        }
    }
}

You’ll notice a fun detail in that for Silverlight code (we cross-compile this code to .NET and Silverlight), we made the original method internal instead of public, because it was a new class for Silverlight—therefore there was no reason to even introduce the method on the public surface area.  Instead, we’d only have the ValidationContext-based approach.

Locks Are Bad

So this is where the code landed.  I had it code reviewed by about a dozen smart people—all much smarter than me in fact.  We all felt sick to our stomachs about it, but we couldn’t think of a better way to accomplish it.  I got code review sign-off, checked in, and this code has been in place for several years now.

Recently though, our team that helps service older code found that this lock I created is quite the bottleneck when validating a lot of attributes.  I don’t know the specific scenario, but it doesn’t really matter—that lock that I used was a bad idea and it’s causing a performance bottleneck for some customers.  It needed to be fixed.

A Cleaner Approach

A couple of weeks ago, Miguel Lacouture worked up a much cleaner approach to solving this problem that doesn’t use locks.  He asked if I would code review his fix since I had written the original code.  I had some review feedback for him, but his approach seems significantly superior to what I had come up with long ago.  With the feedback I sent him, here’s the proposed new implementation for these two methods:

private bool _hasBaseIsValid = false;
private bool _hasBaseIsValidWithContext = false;

virtual bool IsValid(object value) {
    // Track that this overload wasn't overridden
    if (!_hasBaseIsValid) {
        _hasBaseIsValid = true;
    }

    // That means that the other overload must be overridden
    // And if it hasn't been, then throw a NotImplementedException
    if (_hasBaseIsValidWithContext) {
        throw new NotImplementedException(DataAnnotationsResources.ValidationAttribute_IsValid_NotImplemented);
    }

    // We know the other overload was overridden
    // So call it to produce the result
    return this.IsValid(value, null) == null;
}

virtual ValidationResult IsValid(object value, ValidationContext validationContext) {
    // Track that this overload wasn't overridden
    if (!_hasBaseIsValidWithContext) {
        _hasBaseIsValidWithContext = true;
    }

    // That means that the other overload must be overridden
    // And if it hasn't been, then throw a NotImplementedException
    if (_hasBaseIsValid) {
        throw new NotImplementedException(DataAnnotationsResources.ValidationAttribute_IsValid_NotImplemented);
    }

    // We know the other overload was overridden
    // So call it to produce the result
    ValidationResult result = ValidationResult.Success;

    if (!this.IsValid(value)) {
    string[] memberNames = validationContext.MemberName != null ? new string[] {
        validationContext.MemberName
    } : null;

    result = new ValidationResult(
        this.FormatErrorMessage(validationContext.DisplayName),
        memberNames);

    return result;
}

The general idea is that the lock was simply unnecessary because we can easily just record which of the base implementations still exist, and then determine if they both exist.  The lock was guarding against multiple threads calling into one of the IsValid implementations and temporarily flipping a bit to true when we were calling an overload.  But this new approach just turns switches on and if multiple threads come in it won’t cause any problems.

Look Good?

This code is unique—ensuring that 1 of 2 overloads is overridden—and the existing bug has been there for years.  What do you think of the new implementation?  See any holes in it?  Of course, Miguel and the rest of his team will test the heck out of the new implementation, but if you see any issues with it, let me know.

Kudos to Miguel for finding a novel approach to solving the problem.  I wish I’d thought of this approach back when I was writing this code for RIA Services.

author: Jeff Handley | posted @ Tuesday, April 15, 2014 1:29 AM | Feedback (2)

Common NuGet Misconceptions: Package Restore


Package Restore is one of NuGet’s most popular features.  This has been especially true since NuGet 2.7 introduced Automatic Package Restore in Visual Studio and the nuget.exe restore command for simple command-line package restoration.  We hear many compliments about the feature and how it is transforming the way developers reference 3rd party libraries.  However, we also hear quite a few “bug reports” of Package Restore failing to perform a couple of expected functions that it will never do.  This post is to clear up the common misconceptions about package restore.

Content Files Are Not Restored

This is probably the number one misconception about NuGet all-up, let alone just Package Restore.  Many people think that NuGet Package Restore will bring back packages’ content files into their project folders when those content files have been omitted from source control.

It won’t.

Package Restore checks the solution’s /packages folder for all packages that have been referenced by the project(s) to ensure the packages exist in the /packages folder.  If they don’t, the package is downloaded and unpacked into the /packages folder.  That is all.  Package Restore has never checked the packages to see what content files they include to then copy those files into your project if the files are missing.  And it won’t.

Why Not?

There are several reasons.  To be brief, I’ll cover them tersely here:

  1. Adding content files to your project is a design-time action, not a build-time action
  2. Some packages add content files that you’re expected to edit – while many packages’ contents files are not meant to be edited, NuGet doesn’t presently have a way to differentiate the two scenarios
  3. Some packages use the source code transformation feature of NuGet to replace tokens in the content files with project properties, which wouldn’t be possible during Package Restore
  4. Packages’ content files can vary by target framework – knowing which target framework’s content files to copy into your project requires inspecting the project to know its target framework, which is beyond the scope and capability of Package Restore
  5. Some packages use install.ps1 PowerShell scripts to further manipulate content files after NuGet initially adds them to the project, and PowerShell scripts aren’t run during Package Restore either

Long story short, manipulating project content files is beyond the scope of Package Restore.  With the frequency at which I’ve been hearing about this misconception recently, I expect we’ll end up adding a feature to restore content files into projects at some point.  It looks like this issue on CodePlex is the best match if you’d like to upvote it.

PowerShell Scripts Aren’t Run

Wait a second!  Up in #5 above, did you say that install.ps1 PowerShell scripts aren’t run during Package Restore either‽

I sure did.  Misconception number two about Package Restore is that people expect it to run PowerShell scripts (specifically install.ps1) when restoring packages.

It won’t.

As stated above, Package Restore simply downloads and unpacks any missing packages in the /packages folder.  That is all.  Package Restore has never executed PowerShell scripts for packages after restoring them.  And it won’t.

Why Not?

Again, there are several reasons.  And again, here they are in terse form:

  1. Installing packages into a project is a design-time action, not a build-time action
  2. Install.ps1 scripts are meant to be run one time as a post-processing step of installing a package and aren’t meant to be run again
  3. Running an install.ps1 script requires that:
    1. The project be loaded in Visual Studio
    2. The NuGet PowerShell Console has been initialized, creating a PowerShell workspace primed with NuGet’s PowerShell module from the Visual Studio extension
    3. Visual Studio’s DTE object is exposed in the PowerShell workspace as well, so that the install.ps1 can access it to manipulate the project (or anything else in Visual Studio)

Besides the points made by bullets 1 and 2, the requirements for bullet 3 won’t be met either.  At least not from the nuget.exe restore command-line Package Restore.  The requirement would be met for Automatic Package Restore in Visual Studio, but again with bullets 1 and 2, there’s no need to execute the install.ps1 script again.

Now, there is a caveat around packages that contain an Init.ps1 script.  NuGet also doesn’t run any Init.ps1 scripts from packages after executing Package Restore in Visual Studio.  Instead, users have to re-open the solution to have the Init.ps1 scripts executed.  That is really just an oversight and a bug, and we plan to in NuGet 2.9.  Here’s the issue on CodePlex for that.

Download and Unpack

NuGet Package Restore is in place to simply download and unpack any missing packages.  The primary reason for this is assemblies that have references added to them.  Take a really basic package, NuGet.Core for example.  This package includes NuGet.Core.dll in the \lib folder.  When installing the NuGet.Core package into a project, the project will end up with a reference to something like ../packages/NuGet.Core.2.7.2/lib/net40-Client/NuGet.Core.dll.  But if you decide to omit the /packages folder from source control, that reference will fail at build time.  To alleviate that, NuGet Package Restore runs before build to download and unpack NuGet.Core into the /packages folder, putting NuGet.Core.dll in place before msbuild goes looking for it.

The end result is that your build-time reference needs are taken care of by Package Restore.  Project files are never modified.  Project contents are never touched or restored.  Install.ps1 scripts are never executed.  Your packages folder is simply rehydrated.  Nothing more.

Technorati Tags: ,

author: Jeff Handley | posted @ Monday, December 9, 2013 4:09 PM | Feedback (4)