QuickReactions - Isomorphic Hello World with React and Node


While working through sample after sample for Node.js and React.js, I experienced a pattern that wasn’t very helpful. Instead of truly starting from scratch, the samples kept walking through step-by-step of cloning a working solution. They’d start with “Step 1: paste this fully-working code into this file” and “Step 2: paste this fully-working code into this other file.” I was having a hard time finding a breakdown of the concepts being applied.

I wanted to learn by starting truly from scratch and building the app up in logical, incremental steps. To accomplish the goal of learning this new material one concept at a time, I created a new project and then documented each new concept that was introduced in a giant README.md file. I then transformed the giant README into a 19-step tutorial web site using GitHub Pages.

If you are feeling overwhelmed trying to learn Node and React, you might benefit from this QuickReactions tutorial.

http://jeffhandley.github.io/QuickReactions/

Technorati Tags: ,,

author: Jeff Handley | posted @ Wednesday, May 20, 2015 10:23 AM | Feedback (1)

Following Passions (and Leaving Microsoft)


I decided to leave Microsoft; Friday, March 20th is my last day.

RIA

My family and I moved to Redmond almost 7 years ago so that I could join Microsoft.  After 13 years in the industry, it was my dream job: Creating a UI Framework that enterprise application developers would use for their web applications. The project was Alexandria, which became WCF RIA Services, and it helped developers use Silverlight for Line of Business applications.

As I blogged about when I moved out here, I had put together a 5-year plan for how to get a job at Microsoft building UI frameworks, but I ended up getting that job within just a few months. I was thrilled to work on RIA and have the opportunity to create software that became part of the .NET Framework and shipped to my mom’s computer. It was an honor to take what I learned building user interfaces for dozens of enterprise applications and create a framework that countless developers could benefit from.

With my experience on RIA, I got a taste of delivering frameworks and tools to large developer audiences, and that became a new direction for me.

Growing Scope

While working on RIA Services, I watched NuGet come into existence and I was immediately sold. While NuGet was still quite nascent, I started pitching to the RIA team that we should abandon our MSI and instead ship RIA as a collection of NuGet packages. When I became the dev lead for the project, it was one of the first new efforts I invested time into. I even blogged about my excitement around NuGet. NuGet became what I wanted to work on.

By chance, shortly after that blog post, a re-org happened. NuGet was going to become part of my group—and it didn’t have a dev lead! I jumped at the opportunity to become the project’s dev lead; in order to pick up ownership of NuGet I also needed to take ASP.NET Web Pages and Razor as well. Sold! Suddenly, I was the dev lead for WCF RIA Services, WCF for Silverlight, NuGet, www.nuget.org, ASP.NET Web Pages, Razor, and a couple of other small projects. And there were 6 developers on my team. I immediately started working on how the team could become dedicated to NuGet.

NuGet

When I became the dev lead for NuGet, version 1.5 had just shipped. We then shipped 1.6, 1.7, 1.8, 2.0, and several more releases leading up to NuGet 2.8. For over 2 years, we averaged 11 weeks between RTM releases, with an average of 85 issues addressed in each release. At the same time, we completely redesigned the www.nuget.org gallery, re-implemented it from the ground up to run in Azure on the latest ASP.NET MVC bits, and we did the work in the open on GitHub.

NuGet grew and grew. Our usage was doubling time and time again. The project matured from being a “toy” that was used only for ASP.NET projects into something that almost every project system in Visual Studio was benefiting from. I spent a great deal of my time selling NuGet to teams and groups around the company, gaining broader and deeper adoption. It was exciting to watch the tables turn as we gained more acceptance. Over time, teams were coming to us instead of the other way around. Visual Studio started fixing bugs that made NuGet better. NuGet had arrived.

Integration

It was inevitable—its users wanted NuGet to become more natural. They wanted deep integration with the project systems—not just macros over top of VS actions. They wanted integration with the project templates. And with the build system. With every aspect of the development lifecycle, NuGet should be there and be supported. NuGet needed to become part of the platform.

This is where we are today. NuGet is no longer a toy—it’s truly become a first-class aspect of how developers work on the Microsoft platform. There is still a lot of work to get done to accomplish the goals we’ve set, but I believe the direction is right and the project is on path to get there.

Rewarding Projects

When I recognized that NuGet was on path to become part of the platform, I started thinking about what would be next. What would be the next round of goals for the project? And secondarily, what would be the next round of goals for me? Don’t get me wrong, there is still a lot of work to be done for NuGet to succeed in these goals—the team and the project have plenty of room for improvement, but I started assuming we’d succeed in execution on those items. So I sought out what passions I wanted to follow as I reached my 20th anniversary in the field.

At Øredev 2014’s speaker’s dinner at City Hall in Malmö, I was talking with someone from Jayway about passions and what makes a project rewarding. She asked me what the most rewarding project was that I’d ever worked on. My knee-jerk reaction was to name NuGet. But I held back and really thought about the question. Was NuGet really it? Was it RIA? Was it the web-based replacement for Ohio’s student information system mainframe? That project actually was more rewarding than NuGet! Was it Statsworld—the web-based fantasy football app that competed directly with CBS Sportsline? What about when I created a web-based system to run a cooking school for Proctor and Gamble? Those were great too! And then I kept going back through my career until I decided what my most rewarding project really was—and it is surprising.

Impact on Individuals

My very first professional software project was in high school. I created a DOS-based CRM system for a math teacher’s husband’s lawn care company. Imagine QuickBooks, but running in DOS. I sold it to him with a bound user manual and a custom printer driver for his dot-matrix printer—for $100. It even had mouse support using a library I created in QuickBasic. I think I made about $0.50/hour on that project and built it on my mom’s computer at her office, working nights after the office had closed.

When we first met, he asked if I could create something to print invoices so that he didn’t have to type each one by hand. I most certainly could. But I started asking him questions about what other routine tasks he had and I asserted that I could automate a great deal of his routine administrative work. When I delivered this software to him and trained him on it, his eyes lit up. A few weeks later when I was delivering a new round of floppy disks with some bug fixes, he told me I saved him about 40 hours per week.

That $100 DOS-based invoicing system for a self-employed lawn care professional is the most rewarding project of my career. That was my answer at Øredev and I knew then I needed to think more seriously about what was next for me.

Following Passions

Looking back on my career, I’ve always had passion for interviewing business owners and employees and finding ways to simplify and automate their administrative tasks. In fact, after I completed that lawn care project, I dreamed of owning my own software company—it even had a (horrible) name: HANDLinc. Computer Programming. My junior year in High School, I was telling people that when I grew up I wanted to create software to help other people run their businesses. In 2000, I co-founded WeDoWebStuff.com and did just that. But somewhere between then and now, I lost sight of those objectives and found myself working on frameworks and tools for developers.

I have decided I want to return to building software for business owners and employees. I want to concentrate on user interfaces that simplify administrative tasks that cannot (yet) be automated. I want to work with non-developers and make their lives better and less frustrating—to make computers work for them instead of the other way around.

Concur

Friday, March 20, 2015 is my last day at Microsoft and my last day working on NuGet.

I start at Concur on Monday, March 23, 2015. I will be following my passions and I am very excited!

FAQ

  1. Are you going to stay involved in NuGet?
    • I don’t think so. I’m going to be focused on returning to a different kind of work—I have a lot to learn and remember.
  2. Who is taking over NuGet?
    • Yishai Galatzer is the new Engineering Manager for the NuGet team at Microsoft
  3. Who should I connect with to talk about NuGet?
  4. Are you moving?
    • Nope. I’ll be working at Concur’s headquarters in downtown Bellevue, WA.

If you have other questions, feel free to reach out to me here or on Twitter (@jeffhandley).

author: Jeff Handley | posted @ Thursday, March 19, 2015 8:36 PM | Feedback (7)

Adaptive Batch Sizes for Backend Processing


Most business systems include some form of backend processing. This could be report generation, data transformations, credit card processing, payment auditing, or countless other scenarios. It’s typical for these systems to pull records out of a queue, perform the necessary processing, and then move on to the next record. When possible, these systems are engineered to process more than one record at a time, reducing overhead and increasing efficiency. Each time a batch processing system is created though, we face a difficult question.

What is the best batch size?

This question is always hard to answer because we know that our development environment will differ from the production environment. To combat this problem, most developers define an environment variable or configuration setting that will control the batch size, and then hard-code a default value if the setting is not supplied. This provides a feeling of comfort that we can change the setting in production without having to update the code. But this approach falls short in many ways.

NuGet Package Statistics

NuGet.org creates records every time a package is downloaded—this happens about 750,000 times per day or 8.5 times per second. The records are stored in the production database in a denormalized table where the raw values can easily be inserted at that pace. Then twice each day, we produce updated package download reports for every package with download activity since the last time its report was generated.

To generate these package download reports, we have backend processes that aggregate total download numbers, replicate the records into a warehouse database, and then purge records that are at least 7 days old and that have already been replicated. Each of these processes works against batches of records; choosing batch sizes for each of them was difficult.

Throughput Factors

When trying to select a batch size for each of these processes, we realized that there are lots of factors that come into play. Here are the variables that we found to have significant impact on throughput:

  1. Scale settings for our production database (SQL Azure)
  2. Scale settings for our warehouse database (SQL Azure)
  3. Virtual Machine specifications on our backend processing server (Azure VM)
  4. Current load on the production database
  5. Current load on the warehouse database
  6. Current load on the backend processing server (it performs lots of other backend jobs at the same time)
  7. Index fragmentation in the production database
  8. Index fragmentation in the warehouse database
  9. Number of records in the queue
  10. Network latency

Each time any of these factors changed, the previous choice we’d made for our batch sizes become stale. Every once in a while, a batch would fail, cause an error, and raise an operations alert. We would then file a bug: “Stats Replicator cannot process the current batch size without timing out.” There are two obvious fixes for the bug:

  1. Increase the timeout
  2. Reduce the batch size

Either of these “fixes” would get the job unstuck, but then it’s just a matter of time before the change is stale.

The Edge of Failure

Batch processing can be more efficient because it reduces overhead. There’s startup/shutdown time required for each iteration of the process. When you only pay the startup/shutdown cost once but process thousands of records, the savings can be significant. The bigger the batch, the more we save on overhead. But there’s usually a breaking point where giant batch sizes lead to failure. Finding the largest batch size that can be successfully processed often yields the best performance.

To make the backend processes for NuGet.org as efficient as possible at all times, I created an approach that discovers this breaking point and then automatically adapts batch sizes to achieve the best throughput attainable within the current environment.

Defining Batch Size Ranges

Instead of defining a single batch size setting to be used, the new approach uses a pair of parameters to specify the minimum and maximum batch sizes. These batch sizes aren’t guesses, they are objective numbers with meaning.

Minimum Batch Size

The minimum batch size is truly a minimum. If the system fails to process a batch of this size, it is considered an error and the process will crash. This will lead to an operations alert to inform the team that something is wrong.

Maximum Batch Size

The maximum batch size is the max size that we would ever want to be processed at one time. This number can be selected based on the scenario and it should take into account issues like debugging when a batch encounters a bug. But this number should be as large as you’re comfortable with—don’t worry about what the system will be “capable of” handling—because all of the factors above affect the capability. If you scale your server up significantly, a previously unfathomable batch size may become not only possible, but preferable.

Sampling and Adapting

With a batch size range provided, we can now take samples of different batch sizes. This sampling will produce two important pieces of data:

  1. The edge of failure, where the batch succeeds but larger batch sizes fail (generally by exceeding a timeout period)
  2. The throughput measured for each sampled batch size, in terms of records per second

To accomplish the sampling, we take the following approach:

  1. Process the minimum batch size and record the throughput (records/second)
  2. Incrementally increase the batch size toward the maximum batch size, stepping by 10%

    batchSize = minBatchSize + ((maxBatchSize - minBatchSize) / 10 * samplesTaken);

  3. Record the throughput for each sample

    batchTimes[perSecond] = batchSize;

  4. If a batch size times out, record its throughput as Int32.MaxValue and decrease the maximum batch size by 33%

    maxBatchSize = maxBatchSize * 2 / 3;

Once we’ve finished taking our 11 samples (yes, 11, because fenceposts), we then use the sampling data to begin adapting our batch sizes. Each time we’re ready to process another batch, we calculate the next batch size to use. This calculation aims to find the best possible batch size, but we don’t simply want to choose the best batch size we’ve seen so far because there’s usually a batch size better than what we’ve already seen. Instead, we select the best 25% of our batches and then use the average batch size across them.

var bestBatches = batchTimes.OrderByDescending(b => b.Key).Take(batchTimes.Count / 4);
var nextBatchSize = (int)bestBatches.Select(b=> b.Value).Average();

We will then use this size to process the next batch. We’ll record its throughput and add it into our samples. As we continue to process more batches, we’ll have a larger pool of sample values to select our 25% best batches from, and we’ll be averaging out more batch sizes. But because previous batch sizes were selected based on the averages in the first place, the result is zeroing in on the batch size that yields the best throughput.

Examining the Numbers

Let’s take a look at how this can play out.

Configuration

  • Min Batch Size: 100
  • Max Batch Size: 10000
  • Timeout Period: 30 seconds

Initial Sampling

  1. Batch: 100; Time: 1 sec; Pace: 100/sec
  2. Batch: 1090; Time: 9 sec; Pace: 121/sec
  3. Batch: 2080; Time: 14 sec; Pace: 149/sec
  4. Batch: 3070; Time: 19 sec; Pace: 162/sec
  5. Batch: 4060; Time: 26 sec; Pace: 156/sec
  6. Batch: 5040; Time: TIMEOUT (Int32.MaxValue). Max set to 10000 * 2 /3 = 6667
  7. Batch: 4042; Time: 25 sec; Pace: 161/sec
  8. Batch: 4699; Time: 29 sec; Pace: 162/sec
  9. Batch: 5356; Time: TIMEOUT (Int32.MaxValue). Max set to 6667 * 2 / 3 = 4445
  10. Batch: 4015; Time: 26 sec; Pace: 154/sec
  11. Batch: 4445; Time: 27 sec; Pace: 165/sec

Adapting

After taking these 11 samples, we’ve learned that we can’t seem to get past ~5000 records in a batch without timing out; the maximum successful batch was 4699 at 29 seconds (162/sec). But we also see that within the timeout period, larger batches are providing better throughput than smaller batches. The system will now automatically adapt to use this data.

The samples we've taken can be ordered like this:

  1. 4445 (165/sec)
  2. 4699 (162/sec)
  3. 3070 (162/sec)
  4. 4042 (161/sec)
  5. 4060 (156/sec)
  6. 4015 (154/sec)
  7. 2080 (149/sec)
  8. 1090 (121/sec)
  9. 100 (100/sec)
  10. 5040 (Int32.MaxValue/sec)
  11. 5356 (Int32.MaxValue/sec)

Considering the best 25% of these values (that will be the top 3), we calculate the average of the batch sizes to be 4071. That will be the next batch size. We’ll time that batch as well, and put its data into the sample set.

As more batches are executed, we’ll see performance fluctuate, batch sizes vary a bit, but ultimately narrow down to a small deviation. After around 100 iterations, the value becomes relatively static. So the next step is to guard against circumstances changing and our data becoming stale.

Periodic Resets

After around 100 iterations, we lose some of our ability to adapt. Even if the times start to get very bad for the batch size we’re zeroing in on, there’s too much data indicating that batch size should be efficient. The easiest way to combat this problem is to perform periodic resets. After 100 iterations, simply reset all sample data and start fresh—take 11 new samples and then run 89 more iterations afterward, adapting anew.

While this reset can lead to a few inefficient batches, it’s an important part of what makes the system fully reliable. If load on the production system or any of the other throughput factors changes, it won’t be long before we reset and discover that we need to change our target range.

The Code

This approach is in use within a few of our backend processes around package statistics. The most straight-forward example is the job that finds package statistics from the production database that have already been replicated over to the warehouse and can now be purged from the production database.

Interesting Methods

GetNextBatchSize

RecordSuccessfulBatchTime

RecordFailedBatchSize

PurgeCore

Benefits

The biggest benefit I've seen from this approach is that our production system stays alive and efficient all the time. We used to have to tweak the batch sizes pretty regularly. And when our statistics processing fell behind, it could take a long time to catch up because our batch sizes were conservative. Now, the batch sizes can get more aggressive automatically, while ensuring we avoid timeouts.

Overall, these processes are now much more hands-off. If we need to increase throughput, we can scale a server up and the process will automatically take advantage of the improvement and use bigger batch sizes if that yields better results. But if the system is under load, the process will automatically back off if smaller batch sizes are proving to run at a steady pace.

Technorati Tags: ,

author: Jeff Handley | posted @ Tuesday, December 16, 2014 11:49 PM | Feedback (0)

Evolving NuGet's Code at Øredev 2014


I had the pleasure of attending Øredev 2014 and presenting two sessions about evolving NuGet's code. It was my first time attending the conference and it was a terrific experience.

My sessions were essentially 3-year retrospectives on NuGet's code, both server-side and client-side. I talked through how we built the projects, the initial goals and principles, what we've learned, and what our new principles are.

If you work in a monolithic codebase that you feel is hard to maintain and add features to, and you have that strong desire to throw it all away and start over, then you might be able to relate to the stories. We found, as many teams do, that there's never a good way to start over, and you have to find creative ways to replace subsystems of the giant beast. The sessions tell the story of how we've been doing that with NuGet.

Evolving the NuGet.org Architecture

Video

EVOLVING THE NUGET.ORG ARCHITECTURE from Øredev Conference on Vimeo.

Slides

NuGet 3.0 - Transitioning from OData to JSON-LD

Video

NuGet 3.0 – Transitioning from OData to JSON-LD from Øredev Conference on Vimeo.

Slides

author: Jeff Handley | posted @ Wednesday, December 10, 2014 12:00 AM | Feedback (0)

Volunteering with Cub Scouts


I've volunteered with the Cub Scouts for the last 5 years and it's been a great experience. As I'm winding down my role in the pack, I was asked to write a testimonial about how rewarding it has been to be involved. Here it is.

Spending my days working in an office on a computer, I generally declare that I'm not very handy. As my sons showed interest in building and fixing things, I would often tell them that I was not capable of doing it. But after 5 years of volunteering with Pack 561, I've learned that in fact, I am handy, and I can build and fix things!

You see, as adults, we tend to know our boundaries; we know what we can and cannot do. But our children are still learning their boundaries. And Cub Scouts are taught that in fact, there are no boundaries--you can do whatever you're interested in! If you want to learn how to whittle, you can! If you want to learn how to build a Derby car, you can! If you want to learn how to identify trees, birds, or animal poop, you can! You can tour a TV station. You can meet and interview the mayor. You can experience what it's like to sit in a fire truck or inside a jail cell. You can learn a little bit about a lot of things so that you can discover what you want to learn more about.

When I first started volunteering with Pack 561, I faced some boundaries. How could I possibly teach children how to build a Derby car when I've never built one myself? How could I possibly lead them on a trek at the Arboretum, identifying birds, when I can't tell the difference between a cardinal and a robin? I can't do these things and I'm too old to learn. But when I saw first-hand the eagerness to learn that our Scouts have, it rubbed off on me. I can learn how to build a Derby car! I can learn to identify birds! I can learn all kinds of things!

I may not have been in Scouts when I was a boy, but I've spent the last 5 years volunteering with the Scouts, and I've learned so much. But the most important thing I've learned is that I don't have as many boundaries as I thought I had, and I'm not too old to learn about all of the exciting topics that my sons want to learn about themselves. And just today, I worked with my sons to to take apart and fix two of their toys that have been broken for years. I used to tell them that I didn't know how to build and fix things, but today I proved to them that life isn't about what you know, it's about what you are willing to learn.

Jeff Handley

Den Leader, Den 2 (2009-2015)
Den Leader, Den 4 (2011-2014)
Derby Car Workshop Host (2012-2014)
Webmaster (2013-2014)
Honorary Scout (2009-2015)

author: Jeff Handley | posted @ Friday, November 28, 2014 10:56 PM | Feedback (0)