David J. Paola

Generalization vs specialization in software

In societies, there are always generalists and specialists. Generalists can do lots of different things with a mediocre level of quality, sometimes surprisingly well, while specialists utterly excel at one particular thing.

Suppose we had a simple distributed system that accomplishes the completion of different kinds of jobs. The way we usually build these is to have one component for each kind of job, or one component for each stage a job might be currently “in”. This has many benefits, one of which is handling load – it’s horizontally scalable, so if you need more “workers” for a specific kind of task, you can just spawn new components of that same kind.

This is akin to specialization – each component does one thing, and it does one thing well. Simple, and easy to debug.

Would there be any advantage to a system where some of the components are “generalists”? Meaning, they can do everything more quickly but not necessarily as well? Perhaps there is a scenario in which network latency degrades, or there is congestion of some kind, and to compensate, you need to get the job done more quickly but with less fidelity or something.

You could then respond to any intermittent conditions by throwing more work to the “generalist” components, until the degrading condition is resolved and all the specialized workers can be fully utilized again.

(This sort of assumes that the specialized components somehow require “more” of something than the general ones, which may or may not be true. More CPU power, more time, more memory or bandwidth.) Anytime any of these is degraded, bring in the generalists.