Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage] [index] [] [@jschauma] [RSS]

From Company Closed to Open Source

IronManGood news, everybody! Open Source is everywhere! Companies are opening up their tools, making them available to everybody on the internet. GitHub encourages forking and pull requests, bug fixes flow in, companies end up looking good, engineers have something to show off and be proud of, and everybody wins. Right?

Well, kind of. But I'm afraid that it doesn't quite work like that. I believe that "Open Source" is often approached or executed in the wrong way. A lot of new open source software is no longer managed or maintained as a project; the act of making public the source code to the software is regarded as sufficient; reports of problems (in the code or otherwise) are responded to with "well, fix it".

Why do companies open source software? For the most part, I believe, it is to make their engineers happy. Virtually all good software engineers, system administrators (nowadays frequently called "Site Reliability Engineers", another rant in waiting), and developers are using Open Source software and are sympathetic to if not even enthusiastic about the ideals behind it. They all want to contribute in their own way. At the same time, having significant open source projects under one's belt is a clear reputation (and ego) boost: see Coderwall or GitHub's activity feed and trends as examples of open source contributions as resume building factors.

Dick Dastardly As a result, engineers are pushing for their work to be open sourced. Management then follows: the good managers do whatever makes their engineers (and possibly them -- some managers are still or were at some point equally enthusiastic engineers) happy; the others need a little more convincing, but eventually succumb to the idea that open sourcing their tools will bring in a wave of bug fixes and quickly improve the software.

However, and I find this particularly interesting, companies have come to realize that code they release to the public should actually be very, very clean. It should be well-written and reliable, because, being public, it reflects on the company. Why can't we write all of our software according to such standards from the beginning?

Back in July, a blog entry entitled " Open Source at Netflix " put this succinctly:

'[...] the peer pressure from "Social Coding" has driven engineers to make sure code is clean and well structured, documentation is useful and up to date. What we've learned is that a component may be "Good enough for running in production, but not good enough for Github".'

This statement "Good enough for production, not good enough for Github" illustrates how we tend to lie to ourselves about the quality of our software, praising a project as a big internal success and cornerstone of the infrastructure and only face reality when we consider letting outsiders take a look.

Yoda We write our own tools in ways that we would not accept from the outside; we put hacks into production that make you cringe while at the same time brushing aside concerns ("Oh, yeah, this is god awful and broken. We really should fix this some time. But that would take a lot of effort, so let's instead focus on this other, much shinier project."). Yet we criticize and ridicule open source packages that don't work out of the box, come with no documentation or make assumptions about their environment that do not hold in ours. "Pffft, stupid hfrob, clearly written by morons."

So the fact that the desire to open source a project might cause a company to clean up their code should be a Good Thing. And it is, when that happens. But unfortunately, this quality discrepancy also works the other way around and keeps companies from making available their code for fear of it reflecting poorly on them. Not only that, what happens is that a company may end up with a fork of their code base -- a clean one to be open sourced, and the one they are actually using. And therein lies a major headache: what is open sourced is often not what is actually executed.

To be sure, taking a software project that is intertwined with one's infrastructure and likely to have a number of integrations with code or systems that are not (now or ever) open sourced is a LOT of work. Interfaces have to be abstracted, site-specific code blocks or configuration assumptions removed etc. etc. As a result, many projects never actually get out into the open ("It would be just too difficult to remove the $company-specific bits.") or what is released is not actually useful to others.

There are a number of projects that are released by various companies that do not grow an active user community or that do not see actual use (much less contributions) by others simply for the fact that in order to use them, one would need to have an infrastructure near identical to that of the releasing company. Mind you, releasing them is still great, as it offers insights into how one might want to do things (or not), but a company is often their own primary or sole user.

Which projects work as open source? Which projects attract enough other users to lead to contributions, improvements, bug fixes and an overall thriving community? I would argue that it is those project that either have been developed from scratch as public applications. Anticipated peer review improves code quality. (Internal peer review is much weaker, as GroupThink kicks in.) Software that is written for a broad audience, aware that other people might use it in different, not anticipated ways is useful to others outside the company, and thus... finds use in the open community.

Clean, non-company-specific code is required for a successful open source project, but it's not sufficient. All the various factors I rambled on about regarding small system tools apply here as well: you need a simple user interface, accurate accompanying documentation, and decent packaging. For example, dumping over half a million lines of code onto GitHub without any kind of documentation as to what the system does, how to use it, etc. is not actually helping. (In fact, a cynic might speculate whether or not the release is the result of an engineer wishing to "take their code" with them as they prepare to leave the company.)

Angry Linus Finally, open sourcing code is not a one-time action, and, contrary to what I have observed in the last two or three years, does not relieve you from the responsibility to fix bugs, provide good documentation and respond to users. "It's open source -- if something about it bothers you, go fix it." is an approach to software ownership that raises the barrier to entry, to contributions, yet one that seems too easily be encouraged in the GitHub VCS maintenance model.

Yes, I actually do believe that owning software implies responsibility. Providing shoddy (or no) documentation means users will be unable to actually use your tools efficiently. Arguing "Use the source, Luke!" is disrespectful of other people's time -- yes, access to the source is great for debugging issues (or merely understanding the code if one wants to), but should not be a requirement to use the software. What's more, an attitude of "code or shut up" is (as I've argued previously) too easy an excuse used to dismiss bug reports or feature requests or to lazily release known buggy code ("if it bothers you, go fix it - you have the source"; see also this example of responding "pull requests welcome" to a report of an XSS).

This responsibility inherent in owning code does not mean that an open source developer has to implement any feature requested or even fix any bug reported no matter in what manner. Reporting a bug or submitting a code contribution to an open source project, when done right, is a time consuming and labor intensive process, as is integrating a patch or evaluating and responding to bug reports. Both sides should be aware of this and act accordingly.

But once a project has been adopted, the maintainers of the code base do in fact have the responsibility to investigate certain issues and address others in a timely and professional manner. If you provide an operating system or a crucial system library, for example, then, yes, I do think you owe it to your users to not break certain things and to fix others. This becomes particularly relevant when it relates to security updates.

(I've mentioned this before, but I believe that this mind set of code ownership is one of the differences, to paint with a particularly broad brush, between the BSD and Linux camps: BSD people tend to approach software as something they own and will continue to maintain, Linux people as something they toss out there and others can fix if they don't like it. Yes, this is a generalization.)

Having said all this, here's a list of suggestions for a company to consider before releasing their code:

  • Make sure the system is not specific to your particular infrastructure; this avoids you/your company being the sole user of your software and is best achieved when the software is initially designed, as refactoring a working code base is laborsome and easily leads to a public and a private fork in your organization. Note that this also helps you, as your infrastructure will change with time, and a component that can easily adapt will save you time and money in the long run.

  • Write all your code from beginning as if it was to be open sourced; this improves the code quality and forces you to think about dependencies, interfaces and use cases outside your ecosystem.

  • Write your code such that it is generally useful (within the problem domain).

  • Write up and release documentation with the code; include manual pages for all tools, a high-level explanation of what the system does and, where applicable, architecture documents. This helps you maintain and use your own project internally as your infrastructure grows as well.

  • Provide your software in at least one common package format; this helps your users actually integrate your software into their systems, and it helps you correctly identify all dependencies.

  • Provide a way to discuss the software with you; include a mailing list or at least contact addresses.

  • Accept that not all users are coders; when receiving inquiries, suggestions, bug reports or feature requests, don't dismiss them with a reference to the code and a choice offered to the requester to implement it or to go fornicate themselves

Charlie Sheen In short: abide by Kant's categorical imperative of software engineering: write and release your software in the same way that you want other projects to be written and released. It will make your code better, your engineers happier, your company more appealing. Everybody wins.

August 14th, 2012

[Writing (system) tools] [Index] [Integrating Duo 2FA with OpenVPN]