pen source software has been extremely successful at the technology level. Apache, Mozilla, Linux, Perl, Python?the list of stable and robust open source software is impressive and growing. So why is open source still considered fringe?
Let’s start with the poor discoverability of open source tools. How do you find the right tool for the job? More importantly, if you stumbled across the right tool would you be able to recognize it based on the information on the Web site? Take a random project on Sourceforge: Is it easy to recognize what the project does? Where it is going? What problem it was meant to solve?
These are the introductory questions that engineers ask when they first look at a project. Unfortunately, quality introductory documentation is what most open source projects lack. The API and user documentation that most projects have is largely ignored until engineers start using the tool.
The State of Open Source Introductory Documentation
I wanted to do an unscientific study, a survey, of the quality of introductory documentation for open source projects. To do this I took the top 20 entries in the ‘Development tools’ category of Sourceforge and analyzed their Web sites. The ground rules were fairly simple: The documentation should be located on the central site and should be able to be reached in no fewer than five clicks from the home page. These rules had little impact on my testing. I found that if documentation was available, it was centrally located and easy to find.
The focus of my analysis was introductory documentation. This is the type of documentation that you read before you start using the tool, including the introductory text on the home page of the site, the FAQ, and use scenarios or case studies. The survey did not include user’s guides or API documentation.
Having used the top 20 projects, as opposed to a random sampling, you would expect my research to favor more heavily documented projects. This is borne out by the fact that all of the projects sampled were documented to some degree. However, after assessing the quality of the introductory documentation it was clear that there is a lot of room for improvement.
All of the sites had some brief description of what function the tool performed. Only one in every 10 projects had a clear and concise statement of what problem the tool was meant to solve. Almost all provided a statement of the function of the tool, but only one of the 20 projects I looked at used any graphics to describe the function.
Only half of the projects had FAQs. Of those less than half again answered the simple question of what the tool was meant to do. Is the FAQ only intended for engineers already using the tool?
Half of the projects had some type of tutorial or sample code. Of those only a third had documentation that provided scenarios for when the tool was useful and in what configuration. Information about the hardware or system requirements for the software was given by only 15 percent of the projects.
Twenty percent of the sites used esoteric jargon or acronyms without definitions. This is assuming a great deal from the reader. Can we really assume that every reader can understand JXTA from the acronym alone?
The vast majority of the project sites used the front page to show the change log of the tool. Certainly this provides benefit to the engineer who is already using the tool, but it provides no benefit at all to someone interested in an introduction to the tool.
Will Documentation Help?
It’s one thing to prove that the documentation we currently have is poor; it would be much better to also prove that documentation would significantly improve adoption rates. Unfortunately, I have no direct data to support that. However, there are certain trends that clearly indicate, at least to me, that better documentation directly improves adoption rates.
Technology adoption is a well-understood trend. It starts with early adopters who tolerate the bugs and the bad interface to get access to leading edge tools. To move from early adopters to ubiquitous use involves making the tool easy to use and providing comprehensive and accurate documentation. It also involves communicating the benefits experienced by the early adopters to the potential users. This is the introductory information that is frequently lacking in open source projects.
We can also look at the world of commercial software. Software marketing professionals don’t start with what the tool does. They start with customer empathy by assuring the consumer that the vendor understands their problems. Then they show how their product provides an appropriate solution. “Here is your number one problem; this is how we fix it.” It’s a simple and effective approach and it’s not patented. Anyone can use it.
How to Write Documentation that Will Lead to New Users
If our intent is to encourage people to use these tools and to contribute their time to the development of these tools, we need to spend more energy on introductory documentation. Engineers who look at Sourceforge are looking either for solutions to problems or are looking for interesting tools. In either case the Web site needs to convey the important details in a short form to enable the reader to gauge their level of interest. Obscurity in documentation benefits no one as it annoys potential users and non-users alike.
There are five key categories that you need to think about when introducing potential users to a project:
1. The top five (or so) problems the tool was meant to solve
What a tool does is only half the story. Tools exist for a reason, to solve problems. How to get a nail into two pieces of wood is a problem. The solution is a hammer. It’s difficult to explain a hammer without mentioning wood because the solution is directly related to the problem. Technology is just like that.
It’s so easy to get caught up in the how of technology that we often forget about the why. The why will tell your reader whether your tool will fix their problem directly, as a side effect, or not at all.
Using XSLT as an example we can phrase a problem statement this way:
“Procedural code that transforms XML into HTML is difficult to write and maintain. XSLT allows you to specify transformations of XML data into XML output formats (including HTML) in a simple, declarative manner.”
From this statement an engineer should be able to understand why XSLT was developed and how XSLT is doing its job. Understanding both the why and the how lets engineers know what falls in and out of the scope of the tool. When an engineer understands the scope of a tool they will be able to judge whether they can use the tool for their project and if they want to contribute to the development of the tool.
2. The standard usage patterns used to solve the top five problems
For each of the problem statements we should also be able to create a usage pattern that matches. This usage pattern explains at a reasonable level of detail the mechanics of the tool.
Using the first XSLT statement as a platform we phrase a usage pattern this way:
“The XSLT transform engine takes as input an XML stream that contains the data in addition to an XSLT style sheet that contains the transformation rules. The engine applies the style sheet to the data and creates an output stream. In this case the output stream is HTML.”
With this in mind an engineer can understand the proper and intended use of the tool. Successful tools are always developed with a use case in mind. Taking the simple case of a hammer and a nail, the proper use of a hammer is to hit the nail with the striking face of the hammer. You could use the side of a hammer to drive nails but you would not be getting the benefits the design of the hammer provides. The same is true with software engineering tools, if you use them the right way you will get the benefit of all of the design and testing work. If not then the results can be unexpected.
3. The design parameters of the tool
The documentation of any database engine will tell you that the number of fields in a table is unlimited. But that’s not really true. Most database vendors designed their systems to handle up to about 100 fields and the test cases usually hang at around 20 to 50 fields. With this in mind you can feel confident when your tables are small, and you can make sure to test the system when you have tables with an unusually large number of fields.
How this information is represented is highly dependent on the nature of the tool. Taking XSLT as an example, we may express the average size of the input and output test case files as well as the average number of transformations.
Marketing brochures never indicate the design parameters of a tool, because the vendors fear losing your business if you think the tool might under-perform with your requirements. Unfortunately these parameters often decide the success or failure of a deployment. In an open source world we don’t need to be so profit-driven, so we can avoid the inevitable support headaches that come when people push code too far by being up-front with the performance characteristics of our tools.
4. The environment the tool was developed on and is tested on
Software needs to work in the real world. Knowing what environment the tool is written on and tested on is a big clue to where the tool will be most comfortable. Large projects like Mozilla have teams of people testing the code on multiple platforms. For those types of projects this information isn’t critical. For smaller-scale projects with just a few contributors readers need to know what the code is being written on and tested with.
Designing a tool to be database independent is not the same as having automated tests that run every build of the code against each database for a thorough check. If the code is developed primarily on MySQL you can only rely on the code working on MySQL.
To ensure that the code works in the most reliable manner possible it is extremely important to know what platform was used to develop and test the open source software. Give your users the best chance to get the most out of your tool by giving them this vital information.
If a picture is worth a thousand words why don’t we use more of them? Only one of the 20 Web sites I looked at made any use of graphics to explain the structure, architecture, or usage of the development tool.
Even the most basic graphics are extremely compelling. The front page of UML2EJB contains a single graphic that describes the complete workflow cycle between the incoming UML and the EJB beans that are at the end of the cycle.
This is a simple graphic, but it speaks volumes. An engineer can quickly see that if they have XML and they want HTML that XSLT can provide a solution.
Documenting these basics is not only valuable for potential users of the tool, it’s also valuable for the implementing engineers. Software engineering is not a discipline that favors ambiguity. Stating clearly what a tool does and does not do provides a method for judging what new features should be included and what should be left out. If it’s difficult to nail down what a tool should does it is a potential sign of weakness in the requirements, architecture, or design of the tool.
Open source software is arguably more stable and reliable than its closed source equivalent. We need to build on that technical success by fixing open source software issues with documentation. Writing reasonable and useful documentation is not difficult. It entails stepping into the shoes of your reader, introducing them to the basics of the software, addressing their concerns and doing a little self-promotion. For good examples all we need to do is look at the documentation for successful commercial software.
Technical excellence is not everything. The technically superior tool can be outpaced in its adoption by an inferior tool with better support and documentation. To ensure the long-term success of open source we need to spend a considerable amount of our development time on both in-depth and introductory documentation.