Why your academic code needs a software licence
Tags: programming, musings, research« Previous post: The meddling middlemen of academia — Next post: A round-up of topology-based papers at … »
Regardless of the scientific discipline you are working in, you probably write some code to perform your experiments. This post aims to inform you about the need for adding a licence to your software—a procedure that is often neglected, in particular since it is associated with legal considerations, which are perceived as being tedious by many1. Since I am not a lawyer, I will go easy on the legalese and just provide a quick overview.
(If you are in a hurry, feel free to skip to the conclusion and its actionable items.)
Caveat lector: your university might require you to consult with them prior to releasing software into the wild. Make sure to check whether you are actually allowed to release your code without anyone checking out the legalities of the situation. This article is mostly written from a machine learning perspective, where people tend to be very generous in releasing their code, but often forget to choose an appropriate licence.
Why are software licences useful?
Software licences serve multiple purposes. Most importantly, they specify what kind of interactions you want people to have with your code. For example, are you okay with your code—and thus potentially your name—being used to advertise a product or service of someone else? Are you okay with people using the software commercially? What happens if your code does not work as expected and produces incorrect results?
A licence helps you cover all of these aspects—and many more. But the primary function of a licence is not necessarily to protect you from bad things, it also defines what kind of community you want to build around your code. For example, how do you handle improvements of your code? How do you want to be attributed? How do you want others to be attributed? Can anyone contribute to your project?
Summary: A software licence clearly defines what people can and cannot do with your code. It helps you, your users, and your potential contributors.
Why is having no licence a bad thing?
You might think that, being the generous scientist that you are, that your code should be released with essentially a ‘no strings attached’ policy. That is commendable2, but the law, for once, is not on your side here: if you release code without a licence attached, the default interpretation in most—if not all—jurisdictions is that you retain all privileges. This might sound not too bad, but there are often unintended consequences. For example, choosealicense.com has the following things to say about having no licence (emphasis by me):
When you make a creative work (which includes code), the work is under exclusive copyright by default. Unless you include a license that specifies otherwise, nobody else can copy, distribute, or modify your work without being at risk of take-downs, shake-downs, or litigation. Once the work has other contributors (each a copyright holder), “nobody” starts including you.
This is bad! Suppose that you revisit a project that you collaborated on with multiple people. After a few years, you have the urge to make this into a commercial tool. If you have no licence, you technically have to negotiate with all of your original contributors, or even your previous employers (remember that you might change university affiliations a few times). If you fail to do this, you make your life harder and, in the worst case, you may be subject to litigation.
However, you are not only impeding your future possibilities if you do not specify a licence, you are also hurting the options of others: code without a licence can be removed without a whim, and technically, everyone who builds on that code has to ask for permission and is not directly allowed to redistribute it. Again, as a worst-case scenario, you might have to retract code because you never had the right to use it in the first place! Plus, companies tend to avoid code without a licence like a plague because they have legal departments that know about the dire consequences of using code without the proper legal framework3.
Summary: If you do not use a licence, you are only hurting yourself (or your future self) and the prospects of your project.
How do I licence my code, then?
After all these warnings about unintended consequences, it is time for some good news: in this age, adding a licence to your code is as easy as it gets. Here are the main steps:
- First, you have to pick an appropriate licence—no worries, we will talk about this in a second!
- You now create a file called
LICENSE4 in your main code repository and add your name to it.
- Optionally, you now add a text of the form ‘See
LICENSEfor details on how to use this code’ to your source code files. This is just a courtesy for your users; your selected licence still applies to all source files, but sometimes, adding such a disclaimer is a good reminder—in particular for large code bases.
- You publish the code. That’s it!
Sounds easy, right? In fact, it is so easy that GitHub even provides licence templates when setting up a repository. It does not even take a few minutes of your time—which is how it should be!
Which licence should I choose?
Now that you know how easy it is, which licence should you choose? For this, the website choosealicense.com is a treasure trove. Moreover, the website tldrlegal.com is also great in explaining the specifics of a licence in plain English. Please do not take my word for it and check all the licences I will subsequently mention!
Caveat lector: At this point, I have to stress that what comes next is my personal opinion and it is thus heavily biased towards a certain type of licence. I will try to back up my claims, though!
Avoid copyleft licences
As a first piece of advice, I would urge you to avoid so-called copyleft licences, like the GNU General Public Licence v3. I know that this is a controversial statement for some, but before you close this tab in anger, let me explain: a copyleft licence forces your users to require all derivative works—other projects, updates, and extensions of your code—to be released under the same licence as your original code. This sounds fair at first glance—after all, you made your code available for anyone else to be free to use, so others should do the same, right? While I agree with this sentiment, the GPL (and similar licences) make it harder for private companies to engage with your project. There are subtle legal issues that often require for-profit businesses not to engage with code like this, even if they wanted to!
Note that I am not claiming that there are no successful industry contributions to GPL projects! I am merely pointing out that this licence raises the bar quite considerably for other institutions to engage with your code. Especially when you develop tools for others to be used, as is very common in bioinformatics and machine learning, you want to make it easy for others to build on your work. Do not take my word for it, though: if you are not convinced, read this article on licencing software in bioinformatics by C. Titus Brown, as well as this article on the troubles with licencing, written by Lior Pachter.
Summary: Avoid copyleft licences if you can (but they are still a lot better than having no licence at all).
Philosophical sidenote (you may skip this): it is one of my core tenets that I do not want to force people into a certain kind of desirable behaviour—such as sharing their code—but rather I want to convince them. The formulation of copyleft appears to be too stringent for my taste: I can easily imagine situations in which abiding by this licence is problematic for the other party, even if they want to abide by it! Thus, I am not a big proponent of these licences any more. When I originally got involved with Linux, BSD, and the open source movement, I liked the GPL a lot, but as I started using BSD more and more, my stance changed.
Use a permissive licence instead
So, what licence should you choose for your academic code, then? I prefer the so-called permissive licences, which are often also referred to as ‘BSD-style’ licences. Permissive licences have minimal restrictions as to how your software can be used, but they provide protection for you and make it possible for others to attribute your work! Here is my preferred permissive licence, the Revised BSD 3-Clause Licence:
Copyright (c) YEAR AUTHOR, all rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
I like this licence because it permits others5 to use your software commercially, modify it, distribute it, and even provide a warranty for it, provided they include the copyright notice—in other words, they attribute your work—and the licence itself. Moreover, you cannot be held liable for any damages arising from the use of the code, and others cannot use your name to endorse any derived products (they can of course obtain permission from you). The last part is particularly interesting, as you never know how your software is going to be used.
There are variations of this licence, depending on the number of clauses that are being used. Other permissive licences include the MIT Licence, as well as the Apache Licence 2.0. The latter is slightly more complicated and requires users to be more specific concerning the changes they make to the software, but essentially, all of these licences permit commercial and non-commercial use. Picking any one of them is going to make everything easier for you and users of your project!
Software licences in the wild
The success of these permissive licences can easily be observed in the machine learning and data visualisation communities. Here are some examples of our most beloved projects and their respective licences:
- Keras: MIT Licence
- matplotlib: A BSD-style licence
- numpy: BSD 3-Clause Licence
- scikit-learn: BSD 3-Clause Licence
- seaborn: BSD 3-Clause Licence
- PyTorch: BSD 3-Clause Licence
- TensorFlow: Apache Licence 2.0
I think this establishes that permissive licences and vibrant communities can go hand in hand.
I hope I provided an interesting glimpse into the wonderful world of licences. Whether you read the whole article or skipped to the end, I want everyone to take away these three statements:
- Your project needs a licence. It helps you and it helps others.
- Any licence (regardless of whether you choose to use a copyleft licence like the GPL or a permissive BSD licence) is infinitely better than no licence at all.
- When in doubt, aim for permissive licences for academic projects. The BSD 3-Clause Licence is a good choice because it requires people to credit you (the favourite currency in academia!), while preventing them from claiming that you endorse their product.
I wish you all the best for your projects, no matter their size and scope. Until next time!
At least in this, law and mathematics are united: most people cannot fathom why one should be fascinated by the subject, even though, when you look more closely, it is really captivating. With apologies to my legally-educated friends, I like the analogy of thinking of a contract as ‘code that you run in your society’. In other words, contracts define how certain interactions between legal persons should work. ↩︎
And we will cover a licence that approaches this ideal rather well! ↩︎
Your university also has a legal department with lawyers that are very aware of these consequences. The pity is that almost no one thinks about consulting them! At this point, let me give a shout-out to the excellent Technology Transfer Office of ETH Zürich. Their employees are extremely well-versed in negotiations involving industry and academia, and they saved our bacon quite a few times now with their ability to navigate the murky waters of international copyright law. ↩︎
Unfortunately, the agreed-upon spelling is the American one here. Having been trained to write the Queen’s English, I often stumble over this. ↩︎
Remember that ‘others’ also includes future versions of yourself! ↩︎