RESEARCH NOTE: Open Source Will Be Different in the Age Of AI

By Jason Andersen - July 3, 2024

Hugging Face is one example of a community-based AI initiative that retains the flavor of traditional open source projects while accommodating the market realities of the age of AI.

There seems to be growing discontent in the open source orthodoxy. We’ve seen some dramatic shifts such as Hashicorp’s licensing change, a migration away from CENTOS 7 due to its forced end of life, or even lingering questions about OpenAI’s commitment to openness. Events like these are prompting a lot of dialogue about where open source is headed, and in particular what open source could mean in the emerging AI market.

I think the answer is that open source is not going anywhere, but that market economics and the nature of AI technology will force open source to evolve. And while this could aggravate open source purists, taken in the aggregate this evolution is a very good thing for the technology industry.

Market Economics Is Now the Biggest Driver of Open Source Decisions

It is hard to underestimate the impact of capital on any market—and open source is no exception. Yet open source software has steadily become more difficult to monetize. In pre-cloud times, if 1% of your downloads were turning into revenue, you were doing great. When cloud providers started to host and support their own variants of open source projects, however, that yield rate went down because the cloud providers were not typically sharing download information with the owners of open source projects. This was also exacerbated by frequent forking of projects.

That download data was the life’s blood of open source companies because a download required a user to provide contact information and some level of context for the download. That quid pro quo helped sellers and marketers nurture the end user’s journey and build trust with the company, ideally leading eventually to some kind of purchase of a subscription or service. Sadly, the spirit of cooperation that had long animated open source projects gave way to the realities of market-based competition. You still had to share code in open source, but not user information.

As revenue became harder to generate for independent open source projects and their associated commercial entities, investor profiles changed. In the early days of a market there are many unknowns; in that setting, venture capital investors are often willing to take risks aimed at disrupting the status quo. However, as the market cools and companies either go public or transition to private equity ownership, the investor mandate shifts. The quest for disruptive innovation morphs into an insistence on operational excellence. This shift inevitably dictates more streamlined and efficient go-to-market plays. In the context of open source, better sales yield is a big reason why companies started to move to a Business Source License approach. BSL prohibits hosting the code or using it in production without vendor approval (i.e., in the form of direct monetization or some sort of partnership). So, while the code is still available as open source, how it is used is more strictly enforced.

A shift to a different type of license does not mean open source is ending. It just means that companies are becoming more pragmatic in their strategy relative to their aspirations, especially in the case of AI. For example, only companies with deep pockets can afford the go-to-market and R&D costs associated with building a general purpose large language model like Llama. Smaller firms, by contrast, must facilitate a more direct path to monetization to maximize the next investment round. Banking on viral adoption and a slower, higher-cost selling model translates directly to lower R&D investment from more risk-averse investors.

That said, the smartest small AI firms are using permissive open source licensing for their APIs and SDKs, lowering the bar for cross-company integration. The logic is simple and well established: Making it easier to cooperate with others helps drive more pipeline for your core product. I predict that we will see more of these hybridized licensing strategies moving forward to advance open source AI collaboration.

Open Source Must Adapt to Vertical Industry Knowledge

The most successful open source projects have been horizontal infrastructure solutions that spread easily across many industries and contexts. The big winners among these projects had two key ingredients. The first was a product category that could be disrupted. For example, Linux was initially a pricing disruption aimed at the high-end Unix market, and Kubernetes was a technology disruption that addressed scaling inefficiencies with virtualization management. The second ingredient was skilled contributors. The major contributors to successful open source projects were typically people who intimately understood the challenges that IT practitioners faced, and they had the skills to create solutions for those IT users. They could walk the walk, so to speak.

The challenge is that the biggest opportunities in AI require deep industry domain knowledge. Even a chatbot will need a good deal of business and industry context to help a customer. Vertically aligned companies understand that their domain knowledge is their greatest asset. That provides a huge incentive for them to create their own AI solutions internally, but much less incentive to share it in an open source way. So, while foundational technology such as general purpose LLMs and basic AI training principles are good in a horizontal sense, anything after that point will require different terms and conditions that respect intellectual know-how and trade secrets. Otherwise there will not be significant enough contributions to merit a disruptive solution (either via pricing or technology disruption).

What about retrieval augmented generation? Although RAG solves for protecting proprietary data, it may not be enough because of performance or poor internal data curation issues (for example, multiple overlapping company policies). I believe that new open source approaches and communities such as Hugging Face are taking good steps to address these issues.

We Need to See a Return to Open Source Cooperation between Vendors and Practitioners

By this point, AI is clearly a megatrend, and technology companies are anxious to get on board and show leadership. But the way that AI is currently being marketed is antithetical to previous cloud and open source methods. Cloud and open source were all about practical approaches to solving problems and smartly scaling them over time. By contrast, large language models are being presented as huge efforts with racks full of dedicated gear and teams of people training and tuning models and results.

These big-bang market plays bring along a series of cultural, organizational, and financial challenges. AI is a very human-centric technology; any organization embracing it also has to deal with ethics and in some cases regulatory requirements. It’s worth thinking about the contrast with the early days of open source commercialization. One thing open source did well was provide support and education within the development community. Widespread comfort with open source technology was a direct result of practitioners rolling up their sleeves and proving that the technology works.

At this stage, unfortunately, AI vendors are more focused on the marketing wins than the proof points—and it shows in their messaging. We are all seeing a lot of products without a lot of defensible value points or selection criteria. (To be fair, the vendors are not completely to blame here. Companies buying AI are aware that by saying too much now, or having a public failure down the road, they are courting horrible PR.)

I believe that if we let the technical people do their jobs using proven community sharing and contribution methods (albeit maybe under some new rules), we will all benefit. We just need some more time to evolve and socialize those standards of practice.

To close, I believe there is a huge opportunity for open source approaches and open source communities to thrive in the next wave of technologies. However, as self-regulating bodies, they also need to understand that the market is in a different place with different demands today. My advice is to seriously consider how to be more inclusive and sensitive to the business drivers at the developer community level. This is a case where running the old playbook will not guarantee victory.

Jason Andersen

+ posts

Jason Andersen is vice president and principal analyst covering application development platforms, technologies, and services. Jason brings over 25 years of experience in product management, product marketing, corporate strategy, sales, and business development at Red Hat, IBM, and Stratus to his work for MI&S and its advisory clients. Working both in the field and in the headquarters of some of the most innovative technology companies, Jason has a wealth of experience in building great products and driving their adoption across a broad spectrum of industries and use cases.