Buy the Model, Build the Meaning

TL;DR

Stop arguing build vs buy. The clean rule for mid-market manufacturing AI: do not pre-train your own models, and own the layer that holds what your data means. The model is a commodity that gets better while you sleep. The meaning is the moat. Call it the Tenant/Building rule: rent the tenant, own the building.


Build vs buy is the wrong question. It paralyzes buyers because it treats AI as one decision when it is really two: which model you run, and who owns the meaning underneath it. Even the vendors admit the binary is fake – Scale frames the winner as “buy the build” (Scale AI), and Dataiku says the real answer is “finding the right mix” (Dataiku). True, and useless to a board. So here is the mix, as a rule you can hand someone.

One definition first, because the argument rides on it. The layer that holds what your data means has a plain industry name: the semantic layer. It is the governed set of definitions – what “on-time delivery” actually means, how “scrap rate” is calculated, whether a part in your ERP (the system that runs orders and finance) is the same part in your MES (the system that tracks the floor) – that sits between raw data and your AI (RSM US). Atlan describes it as translating technical fields into agreed business definitions so a question resolves to one logic (Atlan). It is the same pattern Palantir proved at the high end, where it is called an ontology and serves as “a single source of truth for the organization, not just in terms of data, but also in terms of logic” (Palantir). You do not need Palantir’s price tag to build the same idea over your own ERP. That is the build half of the rule: own the semantic layer.

Why should you never pre-train your own model?

Because you cannot win that race, and you do not need to.

Pre-training a frontier model is a capital-intensive commodity business. The labs spend billions on runs you will never recoup on a plant floor, and the result gets cheaper and better every quarter while you sleep. That is something to rent, not build.

Fine-tuning – continuing an existing model’s training on your own documents so it speaks your vocabulary – is the more tempting trap, because it looks cheap. And to be fair, the running cost is low now: a small open-weight model and a LoRA adapter retrain in hours for the price of a weekend and a credit card. The cost is not the problem. The problem is that it does not compound. A fine-tuned model is a thing you then keep current as your data drifts, govern so it does not quietly degrade, and re-validate every time the base model underneath it improves. And it will improve – a better base model laps your tuned one next quarter, and you start over. The semantic layer you build instead compounds: you add coverage and keep it current, and that upkeep is documentation work, not a compute bill that resets with every new model. Spend your scarce attention on the asset that gains value, not the one that resets.

There is a real exception. If a proprietary model is your actual product – a defect-detection system that is the reason customers choose you – then a custom model is the differentiator, and the upkeep is just the cost of your core business. Dataiku makes the same call: build when you need control or to create differentiation a competitor cannot copy (Dataiku). Here is the test: if your AI capability is not why customers pick you, you are not the exception. Most mid-market manufacturers are not. And if your plant data cannot leave the building – ITAR, supplier NDAs, an OT network ops will never route to a cloud endpoint – the answer is usually a self-hosted open-weight model (Llama, Mistral, or similar). That is still buying the model. You just run it on your own iron.

Why must you own the semantic layer?

Because the model is interchangeable and your definitions are not.

Those definitions are your company. No vendor has them. No model knows them. If you do not own them, you are renting your own operating reality back from whoever stored it – and you learn the rent at renewal, when the cost of leaving is the cost of redefining your business from scratch.

The model, by contrast, is a tenant. Swap to a cheaper, smarter one next quarter and your meaning stays put (Strategy.com). That cuts both ways, and it is worth saying plainly: lean your whole stack on one frontier provider and you have moved the lock-in from your data to your model vendor. But the two locks are not the same size. Switching models costs you some prompt rework and a few capability gaps when a provider retires a version – annoying, bounded, weeks. Switching when a vendor owns your definitions costs you a rewrite of what every number means – quarters, and risky at every step. Owning the semantic layer is what keeps the expensive lock from ever closing: it lets you change providers, or drop in an open-weight model, without relearning your own numbers. You change tenants. You keep the deed.

What does owning it actually cost?

The first usable slice is weeks of focused effort with your operators – not a multi-year IT program, and not a weekend hack. Full coverage grows from there, one process at a time. That is the honest size of it.

The hard part is not technical. It is that the work has no obvious owner, so it becomes nobody’s job. dbt, which sells tooling for exactly this, says there is no single right owner and it depends on your structure and governance (dbt Labs). Skip it and the cost surfaces later, when an AI confidently reports the wrong margin to your board – which is an audit event, not an IT ticket. And every quarter you wait, a vendor is quietly defining those terms for you, inside their product, where you will pay to get them back.

How do you apply this when a vendor pitches you?

When the next vendor pitches a “complete AI solution,” ask one question: who owns the semantic layer when this contract ends? If the answer is them, you are not buying AI. You are leasing your own business definitions with a lock-in clause.

First move this week: take the three metrics that show up in every board deck, write down exactly how each is defined and calculated, and put it in a file you control and version – one no vendor touches. If two departments fight over what a number means, and they will, that fight is the semantic layer work. The file just forces the conversation. It is the first brick of the building.

Buy the model. It will be obsolete and cheaper next year, and that is good. Own the meaning. That is the asset that compounds.

Sources

  • Palantir, Models in the Ontology – https://www.palantir.com/docs/foundry/ontology/models
  • Scale AI, Build vs Buy – https://scale.com/guides/build-vs-buy
  • Dataiku, Build vs Buy for AI Agents – https://www.dataiku.com/stories/blog/build-vs-buy-for-ai-agents
  • Atlan, Semantic Layer – https://atlan.com/know/semantic-layer/
  • RSM US, Importance of the Semantic Layer for Trusted AI – https://rsmus.com/insights/services/digital-transformation/importance-of-semantic-layer-for-trusted-ai-solutions.html
  • Strategy.com, Vendor-Agnostic Semantic Layer – https://www.strategy.com/software/blog/how-to-choose-a-vendor-agnostic-semantic-layer-for-enterprise-data-stacks-in-2026
  • dbt Labs, Who Should Own the Semantic Layer? – https://www.getdbt.com/blog/semantic-layer-ownership