How I got the unfair reputation of disliking semantics

When people today talk about semantic search, they usually mean vector embeddings, RAG pipelines, and all that fancy LLM jazz. But back in 2011, at Globo.com, we were into a very different kind of “semantics”: the Semantic Web.

And believe it or not, despite my supposed reputation as a semantics hater, my first encounter with SPARQL was love at first sight.

Falling in Love with SPARQL

Before I even joined TechTudo, I had already been introduced to SPARQL at Globo.com. At the time, many of Globo’s products were experimenting with organizing their entities in ontologies stored in Virtuoso.

For the uninitiated, SPARQL is like SQL’s eccentric cousin who reads philosophy books and talks about relationships. Data isn’t rows and columns. It’s triples: subject, predicate, object. And you can chain these triples into hierarchies that go up, down, sideways. It’s like the family tree of knowledge.

When I learned it, I thought: “Wow, this is beautiful. This is the future. I’m going to tell my grandchildren about this.”

But as in many relationships, the problems weren’t with SPARQL itself…

The Big Semantic Web Dream

The whole Virtuoso/SPARQL setup wasn’t there because TechTudo really needed it. No. It was part of Globo.com’s grand plan: connect all products, G1, GE, GShow, GloboVideos, into one giant semantic brain. Content would flow magically between them, like neurons firing in the collective consciousness of Globo.

In practice?

Well… politics happened. Business didn’t really want to share traffic. Everyone wanted to receive visitors, but no one wanted to give it away. So, in the end, the “semantic web of everything” was centralized, but the promised cross-domain benefits, sharing traffic between products, were never really used.

Meanwhile, we, humble TechTudo folks, were left dealing with the side effects of this dream.

The Return of the Big Oracle

Here’s the irony: just before Virtuoso, Globo.com had invested a ton of effort breaking down its big, scary Oracle monolith into smaller databases. This gave teams autonomy, agility, and sanity.

Then came Virtuoso. To be fair, the folks working on it, from the ontology design to the deployments, were incredibly talented and did an amazing job within the constraints. But no matter how good the people were, the centralized strategy itself was the problem: schema changes required massive coordination, overnight deployments, and left us with less autonomy.

I still have an email from that era coined with the phrase “the return of the big Oracle”. That was exactly what it felt like. Déjà vu, but with more RDF.

And since TechTudo was considered “the little sibling” compared to G1, GE, or GShow, guess whose schema requests always ended up at the bottom of the priority list? Yep. Ours.

Endless Debates About Printers and Android

Now, you might ask: “But did you really need to change the schema that often?” Oh yes.

At TechTudo we worked side by side with editors, and they constantly saw opportunities to improve navigation and SEO by tweaking hierarchies.

Examples:

Printers and Scanners were categories. But what about multifunction printers? Should they have their own specific category, or should entities like these belong to more than one category at the same time? Existential crisis in hardware form.
Android and iOS were operating systems. But many readers also thought Android was a type of phone. Technically wrong, but SEO doesn’t care about your feelings.

We had dozens of situations like this.

The problem? Every change required weeks of discussion because opportunities to actually implement changes were rare. It was like playing chess, but if you touched a piece, you had to wait two weeks to check the result.

The MySQL Rebellion

At some point, we thought: “Enough philosophy. Let’s get practical.”

So we prototyped a “diet version” of semantics using MySQL:

Hierarchies modeled with n-to-n tables. Entities and content synced asynchronously into other tables with all ancestors (basically, pre-computed tags).

Did we lose SPARQL’s expressive power? Absolutely. Did we care? Nope. Because in practice, all we really needed was:

Give me all content for this entity
Give me the related entities of this entity
Give some contents from the related entities

With MySQL, we could do this quickly, and without begging another team for permission.

And the best part: editors could experiment. Try a hierarchy change, see the results, undo if necessary. No more weeks of debates about printers’ identities. Just: test, learn, move on.

The vibe changed instantly. People got bolder, faster, happier. Autonomy is a hell of a drug.

The Evolution

Eventually, the Semantic Web team evolved their platform and moved part of it towards ElasticSearch. With multi-value keyword fields, they could implemented better some of what we hacked in MySQL.

Editing the schema was still a no-go, but for Globo’s major products, that wasn’t really a big issue. They didn’t need constant tweaks to their hierarchies the way TechTudo did. And hey, at least queries got faster. Baby steps.

Rumors of My “Hatred” for Semantics

Of course, the rumor mill started. “He doesn’t like semantics,” they said.

/assets/images/semantic.png

Totally unfair!

I loved SPARQL so much I even built a Ruby prototype (ActiveRecord-style) for it, and later a full proprietary Python library. I was a fanboy!

What I didn’t like was the centralized bottleneck, it was an autonomy problem.

SPARQL will always have a special place in my heart. But in TechTudo, the real win wasn’t about elegant queries. It was about giving editors and the team the courage to try, fail, and learn fast.

And yes… we did finally figure out what to do with printers.

tiago.rio.br