From Strings to Things: A Quick Primer on Semantic Search
In May of 2012, Google rolled out its Knowledge Graph, an AI-like semantic search engine that would forever shift the search paradigm by focusing on “things not strings.” These three simple words heralded a profound evolution in search, taking it from a static system that understood search queries as groups of keyword “strings” to a more dynamic, context based system that could recognize and understand references to actual “things,” i.e. ideas or entities.
The dictionary defines the term entity as, “A thing with a distinct and independent existence.” This reminds me of the famous maxim of 16th century philosopher and mathematician Rene Descartes, Cogito Ergo Sum, “I think therefore I am,” or “I think therefore I exist.” A simple yet profound phrase, Cogito Ergo Sum reflects the distinguishing characteristic elevating human evolution beyond mere animal instinct to higher-order self-awareness: the ability to understand that we actually exist. This fundamental notion has paved the way for us to rationalize and contextualize, to mentally classify the various objects and ideas we encounter in our daily existence and assign meaning to them.
A Smarter Search Engine
With Knowledge Graph, Google has sought to create a similar, higher-order Artificial Intelligence (AI), to imbue its search engine with the essentially humanlike ability to think semantically, or to derive contextual meaning from different words and symbols. Google showed off its AI chops in 2012, when the company’s secret Project X Lab was able to “teach” an artificial neural network of 16,000 interconnected processors to recognize “cat” simply by watching 10 million random YouTube thumbnails over the course of three days (not recommended at home). Deep learning algorithms such as the one powering Google’s cat experiment serve as the technological underpinning of Knowledge Graph and other semantic search engines.
This advancement, and others like it, is ushering in a dramatic paradigm shift in Search that is rapidly overturning the accepted conventions of SEO. Thanks to faster processing capabilities and advancements in machine learning, Search is finally moving away from static keyword strings, which have relied on the link graph and keyword mentions to discern the relevance and ranking of content, to a brave new world of semantic understanding and higher-order entity recognition, which enables the search engine to recognize user intent - to understand why a user is asking something. From here it is a short leap to anticipatory search – once a search engine is able to consistently recognize (and to some extent understand) why you are asking something, it can begin anticipating what you will ask next.
To put it plainly, search engines are beginning to think like humans.
Today’s evolutionary search environment is putting keywords on notice. In a recent post for Search Engine Land, Paul Bruemmer of PB Communications interviewed Barbara Starr, a semantic strategist with some serious semantic search bona fides (she worked on the HPKB project, a DARPA research program to advance the technology of how computers acquire, represent and manipulate knowledge). When Paul asked Barbara why she thought we’ve been using keywords in search for so long, she replied (in part), “…keyword-only queries will ultimately virtually die out. They existed because it was a pain to type the entire query out in full, and there was no effective technology for voice recognition, touch screens, etc. However, at this point, search engines prefer full sentences or meaningful phrases as they give more context and information about user intent in a query…”
In the interview, Starr also pointed out how Moore’s Law fits into the equation- how keywords were a useful workaround in an era of limited computing power, but exponentially faster processing speeds and storage capacities are now enabling us to take a more sophisticated approach to Search. For example, a user can now easily execute a voice-activated search query that taps into his or her search history and current geo-location to provide highly contextualized search results.
The Importance of Being an Entity
For semantic search to be effective, it is reliant on the culling and organization of huge amounts of data, which is where Google’s Knowledge Graph fits in. Having already mapped over 20 billion facts about the relationship between various objects, Knowledge Graph continues to gather more semantic data in its ongoing quest to think like a human. To accomplish this mighty feat, Google’s AI search engine has necessarily shifted its focus away from keyword strings to entity recognition. As Paul Bruemmer points out in the Search Engine Land article referenced earlier, “to optimize websites for search in the future, SEOs will need to create relevant, machine-recognizable “entities” on webpages that answer well-refined, focused or narrowed queries.”
Imagine these entities as points on a very large and detailed map, crisscrossing with billions of links showing the connections between them all. This is essentially what your brain is doing every day, continually adding semantic information onto its own knowledge graph.
Search engines, though, need a little help. Instead of optimizing for keyword strings, webmasters and SEOs can use on-page schema markup (small bits of HTML code, or tags) to help semantic search engines like Knowledge Graph “understand” the things or entities referenced on specific web pages. Using this kind of structured data hierarchy helps search engines understand context, and thus form semantic connections, between keyword strings and real-world things.
By now you can probably see how using structured data hierarchy such as schema markup can benefit end users, as it helps search engines provide more relevant answers to end-user search queries. It’s also useful for businesses, making it easier for search engines to recognize the “relevance” of a company’s web content and match it to specific queries.
For example, if I wanted Knowledge graph and other semantic search engines to really recognize this blog as a discrete “entity” unto itself, I could go through the bother of executing a full schema markup for “blog,” which I must say is pretty darn comprehensive. By the end of it, the search engines would have a more three-dimensional understanding of this blog than I have of myself (which is a bit sad, if you think about it).
It’s Social, Too?
Just when you thought it was safe to go back onto the web, semantic search is, well, not just in search, but also in social. Graph Search is Facebook’s internal semantic search engine that operates in a similar fashion to Google’s Knowledge Graph. Yep, Zuck and Crew are mining virtually every nook and cranny of the over 1 billion Facebook profiles to serve up contextually relevant search results to you and all of your friends.
Wait a second, though, just how contextually relevant can a search engine be without factoring in human emotions? Don’t worry, Facebook has figured that one out too. Perhaps in answer to the W3C’s Emotion Markup Language recommendation, in the spring of 2013 Facebook rolled out its Share Moods app, giving users access to countless emoticons (love em) with which to express their feelings. This stroke of brilliance represents the gift that keeps on giving for Facebook and Graph Search, supplying the latter with a constant feed of structured behavioral data to improve its machine learning, and the former something new to pedal to advertisers and marketers for the right price.
Suffice it to say, this post just scratches the surface of semantic search. For a more comprehensive treatment of entity search, check out Justin Brigg’s treatment here. In addition, the writers for MOZ and Search Engine Watch cover this important subject both well and often.
Cogito Ergo Sum
Rene Descartes was considered the father of modern western philosophy; I suppose he could also be regarded the grandfather of semantic search. His Cogito Ergo Sum maxim pithily summarizes the key differentiating factor between our current keyword-string approach to search and a rapidly approaching future dominated by semantic search: the ability of search engines to think, and in doing so to recognize things not strings.
I think if Descartes was alive today, he’d be one badass SEO - or SSO (Semantic Search Optimizer) - or whatever we’re calling that person now…