SEO, The Science™ begins with understanding the basics of how search engines work.
One of the difficulties many have producing a solution in a problem domain is understanding how search engines work because of subtle differences in definitions or descriptions used by those in the search industry and what most people understand them to mean.
For instance, Goolge's results (what you see when you search) are considered it's 'index', but not it's 'database' (for lack of a better term), so when you place a "noindex" tag on a web page, the page is 'removed from the index', which translates to 'not returned in the results'.
What the preceding does not state is, 'The web page is not used in any way by Google.', actually Google employees have specifically stated the exact opposite. Pages containing a "noindex" tag can, and do, pass PageRank, indicating, Google not only stores a copy of the page, it uses the information on the page for scoring purposes.
Both Yahoo! and MSN's 'Live Search' actually return references to the page in their indexes (results), but do not return the actual page with the title and description, they simply show a URL.
The slight difference between the definition, or use, of the word 'index', as used by the search industry, and as understood by the general population, (which would probably consider 'index' to be all the information search engines have), has caused some confusion among webmasters and SEOs.
For this reason, the first portion of 'SEO, The Science' clarifies terminology. Part two then helps with understanding the basics of 'How Search Engines Work', and part three highlights some 'On-Page & On-Site' SEO techniques.
This web page is an original work.
© 2008 RankingLabs.Com. All Rights Reserved.
SEO, The Science TM 2008 RankingLabs.Com
Please do not duplicate or reproduce this web page without the express written permission of RankingLabs.Com.
Internal Weblinks underlined in Blue | External Weblinks underlined in Red.
Weblinks to other SEO Resources (including 'What's an SEO' and 'SEO Myths') at the conclusion of this web page.
The web is made up of websites. Websites are made up of documents. Documents are made up of pages. From a search engine perspective the word 'website' can be have multiple definitions depending on the number and type of documents a website contains, for 'visual' purposes please review the following terms and definitions. Terms used may not be exact representations of wording and terms actually used, but should be a valid 'visual' representation of how websites may be viewed.
If you think of the distinctions of the word 'document' in terms of print it is easier to visualize:
* May also be referred to as an 'article', but 'brochure' provides a fitting 'visual' representation. Another 'visual' representation in print terms could be a website is a 'newspaper' (collection of documents, multiple topics), a 'newspaper section' (one general 'theme' or 'topic', multiple closely related topics), a 'newspaper article' (single topic).
Any of the three definitions of 'document' in print terms could easily be what is referred to as a 'website', and unless there is a distinction, understanding the way search engines work is much more difficult.
In search engine processing a website would look something like the following.
Things to keep in mind:
The easiest way to explain how search engines function is: "Words are Math" | "Be Careful What You Say"
The determination of a 'topic' for a given document (web page(s), website) must be calculated mathematically. The only way to do so, beyond a keyword based 'direct word matching system' (boolean), is to treat words as variables, and associate them to each other, singularly and in groups.
So, what reads like language to us looks like a series of variables to a search engine's algorithm, because mathematical equations do not 'comprehend', but have begun to associate phrases (one or more words) based on use and proximity (context) to other phrases within, or referenced by, a document (web page(s), website).
In looking at the one word phrase 'engine', for example, it can have multiple meanings. 'Engine' can be understood to be used in an automobile (internal combustion), in aircraft ('jet', gas turbine), in power generation (gas turbine), for information retrieval (search), and even for calculations (babbage, difference).
The only way to make an accurate determination of which type of engine is being referenced and return relevant documents (web pages) in a result set, without matching 'word-for-word' (boolean), is to associate phrases (contextual), which is what Google's newest algorithm does. Google even uses 'phrase predictability' to assist in determining results and eliminating spam. Basically, what search engines, starting with Google, have done, and are doing, is moving search to another level.
Search used to be 'word-for-word' (boolean) based, meaning if you typed 'Website SEO' into a search box, you would be returned references to documents (web pages) containing the terms 'Website', 'SEO' and 'Website SEO'; Main topic 'Website SEO'; emphasis on 'Website', the first word; secondary value of 'SEO'.
The new type of algorithm makes it possible for phrase based (contextual) associations to be made, and return results for documents (web pages) containing other closely related phrases, which do not necessarily contain the 'searched-on-phrase', or have the terms as such a small percentage of the text or inbound links the words could go unnoticed by visitors and the document (web page) would not be selected as one of the best results by a word matching (boolean) algorithm.
An example of the new contextual results would be searching for 'website rankings' and receiving results for documents (web pages) with the phrases 'SEO', 'search engine optimization', 'Webmaster World', 'Google Webmaster Central', 'Yahoo Search Blog' and so on, rather than boolean results for the words 'website rankings', 'improve website rankings', 'higher website rankings', 'better website rankings', 'how to improve website rankings'.
Another example, again using the word 'engine', is when a more specific search, 'car engine', is conducted. The new search results might be for 'Chevrolet V8', 'Ford 5.0 Liter', 'Dodge Hemi', 'Honda Hybrid', and so on, rather than returning results for 'carengine.com', 'car engines', 'car engine manual', 'car engine repair'.
The old version of SEO for the group of words 'search engine optimization', in addition to other factors (such as links), was to use 'the right combination' of the words 'search engine optimization' together as a group and 'the right combination' of variations of the words within the group ('search engine', 'search optimization', 'search', 'engine', 'optimization').
The shift to phrase based technology (contextual search) makes it so to rank for a 'searched on phrase' associated phrases must be included, or referenced, within the document (collection of one or more web pages associated with the 'searched on phrase', website) and on the specific web page of the document (web page(s), website) you wish to rank.
The more search engines advance in relating associated phrases, the more difficult they will become to spam, because as phrase based technology is refined documents (web pages) containing truly unique, useful content will begin to float to the top, and those who still try to SEO documents (web pages) by repeating words and variations of words will find much less success.
Currently 'on-page text SEO' is shifting from the ability to reverse engineer the top ten websites for a set of words, get close to the 'magic formula' of repetitions and variations, then overpower them with links, to having the ability to create unique content which naturally includes related phrases and helps dictate visitor behavior by keeping them engaged in a website rather than clicking 'back' and finding another source of information.
Words are Math
ILLUSTRATION
Boolean (Word-to-Word): "Engine" = "Engine"
Possible Results:
Contextual (Phrase Assoc.): "Engine" = "Gas Turbine" "Internal Combustion" "Chevrolet V8" "Electric Hybrid" *
Possible Results:
* Contextual Searching is still new, and only limited results are showing as outlined above, but it is the direction search results are heading.
Understanding 'Words are Math': 'Be Careful What You Say'.
Much like everyday life, with search engines, the words you use define what you are trying to communicate, but unlike people, search engines are algorithms, they do not 'get the joke', they do not understand sarcasm, they cannot infer from your personality or past experience. All they have to go by are the actual words you use within your document (web page(s), website).
In everyday life, you can say things you do not 'really mean', because based on tone, inflection, past experience, body language, and other 'outside factors', people can determine what you are trying to communicate (if they know you well enough). In everyday life, you can talk about things which are not 'who you are' or 'what you are', but with search engines, which only have words to work with, every word used 'counts' to define your document (web page(s), website) in some manner.
The words you use within a document (web page(s), website) determine 'topics', 'phrases', 'focus', 'related information', 'relevance' and more. For SEO, you should 'be yourself' to remain unique in your writing style, but when drafting a document (web page, collection of web pages, website) designed to rank as an 'authority' on a subject you should remain on topic, withhold the sarcasm and other text requiring 'external factors' to be correctly interpreted, and be concise, yet informative. Every word you place on a web page is used by search engines in an effort to make a determination of what the complete document (website, collection of web pages) and a specific web page are about, so, especially when it comes to SEO, 'Be Careful What You Say'.
Understanding how search engines are beginning to associate and relate phrases is only part of SEO. Another part of 'On-Page' or 'On-Site' SEO is being able to organize information and communicate with search engines in order to assist generating an accurate determination of the topic(s) contained on a single page (URL), within a document (group of one or more URLs), and ultimately within a complete website (root and all associated URLs). The communication process starts with organization, and both begin at the top of the page, with the URL.
For simplicity, 'document' was previously limited (basically) to be a web page or website. In this section, on how to organize and communicate information to search engines, the word document should be considered the entirety of information on a given subject, and could be the entire website, but usually is not. Even small websites, such as RankingLabs.Com, often have multiple documents, and should be organized correctly.
Often when referring to web pages, people think of the URL (or URI) and the information presented when a browser requests a specific URL as the answer.
For SEO purposes, the two need to have a distinction. The URL is one piece of the information stored by search engines. The information it accesses is another. The two need to be thought of as separate and distinct pieces of information, which should 'work together' for proper website structuring and indexing. In the following text, the URL is the URL, and the web page is the information the URL accesses or presents when requested by a browser.
It is important to understand: A URL is what search engines store to reference a given piece of information. A URL is also what they associate inbound links to, what they associate out bound links from, and what they use in the results so visitors can find your website. URLs are important.
Many have probably read, or heard, at some point in time 'cool URLs don't change', but, from an SEO point of view, 'uncool URLs need to'. When structuring a website understand: search engines are well organized storage and retrieval systems, so a correct, logical URL structure can help identify not only what a web page is about, but which web pages should be considered part of a particular document, how a document(s) and web pages relate to each other, and how many unique documents are contained within a website.
When determining URLs it is easier if you visualize a bit. If complete URLs are the 'doors' to your website, then the words and information they contain are the 'keys'. If the key to the door is messy, difficult to understand, or contains unnecessary information it makes opening the door and evaluating the quality of the information presented more difficult for visitors and search engines alike.
Determining URLs is fairly easy if you simply ask yourself where you would store the information on your computer. Obviously, not everything gets stored in one folder, but a single page probably does not get stored 12 folders deep either, and it is very likely nothing would ever get stored in a folder, or saved as a file called prod_num=1-45N3X672YZ (it would probably be given a name).
Using the RankingLabs.Com basic directory structure as an example: the 'SEO, The Science', web page is located at: RankingLabs.Com/SEO/Science, 'SEO, The Art' is located at: RankingLabs.Com/SEO/Art, and the Mod_Rewrite page is located at: RankingLabs.Com/Mod_Rewrite. The reason for the distinction in RankingLabs.Com's URLs is there are (basically) two different documents consisting of (basically) three 'internal pages'.
The 'internal pages' of the SEO document are filed in the SEO directory, and even though Mod_Rewrite can play a role in SEO the web page is not about SEO, it's about Mod_Rewrite, making it a unique one page document, which should be filed by itself, and this web page makes mention of Mod_Rewrite, but SEO is the main topic, so it does not belong in a Mod_Rewrite 'folder' or 'directory'.
If all three pages were distinct documents (topics), each should have it's own directory (folder), or 'base' URL (page on the 'root' domain). If all three pages covered the topic of SEO directly, they should all be filed under SEO as three parts of a single document. The topic of each page dictates where it should be filed.
After organizing the location of information on a website, filing documents in the correct directories and creating clear, functional URLs it is important to look at on-page organization and communication.
On-page communication starts with the source code, specifically, the information between the <head> and </head> tag, which is where the title, the description, the keywords tag, styles and style sheet references, JavaScript and a number of other elements are located. The title of the a web page is one of the most important elements, and should be a clear, concise 'definition' or 'overview' of the information presented. The text between <title> and </title> is the text you are looking for.
In keeping with the book illustration at the beginning of this web page, it is important to understand, web pages are not like text pages, they are more like chapters, and should be titled as such. When reading a book, the beginning of each chapter has a brief, usually 3 to 10 word, description of what the main topic of the chapter is about, and a web page should be titled the same way.
Within a single document, each web page title should be unique, and relate to the information the individual web page presents. The title should also generally relate to the overall document it is a portion of. When the text is accurate, it is usually a good idea to make sure the title tag is the first 'text' information below the <head> tag.
Next, you should look below to the <body> tag to see which information is closest to the top of the web page. One of the drawbacks to HTML editors is they are not designed to present your web page in a clear logical order for text readers and search engines, they are designed to make your web page and website easy to create and assist you in making it look good, but for SEO purposes it is important to have the main topic of the page and the bulk of information displayed first.
Read through your source code until you see text you typed on the page, usually it will be the text from the top of the far left section, which could be 'sub-topic' information, rather than the main information on the page.
If what you read first is something other than the main topic, a person listening to your page through a text reader might be able to understand the information or wait until they are through the menu to decide what the web page is about, but it might appear to be a 'keyword' or 'key phrase' list to a search engine, and could 'confuse' an algorithm in to triggering a 'possible spam' flag, especially if it is a large 'navigation list' of keywords or key phrases.
After the title is set, you have checked, and if necessary corrected, the load order of the page, look at the web page as an individual book for the purposes of determining headings. Headings should be titled the same way as the title of the page, <h1> might be the title of the 'book', with <h2> through <h6> representing 'chapters'. Each should be a short, concise definition of the information contained in the subsequent text.
It is important to understand 'headings' are defined with specific tags, like a paragraph tag <p>, from <h1> to <h6>. Tags should be placed on the page from <h1> to <h6>, indicating 'most important' to least important, and can be adjusted with CSS to fit the page visually. Resizing a paragraph tag with a style may not be counted, or interpreted the same way by search engines, especially if it is done on an external style sheet, so the correct tags should be utilized.
Other 'On-Page' & 'On-Site' SEO considerations should include:
Read SEO, The Art™ on RankingLabs.Com, where the 'creative side' of SEO is outlined.
What's an SEO
(Google.Com)
Search Engine Optimization
(WikiPedia.Org)
SEO Myths
(Google Video)
Yahoo Search Blog
(YSearchBlog.Com)
Google SEO
(MattCutts.Com)
Search Engine Patents
(RankingLabs.Com)
