Duplicate Content is the Worst
Published by Spinutech on July 6, 2020
 
    The Value of Content
To understand the impacts of duplicate content, you first need to understand the value that content brings to a site. A piece of content can have value in a number of ways. First, the content can provide value to your reader by answering a question or providing advice or instruction. Another way content provides value is by building the authority, relevance, and trust for a site in search. By addressing common search keywords, a piece of content can be served by a search engine as an answer to user queries. Content can also provide value to a site by increasing your site’s authority on a given subject matter. Having multiple pieces of content relating to a particular subject and addressing different intentions builds a site’s reputation.
As a content creator, your primary objective should be to provide value.
The main issue with content duplication is that duplicative content provides no value, either in search or within the site hierarchy.
Duplicate Content Within a Domain
Within a website’s information architecture, each page should represent a topic or idea. To provide value to both users and search engines, the topic or idea should be covered as thoroughly as possible.
When you have two identical pages without canonicalization, this is considered duplicate content. Canonicalization is a way of setting a “preferred” page for information, telling search engines which version of a duplicated page is the right one to crawl and send links into. Google duplicate content guidelines address and acknowledge that this can happen intentionally or unintentionally. Some examples of unintentional duplicate content can include:
- Print-friendly versions of a page which contain the exact content but in a stripped-down design
- Product listings linked via multiple distinct URLs (sometimes caused by added URL parameters)
To fix these unintentional issues, Google duplicate content guidelines give technical SEO guidance on specific tactics like using 301 redirects or canonicalizing the content to pass value to one page, which basically sets a “preferred version” of the page for a given topic.
In SERPs (Search Engine Results Pages), Google will likely choose one page from a domain to display for a query based on quality indicators (such as inbound links) or a manual canonicalization. This means that if you have two pages that cover the exact same topic within a domain without differentiation, only one of those pages will be served to searchers, potentially splitting the visibility of both pages.
Google has developed these processes over time — in large part to benefit users and make search results more valuable to searchers, but also to avoid sites trying to manipulate search results to make themselves more visible. Intentional duplication of content can also be seen by Google as an attempt to deceptively gain more traffic and users by trying to rank multiple pages for the same targets. Through machine learning, over the years Google and other search engines have become smarter about recognizing and filtering
A Note About Duplicate Targeting
Most of this information applies to true duplicate content, or word-for-word republication of the exact same information. However, there can also be an issue with duplication of targeting and intention. Within a site hierarchy, each page provides its own unique value, which rolls value upward into the category, supporting an overarching theme. When you create multiple pages targeting the exact same topic and intention, even if it’s not completely duplicative, the two pages will still compete with each other for users’ and search engines’ attention. By combining information into one page per topic/intention, you ensure that the page created provides the most value possible and is set up to rank as high as possible within search engine results.
Duplicate Content Across Separate Domains
Publishing the exact same content on your domain that appears on another domain without proper citation or attribution is plagiarism — which is just as damaging in website content writing as it is in print. If another site is copying your content without crediting you as the original source, take action. On the flip side, if you’re republishing substantive blocks of content taken from another domain without proper citation and without adding any valuable unique content, you are violating Google’s webmaster guidelines and potentially opening yourself to a penalty (or worse).
Questions to Ask Yourself
Many site owners and content creators get hung up on duplicate content. To avoid true duplication of content, ask yourself these questions prior to publishing:
- What purpose does the piece of content serve? Each piece of content created should have its own unique value. Otherwise, why publish it at all? To publish multiple pieces of content surrounding the same theme, try approaching a topic from multiple intentions, perspectives, or angles. How can you make the content unique or original by adding new perspectives, information from authoritative sources, or locally-modified content.
- Can I accomplish my goals by simply linking to an existing piece of content? If your quandary is how to get a page to appear in multiple places such as within a navigation, consider having one “home” for that piece of content and simply linking from other spots. If this is not possible, utilize canonicalization to indicate to search engines which version is the proper version.
- Should I be publishing this? Any time you create new content, you should be asking yourself what value it provides to your reader, to your site, and to search results. If the piece of content you are planning does not provide any value, rethink your topic and targeting.
Duplicate Content SEO Implications
Although the main reason to avoid duplicate content is because it provides no value to a user or to your site, there can be some implications to SEO as well. When you have duplicate content on a site, it can dilute the power of backlinks by distributing them across multiple versions of content.
Some instances of duplicate content can be caused by URL parameters. In these cases, there are specific strategies to employ to address the duplications with Google and avoid crawling of parameterized duplicate content. Work with an experienced SEO team to adjust automatic URL parameters and block crawling of parameterized duplicate content.
Duplicate Content Checker
There are some tools that content strategists use to check for duplicate content. Some free tools like Copyscape are designed to check for plagiarism within text. Some, like Siteliner help analyze a site’s duplicate content and links. Other tools are designed for more in-depth analysis like Screaming Frog, which is designed to crawl an entire site as a bot. A robust tool with many features including a duplicate content checker plus many other content and SEO tools is Raven Tools. These tools give a more robust crawl but also take some time to understand and interpret.