Exact Duplicates & Near Duplicates - Play Media

Exact Duplicates vs Near Duplicates

The two main categories duplicate content typically falls into are pretty much self-explanatory. Exact duplicates are two URLs with identical content, while near duplicates are pages that are “nearly identical” – multiple versions of the same piece of content with minor differences.

Contrary to popular belief, content doesn’t need to be an exact match to be perceived as duplicate — if it’s similar enough, it will be considered as such, even though some things may differ.

Duplicate Content Detection: Finding Exact & Near Duplicates

While both exact duplicate and near-duplicate content can cause search ranking and visibility issues, each must be approached and handled specifically.

Detecting Exact Duplicates

Pages that are exact duplicates (often due to plagiarism, syndicated and scraped content, or mirroring) are easy to identify by standard checksumming techniques. You can use free versions of various SEO audit tools such as Screaming Frog and Siteliner to crawl your website and detect any exact duplicate pages in real time. The number of searches or results available in the free mode is usually limited, but you can always sign up for the premium versions if needed.

Detecting Near Duplicates

Identifying near-duplicates is a bit more tricky. Most tools, such as Screaming Frog, will look for exact duplicates by default, so you’ll probably need to enable near-duplicate checks manually. You can change the similarity threshold (usually set to 90% by default) if you want to find content with a lower percentage of similarity. Near duplicate content also requires performing a crawl analysis to populate it with usable data.

Another thing to keep in mind is that data is only pulled from indexable URLs, so if you have canonical URLs, those pages won’t be included in the reports, even if they are exact or near duplicates.

Resolving Exact Duplicate & Near Duplicate Content Issues

The first step to fixing these types of duplicate content issues is to decide which version of the page you want to keep – opting for the better-performing one is considered best practice.

301 Redirects

You can kill the duplicate pages and still have them boost the SEO of the primary page you chose to keep by combining exact duplicates or near duplicates into a single URL with permanent 301 redirects, so they consolidate their link equity.

Rel Cannonical

Another way to consolidate duplicate content is to place the rel=canonical attribute inside of the <head> HTML tag to mark the lower-performing exact or near duplicate content page. It’s a way to tell search engines that all its link juice and ranking power should be attributed to the original (higher-performing) page marked with a self-referential canonical tag. It’s similar to a 301 redirect, but easier to implement.

De-Indexing URLs

You can remove URLs that you don’t want indexed altogether from your XML sitemap or modify the meta robots tags of the pages you wish to exclude from search results to “noindex, follow” manually. Another helpful short-term strategy is to mark duplicate URLs as passive in Google’s Search Console parameters. They will be ignored by crawlers and won’t show up in search, however, the URL that you are keeping around also won’t receive any SEO benefits from them

No Results Found

The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.

Duplicate Content Issues: Exact Duplicates & Near Duplicates

Exact Duplicates vs Near Duplicates

Duplicate Content Detection: Finding Exact & Near Duplicates

Detecting Exact Duplicates

Detecting Near Duplicates

Resolving Exact Duplicate & Near Duplicate Content Issues

301 Redirects

Rel Cannonical

De-Indexing URLs

Recent Posts

Categories

Learn On-page SEO

Conversion Rate Optimization

Robots.txt

Meta Description

Learn Technical SEO

External Redirection 3xx

Blocked by robots.txt

Broken redirects

Learn off-page SEO

No Results Found