Discussion:
Duplicate Content Issue - URLs work with dots and colons in it
Baki Goxhaj
2013-08-29 09:51:38 UTC
Permalink
Hi guys,

URLs with a dot, a colon, or many, still works and Google treats it as
duplicate content. Here is an example:

Original:http://ma.tt/2010/11/one-point-oh/
With a dot: http://ma.tt/2010/11/one-point-oh./
With a column: http://ma.tt/2010/11/one-point-oh:/
With many dots:
http://ma.tt/2010/11/one-point-oh...../<http://wplancer.com/a-beginners-guide-to-the-command-line........../>

All of these URLs work and if mistakenly on the site, they will be
considered duplicate content by Google.

Is this a bug? If not, why is this happening?

PS: I wrote about this in the support forums, but the post seems filled
with typos, thus it might be a reason I had no replies there:
http://wordpress.org/support/topic/dublicate-content-url-works-with-tots-and-columns-in-it
Kindly,

Baki Goxhaj
about.me/banago
Shea Bunge
2013-08-29 09:58:47 UTC
Permalink
If you look at the source of that page, you will see that there is a <link rel="canonical"> pointing to the original post URL. This tells Google that the original URL is preferred, no matter what URL the page is accessed from.


> Date: Thu, 29 Aug 2013 11:51:38 +0200
> From: ***@gmail.com
> To: wp-***@lists.automattic.com
> Subject: [wp-hackers] Duplicate Content Issue - URLs work with dots and colons in it
>
> Hi guys,
>
> URLs with a dot, a colon, or many, still works and Google treats it as
> duplicate content. Here is an example:
>
> Original:http://ma.tt/2010/11/one-point-oh/
> With a dot: http://ma.tt/2010/11/one-point-oh./
> With a column: http://ma.tt/2010/11/one-point-oh:/
> With many dots:
> http://ma.tt/2010/11/one-point-oh...../<http://wplancer.com/a-beginners-guide-to-the-command-line........../>
>
> All of these URLs work and if mistakenly on the site, they will be
> considered duplicate content by Google.
>
> Is this a bug? If not, why is this happening?
>
> PS: I wrote about this in the support forums, but the post seems filled
> with typos, thus it might be a reason I had no replies there:
> http://wordpress.org/support/topic/dublicate-content-url-works-with-tots-and-columns-in-it
> Kindly,
>
> Baki Goxhaj
> about.me/banago
> _______________________________________________
> wp-hackers mailing list
> wp-***@lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
Abdussamad Abdurrazzaq
2013-08-29 11:03:41 UTC
Permalink
Yep but only on single posts. What about categories?

http://wplancer.com/category/code..../

I supposed we would have to install one of those SEO plugins.

On 08/29/2013 02:58 PM, Shea Bunge wrote:
> If you look at the source of that page, you will see that there is a <link rel="canonical"> pointing to the original post URL. This tells Google that the original URL is preferred, no matter what URL the page is accessed from.
>
>
>> Date: Thu, 29 Aug 2013 11:51:38 +0200
>> From: ***@gmail.com
>> To: wp-***@lists.automattic.com
>> Subject: [wp-hackers] Duplicate Content Issue - URLs work with dots and colons in it
>>
>> Hi guys,
>>
>> URLs with a dot, a colon, or many, still works and Google treats it as
>> duplicate content. Here is an example:
>>
>> Original:http://ma.tt/2010/11/one-point-oh/
>> With a dot: http://ma.tt/2010/11/one-point-oh./
>> With a column: http://ma.tt/2010/11/one-point-oh:/
>> With many dots:
>> http://ma.tt/2010/11/one-point-oh...../<http://wplancer.com/a-beginners-guide-to-the-command-line........../>
>>
>> All of these URLs work and if mistakenly on the site, they will be
>> considered duplicate content by Google.
>>
>> Is this a bug? If not, why is this happening?
>>
>> PS: I wrote about this in the support forums, but the post seems filled
>> with typos, thus it might be a reason I had no replies there:
>> http://wordpress.org/support/topic/dublicate-content-url-works-with-tots-and-columns-in-it
>> Kindly,
>>
>> Baki Goxhaj
>> about.me/banago
>> _______________________________________________
>> wp-hackers mailing list
>> wp-***@lists.automattic.com
>> http://lists.automattic.com/mailman/listinfo/wp-hackers
>
> _______________________________________________
> wp-hackers mailing list
> wp-***@lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>
Baki Goxhaj
2013-08-31 10:02:26 UTC
Permalink
@Shea,

rel="canonical" or not this seems like a undesirable behaviour to have in
WordPress. The urls should not work at all with those dots and colons -
shouldn't that be the case?

Kindly,

Baki Goxhaj
about.me/banago


On Thu, Aug 29, 2013 at 1:03 PM, Abdussamad Abdurrazzaq <
***@abdussamad.com> wrote:

> Yep but only on single posts. What about categories?
>
> http://wplancer.com/category/**code..../<http://wplancer.com/category/code..../>
>
> I supposed we would have to install one of those SEO plugins.
>
>
> On 08/29/2013 02:58 PM, Shea Bunge wrote:
>
>> If you look at the source of that page, you will see that there is a
>> <link rel="canonical"> pointing to the original post URL. This tells Google
>> that the original URL is preferred, no matter what URL the page is accessed
>> from.
>>
>>
>> Date: Thu, 29 Aug 2013 11:51:38 +0200
>>> From: ***@gmail.com
>>> To: wp-***@lists.automattic.**com <wp-***@lists.automattic.com>
>>> Subject: [wp-hackers] Duplicate Content Issue - URLs work with dots and
>>> colons in it
>>>
>>> Hi guys,
>>>
>>> URLs with a dot, a colon, or many, still works and Google treats it as
>>> duplicate content. Here is an example:
>>>
>>> Original:http://ma.tt/2010/11/**one-point-oh/<http://ma.tt/2010/11/one-point-oh/>
>>> With a dot: http://ma.tt/2010/11/one-**point-oh./<http://ma.tt/2010/11/one-point-oh./>
>>> With a column: http://ma.tt/2010/11/one-**point-oh:/<http://ma.tt/2010/11/one-point-oh:/>
>>> With many dots:
>>> http://ma.tt/2010/11/one-**point-oh...../<http://ma.tt/2010/11/one-point-oh...../>
>>> <http://**wplancer.com/a-beginners-**guide-to-the-command-line.....**
>>> ...../<http://wplancer.com/a-beginners-guide-to-the-command-line........../>
>>> >
>>>
>>> All of these URLs work and if mistakenly on the site, they will be
>>> considered duplicate content by Google.
>>>
>>> Is this a bug? If not, why is this happening?
>>>
>>> PS: I wrote about this in the support forums, but the post seems filled
>>> with typos, thus it might be a reason I had no replies there:
>>> http://wordpress.org/support/**topic/dublicate-content-url-**
>>> works-with-tots-and-columns-**in-it<http://wordpress.org/support/topic/dublicate-content-url-works-with-tots-and-columns-in-it>
>>> Kindly,
>>>
>>> Baki Goxhaj
>>> about.me/banago
>>> ______________________________**_________________
>>> wp-hackers mailing list
>>> wp-***@lists.automattic.**com <wp-***@lists.automattic.com>
>>> http://lists.automattic.com/**mailman/listinfo/wp-hackers<http://lists.automattic.com/mailman/listinfo/wp-hackers>
>>>
>>
>> ______________________________**_________________
>> wp-hackers mailing list
>> wp-***@lists.automattic.**com <wp-***@lists.automattic.com>
>> http://lists.automattic.com/**mailman/listinfo/wp-hackers<http://lists.automattic.com/mailman/listinfo/wp-hackers>
>>
>> ______________________________**_________________
> wp-hackers mailing list
> wp-***@lists.automattic.**com <wp-***@lists.automattic.com>
> http://lists.automattic.com/**mailman/listinfo/wp-hackers<http://lists.automattic.com/mailman/listinfo/wp-hackers>
>
Guus (IFS)
2013-08-31 10:35:19 UTC
Permalink
I can't imagine any undesirable behaviour with canonicals. It just tells
search engines what's the 'best' url for a page if there are more url's
leading to the same page. Very useful way to prevent duplicate content
within a site.

What canonicals are is not obvious to anybody straight away, so that may be
your issue. If you need any help with that just e-mail me.

----- Original Message -----
From: "Baki Goxhaj" <***@gmail.com>
To: <wp-***@lists.automattic.com>
Sent: Saturday, August 31, 2013 6:02 PM
Subject: Re: [wp-hackers] Duplicate Content Issue - URLs work with dots and
colons in it


> @Shea,
>
> rel="canonical" or not this seems like a undesirable behaviour to have in
> WordPress. The urls should not work at all with those dots and colons -
> shouldn't that be the case?
>
> Kindly,
>
> Baki Goxhaj
> about.me/banago
>
>
> On Thu, Aug 29, 2013 at 1:03 PM, Abdussamad Abdurrazzaq <
> ***@abdussamad.com> wrote:
>
>> Yep but only on single posts. What about categories?
>>
>> http://wplancer.com/category/**code..../<http://wplancer.com/category/code..../>
>>
>> I supposed we would have to install one of those SEO plugins.
>>
>>
>> On 08/29/2013 02:58 PM, Shea Bunge wrote:
>>
>>> If you look at the source of that page, you will see that there is a
>>> <link rel="canonical"> pointing to the original post URL. This tells
>>> Google
>>> that the original URL is preferred, no matter what URL the page is
>>> accessed
>>> from.
>>>
>>>
>>> Date: Thu, 29 Aug 2013 11:51:38 +0200
>>>> From: ***@gmail.com
>>>> To: wp-***@lists.automattic.**com <wp-***@lists.automattic.com>
>>>> Subject: [wp-hackers] Duplicate Content Issue - URLs work with dots and
>>>> colons in it
>>>>
>>>> Hi guys,
>>>>
>>>> URLs with a dot, a colon, or many, still works and Google treats it as
>>>> duplicate content. Here is an example:
>>>>
>>>> Original:http://ma.tt/2010/11/**one-point-oh/<http://ma.tt/2010/11/one-point-oh/>
>>>> With a dot:
>>>> http://ma.tt/2010/11/one-**point-oh./<http://ma.tt/2010/11/one-point-oh./>
>>>> With a column:
>>>> http://ma.tt/2010/11/one-**point-oh:/<http://ma.tt/2010/11/one-point-oh:/>
>>>> With many dots:
>>>> http://ma.tt/2010/11/one-**point-oh...../<http://ma.tt/2010/11/one-point-oh...../>
>>>> <http://**wplancer.com/a-beginners-**guide-to-the-command-line.....**
>>>> ...../<http://wplancer.com/a-beginners-guide-to-the-command-line........../>
>>>> >
>>>>
>>>> All of these URLs work and if mistakenly on the site, they will be
>>>> considered duplicate content by Google.
>>>>
>>>> Is this a bug? If not, why is this happening?
>>>>
>>>> PS: I wrote about this in the support forums, but the post seems filled
>>>> with typos, thus it might be a reason I had no replies there:
>>>> http://wordpress.org/support/**topic/dublicate-content-url-**
>>>> works-with-tots-and-columns-**in-it<http://wordpress.org/support/topic/dublicate-content-url-works-with-tots-and-columns-in-it>
>>>> Kindly,
>>>>
>>>> Baki Goxhaj
>>>> about.me/banago
>>>> ______________________________**_________________
>>>> wp-hackers mailing list
>>>> wp-***@lists.automattic.**com <wp-***@lists.automattic.com>
>>>> http://lists.automattic.com/**mailman/listinfo/wp-hackers<http://lists.automattic.com/mailman/listinfo/wp-hackers>
>>>>
>>>
>>> ______________________________**_________________
>>> wp-hackers mailing list
>>> wp-***@lists.automattic.**com <wp-***@lists.automattic.com>
>>> http://lists.automattic.com/**mailman/listinfo/wp-hackers<http://lists.automattic.com/mailman/listinfo/wp-hackers>
>>>
>>> ______________________________**_________________
>> wp-hackers mailing list
>> wp-***@lists.automattic.**com <wp-***@lists.automattic.com>
>> http://lists.automattic.com/**mailman/listinfo/wp-hackers<http://lists.automattic.com/mailman/listinfo/wp-hackers>
>>
> _______________________________________________
> wp-hackers mailing list
> wp-***@lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
David Ernst
2013-08-31 13:09:20 UTC
Permalink
You'd prefer your visitors see a 404 page? What's the benefit?

On Saturday, August 31, 2013, Baki Goxhaj wrote:

> @Shea,
>
> rel="canonical" or not this seems like a undesirable behaviour to have in
> WordPress. The urls should not work at all with those dots and colons -
> shouldn't that be the case?
>
> Kindly,
>
> Baki Goxhaj
> about.me/banago
>
>
> On Thu, Aug 29, 2013 at 1:03 PM, Abdussamad Abdurrazzaq <
> ***@abdussamad.com <javascript:;>> wrote:
>
> > Yep but only on single posts. What about categories?
> >
> > http://wplancer.com/category/**code..../<
> http://wplancer.com/category/code..../>
> >
> > I supposed we would have to install one of those SEO plugins.
> >
> >
> > On 08/29/2013 02:58 PM, Shea Bunge wrote:
> >
> >> If you look at the source of that page, you will see that there is a
> >> <link rel="canonical"> pointing to the original post URL. This tells
> Google
> >> that the original URL is preferred, no matter what URL the page is
> accessed
> >> from.
> >>
> >>
> >> Date: Thu, 29 Aug 2013 11:51:38 +0200
> >>> From: ***@gmail.com <javascript:;>
> >>> To: wp-***@lists.automattic.**com <wp-***@lists.automattic.com<javascript:;>
> >
> >>> Subject: [wp-hackers] Duplicate Content Issue - URLs work with dots and
> >>> colons in it
> >>>
> >>> Hi guys,
> >>>
> >>> URLs with a dot, a colon, or many, still works and Google treats it as
> >>> duplicate content. Here is an example:
> >>>
> >>> Original:http://ma.tt/2010/11/**one-point-oh/<
> http://ma.tt/2010/11/one-point-oh/>
> >>> With a dot: http://ma.tt/2010/11/one-**point-oh./<
> http://ma.tt/2010/11/one-point-oh./>
> >>> With a column: http://ma.tt/2010/11/one-**point-oh:/<
> http://ma.tt/2010/11/one-point-oh:/>
> >>> With many dots:
> >>> http://ma.tt/2010/11/one-**point-oh...../<
> http://ma.tt/2010/11/one-point-oh...../>
> >>> <http://**wplancer.com/a-beginners-**guide-to-the-command-line.....**
> >>> ...../<
> http://wplancer.com/a-beginners-guide-to-the-command-line........../>
> >>> >
> >>>
> >>> All of these URLs work and if mistakenly on the site, they will be
> >>> considered duplicate content by Google.
> >>>
> >>> Is this a bug? If not, why is this happening?
> >>>
> >>> PS: I wrote about this in the support forums, but the post seems filled
> >>> with typos, thus it might be a reason I had no replies there:
> >>> http://wordpress.org/support/**topic/dublicate-content-url-**
> >>> works-with-tots-and-columns-**in-it<
> http://wordpress.org/support/topic/dublicate-content-url-works-with-tots-and-columns-in-it
> >
> >>> Kindly,
> >>>
> >>> Baki Goxhaj
> >>> about.me/banago
> >>> ______________________________**_________________
> >>> wp-hackers mailing list
> >>> wp-***@lists.automattic.**com <wp-***@lists.automattic.com<javascript:;>
> >
> >>> http://lists.automattic.com/**mailman/listinfo/wp-hackers<
> http://lists.automattic.com/mailman/listinfo/wp-hackers>
> >>>
> >>
> >> ______________________________**_________________
> >> wp-hackers mailing list
> >> wp-***@lists.automattic.**com <wp-***@lists.automattic.com<javascript:;>
> >
> >> http://lists.automattic.com/**mailman/listinfo/wp-hackers<
> http://lists.automattic.com/mailman/listinfo/wp-hackers>
> >>
> >> ______________________________**_________________
> > wp-hackers mailing list
> > wp-***@lists.automattic.**com <wp-***@lists.automattic.com<javascript:;>
> >
> > http://lists.automattic.com/**mailman/listinfo/wp-hackers<
> http://lists.automattic.com/mailman/listinfo/wp-hackers>
> >
> _______________________________________________
> wp-hackers mailing list
> wp-***@lists.automattic.com <javascript:;>
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>
Shea Bunge
2013-08-31 13:51:31 UTC
Permalink
@Baxi
I agree that this is strange behaviour and probably should be fixed. I was just pointing out that the rel="canonical" link fixes the issue with Google


> Date: Sat, 31 Aug 2013 09:09:20 -0400
> From: ***@ernsts.us
> To: wp-***@lists.automattic.com
> Subject: Re: [wp-hackers] Duplicate Content Issue - URLs work with dots and colons in it
>
> You'd prefer your visitors see a 404 page? What's the benefit?
>
> On Saturday, August 31, 2013, Baki Goxhaj wrote:
>
> > @Shea,
> >
> > rel="canonical" or not this seems like a undesirable behaviour to have in
> > WordPress. The urls should not work at all with those dots and colons -
> > shouldn't that be the case?
> >
> > Kindly,
> >
> > Baki Goxhaj
> > about.me/banago
> >
> >
> > On Thu, Aug 29, 2013 at 1:03 PM, Abdussamad Abdurrazzaq <
> > ***@abdussamad.com <javascript:;>> wrote:
> >
> > > Yep but only on single posts. What about categories?
> > >
> > > http://wplancer.com/category/**code..../<
> > http://wplancer.com/category/code..../>
> > >
> > > I supposed we would have to install one of those SEO plugins.
> > >
> > >
> > > On 08/29/2013 02:58 PM, Shea Bunge wrote:
> > >
> > >> If you look at the source of that page, you will see that there is a
> > >> <link rel="canonical"> pointing to the original post URL. This tells
> > Google
> > >> that the original URL is preferred, no matter what URL the page is
> > accessed
> > >> from.
> > >>
> > >>
> > >> Date: Thu, 29 Aug 2013 11:51:38 +0200
> > >>> From: ***@gmail.com <javascript:;>
> > >>> To: wp-***@lists.automattic.**com <wp-***@lists.automattic.com<javascript:;>
> > >
> > >>> Subject: [wp-hackers] Duplicate Content Issue - URLs work with dots and
> > >>> colons in it
> > >>>
> > >>> Hi guys,
> > >>>
> > >>> URLs with a dot, a colon, or many, still works and Google treats it as
> > >>> duplicate content. Here is an example:
> > >>>
> > >>> Original:http://ma.tt/2010/11/**one-point-oh/<
> > http://ma.tt/2010/11/one-point-oh/>
> > >>> With a dot: http://ma.tt/2010/11/one-**point-oh./<
> > http://ma.tt/2010/11/one-point-oh./>
> > >>> With a column: http://ma.tt/2010/11/one-**point-oh:/<
> > http://ma.tt/2010/11/one-point-oh:/>
> > >>> With many dots:
> > >>> http://ma.tt/2010/11/one-**point-oh...../<
> > http://ma.tt/2010/11/one-point-oh...../>
> > >>> <http://**wplancer.com/a-beginners-**guide-to-the-command-line.....**
> > >>> ...../<
> > http://wplancer.com/a-beginners-guide-to-the-command-line........../>
> > >>> >
> > >>>
> > >>> All of these URLs work and if mistakenly on the site, they will be
> > >>> considered duplicate content by Google.
> > >>>
> > >>> Is this a bug? If not, why is this happening?
> > >>>
> > >>> PS: I wrote about this in the support forums, but the post seems filled
> > >>> with typos, thus it might be a reason I had no replies there:
> > >>> http://wordpress.org/support/**topic/dublicate-content-url-**
> > >>> works-with-tots-and-columns-**in-it<
> > http://wordpress.org/support/topic/dublicate-content-url-works-with-tots-and-columns-in-it
> > >
> > >>> Kindly,
> > >>>
> > >>> Baki Goxhaj
> > >>> about.me/banago
> > >>> ______________________________**_________________
> > >>> wp-hackers mailing list
> > >>> wp-***@lists.automattic.**com <wp-***@lists.automattic.com<javascript:;>
> > >
> > >>> http://lists.automattic.com/**mailman/listinfo/wp-hackers<
> > http://lists.automattic.com/mailman/listinfo/wp-hackers>
> > >>>
> > >>
> > >> ______________________________**_________________
> > >> wp-hackers mailing list
> > >> wp-***@lists.automattic.**com <wp-***@lists.automattic.com<javascript:;>
> > >
> > >> http://lists.automattic.com/**mailman/listinfo/wp-hackers<
> > http://lists.automattic.com/mailman/listinfo/wp-hackers>
> > >>
> > >> ______________________________**_________________
> > > wp-hackers mailing list
> > > wp-***@lists.automattic.**com <wp-***@lists.automattic.com<javascript:;>
> > >
> > > http://lists.automattic.com/**mailman/listinfo/wp-hackers<
> > http://lists.automattic.com/mailman/listinfo/wp-hackers>
> > >
> > _______________________________________________
> > wp-hackers mailing list
> > wp-***@lists.automattic.com <javascript:;>
> > http://lists.automattic.com/mailman/listinfo/wp-hackers
> >
> _______________________________________________
> wp-hackers mailing list
> wp-***@lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
Baki Goxhaj
2013-09-18 10:35:41 UTC
Permalink
Here is the solution I came up with to fix that strange behaviour:

/**
* Fix for urls ending in dots and colons
*/
function redirect_dotted() {
global $wp;
$current_url = home_url( $wp->request );
$is_dot = substr($current_url, -1, 1);
if( $is_dot === '.' || $is_dot === ':' ) {
$clean_url = rtrim( $current_url, '.:' );
wp_redirect( $clean_url, 301 ); exit;
}
return;
}
add_action('wp', 'redirect_dotted');


Kindly,

Baki Goxhaj
about.me/banago
Loading...