Discussion:
non-ascii characters at URL and pasrsing those chars at string level
Haluk Karamete
2014-09-09 22:03:47 UTC
Permalink
First off, I need to get you what non-ascii chacters I'm talking about.

For instance, just type in 'Slobodan Milosevic' in Google Search and go to
the first suggested wikipedia link.

You will see that the URL contains very unusual characters that is well
beyond the common ASCII set. I'm simply curious if WordPress support that.

Though this is not a feature I particularly like (to say the least), I do
confess that I find it quite interesting from an HTTP point of view.

But my real question (or pain to better put) is this.
Say you are scraping that data and you came across that title with those
funny characers... and you want to create a tag out of that.

Is there a conversion function that I can pass in that string and get back
the ASCII 128 or below translated version?

So I pass in 'slobodan_milo%c5%a1evi%c4%87', and I get back the good old
'Slobodan Milosevic'

Does such a function exist? Or how do you deal with that situation?
Jason LeVan
2014-09-09 23:53:54 UTC
Permalink
urldecode() mixed with remove_accents() perhaps?

https://core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L794

___________________________________

Jason LeVan
Post by Haluk Karamete
First off, I need to get you what non-ascii chacters I'm talking about.
For instance, just type in 'Slobodan Milosevic' in Google Search and go to
the first suggested wikipedia link.
You will see that the URL contains very unusual characters that is well
beyond the common ASCII set. I'm simply curious if WordPress support that.
Though this is not a feature I particularly like (to say the least), I do
confess that I find it quite interesting from an HTTP point of view.
But my real question (or pain to better put) is this.
Say you are scraping that data and you came across that title with those
funny characers... and you want to create a tag out of that.
Is there a conversion function that I can pass in that string and get back
the ASCII 128 or below translated version?
So I pass in 'slobodan_milo%c5%a1evi%c4%87', and I get back the good old
'Slobodan Milosevic'
Does such a function exist? Or how do you deal with that situation?
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
Haluk Karamete
2014-09-09 23:59:15 UTC
Permalink
I will look into that. your suggestion looks very promising. thank you for
that.
I also discovered this resource http://www.acc.umu.se/~saasha/charsets/ for
my own DIY.
Post by Jason LeVan
urldecode() mixed with remove_accents() perhaps?
https://core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L794
___________________________________
Jason LeVan
Post by Haluk Karamete
First off, I need to get you what non-ascii chacters I'm talking about.
For instance, just type in 'Slobodan Milosevic' in Google Search and go
to
Post by Haluk Karamete
the first suggested wikipedia link.
You will see that the URL contains very unusual characters that is well
beyond the common ASCII set. I'm simply curious if WordPress support
that.
Post by Haluk Karamete
Though this is not a feature I particularly like (to say the least), I do
confess that I find it quite interesting from an HTTP point of view.
But my real question (or pain to better put) is this.
Say you are scraping that data and you came across that title with those
funny characers... and you want to create a tag out of that.
Is there a conversion function that I can pass in that string and get
back
Post by Haluk Karamete
the ASCII 128 or below translated version?
So I pass in 'slobodan_milo%c5%a1evi%c4%87', and I get back the good old
'Slobodan Milosevic'
Does such a function exist? Or how do you deal with that situation?
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
_______________________________________________
wp-hackers mailing list
http://lists.automattic.com/mailman/listinfo/wp-hackers
Loading...