I don’t know how you are building the URLs, but, except for the domains, that have a different encoding, all non-ASCII parts of a URL must be URL-encoded, AKA percent-encoded. The browser does it for you if you don’t do it yourself. OTOH, the browser will in most cases show you the unencoded version of your characters. You might not be aware that what is sent over the wire is URL-encoded.
E.g., your path is sent over the wire as /om-os/b%c3%a6redygtighed/socialt-ansvar
, even if you see /om-os/bæredygtighed/socialt-ansvar
in the address bar. Check it with the developer tools. If you use Firefox, you will have to look at the Headers tab of the HTTP call’s details in the Network tab. Chrome, instead, will also show you the HTTP call’s summary row URL-encoded. That %c3%a6
in the path is the hex value of the two bytes, C3 and A6, that make up the UTF-8 encoding of the character æ
.
You can even set your window.location.pathname
programmatically to /om-os/bæredygtighed/socialt-ansvar
, but when you read window.location.pathname
afterwards, you will get it URL-encoded:
window.location.pathname = '/om-os/bæredygtighed/socialt-ansvar'
[...]
console.log(window.location.pathname)
/om-os/b%C3%A6redygtighed/socialt-ansvar
I don’t know how your path flows into your breadcrumbs, but you clearly can reverse the URL-encoding before using your strings.
In JavaScript you normally do that with decodeURIComponent
():
console.log(decodeURIComponent('b%c3%a6redygtighed'))
bæredygtighed
console.log(decodeURIComponent('/om-os/b%c3%a6redygtighed/socialt-ansvar'))
/om-os/bæredygtighed/socialt-ansvar
In PHP you normally do that with urldecode
:
$decoded = urldecode('b%c3%a6redygtighed'); // will contain 'bæredygtighed'
But it would be better if you could make your data flow in a way that avoids the encoding and decoding steps before reaching your breadcrumbs.