Solution 1 :

I don’t know how you are building the URLs, but, except for the domains, that have a different encoding, all non-ASCII parts of a URL must be URL-encoded, AKA percent-encoded. The browser does it for you if you don’t do it yourself. OTOH, the browser will in most cases show you the unencoded version of your characters. You might not be aware that what is sent over the wire is URL-encoded.

E.g., your path is sent over the wire as /om-os/b%c3%a6redygtighed/socialt-ansvar, even if you see /om-os/bæredygtighed/socialt-ansvar in the address bar. Check it with the developer tools. If you use Firefox, you will have to look at the Headers tab of the HTTP call’s details in the Network tab. Chrome, instead, will also show you the HTTP call’s summary row URL-encoded. That %c3%a6 in the path is the hex value of the two bytes, C3 and A6, that make up the UTF-8 encoding of the character æ.

You can even set your window.location.pathname programmatically to /om-os/bæredygtighed/socialt-ansvar, but when you read window.location.pathname afterwards, you will get it URL-encoded:

window.location.pathname = '/om-os/bæredygtighed/socialt-ansvar'
[...]
console.log(window.location.pathname)
/om-os/b%C3%A6redygtighed/socialt-ansvar

I don’t know how your path flows into your breadcrumbs, but you clearly can reverse the URL-encoding before using your strings.

In JavaScript you normally do that with decodeURIComponent():

console.log(decodeURIComponent('b%c3%a6redygtighed'))
bæredygtighed
console.log(decodeURIComponent('/om-os/b%c3%a6redygtighed/socialt-ansvar'))
/om-os/bæredygtighed/socialt-ansvar

In PHP you normally do that with urldecode:

$decoded = urldecode('b%c3%a6redygtighed'); // will contain 'bæredygtighed'

But it would be better if you could make your data flow in a way that avoids the encoding and decoding steps before reaching your breadcrumbs.

Solution 2 :

If you have not yet figured out the fix –

just to add on top of whatever walter-tross has already mentioned in above answer –

For the given input – (/om-os/bæredygtighed/socialt-ansvar)
the encodeURI js-method output is as follows –

/om-os/b%C3%A6redygtighed/socialt-ansvar

and the the encodeURIComponent js-method output is as follows –
%2Fom-os%2Fb%C3%A6redygtighed%2Fsocialt-ansvar.

Given the above, it appears that you are fetching the bread-crumb input from the URL. And the behaviour is equivalent to encodeURI method, thus enabling you to split on the ‘/’ character.

The fix, as already noted, would be to perform url-decode using decodeURI or decodeURIComponent on the individual components prior to using it as content.

Problem :

I have a Norwegian URL path which looks like this /om-os/bæredygtighed/socialt-ansvar

In my breadcrumb menu, I expect to see something like this:

Om os > Bæredygtighed > Socialt-ansvar

However, the æ is appearing as %c3%a6. So my breadcrumb looks like this:

Om os > B%c3%a6redygtighed > Socialt-ansvar

I have <meta charset="utf-8"> in the head, so I’m unsure why these characters are still appearing?

Comments

Comment posted by Andy

Where is the data for your breadcrumb coming from, which technology is generating your HTML, and server- or client-side? Is it built by means of JavaScript based on the location?

By