Categories
WordPress

JSP and the Content Type charset and the Page Encoding attribute

The charset and the pageEncoding specified on a JSP page are very different things, but sometime coders are get confused by them.

A JSP page can ahve this directive on the top:

<%@page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%>

Why one should double specify the charset?

The contentType charset is how the servlet container which runs the JSP (for example Tomcat) must send to the browser the text generated by the page. If not specified the charset is assume to be the ISO-8859-1, so only western characters can be used in that page.

UTF-8 is the current best practice, but it leads to some other issues usually ignored by coders, specially if they come from the PHP world. You can read more about those UTF-8 encoded form data and the problem with accented/cyrillic/chinese characters.

The pageEncoding directive is used to correctly read the JSP from the file system. Since even the JSP is a text but a file is a sequence of bytes on disk, it can be correctly read only knowing the charset to use.

The two encoding are independent and you can save JSP(s) on disk using UTF-8 and ask the container to communicate with the browser using another charset, like the ISO-8859-15.

More about charset

More about charset can be read (msu be read) here.

Leave a Reply