Announcement Announcement Module
Collapse
No announcement yet.
UTF-8 form input garbled Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • UTF-8 form input garbled

    Hello,

    Having a strange problem with Spring 1.1 for UTF-8 form input, which occurs whether I use JSTL or Freemarker 2.3 as a view.

    If use a JSTL view with the following settings:
    Code:
    test.class=org.springframework.web.servlet.view.JstlView
    test.contentType=text/html; charset=utf-8
    test.url=/WEB-INF/jsp/test.jsp
    Plus the following directive in my JSP page
    Code:
    <%@ page language="java" contentType="text/html; charset=utf-8" %>
    When I submit the form to my controller with various non-latin 1 characters, I get strange values set on my bean. The same occurs on a Freemarker view set to UTF-8, using the following settings:
    Code:
    test.class=org.springframework.web.servlet.view.freemarker.FreeMarkerView
    test.exposeSpringMacroHelpers=true
    test.requestContextAttribute=rc
    test.contentType=text/html; charset=utf-8
    test.url=test.ftl
    Configured as:
    Code:
    <bean 
    	id="freemarkerConfig" 
    	class="org.springframework.web.servlet.view.freemarker.FreeMarkerConfigurer">
    	<property name="templateLoaderPath"><value>/WEB-INF/freemarker/</value></property> 
    	<property name="freemarkerSettings">
    		<props>
    			<prop key="default_encoding">UTF-8</prop>
    		</props>
    	</property>
    </bean>
    For example, if I submit the Russian characters "дор", what actually gets assigned in the bean's setter method is:
    2004-09-21 18:24:07,477 DEBUG [org.springframework.beans.BeanWrapperImpl] - Invoked write method [public void test.TestBean.setName(java.lang.String)] with value [до�?]

    This seems to happen consistently with certain unicode characters only, while others are always entered correctly.

    Best regards,
    Assaf

  • #2
    Well, one part of the mystery has been solved: I forgot to set log4j to write the log in UTF-8:
    log4j.appender.logfile.Encoding=UTF-8

    So log4j was writing in ANSI (or CP1252), but Eclipse was displaying this log in UTF-8! A mess, the result of which was that Eclipse was displaying most of the data correctly as UTF-8, but the database and webpage were storing/displaying them as ANSI.

    So this is my actual problem: if I enter a string like "fatigué" on my web page, what I get in the log, database, and following web page is "fatigué". The last two characters, representing bytes C3 and A9 in ANSI, are precisely the UTF-8 representation of e-acute (é).

    So, somewhere between posting my data from the web page and storing the string found in request.getParameter("name"), the UTF-8 is being converted into ANSI. Any ideas how to fix this?

    BTW, the page is displaying UTF-8 correctly (it's picking up French accents from the resource bundle), it's just the form submit that's breaking.

    My charset configuration settings can be seen in the previous post.

    Best regards,
    Assaf

    Comment


    • #3
      try putting an accept-charset attribute on the form tag in your HTML..
      Code:
      <form action="" method="POST" accept-charset="UTF-8">
      ...
      </form>
      Regards,

      Comment


      • #4
        The accept-charset attribute didn't work.
        However, I finally found something that did: the trick was to capture the request prior to binding and setting the character encoding to the correct value.

        The only place I found for doing this in the workflow was by overriding the isFormSubmission method, setting the character encoding, and then the calling the super method.

        Code:
        protected boolean isFormSubmission&#40;HttpServletRequest request&#41; &#123;
        	try &#123;
        		request.setCharacterEncoding&#40;"utf-8"&#41;;
        	&#125; catch &#40;UnsupportedEncodingException uee&#41; &#123;
        		LOG.error&#40;uee&#41;;
        		throw new RuntimeException&#40;uee&#41;;
        	&#125;
        	LOG.debug&#40;"encoding&#58; " + request.getCharacterEncoding&#40;&#41;&#41;;
        	
        	return super.isFormSubmission&#40;request&#41;;
        &#125;
        However, it would seem simpler if this were a standard attribute of the SimpleFormController or one of its super-classes, which can be configured in the servlet context config file, and automatically gets set prior to form request binding.

        Best regards,
        Assaf

        Comment


        • #5
          It strikes me that it's possibly a fault of the servlet container - which one are you using? I'll see how a few others behave with this too and see if there's any consistency.

          Comment


          • #6
            The default request encoding according to the Servlet specification is ISO-8859-1. If the client doesn't send any charset information (and none of the major browsers do) then this is used, unless you explicitly set a different encoding, for example in a filter. Most browsers respond with the same charset your response was in, so what I'm doing at the moment is to send *only* UTF-8 in my responses, forcing the request into UTF-8 with a filter, and additionally using accept-charset. It seems to work so far.

            Hope this helps
            Carl-Eric

            Comment


            • #7
              Originally posted by davison
              It strikes me that it's possibly a fault of the servlet container - which one are you using?
              Resin 3.0.7 - I wonder if the others are smarter about this?

              Carl-Eric, thanks for your suggestions. The filter works and seems cleaner than messing with isFormSubmission. For anybody else running into the same problem, here's a sample filter, with set-up in web.xml and code.
              Code:
              <filter>
                <filter-name>
              	charsetFilter
                </filter-name>
                <filter-class>
              	com.blah.blah.blah.CharsetFilter
                </filter-class>
              	<init-param>
              	  <param-name>requestEncoding</param-name>
              	  <param-value>UTF-8</param-value>
              	</init-param>
              </filter>
              
              <filter-mapping>
              	<filter-name>charsetFilter</filter-name>
              	<url-pattern>/*</url-pattern>
              </filter-mapping>
              Code:
              public class CharsetFilter implements Filter &#123;
              	FilterConfig config;
              	String encoding = "UTF-8";
              	
              	/**
              	 * @see javax.servlet.Filter#destroy&#40;&#41;
              	 */
              	public void destroy&#40;&#41; &#123;
              	&#125;
              	
              	/**
              	 * Sets the character encoding on the request
              	 * @see javax.servlet.Filter#doFilter&#40;javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain&#41;
              	 */
              	public void doFilter&#40;ServletRequest request, ServletResponse response,
              			FilterChain chain&#41; throws IOException, ServletException &#123;
              		request.setCharacterEncoding&#40;encoding&#41;;
              		chain.doFilter&#40;request, response&#41;;    
              	&#125;
              	
              	/**
              	 * @see javax.servlet.Filter#init&#40;javax.servlet.FilterConfig&#41;
              	 */
              	public void init&#40;FilterConfig config&#41; throws ServletException &#123;
              		this.config = config;
              		this.encoding = config.getInitParameter&#40;"requestEncoding"&#41;;
              	&#125;
              &#125;
              Regarding accept-charset: I presume this is supposed to indicate to the Browser that it should automatically reject any input not matching a particular charset (?). I don't find the definition in the w3c recommendations very clear. Anyway, it's completely ignored by Firefox 0.8, tested by entering Russian characters on ISO-8859-1 which all got sent correctly. So for now, using "accept-charset" seems like needless typing...

              Comment


              • #8
                I solved UTF-8 problem in Tomcat 5, by adding the following lines to web.xml:
                Code:
                    <locale-encoding-mapping-list>
                        <locale-encoding-mapping>
                            <locale>en</locale>
                            <encoding>UTF-8</encoding>
                        </locale-encoding-mapping>
                        <locale-encoding-mapping>
                            <locale>no</locale>
                            <encoding>UTF-8</encoding>
                        </locale-encoding-mapping>
                        <locale-encoding-mapping>
                            <locale>ru</locale>
                            <encoding>UTF-8</encoding>
                        </locale-encoding-mapping>
                        <locale-encoding-mapping>
                            <locale>pl</locale>
                            <encoding>UTF-8</encoding>
                        </locale-encoding-mapping>
                    </locale-encoding-mapping-list>

                Comment


                • #9
                  Since this thread almost satisfied my needs:

                  You don't have to write your own filter to set the character encoding. Springframework (i'm using version 1.2.6) comes with the 'org.springframework.web.filter.CharacterEncodingF ilter' for this purpose.
                  (why the hell is there space rendered between the 'F' and the 'i' in 'CharacterEncodingFilter' ?!?)

                  Use it like this in your web.xml:

                  Code:
                  <filter>
                    <filter-name>charsetFilter</filter-name>
                    <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
                    <init-param>
                      <param-name>encoding</param-name>
                      <param-value>UTF-8</param-value>
                    </init-param>
                  </filter>
                  
                  <filter-mapping>
                    <filter-name>charsetFilter</filter-name>
                    <url-pattern>/*</url-pattern>
                  </filter-mapping>

                  Comment

                  Working...
                  X