Enums for a simple state machine

Just a few minutes for a simple exercise: show how Java enum can be used to implement a simple state machine. Also to have a quick run with my latest HTML editor for my blog...

Consider the problem of parsing an XHTML document to split it into three parts:

  1. a prolog, that is the portion from the beginning up to the <body> element included;
  2. a body, that is the portion between the <body> and </body> elements; 
  3. an epilog, that is the portion after the </body> element included.

For the sake of simplicity, I suppose that the XHTML document is already well-formatted, thus the <body> and </body> elements are in a single line and the analysis of the file can be done by means of simple string manipulation instead of using an XML API.

The above requisites can be described by means of this unit test:

public class HtmlDocumentTest
  {
    private static final String ORIGINAL_PROLOG = "<html>\n"
                                                + "<head><meta name=\"prolog\"/></head>\n"
                                                + "<body>\n";
    private static final String ORIGINAL_BODY   = "body\n";
    private static final String ORIGINAL_EPILOG = "</body>\n"
                                                + "</html>\n";
    @Test
    public void must_properly_create_from_text()
      {
        // given
        final String text = ORIGINAL_PROLOG + ORIGINAL_BODY + ORIGINAL_EPILOG;
        // when
        final HtmlDocument result = HtmlDocument.createFromText(text);
        // then
        assertThat(result.getProlog(), is(ORIGINAL_PROLOG));
        assertThat(result.getBody(),   is(ORIGINAL_BODY));
        assertThat(result.getEpilog(), is(ORIGINAL_EPILOG));
      }
  }

Seen as a state machine, we have three states: PROLOG, BODY and EPILOG, with the transition from the 1st to the 2nd triggered when <body> is seen, and from the 2nd to the 3rd when </body> is seen. The data part of the state can be modelled by three StringBuilders where the three portions of the text are accumulated.

The states can be implemented by a polymorphic enum with a single method, receiving the input and the state to process, and returning the next state.

enum State
  {
    PROLOG
      {
        @Override
        State process (final @Nonnull String line,
                       final @Nonnull StringBuilder prologBuilder,
                       final @Nonnull StringBuilder bodyBuilder,
                       final @Nonnull StringBuilder epilogBuilder)
          {
            prologBuilder.append(line).append("\n");
            return line.contains("<body") ? BODY : PROLOG;
          }
      },

    BODY
      {
        @Override
        State process (final @Nonnull String line,
                       final @Nonnull StringBuilder prologBuilder,
                       final @Nonnull StringBuilder bodyBuilder,
                       final @Nonnull StringBuilder epilogBuilder)
          {
            final boolean containsEndBody = line.contains("</body");
            (containsEndBody ? epilogBuilder : bodyBuilder).append(line).append("\n");
            return containsEndBody ? EPILOG : BODY;
          }
      },

    EPILOG
      {
        @Override
        State process (final @Nonnull String line,
                       final @Nonnull StringBuilder prologBuilder,
                       final @Nonnull StringBuilder bodyBuilder,
                       final @Nonnull StringBuilder epilogBuilder)
          {
            epilogBuilder.append(line).append("\n");
            return EPILOG;
          }
      };

    abstract State process (@Nonnull String line,
                            @Nonnull StringBuilder prologBuilder,
                            @Nonnull StringBuilder bodyBuilder,
                            @Nonnull StringBuilder epilogBuilder);
  }

The state machine itself, on these premises, can be implemented with this simple for loop:

@Nonnull
public static HtmlDocument createFromText (final @Nonnull String text)
  {
    final StringBuilder prologBuilder = new StringBuilder();
    final StringBuilder bodyBuilder = new StringBuilder();
    final StringBuilder epilogBuilder = new StringBuilder();

    State state = State.PROLOG;

    for (final String line : text.split("\\n"))
      {
        state = state.process(line, prologBuilder, bodyBuilder, epilogBuilder);
      }

    return new HtmlDocument(prologBuilder.toString(),
                            bodyBuilder.toString(),
                            epilogBuilder.toString());
  }

The whole working source code can be found in the classes HtmlDocument and HtmlDocumentTest at BitBucket.

Comments are managed by Disqus, which makes use of a few cookies. Please read their cookie policy for more details. If you agree to that policy, please click on the button below to accept it. If you don't, you can still enjoy this site without using Disqus cookies, but you won't be able to see and post comments. Thanks.