Warning: This document is for the development version of Rasa Core. The latest version is 0.8.6.

Custom Policies

The Policy is the core of your bot, and it really just has one important method:

def predict_action_probabilities(self, tracker, domain):
    # type: (DialogueStateTracker, Domain) -> List[float]

    return []

This uses the current state of the conversation (provided by the tracker) to choose the next action to take. The domain is there if you need it, but only some policy types make use of it. The returned array contains the probabilities for each action to be executed next. The action that is most likely will be executed.

Let’s look at a simple example for a custom policy:

from rasa_core.policies import Policy
from rasa_core.actions.action import ACTION_LISTEN_NAME
from rasa_core import utils
import numpy as np

class SimplePolicy(Policy):
    def predict_action_probabilities(self, tracker, domain):
        responses = {"greet": 3}

        if tracker.latest_action_name == ACTION_LISTEN_NAME:
            key = tracker.latest_message.intent["name"]
            action = responses[key] if key in responses else 2
            return utils.one_hot(action, domain.num_actions)
            return np.zeros(domain.num_actions)

How does this work? When the controller processes a message from a user, it will keep asking for the next most likely action using predict_action_probabilities. The bot then executes that action, until it receives an ActionListen instruction. This breaks the loop and makes the bot await further instructions.

In pseudocode, what the SimplePolicy above does is:

-> a new message has come in

if we were previously listening:
    return a canned response
    we must have just said something, so let's Listen again

Note that the policy itself is stateless, and all the state is carried by the tracker object.

Creating Policies from Stories

Writing rules like in the SimplePolicy above is not a great way to build a bot, it gets messy fast & is hard to debug. If you’ve found Rasa Core, it’s likely you’ve already tried this approach and were looking for something better. A good next step is to use our story framework to build a policy by giving it some example conversations. We won’t use machine learning yet, we will just create a policy which memorises these stories.

We can use the MemoizationPolicy to do this.

Here is the train method training the policies class:

    def train(self,
              training_trackers,  # type: List[DialogueStateTracker]
              **kwargs  # type: **Any
        # type: (...) -> None
        """Train the policies / policy ensemble using dialogue data from file.

            :param training_trackers: trackers to train on
            :param kwargs: additional arguments passed to the underlying ML
                           trainer (e.g. keras parameters)

        # deprecation tests
        if kwargs.get('featurizer') or kwargs.get('max_history'):
            raise Exception("Passing `featurizer` and `max_history` "
                            "to `agent.train(...)` is not supported anymore. "
                            "Pass appropriate featurizer "
                            "directly to the policy instead. More info "

        # TODO: DEPRECATED - remove in version 0.10
        if isinstance(training_trackers, string_types):
            # the user most likely passed in a file name to load training
            # data from
            logger.warning("Passing a file name to `agent.train(...)` is "
                           "deprecated. Rather load the data with "
                           "`data = agent.load_data(file_name)` and pass it "
                           "to `agent.train(data)`.")
            training_trackers = self.load_data(training_trackers)

        logger.debug("Agent trainer got kwargs: {}".format(kwargs))

        self.policy_ensemble.train(training_trackers, self.domain,

What the train() method does is the following:

  1. reads the stories from a file
  2. creates all possible dialogues from these stories
  3. creates the following variables:
    1. y - a 1D array representing all of the actions taken in the dialogues
    2. X - a 2D array where each row represents the state of the tracker when an action was taken
  4. calls the policy’s train() method to create a policy from these X, y state-action pairs (don’t mind the ensemble it is just a collection of policies - e.g. you can combine multiple policies and train them all at once using the ensemble)


In fact, the rows in X describe the state of the tracker when the previous max_history actions were taken. See Featurization for more details.

For the MemoizationPolicy, the train() method just memorises the actions taken in the story, so that when your bot encounters an identical situation it will make the decision you intended.

Generalising to new Dialogues

The stories data format gives you a compact way to describe a large number of possible dialogues without much effort. But humans are infinitely creative, and you could never hope to describe every possible dialogue programatically. Even if you could, it probably wouldn’t fit in memory :)

So how do we create a policy which behaves well even in scenarios you haven’t thought of? We will try to achieve this generalisation by creating a policy based on Machine Learning.

You can use whichever machine learning library you like to train your policy. One implementation that ships with Rasa is the KerasPolicy, which uses Keras as a machine learning library to train your dialogue model. These base classes have already implemented the logic of persisting and reloading models.

By default, each of these trains a linear model to fit the X, y data.

The model is defined here:

    def model_architecture(
            input_shape,  # type: Tuple[int, int]
            output_shape  # type: Tuple[int, Optional[int]]
        # type: (...) -> keras.models.Sequential
        """Build a keras model and return a compiled model."""

        from keras.models import Sequential
        from keras.layers import \
            Masking, LSTM, Dense, TimeDistributed, Activation

        # Build Model
        model = Sequential()

        # the shape of the y vector of the labels,
        # determines which output from rnn will be used
        # to calculate the loss
        if len(output_shape) == 1:
            # y is (num examples, num features) so
            # only the last output from the rnn is used to
            # calculate the loss
            model.add(Masking(mask_value=-1, input_shape=input_shape))
            model.add(LSTM(self.rnn_size, dropout=0.2))
            model.add(Dense(input_dim=self.rnn_size, units=output_shape[-1]))
        elif len(output_shape) == 2:
            # y is (num examples, max_dialogue_len, num features) so
            # all the outputs from the rnn are used to
            # calculate the loss, therefore a sequence is returned and
            # time distributed layer is used

            # the first value in input_shape is max dialogue_len,
            # it is set to None, to allow dynamic_rnn creation
            # during prediction
                              input_shape=(None, input_shape[1])))
            model.add(LSTM(self.rnn_size, return_sequences=True, dropout=0.2))
            raise ValueError("Cannot construct the model because"
                             "length of output_shape = {} "
                             "should be 1 or 2."




        return model

and the training is run here:

    def train(self,
              training_trackers,  # type: List[DialogueStateTracker]
              domain,  # type: Domain
              **kwargs  # type: **Any
        # type: (...) -> Dict[Text: Any]

        if kwargs.get('rnn_size') is not None:
            logger.debug("Parameter `rnn_size` is updated with {}"
            self.rnn_size = kwargs.get('rnn_size')

        training_data = self.featurize_for_training(training_trackers,

        shuffled_X, shuffled_y = training_data.shuffled_X_y()

        if self.model is None:
            self.model = self.model_architecture(shuffled_X.shape[1:],

        validation_split = kwargs.get("validation_split", 0.0)
        logger.info("Fitting model with {} total samples and a validation "
                    "split of {}".format(training_data.num_examples(),
        # filter out kwargs that cannot be passed to fit
        params = self._get_valid_params(self.model.fit, **kwargs)

        self.model.fit(shuffled_X, shuffled_y, **params)
        # the default parameter for epochs in keras fit is 1
        self.current_epoch = kwargs.get("epochs", 1)
        logger.info("Done fitting keras policy model")

You can implement the model of your choice by overriding these methods.