As in existing statistical frameworks, OpenDial represents the dialogue state as a Bayesian network that is regularly updated with new observations and employed to calculate the utility of possible system actions. The key difference between OpenDial and traditional MDP/POMDP approaches is that the domain models (i.e. the transition, reward and observation models) are expressed via probabilistic rules instead of the usual representations for probability distributions and utility functions.
The rules are essentially high-level templates for probabilistic models. They provide an abstraction layer that allows the system designer to capture the domain in a concise and human-readable form, with a limited number of parameters. In other words, OpenDial can be seen as following a "structured POMDP" approach to dialogue. See Lison (2014) for a more detailed description of the formalism.
It depends on the planning horizon that is employed for your domain. In the most common case, the planning horizon is set to 1 (i.e. limited to the present time step). In this setting, the action selected by OpenDial will simply correspond to the one that maximises the total utility in the current dialogue state. It is, however, also possible to use planning horizons larger than 1, in which case online planning is performed to find the action that maximises the accumulated utilities from the present time up to the horizon limit (given a particular discount factor). Online planning is, however, a computational expensive operation due to the combinatorial explosion of the number of possible interaction paths to consider. It also necessitates the specification of a full transition model.
For most dialogue domains, we recommend therefore to keep the planning horizon to its default value, and ensure that the (handcrafted or learned) utilities reflect the long-term utilities of the action, and not just its immediate reward.
Why is there a distinction between "predictive" variables X^p and normal state variables X in OpenDial?
We sometimes want to use probability rules to provide a prior prediction on a future, currently unobserved variable (for instance, to predict the next user dialogue act following a particular system response). The suffix ^p allows OpenDial to distinguish such predictions from actually observed values. Once matched with actual observed values, the prediction acts as a prior for the observation. Technically, this is realised via an "equivalence node" inserted between the prediction and the observation, see Lison (2014), p. 78-79 for details.
This explicit distinction between prediction and observations is necessary in OpenDial since both the predictions and the observations may be uncertain (and hence expressed as distinct probability distributions). Spoken dialogue systems must indeed frequently integrate observations that represent "soft" evidence, such as the ASR/NLU hypotheses of the user dialogue act.
Probability rules represent conditional probability models P(Y|X), where Y and X are arbitrary subsets of the state variables. In other words, probability rules express conditional, probabilistic relations between state variables. Utility rules, on the other hand, express the utility U(A,X) of particular system actions A depending on the state variables X. The utility rules encode the relative "desirability" (from the system's point of view) of executing particular actions depending on the current state.
In practice, probability rules are typically used to define the models used in language understanding (to capture e.g. the relation between the user utterances and their corresponding dialogue acts), dialogue state update (to capture the relation between the dialogue acts and other state variables such as the underlying user intentions) and in the prediction of future observations. The utility rules, for their part, are most often used in action selection (to find the next system action to execute) and generation (to find the best realisation of a particular communicative action).
This can be done very easily: just add the starting system action in the initial dialogue state for the domain.
If your domain includes continuous distributions (such as unknown parameter values), OpenDial will rely on sampling algorithms to perform probabilistic inference. Sampling algorithms are approximate algorithms, so the probability value will always be somewhat imprecise -- especially when dealing with multivariate continuous distributions, which are more difficult to sample. You can easily modify the number of samples in the Options -> Settings GUI window, or in the domain settings.
The None value is employed as a filler value to ensure that all probability distributions sum up to 1.0. For instance, if a user utterance is expressed as an N-best list with 2 elements, one "move left" with probability 0.6 and one "mow the left" with probability 0.2, the distribution over possible user utterance will have a None value with probability 0.2, corresponding to an empty utterance.
For system actions, the None value represents the void action (i.e. do nothing).
By default, the dialogue state will only contain the most recent value of the user- or system-related variables such as u_u (user utterance), a_u (user dialogue act), a_m (system action) or u_m (system utterance). However, one can easily record longer dialogue histories by creating new variables that capture previous dialogue acts. For instance, we can create a new variable a_u-prev that contains the next-to-last dialogue act from the user, and specify the following rule in the domain model updating the user dialogue act:
The same operation can be of course done for other state variables.
Alternatively, if you want to keep track of the complete dialogue history (without limit on the number of elements), you can also define the history as a list, and insert a new element after each update:
For this last approach, you might need to reduce the number of hypotheses in the N-best lists in order to avoid a combinatorial explosion in the number of values in this history variable.
It is really easy to integrate OpenDial as part of another Java
application. You simply need to add OpenDial to your classpath (if you
don't need to directly access OpenDial's code, you can simply add the OpenDial JAR file and its dependencies to the classpath), instantiate a new DialogueSystem object, provide it with a dialogue domain (and possibly some additional modules), and start the system:
Once started, you can simply update the dialogue state using the methods addUserInput(...) and addContent(...) in DialogueSystem. You can also query the current dialogue state at any time using the getContent(...) methods. Check the Javadoc API for details on how to control and monitor the dialogue system.
As long as your programming language allows you to import Java
classes (this is the cases for e.g. Jython or Scala), you can even
integrate OpenDial in applications using other languages than Java.
Here is for instance how to start OpenDial using Jython:
OpenDial includes a functionality to connect two remote machines (on the same network) with one another. This can be especially useful to perform Wizard-of-Oz experiments. To allow two OpenDial systems to be connected with one another, follow the following procedure:
OpenDial allows you to insert user inputs in an incremental manner. This can be useful if you have a speech recogniser that is able to output partial recognition hypotheses while the user is speaking. This way, you can get the dialogue system to react quickly and start processing the user inputs as soon as some partial hypotheses are available.
In practice, the insertion of incremental content is achieved via the method addIncrementalContent(...) in the class DialogueSystem. The method takes two arguments: a partial N-Best list of user input, and a boolean flag followPrevious that indicates whether the new content is a continuation of some previous hypotheses, or whether it constitutes a new utterance. Here is an example of how you can use the method:
The method starts by inserting a partial N-best list ["this is" (0.7), "these" (0.3)], and then expands these initial hypotheses with the N-best list ["a screw" (0.6), "" (0.4)]. At the end, the full N-Best list for the utterance will be ["this is a screw (0.42), "this is" (0.28), "these a screw" (0.18), "these" (0.12)].
When operating in this incremental mode, the probabilistic rules and the external modules are triggered as usual, but the state variables can be modified at anytime to reflect the insertion of new incremental content. If you want to perform incremental updates of other variables than the user inputs, you can do so with the method addIncrementalContent(...).
All version of OpenDial > 0.95 require Java 8 in order to compile. Simply download and install the Java 8 JDK in order to resolve the issue.
You should first check that the correct input mixer is selected in the Options, that the volume bar is moving when you talk in the microphone, and that you are connected to the internet. You should also make sure that the speech data lasts longer than 2 seconds, as the Nuance API seems to have problems with speech data of shorter duration.
If the problem persists, look at the logs to see what may have gone wrong. If you get an unusual response status from the Nuance server, check the online documentation on the Nuance Mobile Developer website to get the meaning of that response status. Finally, you can directly listen to the last recorded speech input via the OpenDial GUI (go to the state monitor tab, right-click on the "s_u" node, and select "play sound"). If the sound is absent or distorted, this may indicate a problem with the sound capture on your machine.
User Guide >