In [10]:
# importing and defining relevant directories
import sys
import os
# pyseqlab root directory
pyseqlab_dir = os.path.abspath(os.path.join(os.getcwd(), os.pardir))
# print("pyseqlab cloned dir:", pyseqlab_dir)
# inserting the pyseqlab directory to python's system path 
# if pyseqlab is already installed this could be commented out
sys.path.insert(0, pyseqlab_dir)
# current directory (tutorials)
tutorials_dir = os.path.join(pyseqlab_dir, 'tutorials')
# print("tutorials_dir:", tutorials_dir)
dataset_dir = os.path.join(tutorials_dir, 'datasets', 'conll2000')
# to use for customizing the display/format of the cells
from IPython.core.display import HTML
with open(os.path.join(tutorials_dir, 'pseqlab_base.css')) as f:
    css = "".join(f.readlines())
HTML(css)
Out[10]:

1. Objectives and goals

In this tutorial, we will learn about:

  • parsing, preparing and representing sequences in a training dataset for model building
  • the different available CRF models in PySeqLab package
  • the process/workflow for buidling CRFs models
  • training CRFs model and evaluating model performance
  • reviving learned models and decoding new sequences

Reminder: To work with this tutorial interactively, we need first to clone the PySeqLab package to our disk locally and then navigate to [cloned_package_dir]/tutorials where [cloned_package_dir] is the path to the cloned package folder (see the tree path for display).

├── pyseqlab
    ├── tutorials
    │   ├── datasets
    │   │   ├── conll2000
    │   │   ├── segments

We suggest going through the earlier tutorials:

  1. sequence_and_input_structure tutorial
  2. templates_and_feature_extraction tutorial

before continuing through this notebook. We will use part of the CoNLL 2000 training dataset throughout this tutorial.

As a reminder, the CoNLL 2000 task states:

Given a set of sentences (our sequences) where each sentence is composed of words and their corresponding part-of-speech, the goal is to predict the chunk/shallow parse label of every word in the sentence.

We start by parsing/reading the sentences in the training dataset into sequences.

In [3]:
# print("dataset_dir:", dataset_dir)
from pyseqlab.utilities import DataFileParser
# initialize a data file parser
dparser = DataFileParser()
# provide the options to parser such as the header info, the separator between words and if the y label is already existing
# main means the header is found in the first line of the file
header = "main"
# y_ref is a boolean indicating if the label to predict is already found in the file
y_ref = True
# spearator between the words/observations
column_sep = " "
seqs = []
for seq in dparser.read_file(os.path.join(dataset_dir, 'train.txt'), header, y_ref=y_ref, column_sep = column_sep):
    seqs.append(seq)
    
# printing one sequence for display
print(seqs[0])
print("type(seq):", type(seqs[0]))
print("number of parsed sequences is: ", len(seqs))
Y sequence:
 ['B-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'B-SBAR', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'O', 'B-ADJP', 'B-PP', 'B-NP', 'B-NP', 'O', 'B-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-NP', 'I-NP', 'I-NP', 'O']
X sequence:
 {1: {'pos': 'NN', 'w': 'Confidence'}, 2: {'pos': 'IN', 'w': 'in'}, 3: {'pos': 'DT', 'w': 'the'}, 4: {'pos': 'NN', 'w': 'pound'}, 5: {'pos': 'VBZ', 'w': 'is'}, 6: {'pos': 'RB', 'w': 'widely'}, 7: {'pos': 'VBN', 'w': 'expected'}, 8: {'pos': 'TO', 'w': 'to'}, 9: {'pos': 'VB', 'w': 'take'}, 10: {'pos': 'DT', 'w': 'another'}, 11: {'pos': 'JJ', 'w': 'sharp'}, 12: {'pos': 'NN', 'w': 'dive'}, 13: {'pos': 'IN', 'w': 'if'}, 14: {'pos': 'NN', 'w': 'trade'}, 15: {'pos': 'NNS', 'w': 'figures'}, 16: {'pos': 'IN', 'w': 'for'}, 17: {'pos': 'NNP', 'w': 'September'}, 18: {'pos': ',', 'w': ','}, 19: {'pos': 'JJ', 'w': 'due'}, 20: {'pos': 'IN', 'w': 'for'}, 21: {'pos': 'NN', 'w': 'release'}, 22: {'pos': 'NN', 'w': 'tomorrow'}, 23: {'pos': ',', 'w': ','}, 24: {'pos': 'VB', 'w': 'fail'}, 25: {'pos': 'TO', 'w': 'to'}, 26: {'pos': 'VB', 'w': 'show'}, 27: {'pos': 'DT', 'w': 'a'}, 28: {'pos': 'JJ', 'w': 'substantial'}, 29: {'pos': 'NN', 'w': 'improvement'}, 30: {'pos': 'IN', 'w': 'from'}, 31: {'pos': 'NNP', 'w': 'July'}, 32: {'pos': 'CC', 'w': 'and'}, 33: {'pos': 'NNP', 'w': 'August'}, 34: {'pos': 'POS', 'w': "'s"}, 35: {'pos': 'JJ', 'w': 'near-record'}, 36: {'pos': 'NNS', 'w': 'deficits'}, 37: {'pos': '.', 'w': '.'}}
----------------------------------------
type(seq): <class 'pyseqlab.utilities.SequenceStruct'>
number of parsed sequences is:  8936

2. Supported models

Before going into model building and training workflow, it is important to discuss and highlight the CRFs models supported in PySeqLab. Currently, PySeqLab supports linear-chain (1) conditional random fields (CRFs) and (2) semi-Markov conditional random fields (semi-CRFs). Moreover, it supports first order models (i.e. modeling label patterns using at most two states/labels $\le$ 2) and higher order models (i.e. modeling label patterns using > 2 states). The table below provides an overview on the implemented CRFs classes.

Models CRFs semi-CRFs
First order $\le$ 2 Higher order > 2 First order $\le$ 2 Higher order > 2
FirstOrderCRF
HOCRF
HOCRFAD
HOSemiCRF
HOSemiCRFAD

The displayed models are all based on the CRFs formalism (undirected discriminative graphical models) where differences among them arise due to: (1) the model order they can support and/or (2) the algorithms used to estimate the probability of the sequences and consequently affecting the gradient and log-likelihood computation. The semi-CRFs are considered generalization to CRFs as they tackle sequence segmentation problems (i.e. using segments in which labels/tags extend across several consecutive observations of the input sequence). Hence, CRFs could be seen as a special case of semi-CRFs when the segment length is 1 (i.e. each label is assigned to one observation, see (Sarawagi et al., 2005) for further discussion). Below is another table comprising pointers to papers and literature in which the implemented CRFs and semi-CRFs models are based on:

Models Short description Reference
FirstOrderCRF Original formalism of first order CRF (Lafferty et al., 2001)
HOCRF Higher order formulation of CRF (Ye, et al., 2009)
HOSemiCRF Higher order formulation of semi-CRFs (Sarawagi et al., 2005, Cuong et al., 2014)
HOCRFAD Higher order formulation of CRFs with a better/optimized forward-backward algorithm (Ye, et al., 2009; Vieira et al., 2016)
HOSemiCRFAD Higher order formulation of semi-CRFs with a better/optimized forward-backward algorithm (Sarawagi et al., 2005, Cuong et al., 2014; Vieira et al., 2016)

Question: So, how do we choose what models to use?
Answer: It depends on the training data we have and the order of patterns we want to model.

Examples:

  • FirstOrderCRF could be used if our training data consists of sequences and we aim to model features that include at most one or two state/label transitions. Moreover, the FirstOrderCRF naturally supports the inclusion of __START__ state for building models supporting initial labels and transition label features at the starting position of the sequences. For more info about extracting features that include __START__ state, please see the templates_and_feature_extraction tutorial.

  • HOCRF and HOCRFAD could be used if our training data consists of sequences and we aim to model features with first order and/or higher-order (i.e. that includes label pattern transitions with more than two states). There are no restrictions on the label patterns when using those models. Additionally, both models are equivalent to FirstOrderCRF when we model features with first order label patterns without including the __START__ state. However, there is a subtle difference between HOCRF and HOCRFAD models, where HOCRFAD provides a more efficient implementation for the forward-backward algorithm. In addition, the HOCRFAD supports gradient-based training methods while HOCRF supports perceptron/search-based training methods (see the training section for more info). Hence, we suggest using HOCRFAD as a primary choice in similar contexts/scenario.

  • HOSemiCRF and HOSemiCRFAD could be used if our training data consists of segments or sequences and we aim to model features with first order and/or higher-order (i.e. that includes label pattern transitions with more than two states). There are no restrictions on the label patterns when using those models. However, there is a subtle difference between HOSemiCRF and HOSemiCRFAD models, where HOSemiCRFAD provides a more efficient implementation for the forward-backward algorithm. Both models support gradient-based and perceptron/search-based training methods. It is also worth to mention that these models are of particular use when we are dealing with segments even though they can incorporate sequences without any problems. Hence, we suggest using HOSemiCRFAD as a primary choice when we have segments and/or have no restriction on the order of label pattern transitions.

Now that we know the different implementaion of CRFs models, it is time to talk about another set of very related classes that are suffixed by ModelRepresentation (such as FirstOrderCRFModelRepresentation class). These classes primarily hold relevant data structures that are used by the CRFs models. Therefore, for every class representing a CRF model, we have a corresponding model representation class (i.e. FirstOrderCRF --- FirstOrderCRFModelRepresentation). Below is a table mapping between both set of classes:

CRFs model CRFs model representation Module name
FirstOrderCRF FirstOrderCRFModelRepresentation pyseqlab.fo_crf
HOCRF HOCRFModelRepresentation pyseqlab.ho_crf
HOSemiCRF HOSemiCRFModelRepresentation pyseqlab.hosemi_crf
HOCRFAD HOCRFADModelRepresentation pyseqlab.ho_crf_ad
HOSemiCRFAD HOSemiCRFADModelRepresentation pyseqlab.hosemi_crf_ad

These two sets of classes are tied together. Whenever we want to use a CRF model class, another corresponding model representation class should be used. Both sets of classes are very important and will be tested when we examine the model building workflow in the following section.

3. Model building workflow

A typical process for building CRF models starting from the file comprising the training data (i.e. sequences or segments) is described as follows:

  1. We are given a dataset (i.e. file) comprising sequences or segments (see sequence_and_input_structure tutorial for a refresher).
  2. We read the file and generate sequences/segments
  3. We decide on a data split strategy in which we obtain a set of sequences for training CRFs model (training set) and another set of sequences for testing/validating the model (testing/validation set)
  4. We process the training set (i.e. comprising sequences) and dump them on disk in a relevant format to be later used in the learning framework
  5. We build a model based on the processed training set
  6. We train the model weights (i.e. estimate the feature weights) using one of the available training methods
  7. We test our model by evaluating its decoding performance using the sequences in the test/validation set
  8. The trained model is dumped on disk and is always available to be revived and used for decoding new sequences

The listed points represent a common and recurrent workflow in sequence labeling and segmentation problems. For this reason, PySeqLab provides a GenericTrainingWorkflow class that takes care of the tasks 1-8, which we aim to further explore in this section.

But first, we describe the arguments passed to GenericTrainingWorkflow constructor:

  • aextractor_obj: initialized instance of GenericAttributeExtractor class/subclass (see attributes extraction in templates_and_feature_extraction tutorial)
  • fextractor_obj: initialized instance of FeatureExtractor class/subclass (see FOFeatureExtractor or HOFeatureExtractor classes in the templates_and_feature_extraction tutorial)
  • feature_filter_obj: None or an initialized instance of FeatureFilter class (see feature filter section in the templates_and_feature_extraction tutorial)
  • model_repr_class: the CRFs model representation class (see 2nd column in this table for model representation classes)
  • model_class: the CRFs model class (see 1st column in this table for model classes)
  • root_dir: string representing the directory/path where working directory will be created

3.1 Parsing sequences/segments

After initializing a GenericTrainingWorkflow instance, we will use seq_parsing_workflow(args) method that handles points (1 to 4) in the above process. This instance method has one argument and multiple keyword arguments.

Args:

  • split_options: dictionary representing options for performing data split strategy. The main key is 'method' which specifies the method for data splitting. It can assume one of the following values {'none', 'random', 'cross_validation', 'wsample'}. Depending on the chosen method, additional keys and values are specified. For example:
         To perform cross validation, we need to specify
                   - cross_validation for the `method`
                   - the number of folds for the `k_fold`
    
                   options = {'method':'cross_validation',
                              'k_fold':number
                             }
    
          To perform random splitting, we need to specify
                   - random for the `method`
                   - number of splits for the `num_splits`
                   - size of the training set in percentage for the `trainset_size`
    
                   options = {'method':'random',
                              'num_splits':number,
                              'trainset_size':percentage
                             }
    
          To perform nothing (i.e. use all data as training), we need to specify
                   - none for the `method`
    
                   options = {'method':'none'
                             }
    
          To perform weighted sampling (i.e. based on sequences' length), we need to specify
                   - wsample for the `method`
    
                   options = {'method':'wsample'
                             }
    
    

For the keyword arguments, we have two main ones depending on the input (file or sequences) we want to pass.

Keyword args:

  • seq_file: string representing the path to the file comprising the sequences/segments. For this option, we have to specify additionally the following keyword arguments:
    • data_parser_options: a dictionary defining the arguments of the read_file(args) method in the DataFileParser class (see sequence_and_input_structure tutorial for more info about this class).
    • num_seqs: the maximum number of sequences to read/parse from a file. By default it is equal to -1, which will read the whole file

    • Example:
      parser_options = {'header': header argument in read_file(args)
                        'y_ref': y_ref argument in read_file(args)
                        'column_sep: column_sep argument in read_file(args)
                        'seg_other_symbol': the seg_other_symbol argument in read_file(args)            
                        }
      
      workflow.seq_parsing_workflow(split_options, seq_file=path, 
                                    data_parser_options=parser_options,
                                    num_seqs=-1)                     
      

  • seqs: list of sequences that are instances of SequenceStruct class

3.2 Modeling setup

We start by initializing an instance of the GenericTrainingWorkflow class that uses the following setup:

  • Use GenericAttributeExtractor with attribute description dictionary equal to
    
                {'w':{'description': 'word observation track', 
                      'encoding':'categorical'},
                 'pos':{'description':'part of speech track',
                        'encoding':'categorical'}
                }
    
  • For both attribute tracks w and pos we generate the same feature template (template_XY) that guides/instructs feature extraction by:
    • Using a centered window of size 3 at each position in the sequence while extracting unigrams and bigrams and joining them with the current label/state.
  • For the label transition template (template_Y), we request to use the current label, previous and current label at each position in the sequence
  • Use first order feature extractor (FOFeatureExtractor)
  • Use first order CRFs model representation class (FirstOrderCRFModelRepresentation)
  • Use first order CRFs model class (FirstOrderCRF)

Then, we show the two approaches for using the sequence_parsing_workflow(args) method. The first approach uses the seqs keyword argument while the second approach uses the seq_file keyword argument.

An example of the return value data_split for the different options we specified:

'method':'none' -- all training data is used

data_split:
{0: {'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}}

'method':'cross_validation' -- 5 folds

data_split: 
{0: {'train': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], 
     'test': [1, 2, 3, 4, 5]}, 
 1: {'train': [1, 2, 3, 4, 5, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], 
     'test': [6, 7, 8, 9, 10]}, 
 2: {'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], 
     'test': [11, 12, 13, 14, 15]}, 
 3: {'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 21, 22, 23, 24, 25], 
     'test': [16, 17, 18, 19, 20]},
 4: {'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 
     'test': [21, 22, 23, 24, 25]}
}
'method':'random' -- 3 splits with train set size 80%
data_split: 'random'
{0: {'train': [22, 3, 6, 24, 9, 13, 1, 8, 25, 16, 15, 14, 20, 2, 4, 10, 7, 19, 17, 11], 
     'test': [18, 21, 12, 5, 23]}, 
 1: {'train': [23, 12, 11, 17, 19, 9, 8, 24, 20, 10, 22, 18, 21, 25, 1, 4, 3, 7, 14, 6], 
     'test': [16, 2, 13, 5, 15]}, 
 2: {'train': [16, 9, 21, 17, 5, 18, 25, 20, 23, 13, 12, 24, 3, 14, 8, 19, 10, 22, 11, 7], 
     'test': [1, 2, 4, 6, 15]}
}
'method':'wsample' -- with train set size 80%
data_split: 'wsample'
{0: {'train': [7, 11, 17, 16, 14, 5, 2, 3, 4, 22, 1, 15, 12, 13, 20, 25, 24], 
     'test': [19, 10, 18, 6, 23, 8, 21, 9]}
}

At this point, the sequences are processed, dumped on disk and were assigned a unique id. The parsed sequences will be organized in a folder/directory structure where a reference_corpus folder suffixed with date/time string is created under the root directory that we already specified in the GenericTrainingWorkflow constructor. Under this folder, the process will generate a folder for every sequence under global_features folder. Additionally, a log file named log.txt will be added. It will contain the logs generated during the parsing process and info about the total number of features created once we create a CRFs model.

working_dir
│   ├── reference_corpus_2017_5_17-8_56_49_631884
│   │   ├── data_split
│   │   ├── global_features
│   │   │   ├── log.txt
│   │   │   ├── seq_1
│   │   │   ├── seq_10
│   │   │   ├── seq_11
│   │   │   ├── seq_12
│   │   │   ├── seq_13
│   │   │   ├── seq_14
│   │   │   ├── seq_15
│   │   │   ├── seq_16
│   │   │   ├── seq_17
│   │   │   ├── seq_18
│   │   │   ├── seq_19
│   │   │   ├── seq_2
│   │   │   ├── seq_20
│   │   │   ├── seq_21
│   │   │   ├── seq_22
│   │   │   ├── seq_23
│   │   │   ├── seq_24
│   │   │   ├── seq_25
│   │   │   ├── seq_3
│   │   │   ├── seq_4
│   │   │   ├── seq_5
│   │   │   ├── seq_6
│   │   │   ├── seq_7
│   │   │   ├── seq_8
│   │   │   ├── seq_9
In [4]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (10,6)
from pyseqlab.workflow import GenericTrainingWorkflow
from pyseqlab.features_extraction import FOFeatureExtractor
from pyseqlab.attributes_extraction import GenericAttributeExtractor
from pyseqlab.utilities import TemplateGenerator
from pyseqlab.fo_crf import FirstOrderCRF, FirstOrderCRFModelRepresentation
from pyseqlab.utilities import ReaderWriter
# attribute description for both w and pos attribute tracks
attr_desc = {'w':{'description': 'word observation track', 
                  'encoding':'categorical'
                 },
             'pos':{'description':'part of speech track',
                    'encoding':'categorical'}
            }
# initialize the attribute extractor instance
generic_attr_extractor = GenericAttributeExtractor(attr_desc)
print("attr_desc {}".format(generic_attr_extractor.attr_desc))
print()
# use same template for both tracks w and pos
template_XY = {}
template_gen = TemplateGenerator()
for track_name in ('w', 'pos'):
    template_gen.generate_template_XY(track_name, ('1-gram:2-grams', range(-1,2)), '1-state', template_XY)
template_Y = template_gen.generate_template_Y('1-state:2-states')
print("template_XY: ")
print(template_XY)
print()
print("template_Y:")
print(template_Y)
print()
# initialize first order feature extractor instance
fo_fe = FOFeatureExtractor(template_XY, template_Y, attr_desc, start_state=False)
# no feature filter 
fe_filter = None
working_dir = tutorials_dir
workflow = GenericTrainingWorkflow(generic_attr_extractor, fo_fe, fe_filter, 
                                   FirstOrderCRFModelRepresentation, FirstOrderCRF,
                                   working_dir)
attr_desc {'pos': {'encoding': 'categorical', 'repr_func': <bound method GenericAttributeExtractor._represent_categorical_attr of <pyseqlab.attributes_extraction.GenericAttributeExtractor object at 0x7fbd6bd41fd0>>, 'description': 'part of speech track'}, 'w': {'encoding': 'categorical', 'repr_func': <bound method GenericAttributeExtractor._represent_categorical_attr of <pyseqlab.attributes_extraction.GenericAttributeExtractor object at 0x7fbd6bd41fd0>>, 'description': 'word observation track'}}

template_XY: 
{'pos': {(0, 1): ((0,),), (0,): ((0,),), (-1,): ((0,),), (1,): ((0,),), (-1, 0): ((0,),)}, 'w': {(0, 1): ((0,),), (0,): ((0,),), (-1,): ((0,),), (1,): ((0,),), (-1, 0): ((0,),)}}

template_Y:
{'Y': [(0,), (-1, 0)]}

In [5]:
print("Using seqs keyword argument option")
print()
data_split_options = [{'method':'none'}, #  no splitting -- use all data
                      {'method':'cross_validation', 'k_fold':5}, # cross_validation 5-fold,
                      {'method':'random', 'num_splits':3, 'trainset_size':80}, #  3 random splits with train set 80%
                      {'method':'wsample', 'trainset_size':80}# weighted sample by sequence length
                     ]
for split_option in data_split_options:
    data_split = workflow.seq_parsing_workflow(split_option, 
                                               seqs=seqs[:25], 
                                               full_parsing = True)
    print("data_split: ", split_option['method'])
    print(data_split)
    print()
    print("-"*50)
Using seqs keyword argument option

dumping globalfeatures -- processed seqs:  1
dumping globalfeatures -- processed seqs:  2
dumping globalfeatures -- processed seqs:  3
dumping globalfeatures -- processed seqs:  4
dumping globalfeatures -- processed seqs:  5
dumping globalfeatures -- processed seqs:  6
dumping globalfeatures -- processed seqs:  7
dumping globalfeatures -- processed seqs:  8
dumping globalfeatures -- processed seqs:  9
dumping globalfeatures -- processed seqs:  10
dumping globalfeatures -- processed seqs:  11
dumping globalfeatures -- processed seqs:  12
dumping globalfeatures -- processed seqs:  13
dumping globalfeatures -- processed seqs:  14
dumping globalfeatures -- processed seqs:  15
dumping globalfeatures -- processed seqs:  16
dumping globalfeatures -- processed seqs:  17
dumping globalfeatures -- processed seqs:  18
dumping globalfeatures -- processed seqs:  19
dumping globalfeatures -- processed seqs:  20
dumping globalfeatures -- processed seqs:  21
dumping globalfeatures -- processed seqs:  22
dumping globalfeatures -- processed seqs:  23
dumping globalfeatures -- processed seqs:  24
dumping globalfeatures -- processed seqs:  25
data_split:  none
{0: {'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}}

--------------------------------------------------
dumping globalfeatures -- processed seqs:  1
dumping globalfeatures -- processed seqs:  2
dumping globalfeatures -- processed seqs:  3
dumping globalfeatures -- processed seqs:  4
dumping globalfeatures -- processed seqs:  5
dumping globalfeatures -- processed seqs:  6
dumping globalfeatures -- processed seqs:  7
dumping globalfeatures -- processed seqs:  8
dumping globalfeatures -- processed seqs:  9
dumping globalfeatures -- processed seqs:  10
dumping globalfeatures -- processed seqs:  11
dumping globalfeatures -- processed seqs:  12
dumping globalfeatures -- processed seqs:  13
dumping globalfeatures -- processed seqs:  14
dumping globalfeatures -- processed seqs:  15
dumping globalfeatures -- processed seqs:  16
dumping globalfeatures -- processed seqs:  17
dumping globalfeatures -- processed seqs:  18
dumping globalfeatures -- processed seqs:  19
dumping globalfeatures -- processed seqs:  20
dumping globalfeatures -- processed seqs:  21
dumping globalfeatures -- processed seqs:  22
dumping globalfeatures -- processed seqs:  23
dumping globalfeatures -- processed seqs:  24
dumping globalfeatures -- processed seqs:  25
data_split:  cross_validation
{0: {'test': [1, 2, 3, 4, 5], 'train': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}, 1: {'test': [6, 7, 8, 9, 10], 'train': [1, 2, 3, 4, 5, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}, 2: {'test': [11, 12, 13, 14, 15], 'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}, 3: {'test': [16, 17, 18, 19, 20], 'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 21, 22, 23, 24, 25]}, 4: {'test': [21, 22, 23, 24, 25], 'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]}}

--------------------------------------------------
dumping globalfeatures -- processed seqs:  1
dumping globalfeatures -- processed seqs:  2
dumping globalfeatures -- processed seqs:  3
dumping globalfeatures -- processed seqs:  4
dumping globalfeatures -- processed seqs:  5
dumping globalfeatures -- processed seqs:  6
dumping globalfeatures -- processed seqs:  7
dumping globalfeatures -- processed seqs:  8
dumping globalfeatures -- processed seqs:  9
dumping globalfeatures -- processed seqs:  10
dumping globalfeatures -- processed seqs:  11
dumping globalfeatures -- processed seqs:  12
dumping globalfeatures -- processed seqs:  13
dumping globalfeatures -- processed seqs:  14
dumping globalfeatures -- processed seqs:  15
dumping globalfeatures -- processed seqs:  16
dumping globalfeatures -- processed seqs:  17
dumping globalfeatures -- processed seqs:  18
dumping globalfeatures -- processed seqs:  19
dumping globalfeatures -- processed seqs:  20
dumping globalfeatures -- processed seqs:  21
dumping globalfeatures -- processed seqs:  22
dumping globalfeatures -- processed seqs:  23
dumping globalfeatures -- processed seqs:  24
dumping globalfeatures -- processed seqs:  25
data_split:  random
{0: {'train': [1, 22, 10, 3, 12, 16, 25, 15, 8, 7, 9, 13, 21, 19, 24, 17, 6, 14, 5, 20], 'test': [18, 2, 11, 4, 23]}, 1: {'train': [19, 10, 3, 5, 6, 24, 18, 20, 14, 12, 23, 8, 25, 13, 11, 1, 4, 15, 17, 2], 'test': [16, 9, 21, 22, 7]}, 2: {'train': [23, 13, 21, 10, 24, 15, 9, 22, 11, 6, 7, 3, 17, 25, 12, 1, 16, 8, 4, 2], 'test': [18, 19, 20, 5, 14]}}

--------------------------------------------------
dumping globalfeatures -- processed seqs:  1
dumping globalfeatures -- processed seqs:  2
dumping globalfeatures -- processed seqs:  3
dumping globalfeatures -- processed seqs:  4
dumping globalfeatures -- processed seqs:  5
dumping globalfeatures -- processed seqs:  6
dumping globalfeatures -- processed seqs:  7
dumping globalfeatures -- processed seqs:  8
dumping globalfeatures -- processed seqs:  9
dumping globalfeatures -- processed seqs:  10
dumping globalfeatures -- processed seqs:  11
dumping globalfeatures -- processed seqs:  12
dumping globalfeatures -- processed seqs:  13
dumping globalfeatures -- processed seqs:  14
dumping globalfeatures -- processed seqs:  15
dumping globalfeatures -- processed seqs:  16
dumping globalfeatures -- processed seqs:  17
dumping globalfeatures -- processed seqs:  18
dumping globalfeatures -- processed seqs:  19
dumping globalfeatures -- processed seqs:  20
dumping globalfeatures -- processed seqs:  21
dumping globalfeatures -- processed seqs:  22
dumping globalfeatures -- processed seqs:  23
dumping globalfeatures -- processed seqs:  24
dumping globalfeatures -- processed seqs:  25
data_split:  wsample
{0: {'train': [4, 1, 23, 15, 8, 20, 21, 25, 17, 16, 2, 5, 7, 18, 11, 10, 9], 'test': [6, 22, 12, 13, 19, 3, 14, 24]}}

--------------------------------------------------

In [6]:
print("Using seq_file keyword argument option")
print()
# get the path to CoNLL 2000 training file
train_file = os.path.join(dataset_dir, 'train.txt')
# parser options to read train_file
parser_options = {'header': 'main', # main means the header is found in the first line of the file
                  'y_ref':True, # y_ref is a boolean indicating if the label to predict is already found in the file
                  'column_sep': " ",
                  'seg_other_symbol':None # spearator between the words/observations
                  }
# parse only 25 sequences for comparison to our previous approach
num_seqs = 25
# use all passed data as training data 
data_split_options = [{'method':'none'}, #  no splitting -- use all data
                      {'method':'cross_validation', 'k_fold':5}, # cross_validation 5-fold,
                      {'method':'random', 'num_splits':3, 'trainset_size':80}, #  3 random splits with train set 80%
                      {'method':'wsample', 'trainset_size':80}# weighted sample by sequence length
                     ]
for split_option in data_split_options:
    data_split = workflow.seq_parsing_workflow(split_option, 
                                               seq_file=train_file,
                                               data_parser_options=parser_options,
                                               num_seqs=num_seqs,
                                               full_parsing = True)
    print("data_split: ", split_option['method'])
    print(data_split)
    print()
    print("-"*50)
    print()
Using seq_file keyword argument option

1 sequences have been processed
2 sequences have been processed
3 sequences have been processed
4 sequences have been processed
5 sequences have been processed
6 sequences have been processed
7 sequences have been processed
8 sequences have been processed
9 sequences have been processed
10 sequences have been processed
11 sequences have been processed
12 sequences have been processed
13 sequences have been processed
14 sequences have been processed
15 sequences have been processed
16 sequences have been processed
17 sequences have been processed
18 sequences have been processed
19 sequences have been processed
20 sequences have been processed
21 sequences have been processed
22 sequences have been processed
23 sequences have been processed
24 sequences have been processed
25 sequences have been processed
dumping globalfeatures -- processed seqs:  1
dumping globalfeatures -- processed seqs:  2
dumping globalfeatures -- processed seqs:  3
dumping globalfeatures -- processed seqs:  4
dumping globalfeatures -- processed seqs:  5
dumping globalfeatures -- processed seqs:  6
dumping globalfeatures -- processed seqs:  7
dumping globalfeatures -- processed seqs:  8
dumping globalfeatures -- processed seqs:  9
dumping globalfeatures -- processed seqs:  10
dumping globalfeatures -- processed seqs:  11
dumping globalfeatures -- processed seqs:  12
dumping globalfeatures -- processed seqs:  13
dumping globalfeatures -- processed seqs:  14
dumping globalfeatures -- processed seqs:  15
dumping globalfeatures -- processed seqs:  16
dumping globalfeatures -- processed seqs:  17
dumping globalfeatures -- processed seqs:  18
dumping globalfeatures -- processed seqs:  19
dumping globalfeatures -- processed seqs:  20
dumping globalfeatures -- processed seqs:  21
dumping globalfeatures -- processed seqs:  22
dumping globalfeatures -- processed seqs:  23
dumping globalfeatures -- processed seqs:  24
dumping globalfeatures -- processed seqs:  25
data_split:  none
{0: {'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}}

--------------------------------------------------

1 sequences have been processed
2 sequences have been processed
3 sequences have been processed
4 sequences have been processed
5 sequences have been processed
6 sequences have been processed
7 sequences have been processed
8 sequences have been processed
9 sequences have been processed
10 sequences have been processed
11 sequences have been processed
12 sequences have been processed
13 sequences have been processed
14 sequences have been processed
15 sequences have been processed
16 sequences have been processed
17 sequences have been processed
18 sequences have been processed
19 sequences have been processed
20 sequences have been processed
21 sequences have been processed
22 sequences have been processed
23 sequences have been processed
24 sequences have been processed
25 sequences have been processed
dumping globalfeatures -- processed seqs:  1
dumping globalfeatures -- processed seqs:  2
dumping globalfeatures -- processed seqs:  3
dumping globalfeatures -- processed seqs:  4
dumping globalfeatures -- processed seqs:  5
dumping globalfeatures -- processed seqs:  6
dumping globalfeatures -- processed seqs:  7
dumping globalfeatures -- processed seqs:  8
dumping globalfeatures -- processed seqs:  9
dumping globalfeatures -- processed seqs:  10
dumping globalfeatures -- processed seqs:  11
dumping globalfeatures -- processed seqs:  12
dumping globalfeatures -- processed seqs:  13
dumping globalfeatures -- processed seqs:  14
dumping globalfeatures -- processed seqs:  15
dumping globalfeatures -- processed seqs:  16
dumping globalfeatures -- processed seqs:  17
dumping globalfeatures -- processed seqs:  18
dumping globalfeatures -- processed seqs:  19
dumping globalfeatures -- processed seqs:  20
dumping globalfeatures -- processed seqs:  21
dumping globalfeatures -- processed seqs:  22
dumping globalfeatures -- processed seqs:  23
dumping globalfeatures -- processed seqs:  24
dumping globalfeatures -- processed seqs:  25
data_split:  cross_validation
{0: {'test': [1, 2, 3, 4, 5], 'train': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}, 1: {'test': [6, 7, 8, 9, 10], 'train': [1, 2, 3, 4, 5, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}, 2: {'test': [11, 12, 13, 14, 15], 'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}, 3: {'test': [16, 17, 18, 19, 20], 'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 21, 22, 23, 24, 25]}, 4: {'test': [21, 22, 23, 24, 25], 'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]}}

--------------------------------------------------

1 sequences have been processed
2 sequences have been processed
3 sequences have been processed
4 sequences have been processed
5 sequences have been processed
6 sequences have been processed
7 sequences have been processed
8 sequences have been processed
9 sequences have been processed
10 sequences have been processed
11 sequences have been processed
12 sequences have been processed
13 sequences have been processed
14 sequences have been processed
15 sequences have been processed
16 sequences have been processed
17 sequences have been processed
18 sequences have been processed
19 sequences have been processed
20 sequences have been processed
21 sequences have been processed
22 sequences have been processed
23 sequences have been processed
24 sequences have been processed
25 sequences have been processed
dumping globalfeatures -- processed seqs:  1
dumping globalfeatures -- processed seqs:  2
dumping globalfeatures -- processed seqs:  3
dumping globalfeatures -- processed seqs:  4
dumping globalfeatures -- processed seqs:  5
dumping globalfeatures -- processed seqs:  6
dumping globalfeatures -- processed seqs:  7
dumping globalfeatures -- processed seqs:  8
dumping globalfeatures -- processed seqs:  9
dumping globalfeatures -- processed seqs:  10
dumping globalfeatures -- processed seqs:  11
dumping globalfeatures -- processed seqs:  12
dumping globalfeatures -- processed seqs:  13
dumping globalfeatures -- processed seqs:  14
dumping globalfeatures -- processed seqs:  15
dumping globalfeatures -- processed seqs:  16
dumping globalfeatures -- processed seqs:  17
dumping globalfeatures -- processed seqs:  18
dumping globalfeatures -- processed seqs:  19
dumping globalfeatures -- processed seqs:  20
dumping globalfeatures -- processed seqs:  21
dumping globalfeatures -- processed seqs:  22
dumping globalfeatures -- processed seqs:  23
dumping globalfeatures -- processed seqs:  24
dumping globalfeatures -- processed seqs:  25
data_split:  random
{0: {'train': [12, 2, 4, 17, 16, 7, 25, 11, 22, 6, 24, 19, 15, 9, 13, 10, 1, 21, 5, 23], 'test': [8, 18, 3, 20, 14]}, 1: {'train': [20, 10, 12, 4, 1, 8, 16, 19, 13, 2, 6, 5, 11, 24, 18, 15, 3, 9, 7, 21], 'test': [17, 23, 25, 14, 22]}, 2: {'train': [5, 15, 16, 8, 4, 19, 12, 24, 9, 21, 18, 22, 25, 20, 13, 10, 14, 6, 17, 2], 'test': [11, 1, 3, 23, 7]}}

--------------------------------------------------

1 sequences have been processed
2 sequences have been processed
3 sequences have been processed
4 sequences have been processed
5 sequences have been processed
6 sequences have been processed
7 sequences have been processed
8 sequences have been processed
9 sequences have been processed
10 sequences have been processed
11 sequences have been processed
12 sequences have been processed
13 sequences have been processed
14 sequences have been processed
15 sequences have been processed
16 sequences have been processed
17 sequences have been processed
18 sequences have been processed
19 sequences have been processed
20 sequences have been processed
21 sequences have been processed
22 sequences have been processed
23 sequences have been processed
24 sequences have been processed
25 sequences have been processed
dumping globalfeatures -- processed seqs:  1
dumping globalfeatures -- processed seqs:  2
dumping globalfeatures -- processed seqs:  3
dumping globalfeatures -- processed seqs:  4
dumping globalfeatures -- processed seqs:  5
dumping globalfeatures -- processed seqs:  6
dumping globalfeatures -- processed seqs:  7
dumping globalfeatures -- processed seqs:  8
dumping globalfeatures -- processed seqs:  9
dumping globalfeatures -- processed seqs:  10
dumping globalfeatures -- processed seqs:  11
dumping globalfeatures -- processed seqs:  12
dumping globalfeatures -- processed seqs:  13
dumping globalfeatures -- processed seqs:  14
dumping globalfeatures -- processed seqs:  15
dumping globalfeatures -- processed seqs:  16
dumping globalfeatures -- processed seqs:  17
dumping globalfeatures -- processed seqs:  18
dumping globalfeatures -- processed seqs:  19
dumping globalfeatures -- processed seqs:  20
dumping globalfeatures -- processed seqs:  21
dumping globalfeatures -- processed seqs:  22
dumping globalfeatures -- processed seqs:  23
dumping globalfeatures -- processed seqs:  24
dumping globalfeatures -- processed seqs:  25
data_split:  wsample
{0: {'train': [6, 23, 22, 12, 15, 20, 25, 21, 7, 2, 16, 17, 3, 10, 11, 5, 24], 'test': [4, 1, 8, 13, 18, 19, 14, 9]}}

--------------------------------------------------

3.3 Building CRFs model

After we parsed and generated a data_split (i.e. decided the sequences to be used for training/building a CRFs model), we can proceed to build our first CRFs model. Using build_crf_model(args) method, we can generate our CRFs model that will be an instance of FirstOrderCRF because it was the class we passed to the GenericTrainingWorkflow constructor. The build_crf_model(args) takes the following main arguments:

  • seqs_id: list of sequence ids referring to the sequences to be used for building the model. These ids are the generated ones after we parsed and prepared the sequences. They are found in the data_split variable.
  • folder_name: string representing the folder name which will be created under the reference_corpus_date/time directory. It will hold additional processed sequence info generated by the model building process.

The method will return a CRFs model (see code) that we will later train its parameters (i.e. the weights of the features in the model). Moreover, if we inspect the reference_corpus_date/time directory, we will notice a new folder having a suffix equal to the folder_name we specified. More importantly, if we check log.txt in global_features folder, we will see the generated log for the number of features created. Tree path/directory of reference corpus:

working_dir
│   ├── reference_corpus_2017_5_17-8_56_49_631884
│   │   ├── data_split
│   │   ├── global_features
│   │   │   ├── log.txt
│   │   │   ├── seq_1
│   │   │   ├── seq_10
│   │   │   ├── seq_11
│   │   │   ├── seq_12
│   │   │   ├── seq_13
│   │   │   ├── seq_14
│   │   │   ├── seq_15
│   │   │   ├── seq_16
│   │   │   ├── seq_17
│   │   │   ├── seq_18
│   │   │   ├── seq_19
│   │   │   ├── seq_2
│   │   │   ├── seq_20
│   │   │   ├── seq_21
│   │   │   ├── seq_22
│   │   │   ├── seq_23
│   │   │   ├── seq_24
│   │   │   ├── seq_25
│   │   │   ├── seq_3
│   │   │   ├── seq_4
│   │   │   ├── seq_5
│   │   │   ├── seq_6
│   │   │   ├── seq_7
│   │   │   ├── seq_8
│   │   │   ├── seq_9
│   │   ├── model_activefeatures_f_0 

Inspecting log.txt under global_features folder we get the following excerpt:

---Preparing/parsing sequences--- starting time: 2017-05-17 08:56:49.635522 
Number of sequences prepared/parsed: 25 
---Preparing/parsing sequences--- end time: 2017-05-17 08:56:49.691309 


---Generating Global Features F_j(X,Y)--- starting time: 2017-05-17 08:56:49.703424 
Number of instances/training data processed: 25
---Generating Global Features F_j(X,Y)--- end time: 2017-05-17 08:56:49.901177 


---Constructing model--- starting time: 2017-05-17 08:56:49.912083 
Number of instances/training data processed: 25
Number of features: 3318 
Number of labels: 13 
---Constructing model--- end time: 2017-05-17 08:57:05.827288 

To verify and see these features, we can inspect them by checking the output of this code snippet below.

In [7]:
# use all passed data as training data -- no splitting
data_split_options = {'method':'none'}
data_split = workflow.seq_parsing_workflow(data_split_options, 
                                           seqs=seqs[:25],
                                           full_parsing=True)
print()
print("data_split: 'none' option ")
print(data_split)
print()

# build and return a CRFs model
# folder name will be f_0 as fold 0
crf_m = workflow.build_crf_model(data_split[0]['train'], "f_0")
print()
print("type of built model:")
print(type(crf_m))
print()
dumping globalfeatures -- processed seqs:  1
dumping globalfeatures -- processed seqs:  2
dumping globalfeatures -- processed seqs:  3
dumping globalfeatures -- processed seqs:  4
dumping globalfeatures -- processed seqs:  5
dumping globalfeatures -- processed seqs:  6
dumping globalfeatures -- processed seqs:  7
dumping globalfeatures -- processed seqs:  8
dumping globalfeatures -- processed seqs:  9
dumping globalfeatures -- processed seqs:  10
dumping globalfeatures -- processed seqs:  11
dumping globalfeatures -- processed seqs:  12
dumping globalfeatures -- processed seqs:  13
dumping globalfeatures -- processed seqs:  14
dumping globalfeatures -- processed seqs:  15
dumping globalfeatures -- processed seqs:  16
dumping globalfeatures -- processed seqs:  17
dumping globalfeatures -- processed seqs:  18
dumping globalfeatures -- processed seqs:  19
dumping globalfeatures -- processed seqs:  20
dumping globalfeatures -- processed seqs:  21
dumping globalfeatures -- processed seqs:  22
dumping globalfeatures -- processed seqs:  23
dumping globalfeatures -- processed seqs:  24
dumping globalfeatures -- processed seqs:  25

data_split: 'none' option 
{0: {'train': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}}

constructing model -- processed seqs:  1
constructing model -- processed seqs:  2
constructing model -- processed seqs:  3
constructing model -- processed seqs:  4
constructing model -- processed seqs:  5
constructing model -- processed seqs:  6
constructing model -- processed seqs:  7
constructing model -- processed seqs:  8
constructing model -- processed seqs:  9
constructing model -- processed seqs:  10
constructing model -- processed seqs:  11
constructing model -- processed seqs:  12
constructing model -- processed seqs:  13
constructing model -- processed seqs:  14
constructing model -- processed seqs:  15
constructing model -- processed seqs:  16
constructing model -- processed seqs:  17
constructing model -- processed seqs:  18
constructing model -- processed seqs:  19
constructing model -- processed seqs:  20
constructing model -- processed seqs:  21
constructing model -- processed seqs:  22
constructing model -- processed seqs:  23
constructing model -- processed seqs:  24
constructing model -- processed seqs:  25
identifying model active features -- processed seqs:  1
identifying model active features -- processed seqs:  2
identifying model active features -- processed seqs:  3
identifying model active features -- processed seqs:  4
identifying model active features -- processed seqs:  5
identifying model active features -- processed seqs:  6
identifying model active features -- processed seqs:  7
identifying model active features -- processed seqs:  8
identifying model active features -- processed seqs:  9
identifying model active features -- processed seqs:  10
identifying model active features -- processed seqs:  11
identifying model active features -- processed seqs:  12
identifying model active features -- processed seqs:  13
identifying model active features -- processed seqs:  14
identifying model active features -- processed seqs:  15
identifying model active features -- processed seqs:  16
identifying model active features -- processed seqs:  17
identifying model active features -- processed seqs:  18
identifying model active features -- processed seqs:  19
identifying model active features -- processed seqs:  20
identifying model active features -- processed seqs:  21
identifying model active features -- processed seqs:  22
identifying model active features -- processed seqs:  23
identifying model active features -- processed seqs:  24
identifying model active features -- processed seqs:  25

type of built model:
<class 'pyseqlab.fo_crf.FirstOrderCRF'>

In [8]:
print("number of generated features:")
print(len(crf_m.model.modelfeatures_codebook))
print("features:")
print(crf_m.model.modelfeatures)
number of generated features:
3318
features:
{'B-ADVP|B-VP': Counter({'B-ADVP|B-VP': 1}), 'I-ADJP|O': Counter({'I-ADJP|O': 2}), 'B-ADVP|B-ADJP': Counter({'B-ADVP|B-ADJP': 1}), 'I-VP|B-ADVP': Counter({'I-VP|B-ADVP': 3}), 'I-SBAR': Counter({'w[-1]=even': 1, 'w[-1]|w[0]=even|if': 1, 'pos[1]=DT': 1, 'I-SBAR': 1, 'w[1]=the': 1, 'w[0]|w[1]=if|the': 1, 'pos[-1]=RB': 1, 'pos[-1]|pos[0]=RB|IN': 1, 'pos[0]=IN': 1, 'w[0]=if': 1, 'pos[0]|pos[1]=IN|DT': 1}), 'B-SBAR|B-ADJP': Counter({'B-SBAR|B-ADJP': 2}), 'B-ADJP|I-ADJP': Counter({'B-ADJP|I-ADJP': 3}), 'I-NP|B-VP': Counter({'I-NP|B-VP': 34}), 'B-NP|B-PP': Counter({'B-NP|B-PP': 9}), 'B-NP|B-SBAR': Counter({'B-NP|B-SBAR': 2}), 'B-NP|B-NP': Counter({'B-NP|B-NP': 3}), 'O|B-SBAR': Counter({'O|B-SBAR': 1}), 'I-NP|B-ADJP': Counter({'I-NP|B-ADJP': 1}), 'I-VP|I-VP': Counter({'I-VP|I-VP': 23}), 'I-NP|B-PP': Counter({'I-NP|B-PP': 38}), 'O|B-ADJP': Counter({'O|B-ADJP': 2}), 'B-NP|B-VP': Counter({'B-NP|B-VP': 24}), 'I-NP|O': Counter({'I-NP|O': 31}), 'B-NP': Counter({'B-NP': 173, 'pos[-1]=IN': 77, 'pos[0]=DT': 65, 'pos[1]=NN': 50, 'w[0]=the': 38, 'pos[-1]|pos[0]=IN|DT': 34, 'pos[0]|pos[1]=DT|NN': 28, 'pos[1]=JJ': 27, 'pos[0]=NNP': 24, 'pos[0]=NN': 23, 'pos[0]|pos[1]=DT|JJ': 21, 'pos[1]=NNP': 19, 'pos[-1]=,': 16, 'w[-1]=,': 16, 'w[0]=a': 15, 'pos[-1]=VB': 15, 'w[-1]=in': 14, 'pos[0]=NNS': 14, 'pos[-1]|pos[0]=IN|NNP': 13, 'pos[0]|pos[1]=NNP|NNP': 12, 'pos[-1]|pos[0]=IN|NN': 12, 'pos[0]=JJ': 12, 'w[-1]=of': 12, 'pos[1]=VBZ': 11, 'w[-1]=that': 11, 'pos[1]=NNS': 11, 'pos[1]=IN': 10, 'w[-1]=for': 9, 'pos[0]|pos[1]=JJ|NN': 8, 'pos[0]=PRP': 8, 'pos[-1]|pos[0]=VB|DT': 8, 'pos[-1]=VBD': 8, 'pos[1]=.': 7, 'w[1]=.': 7, 'pos[-1]=CC': 7, 'w[0]=sterling': 6, "w[0]='s": 6, 'pos[-1]|pos[0]=IN|NNS': 6, 'pos[-1]=VBZ': 6, 'pos[-1]=NNP': 6, 'w[1]=,': 6, 'pos[0]=POS': 6, 'pos[1]=,': 6, 'w[-1]=from': 6, 'w[-1]|w[0]=in|the': 5, 'w[-1]|w[0]=of|the': 5, 'w[0]=Mr.': 5, 'pos[-1]|pos[0]=NNP|POS': 5, 'pos[0]=PRP$': 5, 'w[0]=he': 5, 'pos[-1]=TO': 5, 'pos[0]=CD': 5, 'pos[0]|pos[1]=NN|NNS': 5, 'w[-1]=to': 5, 'pos[0]|pos[1]=NN|IN': 5, 'pos[-1]|pos[0]=IN|PRP$': 4, 'pos[-1]|pos[0]=,|DT': 4, 'w[0]=there': 4, 'pos[1]=VBP': 4, 'pos[0]=EX': 4, 'pos[1]=#': 4, 'w[-1]=by': 4, 'pos[-1]=VBP': 4, 'pos[-1]=NN': 4, 'w[1]=is': 4, 'pos[0]|pos[1]=NN|NN': 4, 'pos[0]|pos[1]=PRP|VBZ': 4, 'pos[-1]|pos[0]=VBZ|DT': 4, 'w[1]=#': 4, 'w[-1]=said': 4, 'pos[0]|pos[1]=NNS|IN': 4, 'pos[1]=MD': 4, 'pos[0]|pos[1]=DT|NNS': 4, 'w[-1]=at': 4, 'w[1]=Lawson': 3, 'pos[-1]=VBN': 3, 'w[-1]=on': 3, 'w[1]=second': 3, 'w[-1]|w[0]=that|a': 3, 'w[-1]|w[0]=from|the': 3, 'w[0]|w[1]=the|trade': 3, 'w[1]=rates': 3, 'pos[1]=RB': 3, 'w[-1]|w[0]=for|sterling': 3, 'pos[0]|pos[1]=EX|VBZ': 3, 'w[0]|w[1]=the|second': 3, 'w[0]=his': 3, 'w[1]=pound': 3, 'pos[1]=CD': 3, 'pos[-1]|pos[0]=,|NNP': 3, 'w[1]=current': 3, 'pos[0]|pos[1]=DT|#': 3, 'w[1]=Dillow': 3, 'w[-1]|w[0]=,|he': 3, 'w[-1]|w[0]=,|the': 3, 'w[1]=of': 3, 'pos[0]|pos[1]=NN|,': 3, 'pos[1]=VBD': 3, 'pos[-1]|pos[0]=VBD|NNP': 3, 'w[1]=and': 3, 'pos[-1]|pos[0]=VB|NN': 3, 'w[0]|w[1]=there|is': 3, 'pos[0]|pos[1]=NNS|.': 3, 'w[0]=The': 3, 'pos[0]|pos[1]=DT|NNP': 3, 'w[1]=trade': 3, 'pos[0]|pos[1]=CD|NN': 3, 'w[-1]=is': 3, 'pos[-1]|pos[0]=IN|CD': 3, 'pos[0]|pos[1]=POS|NN': 3, 'w[-1]=and': 3, 'pos[-1]|pos[0]=,|JJ': 3, 'pos[-1]|pos[0]=,|PRP': 3, 'pos[1]=CC': 3, 'w[-1]=show': 3, 'w[1]=%': 3, 'w[0]|w[1]=the|pound': 3, 'w[-1]|w[0]=that|the': 2, 'pos[0]|pos[1]=NNP|POS': 2, 'w[1]=sharp': 2, 'w[0]|w[1]=the|current': 2, 'pos[-1]|pos[0]=VB|JJ': 2, 'w[0]|w[1]=Mr.|Lawson': 2, 'pos[0]=RB': 2, 'pos[-1]=VBG': 2, 'w[0]=consumer': 2, 'w[-1]|w[0]=,|Mr.': 2, 'w[0]=U.K.': 2, 'pos[0]|pos[1]=DT|CD': 2, 'w[0]=imports': 2, 'pos[0]|pos[1]=DT|VBZ': 2, 'w[1]=Mansion': 2, 'w[1]=economist': 2, 'pos[0]|pos[1]=NNS|MD': 2, 'w[-1]=than': 2, 'w[0]|w[1]=Mr.|Dillow': 2, 'w[-1]=``': 2, "w[-1]|w[0]=August|'s": 2, 'w[-1]=But': 2, 'pos[-1]|pos[0]=``|DT': 2, 'pos[0]|pos[1]=NN|VBZ': 2, 'w[0]=interest': 2, 'w[-1]=Lawson': 2, 'w[-1]|w[0]=by|the': 2, 'w[0]=exports': 2, 'w[0]=September': 2, 'pos[-1]|pos[0]=VB|NNS': 2, 'w[1]=evidence': 2, 'w[-1]=take': 2, "w[-1]|w[0]=Lawson|'s": 2, 'w[0]|w[1]=a|substantial': 2, 'w[-1]=August': 2, 'w[0]|w[1]=the|#': 2, 'pos[0]|pos[1]=POS|JJ': 2, 'w[0]|w[1]=the|economy': 2, 'w[0]|w[1]=the|government': 2, 'w[0]|w[1]=the|data': 2, 'pos[-1]|pos[0]=TO|RB': 2, 'w[0]|w[1]=the|chancellor': 2, 'pos[-1]|pos[0]=CC|JJ': 2, 'w[-1]=if': 2, 'pos[0]|pos[1]=PRP|VBD': 2, 'w[-1]|w[0]=on|the': 2, 'w[1]=figures': 2, 'pos[-1]|pos[0]=TO|DT': 2, 'pos[0]|pos[1]=NNP|VBP': 2, 'pos[0]|pos[1]=NNP|.': 2, 'w[1]=Briscoe': 2, 'w[0]=another': 2, 'w[0]=their': 2, 'w[1]=bad': 2, 'w[0]=This': 2, 'w[1]=government': 2, 'w[-1]=but': 2, 'w[1]=firm': 2, 'w[0]=August': 2, 'w[1]=in': 2, 'w[0]=July': 2, 'pos[-1]|pos[0]=NN|NN': 2, 'w[1]=substantial': 2, 'w[1]=could': 2, 'w[0]=Midland': 2, 'pos[0]|pos[1]=PRP$|NNP': 2, 'pos[-1]=DT': 2, 'w[1]=chancellor': 2, 'pos[0]|pos[1]=JJ|NNP': 2, 'w[-1]|w[0]=to|a': 2, 'pos[0]|pos[1]=NNP|,': 2, 'pos[-1]=``': 2, 'pos[-1]|pos[0]=VBP|DT': 2, 'w[1]=has': 2, 'pos[-1]|pos[0]=IN|JJ': 2, 'w[0]=much': 2, 'pos[-1]|pos[0]=CC|NNS': 2, "w[1]='s": 2, 'w[-1]=increase': 2, 'w[1]=economy': 2, 'w[1]=U.K.': 2, 'w[1]=will': 2, 'w[1]=data': 2, 'w[0]|w[1]=his|Mansion': 2, 'w[0]|w[1]=interest|rates': 2, 'pos[1]=POS': 2, 'pos[-1]|pos[0]=DT|NN': 2, 'w[-1]|w[0]=show|a': 2, 'pos[0]|pos[1]=NNS|VBP': 2, 'pos[-1]|pos[0]=CC|DT': 2, 'w[0]|w[1]=some|rebound': 1, 'w[0]|w[1]=much|because': 1, 'w[1]=freefall': 1, 'pos[1]=NNPS': 1, 'w[-1]|w[0]=of|pressure': 1, 'w[0]=some': 1, 'w[1]=last': 1, 'w[-1]|w[0]=reduce|fears': 1, 'w[0]|w[1]=effect|.': 1, 'pos[0]|pos[1]=RB|JJ': 1, 'w[-1]=see': 1, 'pos[0]|pos[1]=CD|.': 1, 'w[0]=Chris': 1, 'w[1]=outlook': 1, 'w[1]=past': 1, 'w[0]=pressure': 1, 'w[-1]|w[0]=that|spending': 1, 'w[-1]|w[0]=into|the': 1, 'w[-1]|w[0]=Exchequer|Nigel': 1, 'pos[0]|pos[1]=PRP|JJ': 1, 'w[-1]|w[0]=,|overall': 1, 'w[0]|w[1]=the|U.K.': 1, 'w[1]=base': 1, 'w[-1]|w[0]=reckon|underlying': 1, 'pos[0]|pos[1]=JJ|NNS': 1, 'w[-1]=holding': 1, 'w[-1]=as': 1, 'w[-1]|w[0]=allow|the': 1, 'w[1]=are': 1, 'w[-1]|w[0]=,|senior': 1, "w[-1]|w[0]=chancellor|'s": 1, 'w[1]=material': 1, 'w[0]|w[1]=July|and': 1, 'w[-1]|w[0]=-RRB-|deficit': 1, 'w[0]|w[1]=another|sharp': 1, 'w[-1]|w[0]=but|suggestions': 1, 'w[-1]|w[0]=increase|interest': 1, 'w[0]=Nigel': 1, 'w[-1]=chancellor': 1, 'pos[-1]|pos[0]=NN|JJ': 1, 'w[0]|w[1]=sterling|does': 1, 'w[1]=restated': 1, 'w[-1]|w[0]=is|little': 1, 'w[0]|w[1]=the|necessary': 1, 'w[0]=last': 1, 'w[0]=spending': 1, 'w[0]|w[1]=the|currency': 1, 'w[-1]=agree': 1, 'w[0]|w[1]=July|are': 1, 'w[0]=industry': 1, 'pos[-1]|pos[0]=,|EX': 1, 'w[-1]=with': 1, 'w[1]=very': 1, 'w[-1]=If': 1, 'w[1]=rather': 1, "w[0]|w[1]=August|'s": 1, 'w[1]=to': 1, 'w[0]|w[1]=a|bad': 1, 'w[1]=also': 1, 'w[-1]|w[0]=release|tomorrow': 1, 'w[-1]|w[0]=that|sterling': 1, 'w[1]=new': 1, 'pos[1]=TO': 1, 'w[0]|w[1]=there|could': 1, 'w[0]=Simon': 1, 'w[0]=Thursday': 1, 'w[0]|w[1]=it|clear': 1, 'pos[0]|pos[1]=PRP$|JJ': 1, 'w[0]|w[1]=a|freefall': 1, 'w[0]|w[1]=Baring|Brothers': 1, 'w[-1]|w[0]=increased|the': 1, 'w[1]=5.4': 1, 'w[0]=16': 1, 'w[1]=failure': 1, 'w[-1]=forecasts': 1, 'w[0]|w[1]=little|holding': 1, 'w[0]|w[1]=This|compares': 1, 'pos[0]=RBR': 1, 'pos[-1]|pos[0]=VBD|DT': 1, 'w[-1]|w[0]=prevent|a': 1, 'w[-1]|w[0]=noted|Simon': 1, 'w[0]|w[1]=August|.': 1, 'pos[-1]|pos[0]=VBG|NNS': 1, 'w[-1]|w[0]=for|August': 1, 'pos[-1]|pos[0]=NNP|NNP': 1, 'w[0]=fears': 1, 'w[0]=senior': 1, 'w[1]=expenditure': 1, 'w[-1]|w[0]=increase|base': 1, 'pos[-1]|pos[0]=NNS|RBR': 1, 'w[0]|w[1]=place|and': 1, 'w[0]|w[1]=Simon|Briscoe': 1, 'w[-1]|w[0]=But|consumer': 1, 'w[1]=highest': 1, 'w[-1]|w[0]=at|the': 1, 'w[-1]=expect': 1, 'w[0]|w[1]=imports|,': 1, 'w[0]|w[1]=U.K.|base': 1, 'pos[0]|pos[1]=DT|RB': 1, 'w[0]=any': 1, 'w[0]|w[1]=sterling|,': 1, 'w[0]|w[1]=3.8|%': 1, 'w[0]|w[1]=he|noted': 1, 'w[1]=noted': 1, 'w[0]|w[1]=the|same': 1, "w[0]|w[1]='s|restated": 1, 'w[0]=Sanjay': 1, 'w[0]|w[1]=Forecasts|for': 1, 'w[0]|w[1]=Thursday|,': 1, 'w[-1]|w[0]=at|Nomura': 1, 'w[-1]|w[0]=said|he': 1, 'w[0]|w[1]=the|turnaround': 1, 'w[-1]|w[0]=because|investors': 1, 'w[-1]|w[0]=from|a': 1, 'pos[-1]|pos[0]=VBZ|JJ': 1, 'w[0]=services': 1, 'w[0]|w[1]=Nigel|Lawson': 1, 'w[0]=only': 1, 'w[0]|w[1]=the|risk': 1, 'w[-1]|w[0]=of|more': 1, 'pos[0]=VBG': 1, 'w[-1]=reckons': 1, 'w[-1]|w[0]=by|industry': 1, 'w[0]|w[1]=the|deficit': 1, 'w[0]|w[1]=economists|and': 1, 'w[-1]|w[0]=is|another': 1, 'w[0]=itself': 1, "w[0]|w[1]='s|failure": 1, 'w[-1]|w[0]=,|economists': 1, 'w[-1]=rose': 1, 'w[-1]|w[0]=in|exports': 1, 'w[0]|w[1]=he|believes': 1, 'w[-1]=reduce': 1, 'w[-1]|w[0]=and|a': 1, 'w[-1]|w[0]=at|their': 1, 'w[0]=release': 1, 'w[0]=as': 1, 'w[0]=October': 1, 'w[1]=policy': 1, 'w[0]=Confidence': 1, 'w[-1]|w[0]=to|as': 1, 'w[1]=risk': 1, 'w[-1]|w[0]=reckons|the': 1, 'w[-1]=Exchequer': 1, 'w[-1]=be': 1, 'w[0]=raw': 1, 'pos[0]|pos[1]=NN|.': 1, 'w[-1]=transforming': 1, 'w[-1]=In': 1, 'w[1]=further': 1, 'w[0]|w[1]=further|evidence': 1, 'w[1]=little': 1, 'w[-1]|w[0]=that|rates': 1, 'w[0]|w[1]=earlier|this': 1, 'w[-1]|w[0]=for|the': 1, 'pos[0]|pos[1]=JJ|IN': 1, 'w[-1]|w[0]=see|further': 1, 'w[0]|w[1]=1988|.': 1, 'w[1]=exchange': 1, 'w[-1]|w[0]=In|his': 1, 'w[-1]|w[0]=transforming|itself': 1, 'w[-1]|w[0]=in|eight': 1, 'pos[0]=JJR': 1, 'w[0]=little': 1, 'w[-1]|w[0]=said|there': 1, 'pos[-1]|pos[0]=,|NNS': 1, 'w[1]=risks': 1, 'pos[-1]|pos[0]=VBN|DT': 1, 'w[-1]=boost': 1, 'w[-1]|w[0]=in|raw': 1, 'w[-1]|w[0]=said|the': 1, 'pos[0]|pos[1]=POS|VBN': 1, 'w[0]=Analysts': 1, 'w[1]=over': 1, 'w[-1]|w[0]=for|Midland': 1, 'w[-1]=after': 1, 'w[0]|w[1]=as|little': 1, 'w[0]=exchange': 1, 'w[-1]|w[0]=than|Mr.': 1, 'w[0]|w[1]=suggestions|that': 1, 'w[1]=this': 1, 'w[0]|w[1]=a|#': 1, 'w[-1]|w[0]=on|services': 1, 'w[1]=for': 1, 'w[0]=No': 1, 'pos[0]|pos[1]=JJR|NN': 1, 'w[0]=rates': 1, 'w[-1]=-RRB-': 1, 'w[1]=remains': 1, 'w[0]|w[1]=fears|of': 1, 'w[0]|w[1]=a|unit': 1, 'w[0]|w[1]=raw|material': 1, 'w[0]|w[1]=he|remains': 1, 'pos[0]|pos[1]=NN|MD': 1, 'w[-1]=over': 1, 'w[0]|w[1]=the|past': 1, 'w[-1]|w[0]=to|only': 1, 'w[0]|w[1]=pressure|,': 1, 'w[0]=3.8': 1, 'w[1]=clear': 1, 'w[-1]|w[0]=of|Midland': 1, 'w[-1]=advance': 1, 'w[0]=effect': 1, 'w[1]=sign': 1, "w[0]|w[1]='s|manufacturing": 1, 'pos[0]|pos[1]=NNP|CC': 1, 'w[0]|w[1]=a|1.6': 1, 'w[-1]|w[0]=up|3.8': 1, 'pos[0]|pos[1]=PRP$|JJS': 1, 'w[0]=further': 1, 'w[1]=high': 1, 'pos[1]=JJS': 1, 'w[0]=1988': 1, 'w[-1]|w[0]=at|Baring': 1, 'w[-1]=At': 1, 'w[-1]|w[0]=to|16': 1, 'w[0]|w[1]=a|year': 1, 'w[0]|w[1]=eight|years': 1, 'w[1]=goods': 1, 'w[0]=He': 1, 'w[1]=import': 1, 'w[-1]|w[0]=in|imports': 1, 'w[1]=down': 1, 'pos[1]=DT': 1, 'w[0]|w[1]=exports|.': 1, 'w[1]=Joshi': 1, 'w[-1]|w[0]=forecasts|a': 1, 'w[0]|w[1]=Midland|Montagu': 1, 'w[0]=Friday': 1, 'w[0]=continued': 1, 'pos[-1]|pos[0]=VBZ|NN': 1, 'w[-1]=increased': 1, 'w[0]|w[1]=positions|.': 1, 'w[1]=1.6': 1, 'pos[1]=VBN': 1, 'w[0]|w[1]=October|1988': 1, 'pos[0]|pos[1]=NNS|,': 1, 'pos[0]=WP': 1, 'w[1]=unit': 1, 'w[-1]|w[0]=least|some': 1, 'w[-1]|w[0]=rates|earlier': 1, 'w[-1]|w[0]=for|September': 1, 'w[-1]|w[0]=``|The': 1, 'w[-1]|w[0]=take|place': 1, 'w[1]=agree': 1, 'w[-1]|w[0]=if|the': 1, 'w[1]=after': 1, 'w[0]|w[1]=a|further': 1, 'w[0]|w[1]=any|new': 1, 'w[-1]|w[0]=show|the': 1, 'w[1]=third': 1, 'w[0]|w[1]=Sanjay|Joshi': 1, 'w[1]=awful': 1, 'w[1]=economists': 1, 'pos[-1]|pos[0]=VBP|VBG': 1, 'w[0]|w[1]=sterling|has': 1, 'pos[0]|pos[1]=NN|CC': 1, 'w[1]=first': 1, 'w[1]=1988': 1, 'w[0]|w[1]=their|highest': 1, 'w[0]|w[1]=more|import': 1, 'w[-1]|w[0]=from|July': 1, 'w[-1]=prevent': 1, 'w[-1]|w[0]=in|September': 1, 'w[1]=reckons': 1, 'w[-1]|w[0]=from|their': 1, 'w[0]|w[1]=their|current': 1, 'pos[0]|pos[1]=NNS|CC': 1, 'w[1]=deficit': 1, 'w[0]|w[1]=monetary|policy': 1, 'w[1]=audience': 1, 'pos[0]|pos[1]=RB|#': 1, 'w[1]=Brothers': 1, 'w[1]=do': 1, 'w[-1]|w[0]=of|October': 1, 'w[0]=overall': 1, 'w[-1]|w[0]=in|July': 1, 'w[0]|w[1]=few|economists': 1, 'w[-1]|w[0]=and|the': 1, 'w[1]=because': 1, 'w[-1]|w[0]=speech|last': 1, 'w[1]=reduction': 1, 'w[-1]|w[0]=that|much': 1, 'w[-1]=made': 1, 'w[-1]=takes': 1, 'w[-1]|w[0]=of|1988': 1, 'w[-1]|w[0]=as|the': 1, 'w[0]=suggestions': 1, 'w[-1]|w[0]=given|continued': 1, 'w[1]=years': 1, 'w[0]=underlying': 1, 'w[-1]|w[0]=that|he': 1, 'w[0]=deficit': 1, 'w[1]=tomorrow': 1, 'w[0]=Britain': 1, 'pos[-1]|pos[0]=VBD|PRP': 1, 'w[1]=rate': 1, 'w[0]|w[1]=a|firm': 1, 'w[0]=more': 1, 'w[0]|w[1]=consumer|goods': 1, 'w[-1]=about': 1, 'w[-1]|w[0]=that|Britain': 1, 'w[1]=manufacturing': 1, 'pos[-1]|pos[0]=TO|CD': 1, 'w[0]|w[1]=trade|figures': 1, 'w[1]=Bank': 1, 'w[0]|w[1]=his|audience': 1, 'w[-1]|w[0]=but|few': 1, 'pos[-1]|pos[0]=CC|NN': 1, 'w[-1]|w[0]=,|a': 1, 'w[-1]|w[0]=agree|there': 1, 'w[0]|w[1]=only|#': 1, 'w[0]|w[1]=The|figures': 1, 'w[-1]|w[0]=reminded|his': 1, 'w[-1]=defend': 1, 'w[0]|w[1]=No|one': 1, 'w[0]=Forecasts': 1, 'w[0]=an': 1, 'w[1]=spending': 1, 'w[0]|w[1]=the|first': 1, 'w[0]|w[1]=itself|to': 1, 'w[0]|w[1]=underlying|support': 1, 'w[-1]=because': 1, 'w[-1]=adjusting': 1, 'w[0]|w[1]=industry|could': 1, 'w[0]=analysts': 1, 'w[-1]=rates': 1, 'w[0]=tomorrow': 1, 'pos[-1]|pos[0]=VBN|PRP': 1, 'w[0]|w[1]=he|reminded': 1, 'w[0]=no': 1, 'w[0]|w[1]=investors|will': 1, 'w[0]|w[1]=spending|rose': 1, 'w[0]=investors': 1, 'pos[0]|pos[1]=WP|RB': 1, 'w[0]|w[1]=who|also': 1, 'w[0]=Baring': 1, 'w[-1]|w[0]=,|who': 1, 'w[-1]|w[0]=of|monetary': 1, 'w[0]|w[1]=The|risks': 1, 'w[0]=European': 1, 'pos[-1]|pos[0]=VBD|CD': 1, 'w[-1]=released': 1, 'w[1]=currency': 1, 'w[-1]|w[0]=of|a': 1, 'w[-1]=reminded': 1, 'w[0]=few': 1, 'w[-1]|w[0]=released|Friday': 1, 'w[0]|w[1]=Confidence|in': 1, 'w[0]|w[1]=Mr.|Briscoe': 1, 'w[1]=reminded': 1, 'w[-1]|w[0]=without|a': 1, 'w[1]=flat': 1, 'w[1]=Thursday': 1, 'w[0]|w[1]=16|%': 1, 'pos[-1]|pos[0]=VBD|EX': 1, 'pos[0]|pos[1]=VBG|NN': 1, 'w[-1]|w[0]=expect|the': 1, 'w[0]|w[1]=the|third': 1, 'w[0]|w[1]=sterling|firm': 1, 'pos[-1]|pos[0]=JJS|DT': 1, 'pos[-1]=JJS': 1, 'w[0]=foreign': 1, 'w[0]|w[1]=sterling|over': 1, 'w[-1]|w[0]=by|exchange': 1, 'w[0]|w[1]=sterling|of': 1, 'w[-1]|w[0]=after|August': 1, 'pos[0]|pos[1]=NNS|RB': 1, 'w[0]|w[1]=the|spending': 1, 'w[0]|w[1]=the|down': 1, 'w[0]|w[1]=Chris|Dillow': 1, 'w[-1]|w[0]=If|there': 1, 'w[1]=holding': 1, 'w[-1]|w[0]=announce|any': 1, 'w[-1]|w[0]=boost|exports': 1, 'pos[-1]|pos[0]=VBD|PRP$': 1, 'pos[0]|pos[1]=NNP|CD': 1, 'w[0]|w[1]=much|of': 1, 'w[1]=rebound': 1, 'w[0]|w[1]=the|impact': 1, 'w[-1]=least': 1, 'w[-1]=noted': 1, 'w[-1]|w[0]=At|the': 1, 'w[-1]|w[0]=,|there': 1, 'pos[0]|pos[1]=NNP|NNPS': 1, 'w[0]|w[1]=analysts|reckon': 1, 'w[0]=0.1': 1, 'w[0]|w[1]=a|5.4': 1, 'w[0]|w[1]=deficit|in': 1, 'pos[-1]|pos[0]=)|NN': 1, 'w[0]=monetary': 1, 'w[0]|w[1]=exports|after': 1, 'w[-1]=up': 1, 'w[0]|w[1]=he|is': 1, 'w[-1]|w[0]=and|foreign': 1, 'w[-1]|w[0]=in|his': 1, 'w[0]|w[1]=the|last': 1, 'w[1]=compares': 1, 'w[1]=necessary': 1, 'pos[0]|pos[1]=VBD|JJ': 1, 'w[0]|w[1]=base|rates': 1, 'w[1]=promise': 1, 'pos[-1]=)': 1, 'pos[-1]|pos[0]=NN|POS': 1, 'w[1]=August': 1, 'w[-1]=given': 1, 'w[1]=support': 1, 'w[-1]=Britain': 1, "w[0]|w[1]=Britain|'s": 1, 'w[1]=Research': 1, 'w[-1]=announce': 1, 'w[0]=base': 1, 'w[1]=Exchequer': 1, 'w[1]=moment': 1, 'w[1]=does': 1, 'pos[1]=PRP': 1, 'w[0]|w[1]=He|reckons': 1, 'w[-1]|w[0]=holding|sterling': 1, 'w[-1]|w[0]=with|a': 1, 'pos[-1]=NNS': 1, 'w[0]|w[1]=no|sign': 1, 'w[-1]|w[0]=But|analysts': 1, 'w[0]|w[1]=overall|evidence': 1, 'pos[0]|pos[1]=PRP$|NN': 1, "w[-1]|w[0]=Britain|'s": 1, 'w[-1]|w[0]=be|an': 1, 'w[1]=turnaround': 1, 'w[-1]|w[0]=made|it': 1, 'w[0]|w[1]=tomorrow|,': 1, 'w[0]|w[1]=U.K.|economist': 1, 'w[-1]=reckon': 1, 'w[-1]|w[0]=in|interest': 1, 'w[0]=positions': 1, 'pos[-1]|pos[0]=VBN|VBD': 1, 'w[0]|w[1]=September|.': 1, 'w[-1]=into': 1, 'w[0]|w[1]=rates|will': 1, 'w[0]|w[1]=senior|U.K.': 1, 'pos[0]|pos[1]=DT|PRP': 1, 'w[1]=same': 1, 'w[-1]|w[0]=if|trade': 1, 'w[-1]|w[0]=advance|much': 1, 'w[-1]|w[0]=takes|effect': 1, 'w[-1]|w[0]=about|the': 1, 'w[0]|w[1]=September|,': 1, 'w[0]|w[1]=Midland|Bank': 1, 'w[0]=trade': 1, 'w[0]=who': 1, 'w[0]=eight': 1, 'pos[0]|pos[1]=EX|MD': 1, 'pos[0]=VBD': 1, 'w[0]|w[1]=0.1|%': 1, 'w[0]|w[1]=the|outlook': 1, 'w[0]|w[1]=the|Exchequer': 1, 'w[0]|w[1]=an|awful': 1, 'w[1]=reckon': 1, 'w[0]|w[1]=continued|high': 1, 'w[-1]|w[0]=said|Chris': 1, 'w[0]|w[1]=the|moment': 1, 'w[-1]|w[0]=defend|the': 1, 'w[0]|w[1]=a|flat': 1, 'w[1]=Montagu': 1, 'w[0]|w[1]=Friday|do': 1, 'w[0]|w[1]=This|has': 1, 'w[1]=believes': 1, 'w[0]|w[1]=The|August': 1, 'w[-1]|w[0]=for|release': 1, 'w[-1]|w[0]=is|no': 1, 'w[0]|w[1]=exchange|rate': 1, "w[0]|w[1]='s|unexpected": 1, 'w[0]|w[1]=a|reduction': 1, 'w[-1]|w[0]=rose|0.1': 1, 'w[0]|w[1]=foreign|exchange': 1, 'w[0]|w[1]=a|sharp': 1, 'w[-1]|w[0]=``|No': 1, 'w[0]|w[1]=Nomura|Research': 1, 'w[-1]|w[0]=,|European': 1, 'w[0]=economists': 1, 'w[0]|w[1]=another|bad': 1, 'w[0]=Nomura': 1, 'w[0]|w[1]=consumer|expenditure': 1, 'w[1]=impact': 1, 'w[0]|w[1]=Analysts|agree': 1, 'w[0]|w[1]=services|rather': 1, 'w[1]=one': 1, 'w[0]|w[1]=a|very': 1, 'w[-1]|w[0]=adjusting|positions': 1, 'pos[-1]|pos[0]=,|WP': 1, 'w[0]|w[1]=European|economist': 1, 'w[1]=year': 1, 'pos[0]|pos[1]=NN|VBD': 1, 'w[1]=rose': 1, 'w[-1]=allow': 1, 'w[0]|w[1]=imports|.': 1, 'pos[-1]|pos[0]=VBP|EX': 1, 'w[-1]=release': 1, 'w[0]=it': 1, 'w[-1]|w[0]=take|another': 1, 'w[-1]=without': 1, 'pos[-1]|pos[0]=IN|PRP': 1, 'pos[0]|pos[1]=RBR|DT': 1, 'pos[0]|pos[1]=PRP|TO': 1, 'w[0]=earlier': 1, 'w[-1]|w[0]=than|consumer': 1, 'pos[-1]|pos[0]=IN|JJR': 1, 'w[1]=near-record': 1, "w[0]|w[1]='s|promise": 1, 'w[0]|w[1]=release|tomorrow': 1, 'pos[-1]|pos[0]=VBG|PRP': 1, 'pos[0]|pos[1]=NNP|NN': 1, 'w[-1]|w[0]=in|sterling': 1, 'w[-1]|w[0]=over|the': 1, 'w[-1]|w[0]=for|imports': 1, 'w[-1]|w[0]=,|U.K.': 1, 'w[0]=place': 1, 'w[1]=unexpected': 1, 'w[1]=that': 1, "w[0]|w[1]='s|near-record": 1, 'pos[-1]|pos[0]=IN|EX': 1, 'pos[0]|pos[1]=CD|NNS': 1, 'w[-1]=speech': 1, 'w[0]|w[1]=last|Thursday': 1}), 'O|B-PP': Counter({'O|B-PP': 3}), 'I-NP|B-ADVP': Counter({'I-NP|B-ADVP': 2}), 'O|B-VP': Counter({'O|B-VP': 8}), 'I-NP': Counter({'I-NP': 203, 'pos[0]=NN': 97, 'pos[-1]=DT': 64, 'pos[1]=NN': 50, 'pos[1]=IN': 41, 'pos[-1]=JJ': 38, 'w[-1]=the': 38, 'pos[0]|pos[1]=NN|IN': 34, 'pos[-1]|pos[0]=JJ|NN': 32, 'pos[-1]=NN': 29, 'pos[-1]|pos[0]=DT|NN': 29, 'pos[0]=JJ': 28, 'pos[0]|pos[1]=JJ|NN': 25, 'pos[-1]=NNP': 25, 'pos[0]=NNP': 25, 'pos[0]=NNS': 22, 'pos[-1]|pos[0]=DT|JJ': 21, 'pos[-1]|pos[0]=NNP|NNP': 16, 'w[-1]=a': 15, 'pos[-1]=CD': 15, 'pos[-1]|pos[0]=NN|NNS': 14, 'pos[0]=CD': 14, 'w[1]=,': 13, 'pos[1]=,': 13, 'w[1]=.': 12, 'pos[1]=.': 12, 'pos[-1]|pos[0]=NN|NN': 12, 'pos[1]=CD': 11, 'pos[1]=NNS': 11, 'pos[0]|pos[1]=NN|NN': 10, 'w[1]=in': 10, 'pos[0]|pos[1]=NN|NNS': 9, 'pos[1]=VBZ': 8, 'pos[1]=VBP': 8, 'pos[0]|pos[1]=NN|VBZ': 8, 'w[1]=of': 8, 'pos[-1]|pos[0]=CD|NN': 8, 'pos[0]|pos[1]=NN|,': 7, 'pos[1]=NNP': 7, 'w[1]=from': 7, 'pos[0]|pos[1]=NNS|VBP': 7, 'w[-1]=trade': 6, 'w[1]=to': 6, 'pos[1]=TO': 6, 'w[0]=%': 6, 'pos[-1]|pos[0]=NNP|NN': 6, "w[-1]='s": 6, 'pos[1]=VBD': 6, 'pos[-1]=POS': 6, 'pos[0]|pos[1]=NNP|,': 6, 'pos[1]=MD': 6, 'pos[-1]|pos[0]=#|CD': 5, 'pos[-1]=PRP$': 5, 'w[0]=#': 5, 'w[-1]=Mr.': 5, 'w[1]=for': 5, 'pos[0]|pos[1]=NNP|NNP': 5, 'w[0]=figures': 5, 'pos[0]|pos[1]=#|CD': 5, 'pos[-1]=#': 5, 'pos[0]=#': 5, 'pos[0]|pos[1]=CD|CD': 5, 'w[-1]=#': 5, 'pos[-1]|pos[0]=CD|CD': 5, 'w[0]=billion': 5, 'pos[0]|pos[1]=CD|NN': 5, 'w[1]=billion': 5, 'w[0]=trade': 5, 'pos[0]|pos[1]=NNP|NN': 5, 'pos[1]=CC': 5, 'w[1]=quarter': 4, 'pos[0]|pos[1]=NN|TO': 4, 'w[0]=current': 4, 'w[-1]=U.K.': 4, 'pos[1]=JJ': 4, 'w[0]=rates': 4, 'pos[0]|pos[1]=NN|MD': 4, 'pos[0]|pos[1]=NN|CC': 4, 'w[0]=deficit': 4, 'w[1]=and': 4, 'w[-1]|w[0]=trade|figures': 4, 'pos[0]|pos[1]=NNS|IN': 4, 'pos[0]|pos[1]=NNS|.': 4, 'w[0]=quarter': 4, 'pos[-1]|pos[0]=DT|NNS': 4, 'w[-1]=current': 4, "w[1]='s": 4, 'pos[1]=POS': 4, 'w[0]=Lawson': 3, 'w[1]=that': 3, 'w[0]|w[1]=current|account': 3, 'w[1]=are': 3, 'w[0]=second': 3, 'pos[0]|pos[1]=NNP|.': 3, 'w[-1]|w[0]=the|pound': 3, 'pos[-1]|pos[0]=POS|NN': 3, 'w[0]=Dillow': 3, 'pos[0]=CC': 3, 'pos[0]|pos[1]=NN|.': 3, 'pos[0]|pos[1]=NN|JJ': 3, 'w[-1]=%': 3, 'pos[0]|pos[1]=NNP|VBD': 3, 'w[1]=account': 3, 'w[-1]=The': 3, 'w[1]=figures': 3, 'pos[-1]|pos[0]=DT|#': 3, 'w[1]=deficit': 3, 'w[-1]=consumer': 3, 'pos[-1]=CC': 3, 'w[-1]|w[0]=the|second': 3, 'w[0]|w[1]=trade|figures': 3, 'w[0]=policy': 3, 'pos[0]|pos[1]=NNP|POS': 3, 'w[0]=data': 3, 'w[0]=economy': 3, 'w[0]=pound': 3, 'pos[-1]|pos[0]=DT|NNP': 3, 'w[1]=is': 3, 'w[0]=account': 3, 'w[-1]=billion': 3, 'pos[-1]=RB': 3, 'w[-1]=his': 3, 'w[-1]|w[0]=the|trade': 3, 'w[0]=economist': 3, 'w[-1]|w[0]=current|account': 3, 'w[1]=%': 3, 'w[-1]|w[0]=#|1.3': 2, 'w[-1]|w[0]=Mansion|House': 2, 'pos[0]|pos[1]=CD|.': 2, 'w[0]|w[1]=improvement|from': 2, 'pos[-1]=VBN': 2, 'w[0]|w[1]=Mansion|House': 2, 'pos[1]=``': 2, 'w[1]=policy': 2, 'w[-1]=another': 2, 'w[-1]=account': 2, 'pos[1]=VBG': 2, 'w[-1]=Mansion': 2, 'w[0]=bad': 2, 'w[0]=U.K.': 2, 'w[0]|w[1]=rise|in': 2, 'w[0]=Mansion': 2, 'w[-1]|w[0]=the|government': 2, 'w[0]|w[1]=Dillow|said': 2, 'pos[-1]|pos[0]=POS|JJ': 2, 'pos[1]=RB': 2, 'w[-1]|w[0]=the|#': 2, 'w[-1]=last': 2, 'w[0]=gap': 2, 'pos[0]=VBN': 2, 'pos[0]|pos[1]=NN|``': 2, 'w[-1]=Midland': 2, 'w[-1]=bad': 2, 'w[-1]|w[0]=U.K.|economist': 2, 'w[-1]|w[0]=a|substantial': 2, 'w[-1]=as': 2, 'w[1]=House': 2, 'w[-1]=substantial': 2, 'w[-1]=second': 2, 'w[0]=chancellor': 2, 'w[0]=goods': 2, 'w[1]=speech': 2, 'w[-1]|w[0]=the|current': 2, 'pos[-1]|pos[0]=DT|CD': 2, 'w[0]=and': 2, 'w[1]=said': 2, 'w[0]|w[1]=%|from': 2, 'w[0]|w[1]=economist|at': 2, 'pos[-1]|pos[0]=PRP$|NNP': 2, 'w[0]=Briscoe': 2, 'pos[-1]|pos[0]=JJ|NNP': 2, 'w[-1]=exchange': 2, 'w[-1]|w[0]=interest|rates': 2, 'w[-1]|w[0]=his|Mansion': 2, 'w[1]=slowdown': 2, 'w[0]=1.3': 2, 'w[-1]=interest': 2, 'w[0]=rise': 2, 'pos[-1]|pos[0]=JJ|NNS': 2, 'w[-1]|w[0]=base|rates': 2, 'pos[1]=VBN': 2, 'w[0]|w[1]=Briscoe|,': 2, 'pos[0]|pos[1]=VBN|NN': 2, 'w[-1]|w[0]=second|quarter': 2, 'w[1]=trade': 2, 'w[0]|w[1]=1.3|billion': 2, 'w[1]=level': 2, 'w[0]=improvement': 2, 'w[0]=slowdown': 2, 'w[0]=level': 2, 'w[0]=August': 2, 'w[0]=sharp': 2, 'w[-1]=further': 2, 'pos[-1]|pos[0]=VBN|NN': 2, 'w[0]=evidence': 2, 'w[-1]|w[0]=1.3|billion': 2, 'pos[-1]|pos[0]=CC|NNP': 2, 'w[0]=speech': 2, 'w[-1]|w[0]=Mr.|Lawson': 2, 'w[-1]|w[0]=the|chancellor': 2, 'w[-1]=their': 2, 'w[1]=improvement': 2, 'w[1]=has': 2, 'w[-1]|w[0]=House|speech': 2, 'w[-1]|w[0]=Mr.|Dillow': 2, "w[0]|w[1]=Lawson|'s": 2, 'w[0]|w[1]=second|quarter': 2, 'w[0]|w[1]=bad|trade': 2, 'w[0]=House': 2, 'pos[0]|pos[1]=NNS|TO': 2, 'w[-1]|w[0]=the|economy': 2, 'w[-1]|w[0]=monetary|policy': 2, 'w[1]=``': 2, 'w[1]=show': 2, 'w[0]|w[1]=House|speech': 2, 'w[-1]=and': 2, 'w[0]|w[1]=quarter|and': 2, 'w[-1]=sharp': 2, 'w[1]=at': 2, 'w[1]=rise': 2, 'w[-1]=1.3': 2, 'w[0]|w[1]=#|1.3': 2, 'w[1]=will': 2, 'w[-1]=monetary': 2, 'w[-1]|w[0]=the|data': 2, 'w[-1]=base': 2, 'pos[0]|pos[1]=CC|NNP': 2, 'w[0]=substantial': 2, 'w[1]=gap': 2, 'pos[0]|pos[1]=NN|VBD': 2, 'w[0]=government': 2, 'w[-1]=House': 2, 'w[1]=1.3': 2, 'w[-1]|w[0]=bad|trade': 2, 'pos[0]|pos[1]=NN|VBG': 2, 'w[0]|w[1]=slowdown|does': 1, 'w[0]|w[1]=sharp|dive': 1, 'w[0]=firm': 1, 'w[0]|w[1]=promise|that': 1, 'pos[-1]|pos[0]=RB|VBN': 1, 'w[0]|w[1]=rigor|of': 1, 'w[1]=last': 1, 'w[1]=figure': 1, 'w[0]|w[1]=very|marked': 1, 'pos[-1]|pos[0]=PRP$|NN': 1, 'w[0]|w[1]=second|from': 1, 'w[1]=again': 1, 'pos[-1]|pos[0]=RB|#': 1, 'w[-1]|w[0]=July|and': 1, 'pos[-1]=NNPS': 1, 'w[-1]|w[0]=exchange|market': 1, 'pos[-1]=RBR': 1, 'pos[0]|pos[1]=JJ|NNS': 1, 'w[1]=measures': 1, 'w[-1]=15': 1, 'w[0]|w[1]=stockbuilding|by': 1, 'w[0]=unexpected': 1, 'w[-1]=Nomura': 1, 'w[0]=sign': 1, 'w[0]|w[1]=marked|improvement': 1, 'w[0]=side': 1, 'w[0]|w[1]=economist|for': 1, 'w[0]=measures': 1, 'w[0]|w[1]=economy|is': 1, 'w[0]|w[1]=past|week': 1, 'w[-1]=manufacturing': 1, 'w[0]=Joshi': 1, 'w[0]=years': 1, 'w[0]|w[1]=2.2|billion': 1, 'w[1]=2.3': 1, 'w[-1]|w[0]=a|very': 1, 'w[-1]|w[0]=senior|U.K.': 1, 'w[0]|w[1]=pound|,': 1, 'w[-1]|w[0]=a|firm': 1, 'w[1]=expect': 1, 'pos[-1]|pos[0]=NN|JJ': 1, 'w[1]=stockbuilding': 1, 'w[-1]|w[0]=1.8|billion': 1, 'w[1]=ago': 1, 'w[1]=week': 1, 'w[0]=market': 1, 'w[0]|w[1]=Bank|PLC': 1, 'w[0]=month': 1, 'w[-1]=16': 1, 'pos[-1]=VBG': 1, "w[-1]|w[0]='s|promise": 1, 'w[-1]|w[0]=their|highest': 1, 'w[0]=consumer': 1, 'w[1]=month': 1, 'w[0]=spending': 1, 'w[-1]|w[0]=account|deficit': 1, 'w[0]=industry': 1, 'w[0]=Bank': 1, 'w[0]|w[1]=data|show': 1, 'w[0]=figure': 1, 'pos[-1]|pos[0]=IN|#': 1, 'w[-1]|w[0]=awful|lot': 1, 'w[0]|w[1]=trade|figure': 1, 'w[0]|w[1]=Thursday|.': 1, 'w[0]|w[1]=deficit|of': 1, 'pos[0]|pos[1]=NNS|NNS': 1, 'w[1]=#': 1, 'w[0]|w[1]=2.3|billion': 1, "w[0]|w[1]=August|'s": 1, 'w[0]=restated': 1, 'w[-1]|w[0]=flat|position': 1, 'w[-1]|w[0]=16|%': 1, 'w[0]|w[1]=pound|is': 1, 'w[0]|w[1]=inflows|.': 1, 'w[0]|w[1]=firm|monetary': 1, 'w[0]|w[1]=%|in': 1, 'pos[-1]|pos[0]=RBR|DT': 1, 'w[1]=registered': 1, 'pos[-1]|pos[0]=RB|JJ': 1, 'w[-1]|w[0]=past|week': 1, 'pos[0]|pos[1]=NN|VBN': 1, 'w[-1]|w[0]=the|last': 1, 'w[-1]|w[0]=the|past': 1, 'w[-1]=more': 1, 'w[0]|w[1]=rises|.': 1, 'w[1]=without': 1, 'w[1]=went': 1, 'w[-1]=5.4': 1, 'w[0]|w[1]=deficits|.': 1, 'w[0]=economists': 1, 'w[-1]|w[0]=last|rise': 1, 'w[0]|w[1]=billion|current': 1, 'w[0]|w[1]=goods|inflows': 1, 'pos[0]|pos[1]=PRP|MD': 1, 'w[-1]|w[0]=#|1.8': 1, 'w[0]|w[1]=highest|level': 1, 'pos[-1]|pos[0]=NN|VBG': 1, 'w[-1]|w[0]=the|spending': 1, 'w[-1]|w[0]=a|unit': 1, 'w[-1]|w[0]=billion|deficit': 1, 'pos[0]=DT': 1, 'w[0]|w[1]=failure|to': 1, 'w[0]=5.4': 1, 'w[-1]=unexpected': 1, 'w[0]=Brothers': 1, 'w[1]=if': 1, 'w[0]|w[1]=capital|goods': 1, 'w[-1]|w[0]=Simon|Briscoe': 1, 'w[0]=down': 1, 'w[-1]=material': 1, 'w[0]|w[1]=policy|has': 1, 'w[1]=economist': 1, 'w[0]=audience': 1, 'w[1]=should': 1, 'w[1]=earlier': 1, 'w[0]=impact': 1, 'pos[0]|pos[1]=IN|#': 1, 'w[-1]|w[0]=rate|weakness': 1, 'pos[0]|pos[1]=NNS|RB': 1, 'w[1]=takes': 1, 'w[0]|w[1]=and|August': 1, 'w[0]|w[1]=slowdown|can': 1, 'w[0]|w[1]=figures|show': 1, 'w[0]|w[1]=billion|gap': 1, 'w[-1]=no': 1, 'w[0]=15': 1, 'w[0]|w[1]=gap|,': 1, 'w[0]|w[1]=high|consumer': 1, 'w[-1]=raw': 1, 'w[-1]=first': 1, 'w[-1]=some': 1, 'w[0]|w[1]=economists|expect': 1, 'w[0]|w[1]=material|stockbuilding': 1, 'w[-1]=same': 1, 'w[1]=weakness': 1, 'w[0]=1.6': 1, 'w[0]|w[1]=economy|``': 1, 'w[-1]|w[0]=overall|evidence': 1, 'w[0]|w[1]=first|quarter': 1, 'w[0]|w[1]=rates|earlier': 1, 'w[0]|w[1]=substantial|slowdown': 1, 'w[-1]|w[0]=5.4|%': 1, 'w[1]=analysts': 1, 'w[-1]|w[0]=billion|gap': 1, 'w[1]=say': 1, 'w[0]|w[1]=freefall|in': 1, 'w[-1]|w[0]=&|Co.': 1, 'pos[-1]|pos[0]=CD|NNS': 1, 'w[-1]|w[0]=2.2|billion': 1, 'w[1]=range': 1, 'w[0]|w[1]=trade|number': 1, 'w[-1]|w[0]=0.1|%': 1, 'pos[0]=VBG': 1, 'w[0]|w[1]=substantial|improvement': 1, 'w[0]=marked': 1, 'w[-1]|w[0]=highest|level': 1, 'w[-1]|w[0]=import|rises': 1, 'w[-1]=restated': 1, 'w[-1]|w[0]=market|analysts': 1, 'w[-1]|w[0]=the|moment': 1, 'w[0]|w[1]=figures|range': 1, 'w[-1]|w[0]=marked|improvement': 1, 'w[0]|w[1]=account|deficit': 1, 'pos[0]=NNPS': 1, 'w[-1]=Bank': 1, 'pos[0]|pos[1]=NNPS|CC': 1, 'w[0]=support': 1, 'w[0]|w[1]=figures|without': 1, 'w[1]=PLC': 1, 'w[-1]|w[0]=a|5.4': 1, 'w[0]=moment': 1, 'pos[0]=PRP': 1, 'w[1]=market': 1, 'w[-1]|w[0]=a|reduction': 1, 'w[-1]|w[0]=foreign|exchange': 1, 'w[-1]|w[0]=No|one': 1, 'w[-1]=capital': 1, 'w[-1]|w[0]=sharp|drop': 1, 'w[-1]|w[0]=manufacturing|industry': 1, 'w[-1]|w[0]=the|impact': 1, 'w[0]|w[1]=Joshi|,': 1, 'pos[-1]|pos[0]=NNP|NNPS': 1, 'w[-1]|w[0]=the|first': 1, 'w[-1]|w[0]=the|third': 1, 'w[-1]=Sanjay': 1, 'w[0]|w[1]=speech|,': 1, 'pos[-1]|pos[0]=JJ|IN': 1, 'w[-1]|w[0]=earlier|this': 1, 'w[0]|w[1]=August|deficit': 1, 'w[0]=time': 1, 'pos[0]|pos[1]=JJ|IN': 1, 'w[0]|w[1]=turnaround|before': 1, 'w[0]|w[1]=1988|.': 1, 'w[-1]|w[0]=The|risks': 1, 'w[-1]=high': 1, 'w[0]|w[1]=%|level': 1, 'w[0]|w[1]=sign|that': 1, 'w[0]|w[1]=%|rise': 1, 'w[0]=rate': 1, 'w[0]=drop': 1, 'w[-1]|w[0]=near-record|deficits': 1, 'w[0]|w[1]=U.K.|economy': 1, 'pos[0]|pos[1]=VBG|IN': 1, 'w[-1]|w[0]=a|bad': 1, 'w[0]|w[1]=account|reported': 1, 'pos[-1]|pos[0]=CD|JJ': 1, 'w[1]=inflows': 1, 'w[-1]=2.2': 1, 'w[0]|w[1]=rate|weakness': 1, 'w[0]=year': 1, 'w[-1]|w[0]=trade|figure': 1, 'w[0]|w[1]=pound|.': 1, 'w[1]=current': 1, 'w[1]=number': 1, 'w[-1]=Simon': 1, 'w[-1]|w[0]=his|audience': 1, 'pos[-1]=IN': 1, 'w[-1]|w[0]=the|same': 1, 'w[0]=Co.': 1, 'w[0]|w[1]=Montagu|,': 1, 'w[0]=freefall': 1, 'w[0]|w[1]=deficit|could': 1, 'w[0]=import': 1, 'w[-1]=awful': 1, 'w[1]=warned': 1, 'w[0]|w[1]=industry|is': 1, 'w[0]|w[1]=chancellor|has': 1, 'w[-1]=August': 1, 'w[1]=remains': 1, 'w[-1]|w[0]=European|economist': 1, 'w[0]|w[1]=month|takes': 1, 'w[-1]|w[0]=further|evidence': 1, 'w[0]=little': 1, 'w[0]|w[1]=down|side': 1, 'w[1]=as': 1, 'w[-1]=July': 1, 'w[0]|w[1]=increase|from': 1, 'w[0]=unit': 1, 'w[-1]|w[0]=consumer|expenditure': 1, 'pos[0]=JJS': 1, 'w[0]|w[1]=data|to': 1, 'w[-1]|w[0]=their|current': 1, 'w[0]=past': 1, 'w[-1]|w[0]=billion|current': 1, 'w[-1]|w[0]=the|down': 1, 'w[0]=PLC': 1, 'w[0]=commitment': 1, 'w[-1]|w[0]=third|quarter': 1, 'w[0]=2.2': 1, "w[0]|w[1]=position|''": 1, 'w[-1]=past': 1, 'pos[-1]|pos[0]=POS|VBN': 1, 'w[-1]=import': 1, 'w[0]|w[1]=rates|are': 1, 'w[-1]|w[0]=2.3|billion': 1, 'w[0]=currency': 1, 'w[0]|w[1]=commitment|to': 1, 'w[-1]=goods': 1, 'w[0]|w[1]=goods|should': 1, 'w[0]=further': 1, 'pos[-1]|pos[0]=NNS|NNS': 1, 'w[0]=1988': 1, 'w[-1]|w[0]=another|bad': 1, 'w[-1]|w[0]=unexpected|decline': 1, 'w[0]=Exchequer': 1, 'w[0]|w[1]=one|will': 1, 'w[0]=third': 1, 'w[0]=base': 1, 'w[1]=goods': 1, 'w[0]|w[1]=impact|of': 1, 'w[0]|w[1]=third|quarter': 1, 'w[-1]|w[0]=last|Thursday': 1, 'pos[1]=RBR': 1, 'w[0]=very': 1, 'w[-1]|w[0]=as|#': 1, 'w[1]=capital': 1, 'w[1]=data': 1, "pos[1]=''": 1, 'w[-1]=2.3': 1, 'w[0]=material': 1, 'w[-1]=No': 1, 'w[1]=commitment': 1, 'w[0]=2.3': 1, 'w[0]=1.8': 1, 'w[-1]|w[0]=a|1.6': 1, 'w[-1]|w[0]=the|turnaround': 1, 'w[0]|w[1]=gap|registered': 1, 'w[-1]=Research': 1, 'w[-1]|w[0]=3.8|%': 1, 'w[-1]|w[0]=and|capital': 1, 'w[0]|w[1]=speech|last': 1, 'w[0]=inflows': 1, 'w[-1]=overall': 1, 'w[0]|w[1]=policy|to': 1, 'w[0]|w[1]=government|being': 1, 'w[1]=marked': 1, 'pos[-1]|pos[0]=NNPS|CC': 1, 'w[-1]|w[0]=policy|measures': 1, 'pos[0]=RB': 1, 'w[1]=other': 1, 'w[-1]|w[0]=Brothers|&': 1, 'w[-1]|w[0]=same|time': 1, 'w[-1]|w[0]=another|sharp': 1, 'w[0]=increase': 1, 'w[0]=near-record': 1, 'w[0]|w[1]=flat|position': 1, 'w[-1]|w[0]=goods|inflows': 1, 'w[-1]|w[0]=very|marked': 1, 'pos[-1]|pos[0]=VBD|JJ': 1, 'w[1]=drop': 1, 'w[-1]|w[0]=trade|number': 1, 'w[1]=dive': 1, 'pos[0]|pos[1]=CC|NN': 1, 'w[1]=lot': 1, 'w[0]=&': 1, 'pos[-1]|pos[0]=VBG|NN': 1, 'w[-1]|w[0]=the|necessary': 1, 'w[-1]|w[0]=%|rise': 1, 'w[0]=Montagu': 1, 'w[0]|w[1]=rates|to': 1, 'w[-1]|w[0]=only|#': 1, 'pos[0]=IN': 1, 'w[0]|w[1]=monetary|policy': 1, 'w[0]|w[1]=account|gap': 1, 'w[0]=last': 1, 'w[1]=decline': 1, 'w[-1]=0.1': 1, 'w[0]|w[1]=weakness|.': 1, 'w[-1]|w[0]=Midland|Bank': 1, 'w[0]=weakness': 1, 'w[0]|w[1]=#|2.2': 1, 'w[0]|w[1]=and|capital': 1, 'w[-1]|w[0]=#|2.2': 1, 'w[1]=Co.': 1, 'w[-1]|w[0]=down|side': 1, 'pos[0]|pos[1]=NNS|VBD': 1, 'w[0]|w[1]=side|,': 1, 'pos[0]|pos[1]=NN|RB': 1, 'w[0]=exchange': 1, 'w[0]|w[1]=restated|commitment': 1, 'w[0]|w[1]=current|15': 1, 'w[0]=as': 1, 'w[0]|w[1]=figure|are': 1, 'pos[0]|pos[1]=NNS|RBR': 1, 'w[-1]|w[0]=a|flat': 1, 'w[-1]|w[0]=Nomura|Research': 1, 'w[-1]|w[0]=a|sharp': 1, 'w[0]=same': 1, 'pos[0]|pos[1]=RB|VBN': 1, 'w[-1]|w[0]=Bank|PLC': 1, 'w[1]=15': 1, 'w[0]|w[1]=policy|measures': 1, 'w[0]|w[1]=figures|are': 1, 'w[-1]|w[0]=an|awful': 1, 'w[0]|w[1]=risk|of': 1, 'w[1]=by': 1, 'w[0]|w[1]=as|#': 1, 'w[-1]|w[0]=this|month': 1, 'w[-1]|w[0]=account|gap': 1, 'w[-1]|w[0]=August|deficit': 1, 'w[-1]|w[0]=consumer|goods': 1, 'pos[-1]|pos[0]=JJ|CD': 1, 'w[1]=monetary': 1, 'w[-1]|w[0]=a|year': 1, 'w[0]=rises': 1, 'w[-1]=flat': 1, "pos[0]|pos[1]=NN|''": 1, 'pos[-1]|pos[0]=DT|RB': 1, 'w[0]=week': 1, 'w[-1]|w[0]=necessary|rigor': 1, 'w[0]|w[1]=necessary|rigor': 1, 'w[-1]=underlying': 1, 'w[-1]|w[0]=some|rebound': 1, 'w[-1]|w[0]=expenditure|data': 1, 'w[-1]=this': 1, 'w[0]|w[1]=quarter|of': 1, 'w[-1]|w[0]=Baring|Brothers': 1, 'w[-1]|w[0]=Research|Institute': 1, 'w[-1]|w[0]=material|stockbuilding': 1, 'w[0]|w[1]=billion|in': 1, 'w[0]=highest': 1, 'w[0]|w[1]=&|Co.': 1, 'w[0]|w[1]=deficit|and': 1, 'w[0]|w[1]=this|month': 1, 'pos[-1]|pos[0]=CC|NN': 1, 'w[1]=time': 1, 'w[1]=rises': 1, 'w[-1]=third': 1, 'w[-1]=an': 1, 'w[0]|w[1]=level|to': 1, 'w[0]|w[1]=audience|that': 1, 'w[-1]|w[0]=any|new': 1, 'w[-1]|w[0]=current|15': 1, 'w[0]|w[1]=Brothers|&': 1, 'w[0]|w[1]=#|1.8': 1, 'pos[-1]|pos[0]=PRP$|JJ': 1, 'pos[0]|pos[1]=JJS|NN': 1, 'w[0]|w[1]=import|rises': 1, 'w[0]|w[1]=Exchequer|Nigel': 1, 'w[-1]=new': 1, 'w[0]|w[1]=1.6|%': 1, 'w[-1]=necessary': 1, 'w[0]=number': 1, 'w[0]|w[1]=market|analysts': 1, 'w[0]=Institute': 1, 'w[-1]|w[0]=substantial|improvement': 1, 'w[0]|w[1]=measures|in': 1, 'w[0]=high': 1, 'w[0]=reduction': 1, 'w[0]|w[1]=billion|deficit': 1, 'w[0]|w[1]=billion|.': 1, 'w[0]=analysts': 1, 'w[0]|w[1]=consumer|and': 1, 'w[0]=failure': 1, 'w[0]|w[1]=rebound|in': 1, 'w[1]=rigor': 1, 'w[1]=reported': 1, 'w[0]=risk': 1, 'w[-1]=policy': 1, "w[-1]|w[0]='s|near-record": 1, 'w[0]|w[1]=moment|other': 1, 'w[-1]|w[0]=Chris|Dillow': 1, 'w[0]|w[1]=level|in': 1, 'w[1]=side': 1, 'w[-1]=little': 1, 'pos[0]|pos[1]=NN|POS': 1, 'w[-1]|w[0]=U.K.|base': 1, 'w[0]|w[1]=Institute|.': 1, 'w[-1]|w[0]=%|level': 1, 'w[0]|w[1]=Lawson|warned': 1, 'w[0]|w[1]=time|,': 1, 'w[-1]|w[0]=as|little': 1, 'w[0]=necessary': 1, 'w[-1]|w[0]=The|figures': 1, 'pos[-1]|pos[0]=PRP$|JJS': 1, 'w[-1]|w[0]=Mr.|Briscoe': 1, 'w[1]=released': 1, 'w[-1]|w[0]=the|deficit': 1, 'w[-1]|w[0]=sharp|dive': 1, 'w[-1]=European': 1, 'w[1]=deficits': 1, 'w[0]|w[1]=Research|Institute': 1, 'w[0]|w[1]=lot|of': 1, 'w[-1]|w[0]=October|1988': 1, 'w[0]|w[1]=risks|for': 1, 'pos[0]|pos[1]=CD|JJ': 1, "w[1]=''": 1, 'w[-1]=October': 1, 'w[1]=consumer': 1, 'w[1]=1.8': 1, 'pos[-1]=JJS': 1, 'w[-1]|w[0]=a|freefall': 1, 'w[-1]|w[0]=1.6|%': 1, 'w[-1]|w[0]=high|consumer': 1, 'w[-1]=highest': 1, 'w[-1]|w[0]=further|slowdown': 1, 'w[0]|w[1]=year|ago': 1, 'w[-1]|w[0]=a|#': 1, 'w[-1]|w[0]=Nigel|Lawson': 1, 'w[0]|w[1]=Dillow|,': 1, 'w[-1]|w[0]=the|currency': 1, 'w[-1]=Brothers': 1, 'pos[-1]|pos[0]=NNP|CD': 1, 'w[0]|w[1]=decline|,': 1, 'w[0]|w[1]=exchange|market': 1, 'w[-1]|w[0]=restated|commitment': 1, 'pos[0]|pos[1]=CD|IN': 1, 'w[0]=Research': 1, 'w[0]=manufacturing': 1, 'w[1]=Institute': 1, 'w[-1]=marked': 1, 'w[-1]=1.6': 1, "w[-1]|w[0]='s|restated": 1, 'w[-1]|w[0]=the|U.K.': 1, 'pos[-1]|pos[0]=JJS|NN': 1, 'w[0]=this': 1, 'w[0]|w[1]=economy|remains': 1, 'w[1]=before': 1, 'w[0]=monetary': 1, "w[-1]|w[0]='s|unexpected": 1, 'w[0]|w[1]=manufacturing|industry': 1, 'pos[0]|pos[1]=DT|NN': 1, 'w[0]=Thursday': 1, 'w[-1]=near-record': 1, 'w[0]=position': 1, 'w[0]|w[1]=new|policy': 1, 'pos[-1]=JJR': 1, 'w[-1]=foreign': 1, 'w[-1]|w[0]=firm|monetary': 1, 'w[-1]|w[0]=little|as': 1, 'w[-1]=Nigel': 1, 'w[-1]|w[0]=the|risk': 1, 'w[0]=rigor': 1, 'w[-1]|w[0]=15|%': 1, 'w[-1]|w[0]=continued|high': 1, 'w[-1]=Baring': 1, 'w[-1]|w[0]=Sanjay|Joshi': 1, 'w[-1]|w[0]=U.K.|economy': 1, 'w[0]|w[1]=unit|of': 1, 'w[0]|w[1]=base|rates': 1, 'w[0]|w[1]=near-record|deficits': 1, 'w[-1]|w[0]=Midland|Montagu': 1, 'w[-1]|w[0]=%|increase': 1, 'w[0]|w[1]=last|rise': 1, 'w[1]=August': 1, 'w[0]=capital': 1, 'w[-1]|w[0]=raw|material': 1, 'w[0]|w[1]=awful|lot': 1, 'w[1]=2.2': 1, 'w[-1]=down': 1, 'w[0]=decline': 1, 'w[0]|w[1]=1.8|billion': 1, 'w[0]|w[1]=unexpected|decline': 1, 'w[-1]|w[0]=no|sign': 1, "w[0]|w[1]=chancellor|'s": 1, 'w[1]=rates': 1, 'w[0]|w[1]=deficit|will': 1, 'w[-1]|w[0]=the|outlook': 1, 'pos[1]=#': 1, 'w[-1]|w[0]=the|Exchequer': 1, 'w[0]|w[1]=dive|if': 1, 'w[-1]=expenditure': 1, 'pos[-1]|pos[0]=NNP|CC': 1, 'w[-1]=rate': 1, 'w[-1]|w[0]=and|August': 1, 'w[-1]=eight': 1, 'w[-1]=only': 1, 'w[0]|w[1]=reduction|in': 1, 'pos[-1]=NNS': 1, 'w[1]=economy': 1, 'w[0]|w[1]=week|.': 1, 'w[0]=turnaround': 1, 'w[0]=first': 1, 'w[0]|w[1]=further|slowdown': 1, 'w[-1]=few': 1, 'w[0]|w[1]=rates|again': 1, 'w[0]|w[1]=U.K.|economist': 1, 'w[-1]|w[0]=first|quarter': 1, 'w[0]|w[1]=evidence|of': 1, 'w[-1]|w[0]=consumer|and': 1, 'w[-1]=market': 1, 'w[-1]=1.8': 1, 'w[0]|w[1]=government|``': 1, 'w[0]=awful': 1, 'w[0]=rebound': 1, 'w[-1]=continued': 1, 'w[-1]|w[0]=more|import': 1, 'w[0]|w[1]=15|%': 1, 'w[1]=industry': 1, 'w[0]|w[1]=outlook|for': 1, 'w[-1]=firm': 1, 'pos[0]|pos[1]=NN|VBP': 1, 'w[-1]|w[0]=a|further': 1, 'w[0]=expenditure': 1, 'w[0]=new': 1, 'pos[-1]|pos[0]=NN|CC': 1, 'w[-1]|w[0]=substantial|slowdown': 1, 'w[-1]=earlier': 1, 'w[1]=on': 1, 'w[0]|w[1]=number|,': 1, 'w[-1]|w[0]=underlying|support': 1, 'w[0]|w[1]=Co.|,': 1, 'w[1]=being': 1, 'w[0]=flat': 1, 'w[-1]|w[0]=The|August': 1, 'w[0]|w[1]=%|increase': 1, 'w[0]|w[1]=currency|wo': 1, 'w[0]|w[1]=expenditure|data': 1, 'w[-1]|w[0]=#|2.3': 1, 'w[1]=could': 1, 'w[0]|w[1]=years|.': 1, 'w[0]|w[1]=same|time': 1, 'pos[-1]|pos[0]=JJR|NN': 1, 'w[0]|w[1]=figures|for': 1, 'w[1]=increase': 1, 'w[-1]|w[0]=eight|years': 1, 'w[0]|w[1]=spending|went': 1, 'w[0]|w[1]=drop|in': 1, 'w[0]|w[1]=data|released': 1, 'w[-1]|w[0]=exchange|rate': 1, 'w[-1]=senior': 1, 'w[-1]|w[0]=few|economists': 1, 'w[1]=wo': 1, 'w[-1]=Chris': 1, 'w[0]|w[1]=analysts|say': 1, 'w[1]=&': 1, 'w[0]|w[1]=support|for': 1, 'w[-1]=very': 1, 'w[1]=position': 1, 'w[1]=can': 1, 'w[0]|w[1]=sharp|drop': 1, "w[-1]|w[0]='s|manufacturing": 1, "w[-1]|w[0]='s|failure": 1, 'w[0]|w[1]=little|as': 1, 'w[-1]|w[0]=new|policy': 1, 'w[0]=risks': 1, 'w[-1]=3.8': 1, 'w[0]|w[1]=5.4|%': 1, 'pos[0]|pos[1]=NNS|MD': 1, 'w[1]=does': 1, 'w[0]=stockbuilding': 1, 'w[-1]=&': 1, 'w[0]=lot': 1, 'w[0]=one': 1, 'w[0]=dive': 1, 'w[-1]|w[0]=capital|goods': 1, 'pos[-1]|pos[0]=DT|PRP': 1, 'w[1]=Nigel': 1, 'w[-1]=any': 1, 'w[0]=deficits': 1, 'pos[0]|pos[1]=JJ|CD': 1, 'w[0]=outlook': 1, 'w[0]|w[1]=PLC|.': 1, 'pos[-1]=VBD': 1, 'w[0]|w[1]=quarter|from': 1, 'w[0]=promise': 1, 'w[0]|w[1]=#|2.3': 1, 'w[0]|w[1]=evidence|on': 1}), 'I-ADJP': Counter({'I-ADJP': 3, 'pos[-1]=RB': 3, 'pos[-1]|pos[0]=RB|JJ': 2, 'pos[0]=JJ': 2, 'w[-1]=fairly': 2, 'w[-1]=quite': 1, 'w[-1]|w[0]=quite|strong': 1, 'w[0]=strong': 1, 'pos[1]=.': 1, 'pos[1]=IN': 1, 'w[-1]|w[0]=fairly|pessimistic': 1, 'w[0]|w[1]=pessimistic|about': 1, 'w[0]=pessimistic': 1, 'w[0]|w[1]=strong|,': 1, 'w[-1]|w[0]=fairly|clouded': 1, 'w[1]=.': 1, 'pos[0]|pos[1]=JJ|IN': 1, 'pos[0]|pos[1]=JJ|,': 1, 'pos[-1]|pos[0]=RB|VBN': 1, 'w[0]|w[1]=clouded|.': 1, 'w[0]=clouded': 1, 'w[1]=about': 1, 'pos[0]|pos[1]=VBN|.': 1, 'pos[1]=,': 1, 'w[1]=,': 1, 'pos[0]=VBN': 1}), 'I-ADVP|B-NP': Counter({'I-ADVP|B-NP': 1}), 'B-SBAR|I-SBAR': Counter({'B-SBAR|I-SBAR': 1}), 'I-NP|B-SBAR': Counter({'I-NP|B-SBAR': 4}), 'O|O': Counter({'O|O': 14}), 'I-ADVP|B-PP': Counter({'I-ADVP|B-PP': 1}), 'B-VP': Counter({'B-VP': 69, 'pos[-1]=NN': 24, 'pos[0]=VBZ': 22, 'pos[1]=VB': 15, 'pos[-1]=NNS': 14, 'pos[0]=VBD': 13, 'pos[0]=VBP': 12, 'pos[0]=MD': 11, 'pos[1]=RB': 10, 'pos[-1]|pos[0]=NN|VBZ': 10, 'pos[0]|pos[1]=MD|VB': 9, 'pos[-1]|pos[0]=NNS|VBP': 9, 'pos[1]=DT': 9, 'pos[1]=IN': 9, 'w[0]=is': 8, 'pos[-1]=PRP': 8, 'pos[1]=VBN': 7, 'pos[0]=TO': 6, 'w[0]=to': 6, 'pos[0]|pos[1]=VBZ|VBN': 5, 'pos[-1]=NNP': 5, 'pos[0]|pos[1]=VBZ|RB': 5, 'pos[0]|pos[1]=TO|VB': 5, 'pos[-1]|pos[0]=NN|MD': 5, 'w[-1]=he': 5, 'w[1]=that': 4, 'w[0]=has': 4, 'w[0]=will': 4, 'w[1]=the': 4, 'w[-1]=there': 4, 'w[0]=said': 4, 'pos[-1]|pos[0]=PRP|VBZ': 4, 'pos[-1]=EX': 4, 'w[0]=are': 4, 'w[1]=be': 4, 'pos[0]|pos[1]=VBD|IN': 4, 'pos[0]|pos[1]=VBZ|DT': 4, 'pos[-1]|pos[0]=EX|VBZ': 3, 'pos[-1]=,': 3, 'w[-1]|w[0]=there|is': 3, 'w[0]=could': 3, 'pos[1]=VBG': 3, 'pos[1]=NNP': 3, 'w[-1]=figures': 3, 'pos[0]|pos[1]=VBD|NNP': 3, 'w[-1]=data': 3, 'pos[-1]|pos[0]=NNS|MD': 3, 'pos[-1]|pos[0]=NN|TO': 3, 'pos[0]|pos[1]=VBZ|IN': 3, "w[1]=n't": 3, 'w[-1]=,': 3, 'pos[-1]|pos[0]=NN|VBD': 3, 'pos[0]|pos[1]=VBP|DT': 3, 'pos[0]|pos[1]=VBP|RB': 3, 'pos[-1]|pos[0]=NNP|VBD': 3, 'w[-1]=analysts': 2, 'w[1]=fairly': 2, 'w[0]=remains': 2, 'pos[-1]|pos[0]=DT|VBZ': 2, 'w[1]=narrow': 2, 'w[-1]=sterling': 2, 'w[1]=widely': 2, 'w[0]=does': 2, 'w[-1]=industry': 2, 'w[0]=show': 2, 'pos[0]|pos[1]=VBZ|VBG': 2, 'w[-1]=rates': 2, 'w[0]=can': 2, 'pos[-1]=DT': 2, 'w[-1]=spending': 2, 'w[-1]=``': 2, 'pos[-1]=``': 2, 'pos[-1]|pos[0]=NNP|VBP': 2, 'pos[1]=JJ': 2, 'w[0]|w[1]=will|want': 2, "w[-1]=''": 2, 'w[-1]=deficit': 2, 'pos[0]=VBG': 2, 'pos[1]=NN': 2, 'pos[0]|pos[1]=MD|RB': 2, 'pos[-1]|pos[0]=PRP|VBD': 2, 'w[0]=noted': 2, 'w[-1]=Dillow': 2, 'w[-1]=slowdown': 2, 'w[-1]|w[0]=Dillow|said': 2, "pos[-1]|pos[0]=''|VBD": 2, "pos[-1]=''": 2, 'w[1]=there': 2, 'w[-1]=economy': 2, 'w[1]=want': 2, 'w[-1]=policy': 2, 'w[-1]=This': 2, 'pos[1]=EX': 2, 'pos[-1]=JJ': 2, 'w[0]|w[1]=remains|fairly': 2, 'w[-1]|w[0]=before|adjusting': 1, 'w[0]|w[1]=takes|effect': 1, 'w[1]=not': 1, 'w[0]|w[1]=does|take': 1, 'w[-1]=before': 1, 'w[-1]|w[0]=slowdown|does': 1, 'w[0]|w[1]=to|both': 1, 'w[0]=reminded': 1, 'w[0]|w[1]=could|narrow': 1, 'w[-1]|w[0]=July|are': 1, 'pos[0]|pos[1]=VBD|PRP$': 1, 'w[-1]|w[0]=rates|will': 1, 'w[0]|w[1]=warned|that': 1, 'w[1]=both': 1, 'w[-1]|w[0]=he|believes': 1, 'pos[-1]|pos[0]=NN|VBG': 1, 'w[-1]|w[0]=figures|are': 1, 'w[0]|w[1]=adjusting|positions': 1, 'w[-1]|w[0]=,|said': 1, 'pos[0]|pos[1]=VBN|IN': 1, 'w[0]=wo': 1, 'w[0]=registered': 1, 'w[-1]|w[0]=rates|are': 1, 'w[-1]|w[0]=sterling|does': 1, 'w[0]|w[1]=being|forced': 1, 'pos[0]|pos[1]=NN|NN': 1, 'w[0]|w[1]=reckon|underlying': 1, 'w[0]|w[1]=forecasts|a': 1, 'w[0]|w[1]=are|very': 1, 'w[0]|w[1]=to|be': 1, 'w[0]=warns': 1, 'pos[0]|pos[1]=TO|DT': 1, 'pos[0]|pos[1]=VB|TO': 1, 'w[1]=announce': 1, 'pos[-1]|pos[0]=EX|MD': 1, 'w[-1]|w[0]=itself|to': 1, 'w[-1]=account': 1, 'pos[-1]|pos[0]=,|VBD': 1, 'pos[0]|pos[1]=VBP|JJ': 1, 'w[-1]|w[0]=slowdown|can': 1, 'pos[-1]|pos[0]=NN|VBN': 1, 'w[-1]=chancellor': 1, 'w[0]|w[1]=compares|with': 1, 'w[-1]=economists': 1, 'w[0]=adjusting': 1, 'w[0]|w[1]=was|up': 1, 'w[-1]|w[0]=level|to': 1, 'w[1]=underlying': 1, 'w[-1]=failure': 1, 'w[-1]|w[0]=economy|is': 1, 'w[0]=say': 1, 'w[1]=reduce': 1, 'w[0]|w[1]=to|defend': 1, 'w[-1]|w[0]=data|show': 1, 'w[0]|w[1]=believes|that': 1, 'w[-1]|w[0]=data|to': 1, 'w[0]|w[1]=fail|to': 1, 'pos[0]|pos[1]=VBD|,': 1, 'w[-1]|w[0]=Analysts|agree': 1, 'w[1]=very': 1, 'w[0]|w[1]=said|Chris': 1, 'w[0]|w[1]=has|been': 1, 'w[0]|w[1]=is|little': 1, "w[0]|w[1]=does|n't": 1, 'w[1]=to': 1, 'pos[-1]|pos[0]=NNS|TO': 1, 'w[-1]=one': 1, 'pos[0]|pos[1]=VBP|IN': 1, 'pos[1]=TO': 1, 'w[1]=helped': 1, 'w[1]=lead': 1, 'w[0]=went': 1, 'w[0]|w[1]=is|widely': 1, 'w[-1]|w[0]=``|is': 1, 'w[0]|w[1]=range|widely': 1, 'w[-1]|w[0]=Friday|do': 1, 'w[0]|w[1]=is|prepared': 1, 'w[0]|w[1]=reported|for': 1, 'w[0]|w[1]=are|at': 1, 'w[-1]|w[0]=figures|show': 1, 'w[0]|w[1]=reckons|the': 1, 'w[-1]=level': 1, 'w[1]=increased': 1, 'w[-1]=necessary': 1, 'w[0]=compares': 1, 'pos[-1]|pos[0]=PRP|MD': 1, 'w[0]|w[1]=released|Friday': 1, 'w[0]|w[1]=holding|sterling': 1, 'w[0]|w[1]=can|not': 1, 'w[0]=was': 1, 'w[1]=at': 1, 'w[1]=0.1': 1, 'pos[-1]|pos[0]=PRP|TO': 1, 'w[0]|w[1]=has|increased': 1, 'w[-1]|w[0]=account|reported': 1, 'w[0]|w[1]=is|no': 1, 'w[1]=with': 1, 'w[-1]=investors': 1, 'w[0]=believes': 1, 'pos[-1]|pos[0]=``|VBZ': 1, 'pos[-1]|pos[0]=NNS|VBD': 1, 'w[0]|w[1]=is|transforming': 1, 'w[0]=being': 1, 'w[-1]=government': 1, 'w[-1]|w[0]=He|reckons': 1, 'w[-1]=figure': 1, 'w[-1]=little': 1, 'pos[0]=VBN': 1, 'w[-1]|w[0]=little|holding': 1, 'w[0]|w[1]=say|.': 1, 'pos[-1]=IN': 1, "w[-1]|w[0]=''|said": 1, 'w[1]=effect': 1, 'w[1]=take': 1, 'w[0]|w[1]=show|the': 1, 'pos[-1]|pos[0]=,|VBZ': 1, 'w[0]=rose': 1, 'w[1]=defend': 1, 'w[0]=range': 1, 'w[1]=he': 1, 'pos[-1]|pos[0]=RB|VBZ': 1, 'w[-1]|w[0]=he|reminded': 1, 'w[1]=Friday': 1, 'w[-1]|w[0]=,|fail': 1, 'w[0]|w[1]=said|there': 1, 'w[0]=forecasts': 1, 'w[-1]|w[0]=he|remains': 1, 'w[-1]|w[0]=pound|is': 1, 'w[-1]|w[0]=also|forecasts': 1, 'w[1]=in': 1, 'w[0]|w[1]=to|show': 1, 'pos[1]=.': 1, 'w[-1]|w[0]=spending|rose': 1, 'w[-1]|w[0]=failure|to': 1, 'w[0]|w[1]=should|reduce': 1, 'w[-1]|w[0]=industry|is': 1, 'pos[0]|pos[1]=VBZ|VB': 1, 'w[-1]=currency': 1, 'w[-1]|w[0]=chancellor|has': 1, 'w[0]=do': 1, 'w[-1]|w[0]=sterling|has': 1, 'w[0]|w[1]=said|he': 1, 'w[-1]=itself': 1, 'pos[-1]|pos[0]=CC|VBD': 1, 'pos[0]|pos[1]=VBZ|NN': 1, 'w[-1]=He': 1, 'w[1]=bullish': 1, 'w[1]=a': 1, 'w[0]=should': 1, 'w[-1]|w[0]=investors|will': 1, 'pos[0]|pos[1]=VBD|EX': 1, 'w[-1]|w[0]=spending|went': 1, 'w[1]=his': 1, 'pos[0]|pos[1]=VBD|DT': 1, 'w[0]|w[1]=reminded|his': 1, 'pos[-1]|pos[0]=``|MD': 1, "w[0]|w[1]=do|n't": 1, 'w[1]=transforming': 1, 'w[1]=positions': 1, 'w[-1]|w[0]=figures|range': 1, 'pos[0]|pos[1]=VBP|EX': 1, 'pos[-1]|pos[0]=JJ|NN': 1, 'w[0]|w[1]=could|lead': 1, 'w[0]|w[1]=can|be': 1, 'w[0]=reported': 1, 'w[-1]|w[0]=and|was': 1, 'w[-1]|w[0]=analysts|reckon': 1, 'w[0]|w[1]=could|be': 1, 'w[1]=Simon': 1, 'w[-1]=Analysts': 1, 'w[0]|w[1]=noted|Simon': 1, 'w[0]|w[1]=are|topped': 1, 'w[0]|w[1]=noted|,': 1, 'pos[-1]=RB': 1, 'w[-1]=Lawson': 1, 'w[-1]|w[0]=analysts|say': 1, 'w[1]=forced': 1, 'w[1]=on': 1, 'w[-1]|w[0]=data|released': 1, 'w[-1]|w[0]=``|can': 1, 'w[0]=reckons': 1, 'w[1]=little': 1, 'pos[1]=CD': 1, 'w[1]=still': 1, 'pos[1]=PRP': 1, 'w[-1]=and': 1, 'pos[-1]|pos[0]=,|VB': 1, 'w[0]|w[1]=are|bullish': 1, 'pos[0]|pos[1]=VBP|.': 1, 'w[1]=prepared': 1, 'w[-1]|w[0]=government|being': 1, 'w[0]|w[1]=is|still': 1, 'w[-1]|w[0]=he|is': 1, 'w[-1]|w[0]=month|takes': 1, "w[0]|w[1]=wo|n't": 1, 'pos[0]=VB': 1, 'w[-1]|w[0]=necessary|to': 1, 'w[-1]=gap': 1, "w[-1]|w[0]=''|noted": 1, 'w[0]=agree': 1, 'w[-1]|w[0]=gap|registered': 1, 'w[1]=made': 1, 'w[-1]|w[0]=currency|wo': 1, 'pos[-1]|pos[0]=NN|VBP': 1, 'w[0]=expect': 1, 'w[0]|w[1]=warns|that': 1, 'w[0]|w[1]=is|slowing': 1, 'w[-1]=also': 1, 'w[1]=boost': 1, 'w[1]=sterling': 1, 'w[1]=,': 1, 'pos[-1]|pos[0]=JJ|TO': 1, 'w[-1]|w[0]=policy|to': 1, 'w[1]=for': 1, 'pos[0]|pos[1]=VBD|PRP': 1, 'w[1]=topped': 1, 'pos[0]|pos[1]=VBP|VBG': 1, 'w[-1]|w[0]=industry|could': 1, 'w[-1]=Friday': 1, 'w[-1]=July': 1, 'pos[0]|pos[1]=VBZ|JJ': 1, 'w[-1]|w[0]=one|will': 1, 'w[0]|w[1]=to|announce': 1, 'w[0]|w[1]=is|another': 1, 'w[-1]|w[0]=figure|are': 1, 'pos[0]|pos[1]=VBD|CD': 1, 'w[1]=been': 1, 'w[-1]|w[0]=This|has': 1, 'w[1]=slowing': 1, 'w[0]|w[1]=registered|in': 1, 'w[-1]=goods': 1, 'w[-1]|w[0]=economy|remains': 1, 'w[-1]|w[0]=he|noted': 1, 'w[0]|w[1]=expect|the': 1, 'pos[-1]=CC': 1, 'w[1]=Chris': 1, 'w[0]|w[1]=will|be': 1, 'w[0]|w[1]=has|made': 1, 'w[-1]=pound': 1, 'pos[0]|pos[1]=VBP|VBN': 1, 'pos[0]|pos[1]=VBG|VBN': 1, 'w[0]=takes': 1, 'w[0]|w[1]=will|narrow': 1, 'w[-1]|w[0]=deficit|could': 1, 'w[1]=no': 1, 'w[0]=fail': 1, 'w[-1]|w[0]=This|compares': 1, 'w[0]|w[1]=has|helped': 1, 'w[0]|w[1]=went|on': 1, 'w[-1]|w[0]=economists|expect': 1, 'w[0]|w[1]=show|that': 1, 'w[-1]|w[0]=Lawson|warned': 1, 'pos[1]=PRP$': 1, 'pos[1]=,': 1, 'pos[1]=NNS': 1, 'w[-1]|w[0]=policy|has': 1, 'w[1]=show': 1, 'w[1]=up': 1, 'w[-1]|w[0]=goods|should': 1, 'w[-1]|w[0]=there|could': 1, 'w[0]|w[1]=agree|there': 1, 'w[0]|w[1]=to|boost': 1, 'w[0]|w[1]=said|the': 1, 'pos[-1]|pos[0]=IN|VBG': 1, 'w[0]=holding': 1, 'w[0]=released': 1, 'w[0]|w[1]=rose|0.1': 1, 'w[1]=.': 1, 'pos[0]=NN': 1, 'w[-1]=month': 1, 'pos[0]|pos[1]=VBG|NNS': 1, 'w[-1]|w[0]=deficit|will': 1, 'w[0]=reckon': 1, 'w[-1]|w[0]=,|warns': 1, 'w[0]=warned': 1, 'w[1]=another': 1}), 'B-VP|B-NP': Counter({'B-VP|B-NP': 20}), 'O': Counter({'O': 81, 'pos[0]=,': 26, 'w[0]=,': 26, 'pos[0]=.': 25, 'w[0]=.': 25, 'pos[-1]=NN': 21, 'pos[-1]=NNP': 13, 'pos[-1]|pos[0]=NN|,': 10, 'pos[0]=CC': 9, 'pos[-1]=NNS': 9, 'pos[1]=DT': 9, 'pos[-1]|pos[0]=NNP|,': 8, 'pos[-1]=CD': 7, 'pos[-1]=RB': 7, 'pos[-1]|pos[0]=NNS|.': 7, 'pos[1]=JJ': 6, 'pos[-1]|pos[0]=RB|,': 5, 'w[0]=``': 5, 'pos[-1]=,': 5, 'pos[-1]|pos[0]=NNP|.': 5, 'w[0]=and': 5, "pos[0]=''": 5, 'w[-1]=,': 5, 'pos[0]=``': 5, "w[0]=''": 5, 'pos[0]|pos[1]=,|JJ': 4, 'pos[1]=VBD': 4, 'w[1]=the': 4, 'pos[-1]|pos[0]=NN|.': 4, "w[1]=''": 4, 'pos[0]=CD': 4, 'pos[1]=NN': 4, 'pos[0]|pos[1]=,|DT': 4, 'pos[-1]|pos[0]=NN|CC': 4, "pos[1]=''": 4, "w[-1]|w[0]=,|''": 3, "pos[0]|pos[1]=,|''": 3, 'pos[1]=PRP': 3, 'pos[0]=DT': 3, "w[0]|w[1]=,|''": 3, 'w[1]=he': 3, 'pos[-1]|pos[0]=CD|.': 3, 'pos[1]=NNP': 3, 'w[-1]=billion': 3, 'w[0]|w[1]=,|the': 3, 'pos[1]=IN': 3, 'pos[1]=CD': 3, 'pos[0]|pos[1]=,|NNP': 3, 'pos[0]|pos[1]=CC|DT': 3, 'w[0]|w[1]=,|he': 3, "pos[-1]|pos[0]=,|''": 3, 'pos[1]=NNS': 3, 'pos[-1]=JJ': 3, 'pos[0]|pos[1]=,|PRP': 3, 'pos[-1]|pos[0]=JJ|.': 2, 'w[-1]=September': 2, 'pos[-1]|pos[0]=NN|``': 2, 'pos[-1]|pos[0]=RB|.': 2, "pos[0]|pos[1]=''|VBD": 2, 'w[0]=that': 2, 'pos[1]=VBZ': 2, 'pos[0]|pos[1]=CD|CD': 2, 'pos[0]|pos[1]=``|DT': 2, 'pos[-1]|pos[0]=CD|CD': 2, 'w[-1]=Briscoe': 2, 'w[1]=a': 2, 'w[0]=billion': 2, 'w[-1]=Thursday': 2, 'w[-1]=imports': 2, 'pos[0]|pos[1]=DT|NN': 2, 'w[1]=said': 2, 'w[1]=Mr.': 2, 'w[1]=but': 2, 'w[1]=billion': 2, 'pos[-1]=VBP': 2, 'w[0]|w[1]=,|Mr.': 2, 'w[0]=but': 2, 'w[-1]=pound': 2, 'w[-1]|w[0]=Briscoe|,': 2, 'w[-1]=1988': 2, 'w[-1]|w[0]=1988|.': 2, 'pos[0]|pos[1]=CC|JJ': 2, 'w[-1]|w[0]=quarter|and': 2, 'w[0]=But': 2, 'pos[0]|pos[1]=CC|NNS': 2, 'w[-1]=quarter': 2, 'pos[1]=CC': 2, 'pos[-1]|pos[0]=#|CD': 1, 'w[-1]|w[0]=speech|,': 1, 'pos[0]|pos[1]=)|NN': 1, "pos[0]|pos[1]=''|CC": 1, 'pos[-1]|pos[0]=VBN|.': 1, 'w[1]=economists': 1, 'w[1]=that': 1, 'w[-1]|w[0]=and|that': 1, 'w[-1]=pressure': 1, 'w[-1]=government': 1, 'w[1]=``': 1, 'w[0]|w[1]=and|was': 1, 'pos[-1]=$': 1, 'w[1]=The': 1, 'pos[-1]=VBN': 1, 'w[1]=deficit': 1, 'w[-1]=Joshi': 1, 'w[-1]=tomorrow': 1, 'pos[0]|pos[1]=,|WP': 1, 'pos[1]=VB': 1, 'pos[-1]|pos[0]=VBP|.': 1, 'w[0]|w[1]=``|can': 1, 'pos[0]|pos[1]=CD|(': 1, 'w[0]=-RRB-': 1, 'w[-1]=sterling': 1, 'pos[0]=#': 1, 'w[-1]|w[0]=Thursday|,': 1, 'w[-1]=clouded': 1, 'w[-1]|w[0]=side|,': 1, 'w[-1]|w[0]=Dillow|,': 1, 'w[0]|w[1]=but|suggestions': 1, 'w[-1]|w[0]=,|``': 1, 'w[0]|w[1]=that|sterling': 1, 'w[0]|w[1]=,|senior': 1, 'w[-1]|w[0]=gap|,': 1, 'w[-1]|w[0]=economists|and': 1, 'pos[1]=WP': 1, 'pos[-1]=CC': 1, 'pos[1]=(': 1, 'w[-1]=economists': 1, 'w[-1]|w[0]=Meanwhile|,': 1, 'w[1]=-RRB-': 1, 'w[-1]|w[0]=say|.': 1, 'w[-1]=noted': 1, 'pos[0]|pos[1]=CC|NN': 1, 'w[-1]|w[0]=weakness|.': 1, 'w[1]=of': 1, 'w[-1]|w[0]=imports|,': 1, 'w[-1]|w[0]=September|.': 1, 'w[0]|w[1]=,|due': 1, 'pos[-1]|pos[0]=,|``': 1, 'w[-1]=Certainly': 1, 'w[-1]=rises': 1, "w[0]|w[1]=''|said": 1, 'pos[-1]|pos[0]=VBD|,': 1, 'w[-1]=time': 1, 'w[-1]|w[0]=positions|.': 1, 'pos[0]|pos[1]=,|VBN': 1, 'w[-1]|w[0]=PLC|.': 1, 'w[1]=#': 1, 'pos[-1]|pos[0]=CD|)': 1, "w[0]|w[1]=''|in": 1, 'w[0]|w[1]=,|given': 1, 'w[-1]|w[0]=years|.': 1, 'w[-1]=further': 1, 'w[-1]|w[0]=sterling|,': 1, 'w[0]|w[1]=billion|-LRB-': 1, 'w[1]=few': 1, 'pos[0]|pos[1]=``|VBZ': 1, 'pos[0]=)': 1, 'pos[0]|pos[1]=,|EX': 1, 'w[1]=spending': 1, 'w[0]|w[1]=``|is': 1, 'pos[-1]=.': 1, 'w[1]=warns': 1, 'w[0]|w[1]=,|European': 1, 'w[0]|w[1]=-RRB-|deficit': 1, 'w[-1]=necessary': 1, 'w[0]=-LRB-': 1, 'w[-1]|w[0]=week|.': 1, 'pos[-1]|pos[0]=$|CD': 1, 'w[1]=senior': 1, 'w[-1]|w[0]=billion|-RRB-': 1, 'w[-1]=Institute': 1, 'pos[0]|pos[1]=CD|)': 1, 'w[0]|w[1]=,|fail': 1, "pos[0]|pos[1]=''|IN": 1, "w[0]|w[1]=''|noted": 1, 'pos[0]=(': 1, 'w[-1]=Meanwhile': 1, 'w[-1]|w[0]=Joshi|,': 1, 'w[-1]|w[0]=government|``': 1, 'w[-1]=place': 1, 'w[-1]|w[0]=the|#': 1, 'w[0]|w[1]=and|foreign': 1, 'w[-1]|w[0]=2|billion': 1, 'w[-1]=$': 1, 'w[-1]=week': 1, 'pos[0]|pos[1]=,|VBZ': 1, 'w[-1]=#': 1, 'pos[-1]=DT': 1, 'pos[0]|pos[1]=,|NNS': 1, 'pos[-1]=IN': 1, 'w[-1]|w[0]=Certainly|,': 1, 'w[0]|w[1]=``|If': 1, 'w[-1]|w[0]=decline|,': 1, 'w[1]=2': 1, 'pos[0]|pos[1]=(|$': 1, 'w[-1]=say': 1, 'w[-1]|w[0]=economy|``': 1, 'w[0]|w[1]=-LRB-|$': 1, 'w[1]=analysts': 1, 'w[0]|w[1]=,|who': 1, "w[-1]|w[0]=''|but": 1, 'w[0]|w[1]=``|No': 1, 'w[-1]=effect': 1, 'pos[0]|pos[1]=NNP|IN': 1, 'w[1]=If': 1, 'w[1]=No': 1, 'w[1]=consumer': 1, 'w[1]=in': 1, 'w[-1]=deficits': 1, 'w[-1]=strong': 1, 'w[0]|w[1]=#|2': 1, 'w[0]|w[1]=but|few': 1, 'w[-1]=inflows': 1, 'w[1]=is': 1, 'pos[-1]|pos[0]=VBP|DT': 1, 'w[-1]=decline': 1, 'w[-1]|w[0]=billion|.': 1, 'w[1]=was': 1, 'w[-1]=side': 1, 'pos[-1]|pos[0]=IN|DT': 1, 'w[0]|w[1]=,|U.K.': 1, 'w[0]|w[1]=Chancellor|of': 1, 'w[-1]|w[0]=pound|.': 1, 'w[-1]=weakness': 1, 'w[0]|w[1]=But|consumer': 1, 'w[0]=Chancellor': 1, 'pos[-1]|pos[0]=,|CC': 1, "w[-1]=''": 1, 'pos[0]|pos[1]=,|``': 1, 'w[-1]=deficit': 1, "w[0]|w[1]=''|but": 1, 'w[-1]=number': 1, 'w[-1]|w[0]=imports|.': 1, 'w[-1]|w[0]=widely|,': 1, 'pos[1]=)': 1, 'w[-1]|w[0]=Thursday|.': 1, 'pos[-1]|pos[0]=JJ|,': 1, 'w[1]=due': 1, 'w[0]|w[1]=,|``': 1, 'w[1]=suggestions': 1, 'w[-1]|w[0]=further|.': 1, "pos[-1]|pos[0]=NN|''": 1, 'w[-1]|w[0]=time|,': 1, 'pos[0]|pos[1]=,|VBD': 1, 'pos[1]=#': 1, 'pos[-1]|pos[0]=NNS|,': 1, 'pos[0]|pos[1]=``|MD': 1, 'w[-1]|w[0]=necessary|.': 1, 'pos[-1]|pos[0]=CD|(': 1, 'w[0]=#': 1, 'w[0]|w[1]=and|that': 1, 'w[0]|w[1]=,|there': 1, 'w[-1]|w[0]=deficit|and': 1, 'pos[0]|pos[1]=``|IN': 1, 'w[-1]|w[0]=strong|,': 1, 'w[-1]|w[0]=,|but': 1, 'w[-1]|w[0]=effect|.': 1, "pos[-1]|pos[0]=.|''": 1, 'w[0]|w[1]=``|The': 1, 'w[-1]|w[0]=number|,': 1, 'w[-1]=Co.': 1, 'w[-1]=position': 1, 'w[-1]|w[0]=Nevertheless|,': 1, 'pos[1]=``': 1, 'w[-1]=Dillow': 1, 'w[-1]|w[0]=show|that': 1, 'pos[1]=$': 1, 'w[1]=foreign': 1, 'w[0]=the': 1, 'w[-1]|w[0]=place|and': 1, 'w[-1]=gap': 1, 'w[-1]=Nevertheless': 1, 'w[-1]=quickly': 1, 'pos[0]|pos[1]=,|CC': 1, 'w[-1]=Montagu': 1, 'w[0]|w[1]=,|said': 1, 'w[0]=3.2': 1, 'w[1]=fail': 1, 'pos[0]=NNP': 1, "w[-1]|w[0]=position|''": 1, 'w[-1]|w[0]=3.2|billion': 1, 'w[1]=sterling': 1, 'w[1]=overall': 1, 'w[-1]|w[0]=$|3.2': 1, 'w[1]=noted': 1, 'w[-1]=exports': 1, 'w[-1]|w[0]=from|the': 1, 'pos[1]=MD': 1, 'w[1]=$': 1, 'w[-1]=August': 1, 'w[-1]|w[0]=pressure|,': 1, 'w[-1]|w[0]=Institute|.': 1, 'w[1]=-LRB-': 1, 'w[-1]|w[0]=noted|,': 1, 'w[0]|w[1]=the|#': 1, 'w[-1]|w[0]=August|.': 1, 'w[0]|w[1]=3.2|billion': 1, 'w[1]=U.K.': 1, 'w[0]|w[1]=,|but': 1, "pos[-1]=''": 1, 'w[0]|w[1]=,|a': 1, 'w[1]=European': 1, 'w[1]=there': 1, 'w[0]|w[1]=that|spending': 1, 'w[-1]|w[0]=#|2': 1, 'w[-1]|w[0]=quickly|.': 1, 'pos[0]|pos[1]=#|CD': 1, 'pos[0]|pos[1]=DT|#': 1, 'w[0]|w[1]=2|billion': 1, 'w[0]|w[1]=,|warns': 1, 'w[-1]|w[0]=However|,': 1, 'w[-1]=economy': 1, "w[0]|w[1]=.|''": 1, 'w[0]|w[1]=,|overall': 1, 'w[-1]=.': 1, 'w[0]|w[1]=billion|-RRB-': 1, 'w[1]=can': 1, 'w[-1]|w[0]=September|,': 1, 'w[-1]|w[0]=clouded|.': 1, 'w[-1]=widely': 1, 'w[0]|w[1]=and|a': 1, 'w[-1]=PLC': 1, 'w[-1]=ago': 1, "w[-1]|w[0]=.|''": 1, 'w[-1]|w[0]=exports|.': 1, 'w[-1]|w[0]=tomorrow|,': 1, 'w[1]=who': 1, 'pos[0]|pos[1]=CC|VBD': 1, "pos[0]|pos[1]=.|''": 1, 'w[-1]|w[0]=deficits|.': 1, 'w[-1]|w[0]=Montagu|,': 1, 'w[-1]|w[0]=billion|-LRB-': 1, 'w[0]|w[1]=and|the': 1, 'w[0]|w[1]=But|analysts': 1, 'pos[1]=EX': 1, 'w[0]=2': 1, 'w[-1]=2': 1, 'w[-1]=3.2': 1, 'pos[-1]|pos[0]=DT|#': 1, 'w[-1]=show': 1, 'pos[0]|pos[1]=,|VB': 1, 'w[-1]|w[0]=pound|,': 1, 'w[-1]=positions': 1, 'pos[-1]|pos[0]=CC|DT': 1, 'pos[1]=VBN': 1, 'w[1]=given': 1, 'w[-1]=the': 1, 'w[-1]|w[0]=Co.|,': 1, 'w[-1]=from': 1, 'w[-1]=However': 1, 'w[-1]|w[0]=rises|.': 1, 'w[0]|w[1]=,|economists': 1, 'pos[-1]|pos[0]=NNS|CC': 1, 'pos[-1]=VBD': 1, 'w[-1]|w[0]=ago|.': 1, 'w[-1]=and': 1, 'pos[-1]=#': 1, "pos[-1]|pos[0]=''|CC": 1, 'w[-1]|w[0]=inflows|.': 1, 'w[-1]=speech': 1, 'w[-1]=years': 1}), 'B-ADJP|O': Counter({'B-ADJP|O': 2}), 'I-VP': Counter({'I-VP': 54, 'pos[0]=VB': 27, 'w[-1]=to': 13, 'pos[-1]=TO': 13, 'pos[-1]|pos[0]=TO|VB': 12, 'pos[1]=VB': 12, 'pos[0]=VBN': 12, 'pos[-1]=MD': 11, 'pos[-1]=VBZ': 10, 'pos[1]=DT': 10, 'pos[-1]|pos[0]=MD|VB': 9, 'w[1]=to': 9, 'pos[1]=TO': 9, 'pos[0]|pos[1]=VB|DT': 8, 'pos[0]=TO': 7, 'w[0]=to': 7, 'pos[0]|pos[1]=TO|VB': 7, 'pos[1]=IN': 6, 'pos[-1]=VB': 6, 'pos[-1]|pos[0]=VBZ|VBN': 5, 'pos[-1]=RB': 5, 'pos[0]|pos[1]=VB|TO': 5, 'pos[0]=RB': 5, 'pos[-1]=VBN': 5, 'pos[1]=VBN': 5, 'w[-1]=is': 4, 'w[-1]=will': 4, 'pos[-1]|pos[0]=RB|VB': 4, 'pos[0]|pos[1]=VBN|TO': 4, 'w[0]=be': 4, 'w[-1]=has': 4, 'pos[0]|pos[1]=RB|VB': 4, 'pos[-1]|pos[0]=VBN|TO': 4, "w[0]=n't": 3, 'pos[0]|pos[1]=VBN|IN': 3, 'pos[1]=NN': 3, 'w[-1]=be': 3, 'pos[0]|pos[1]=VB|VBN': 3, 'pos[-1]|pos[0]=VB|VBN': 3, 'pos[0]|pos[1]=VB|NN': 3, 'pos[0]|pos[1]=VB|JJ': 3, "w[-1]=n't": 3, 'w[-1]=could': 3, 'pos[-1]|pos[0]=VB|TO': 3, 'w[1]=the': 3, 'pos[0]|pos[1]=VB|IN': 3, 'w[1]=a': 3, 'pos[1]=JJ': 3, 'w[1]=that': 3, 'w[0]=want': 2, 'w[0]|w[1]=show|a': 2, 'pos[0]=VBG': 2, 'w[-1]=does': 2, 'w[1]=increase': 2, 'w[0]=narrow': 2, 'w[0]|w[1]=to|increase': 2, 'w[1]=by': 2, 'w[1]=further': 2, 'pos[-1]=VBP': 2, 'pos[-1]|pos[0]=VBZ|VBG': 2, 'w[1]=expected': 2, 'pos[-1]|pos[0]=MD|RB': 2, 'w[-1]|w[0]=want|to': 2, 'w[0]=show': 2, 'w[0]|w[1]=want|to': 2, 'w[0]=expected': 2, 'w[-1]=can': 2, 'pos[-1]|pos[0]=VBZ|RB': 2, 'w[-1]=want': 2, 'w[-1]|w[0]=to|show': 2, 'w[0]|w[1]=narrow|to': 2, 'pos[1]=NNS': 2, 'w[-1]|w[0]=to|increase': 2, 'pos[1]=PRP': 2, 'pos[0]|pos[1]=VB|NNS': 2, 'w[-1]|w[0]=will|want': 2, 'w[0]=increase': 2, 'w[0]=take': 2, 'w[1]=place': 1, "w[-1]|w[0]=do|n't": 1, 'w[0]|w[1]=widely|expected': 1, 'w[0]|w[1]=prevent|a': 1, 'w[-1]|w[0]=forced|to': 1, 'w[-1]=both': 1, 'pos[-1]|pos[0]=RB|VBN': 1, "w[-1]|w[0]=n't|suggest": 1, 'w[0]|w[1]=boost|exports': 1, 'w[-1]|w[0]=been|eroded': 1, 'w[-1]|w[0]=could|be': 1, "w[-1]|w[0]=n't|decline": 1, 'w[-1]|w[0]=is|widely': 1, 'w[0]|w[1]=be|expected': 1, 'w[-1]|w[0]=to|defend': 1, 'pos[-1]|pos[0]=VBZ|VB': 1, 'w[-1]|w[0]=helped|to': 1, 'w[-1]|w[0]=prepared|to': 1, 'w[-1]|w[0]=to|go': 1, 'w[-1]=being': 1, 'w[-1]|w[0]=to|announce': 1, 'w[1]=base': 1, 'w[0]|w[1]=to|prevent': 1, 'w[-1]|w[0]=expected|to': 1, 'w[0]|w[1]=made|it': 1, 'w[0]|w[1]=suggest|that': 1, 'w[-1]|w[0]=is|transforming': 1, 'w[0]=increased': 1, 'w[0]|w[1]=slowing|that': 1, 'w[1]=eroded': 1, 'w[0]|w[1]=announce|any': 1, 'w[0]|w[1]=to|take': 1, 'w[1]=decline': 1, 'pos[0]|pos[1]=VBN|DT': 1, 'w[1]=any': 1, 'w[0]|w[1]=allow|the': 1, 'w[-1]|w[0]=not|allow': 1, 'w[-1]=helped': 1, 'pos[-1]|pos[0]=VBP|RB': 1, 'pos[0]|pos[1]=RB|VBN': 1, 'w[-1]|w[0]=will|be': 1, 'w[0]=announce': 1, 'w[0]=suggest': 1, 'w[-1]|w[0]=to|take': 1, 'w[-1]|w[0]=are|topped': 1, 'w[0]=helped': 1, 'w[0]=decline': 1, 'w[0]|w[1]=transforming|itself': 1, 'w[0]=widely': 1, 'w[-1]=prepared': 1, 'pos[0]|pos[1]=VBG|PRP': 1, 'pos[0]=DT': 1, 'w[0]|w[1]=to|go': 1, 'w[0]|w[1]=defend|the': 1, 'w[-1]|w[0]=is|slowing': 1, 'w[-1]|w[0]=does|take': 1, 'w[-1]|w[0]=has|increased': 1, 'w[-1]|w[0]=can|not': 1, 'w[1]=it': 1, 'pos[-1]=VBG': 1, "w[0]|w[1]=n't|suggest": 1, 'w[-1]=expected': 1, 'w[0]=slowing': 1, 'w[0]|w[1]=undermined|by': 1, 'w[0]=see': 1, 'w[0]=made': 1, 'w[-1]=do': 1, 'pos[-1]|pos[0]=VBN|VBN': 1, "w[-1]|w[0]=wo|n't": 1, 'w[-1]|w[0]=has|helped': 1, 'w[1]=suggest': 1, 'w[0]|w[1]=reduce|fears': 1, 'w[-1]|w[0]=fail|to': 1, 'pos[0]|pos[1]=VBN|JJR': 1, 'pos[-1]|pos[0]=VBP|VBN': 1, 'pos[0]|pos[1]=VBG|DT': 1, 'w[0]|w[1]=decline|further': 1, 'w[0]|w[1]=be|pushed': 1, 'pos[-1]|pos[0]=DT|VB': 1, 'w[1]=higher': 1, 'w[-1]|w[0]=both|ensure': 1, 'w[0]|w[1]=lead|to': 1, 'w[0]|w[1]=forced|to': 1, 'w[1]=exports': 1, 'w[0]=topped': 1, 'w[1]=itself': 1, 'w[0]|w[1]=ensure|that': 1, 'w[1]=see': 1, 'w[0]|w[1]=both|ensure': 1, "w[-1]|w[0]=n't|advance": 1, 'w[0]|w[1]=increased|the': 1, 'w[0]|w[1]=been|eroded': 1, 'w[0]=allow': 1, 'w[0]=forced': 1, 'w[-1]|w[0]=to|be': 1, 'w[0]|w[1]=pushed|higher': 1, 'w[1]=pushed': 1, 'w[1]=allow': 1, 'w[0]=been': 1, 'w[1]=much': 1, "w[-1]|w[0]=does|n't": 1, 'w[0]=not': 1, 'w[0]|w[1]=not|allow': 1, 'w[1]=as': 1, 'w[0]=both': 1, 'w[-1]=been': 1, 'w[0]|w[1]=take|another': 1, 'w[0]|w[1]=to|see': 1, 'w[-1]|w[0]=can|be': 1, 'pos[0]|pos[1]=VBN|RB': 1, 'w[1]=into': 1, 'w[-1]|w[0]=be|pushed': 1, 'w[-1]|w[0]=be|expected': 1, 'w[-1]|w[0]=be|undermined': 1, 'w[1]=an': 1, 'w[-1]=are': 1, 'w[-1]|w[0]=will|narrow': 1, 'w[0]|w[1]=see|further': 1, 'w[1]=fears': 1, 'w[0]|w[1]=eroded|by': 1, 'pos[1]=RB': 1, 'w[1]=advance': 1, 'w[-1]=fail': 1, 'w[1]=another': 1, 'w[0]|w[1]=topped|only': 1, 'w[0]=ensure': 1, 'w[0]=transforming': 1, 'w[0]=lead': 1, 'w[-1]|w[0]=has|made': 1, 'w[-1]|w[0]=should|reduce': 1, 'w[0]|w[1]=be|undermined': 1, 'pos[1]=JJR': 1, 'w[-1]=widely': 1, 'w[0]=undermined': 1, 'w[0]=pushed': 1, 'pos[-1]=DT': 1, 'w[-1]|w[0]=to|see': 1, 'w[-1]=forced': 1, 'w[-1]|w[0]=being|forced': 1, 'w[1]=undermined': 1, 'w[-1]|w[0]=could|lead': 1, 'w[0]|w[1]=prepared|to': 1, 'pos[0]|pos[1]=VBN|PRP': 1, 'w[-1]=not': 1, 'w[1]=take': 1, 'w[0]|w[1]=advance|much': 1, 'w[-1]|w[0]=to|prevent': 1, 'w[0]=boost': 1, 'w[0]|w[1]=increase|interest': 1, "w[0]|w[1]=n't|decline": 1, 'w[0]|w[1]=expected|to': 1, 'w[0]|w[1]=helped|to': 1, 'w[1]=prevent': 1, 'w[0]=eroded': 1, 'w[0]|w[1]=to|show': 1, 'pos[0]|pos[1]=VBN|VBN': 1, 'w[0]=reduce': 1, 'pos[0]|pos[1]=DT|VB': 1, 'w[0]|w[1]=be|an': 1, 'w[-1]=should': 1, 'w[-1]|w[0]=is|prepared': 1, 'w[1]=only': 1, "w[0]|w[1]=n't|advance": 1, 'pos[-1]|pos[0]=TO|DT': 1, 'w[0]=prepared': 1, 'w[1]=show': 1, 'w[-1]|w[0]=to|both': 1, 'w[-1]=wo': 1, 'w[1]=ensure': 1, 'w[0]=prevent': 1, 'w[0]|w[1]=expected|as': 1, 'w[-1]|w[0]=to|boost': 1, 'w[0]=go': 1, 'w[0]|w[1]=take|place': 1, 'w[1]=go': 1, 'w[0]=advance': 1, 'w[0]=defend': 1, 'w[-1]|w[0]=could|narrow': 1, 'pos[-1]|pos[0]=VBG|VBN': 1, 'w[0]|w[1]=go|into': 1, 'w[1]=interest': 1, 'w[0]|w[1]=increase|base': 1, 'w[-1]|w[0]=has|been': 1, 'w[-1]|w[0]=widely|expected': 1}), 'B-SBAR': Counter({'B-SBAR': 17, 'pos[0]=IN': 16, 'w[0]=that': 10, 'pos[1]=DT': 6, 'pos[0]|pos[1]=IN|DT': 6, 'pos[-1]|pos[0]=NN|IN': 4, 'pos[-1]=NN': 4, 'w[1]=the': 3, 'w[0]|w[1]=that|a': 3, 'w[0]=if': 3, 'w[1]=a': 3, 'w[0]|w[1]=that|the': 2, 'pos[1]=NN': 2, 'pos[-1]|pos[0]=VBZ|IN': 2, 'pos[-1]=VB': 2, 'w[1]=necessary': 2, 'pos[-1]|pos[0]=JJ|IN': 2, 'pos[0]|pos[1]=IN|NN': 2, 'pos[1]=NNS': 2, 'pos[-1]|pos[0]=VB|IN': 2, 'pos[0]|pos[1]=IN|NNS': 2, 'pos[0]|pos[1]=IN|JJ': 2, 'w[0]|w[1]=if|necessary': 2, 'pos[-1]=VBZ': 2, 'pos[-1]=JJ': 2, 'pos[1]=JJ': 2, 'pos[0]|pos[1]=RB|IN': 1, 'w[0]|w[1]=that|even': 1, 'w[0]=even': 1, 'w[0]|w[1]=if|trade': 1, 'w[0]|w[1]=If|there': 1, 'w[1]=there': 1, 'w[-1]|w[0]=suggestions|that': 1, 'w[1]=much': 1, 'pos[0]|pos[1]=IN|NNP': 1, 'w[0]|w[1]=that|Britain': 1, 'w[0]|w[1]=that|he': 1, 'pos[1]=NNP': 1, 'w[-1]=promise': 1, 'w[1]=Britain': 1, 'w[-1]|w[0]=ensure|that': 1, 'w[-1]|w[0]=sign|that': 1, 'pos[0]|pos[1]=IN|EX': 1, 'w[0]|w[1]=that|much': 1, 'w[-1]=``': 1, 'pos[-1]=JJR': 1, 'w[-1]|w[0]=``|If': 1, 'pos[-1]=VBN': 1, 'pos[1]=RB': 1, 'w[-1]|w[0]=that|even': 1, 'w[-1]=again': 1, 'pos[-1]|pos[0]=RB|IN': 1, 'w[-1]|w[0]=suggest|that': 1, 'w[-1]|w[0]=dive|if': 1, 'w[0]=because': 1, 'w[0]|w[1]=that|rates': 1, 'w[0]=as': 1, 'w[-1]=clear': 1, 'w[-1]=higher': 1, 'pos[-1]|pos[0]=JJR|IN': 1, 'w[-1]=ensure': 1, 'w[-1]=suggestions': 1, 'pos[-1]|pos[0]=VBD|IN': 1, 'pos[0]|pos[1]=IN|PRP': 1, 'pos[0]|pos[1]=IN|RB': 1, 'w[-1]=suggest': 1, 'w[-1]|w[0]=clear|that': 1, 'w[-1]|w[0]=warns|that': 1, 'w[-1]=warned': 1, 'w[-1]|w[0]=again|if': 1, 'w[1]=if': 1, 'w[-1]=sign': 1, 'w[1]=rates': 1, 'w[-1]|w[0]=believes|that': 1, 'w[0]|w[1]=because|investors': 1, 'w[1]=he': 1, 'w[-1]|w[0]=promise|that': 1, 'pos[1]=EX': 1, 'pos[0]=RB': 1, 'w[-1]=much': 1, 'w[-1]|w[0]=audience|that': 1, 'w[-1]=expected': 1, 'pos[1]=PRP': 1, 'pos[-1]=NNS': 1, 'pos[1]=IN': 1, 'pos[-1]=``': 1, 'w[-1]|w[0]=much|because': 1, 'w[-1]|w[0]=warned|that': 1, 'w[-1]=that': 1, 'w[-1]|w[0]=higher|if': 1, 'pos[-1]|pos[0]=VBN|IN': 1, 'w[1]=investors': 1, 'w[-1]|w[0]=expected|as': 1, 'pos[-1]=VBD': 1, 'w[1]=trade': 1, 'w[-1]=dive': 1, 'w[-1]=believes': 1, 'w[0]|w[1]=as|the': 1, 'pos[-1]=RB': 1, 'pos[-1]=IN': 1, 'pos[-1]|pos[0]=NNS|IN': 1, 'w[-1]=warns': 1, 'pos[-1]|pos[0]=``|IN': 1, 'w[0]|w[1]=even|if': 1, 'w[1]=even': 1, 'w[0]=If': 1, 'pos[-1]|pos[0]=IN|RB': 1, 'w[-1]=audience': 1}), 'I-VP|B-SBAR': Counter({'I-VP|B-SBAR': 3}), 'I-ADVP': Counter({'I-ADVP': 3, 'pos[0]=RB': 2, 'pos[1]=IN': 1, 'w[0]|w[1]=heavily|on': 1, 'w[1]=on': 1, 'pos[-1]|pos[0]=RB|RB': 1, 'w[-1]=very': 1, 'w[0]=quickly': 1, 'pos[1]=.': 1, 'w[0]=least': 1, 'pos[0]|pos[1]=JJS|DT': 1, 'w[0]|w[1]=quickly|.': 1, 'w[-1]|w[0]=at|least': 1, 'w[-1]|w[0]=very|heavily': 1, 'w[0]=heavily': 1, 'w[0]|w[1]=least|some': 1, 'pos[1]=DT': 1, 'w[1]=.': 1, 'pos[0]|pos[1]=RB|IN': 1, 'w[-1]|w[0]=that|quickly': 1, 'pos[-1]=RB': 1, 'pos[-1]=IN': 1, 'pos[0]=JJS': 1, 'pos[-1]|pos[0]=IN|JJS': 1, 'pos[-1]=DT': 1, 'w[-1]=that': 1, 'w[1]=some': 1, 'w[-1]=at': 1, 'pos[0]|pos[1]=RB|.': 1, 'pos[-1]|pos[0]=DT|RB': 1}), 'B-ADJP|B-SBAR': Counter({'B-ADJP|B-SBAR': 2}), 'I-VP|B-PP': Counter({'I-VP|B-PP': 6}), 'B-NP|B-ADVP': Counter({'B-NP|B-ADVP': 1}), 'B-ADVP|B-SBAR': Counter({'B-ADVP|B-SBAR': 1}), 'B-ADVP|B-NP': Counter({'B-ADVP|B-NP': 1}), 'I-SBAR|B-NP': Counter({'I-SBAR|B-NP': 1}), 'B-ADVP|B-PP': Counter({'B-ADVP|B-PP': 1}), 'B-PP|B-PP': Counter({'B-PP|B-PP': 1}), 'B-PP|B-VP': Counter({'B-PP|B-VP': 1}), 'B-VP|B-ADVP': Counter({'B-VP|B-ADVP': 4}), 'B-PP|B-ADVP': Counter({'B-PP|B-ADVP': 1}), 'B-ADVP|O': Counter({'B-ADVP|O': 7}), 'B-ADVP|I-ADVP': Counter({'B-ADVP|I-ADVP': 3}), 'B-NP|B-ADJP': Counter({'B-NP|B-ADJP': 2}), 'I-VP|B-NP': Counter({'I-VP|B-NP': 18}), 'I-PP': Counter({'w[0]=than': 1, 'w[1]=consumer': 1, 'w[-1]|w[0]=rather|than': 1, 'pos[-1]=RB': 1, 'w[-1]=rather': 1, 'I-PP': 1, 'pos[0]|pos[1]=IN|NN': 1, 'pos[-1]|pos[0]=RB|IN': 1, 'pos[1]=NN': 1, 'pos[0]=IN': 1, 'w[0]|w[1]=than|consumer': 1}), 'B-ADJP|B-PP': Counter({'B-ADJP|B-PP': 4}), 'I-ADVP|O': Counter({'I-ADVP|O': 1}), 'B-PP|O': Counter({'B-PP|O': 1}), 'I-NP|I-NP': Counter({'I-NP|I-NP': 86}), 'I-PP|B-NP': Counter({'I-PP|B-NP': 1}), 'B-VP|B-ADJP': Counter({'B-VP|B-ADJP': 3}), 'B-ADJP': Counter({'B-ADJP': 12, 'pos[1]=IN': 6, 'pos[0]=JJ': 6, 'pos[0]|pos[1]=JJ|IN': 4, 'pos[0]=RB': 3, 'w[-1]=remains': 2, 'w[1]=for': 2, 'w[-1]|w[0]=remains|fairly': 2, 'pos[-1]|pos[0]=IN|JJ': 2, 'pos[-1]=NN': 2, 'w[-1]=if': 2, 'pos[-1]|pos[0]=VBZ|RB': 2, 'pos[0]|pos[1]=RB|JJ': 2, 'w[0]=necessary': 2, 'w[0]=fairly': 2, 'pos[-1]=IN': 2, 'pos[1]=JJ': 2, 'pos[-1]=VBZ': 2, 'w[-1]|w[0]=if|necessary': 2, 'w[1]=than': 1, 'w[0]|w[1]=bullish|for': 1, 'w[0]=quite': 1, 'pos[0]|pos[1]=$|CD': 1, 'w[-1]|w[0]=-LRB-|$': 1, 'w[0]=firm': 1, 'w[-1]=still': 1, 'pos[-1]|pos[0]=(|$': 1, 'w[1]=that': 1, 'w[-1]=-LRB-': 1, 'w[-1]|w[0]=are|bullish': 1, 'w[-1]|w[0]=it|clear': 1, 'w[-1]|w[0]=,|due': 1, 'w[1]=pessimistic': 1, 'w[0]|w[1]=due|for': 1, 'pos[0]|pos[1]=JJR|IN': 1, 'w[0]|w[1]=higher|if': 1, 'pos[-1]|pos[0]=VBP|JJ': 1, 'w[-1]|w[0]=moment|other': 1, 'w[0]|w[1]=fairly|pessimistic': 1, 'pos[-1]=VBN': 1, 'pos[-1]|pos[0]=RB|RB': 1, 'w[0]|w[1]=$|3.2': 1, 'w[0]=clear': 1, 'w[-1]=,': 1, 'w[-1]=it': 1, 'w[1]=clouded': 1, 'w[0]|w[1]=firm|at': 1, 'w[-1]=pushed': 1, 'pos[-1]|pos[0]=PRP|JJ': 1, 'w[0]|w[1]=quite|strong': 1, 'w[-1]=sterling': 1, 'pos[0]|pos[1]=NN|IN': 1, 'w[0]=higher': 1, 'pos[-1]|pos[0]=NN|JJ': 1, 'w[-1]=are': 1, 'pos[-1]=RB': 1, 'w[0]|w[1]=fairly|clouded': 1, 'w[0]=due': 1, 'w[-1]|w[0]=still|quite': 1, 'w[1]=if': 1, 'w[0]=bullish': 1, 'w[-1]|w[0]=pushed|higher': 1, 'pos[-1]=,': 1, 'pos[-1]|pos[0]=VBN|JJR': 1, 'pos[1]=CD': 1, 'pos[0]=$': 1, 'pos[-1]=(': 1, 'pos[0]|pos[1]=JJ|TO': 1, 'pos[0]=JJR': 1, 'w[1]=strong': 1, 'w[-1]|w[0]=sterling|firm': 1, 'w[0]|w[1]=other|than': 1, 'pos[1]=VBN': 1, 'pos[0]|pos[1]=RB|VBN': 1, 'pos[-1]=VBP': 1, 'pos[1]=.': 1, 'pos[0]|pos[1]=JJ|.': 1, 'w[1]=3.2': 1, 'w[0]=other': 1, 'pos[-1]|pos[0]=NN|NN': 1, 'w[0]|w[1]=clear|that': 1, 'w[0]|w[1]=necessary|to': 1, 'w[1]=at': 1, 'w[1]=.': 1, 'pos[-1]|pos[0]=,|JJ': 1, 'w[-1]=moment': 1, 'w[0]=$': 1, 'w[1]=to': 1, 'w[0]|w[1]=necessary|.': 1, 'pos[0]=NN': 1, 'pos[-1]=PRP': 1, 'pos[1]=TO': 1}), 'O|B-NP': Counter({'O|B-NP': 28}), 'B-PP|I-PP': Counter({'B-PP|I-PP': 1}), 'B-VP|B-PP': Counter({'B-VP|B-PP': 5}), 'B-VP|I-VP': Counter({'B-VP|I-VP': 31}), 'B-ADJP|B-VP': Counter({'B-ADJP|B-VP': 1}), 'B-NP|I-NP': Counter({'B-NP|I-NP': 117}), 'B-VP|B-SBAR': Counter({'B-VP|B-SBAR': 3}), 'B-SBAR|B-NP': Counter({'B-SBAR|B-NP': 13}), 'B-VP|O': Counter({'B-VP|O': 3}), 'I-ADJP|B-PP': Counter({'I-ADJP|B-PP': 1}), 'B-PP': Counter({'B-PP': 72, 'pos[0]=IN': 64, 'pos[-1]=NN': 37, 'pos[-1]|pos[0]=NN|IN': 36, 'pos[1]=DT': 30, 'pos[0]|pos[1]=IN|DT': 28, 'w[1]=the': 24, 'w[0]=in': 14, 'w[0]=of': 12, 'pos[0]|pos[1]=IN|NNP': 12, 'pos[1]=NNP': 12, 'pos[0]|pos[1]=IN|NN': 9, 'pos[1]=NN': 9, 'pos[-1]=NNS': 9, 'w[0]=for': 9, 'pos[-1]|pos[0]=NNS|IN': 7, 'w[0]=from': 7, 'w[1]=a': 6, 'pos[0]=TO': 5, 'w[0]|w[1]=in|the': 5, 'w[0]|w[1]=of|the': 5, 'w[0]=to': 5, 'pos[-1]=VBN': 4, 'pos[-1]|pos[0]=JJ|IN': 4, 'pos[0]|pos[1]=IN|NNS': 4, 'pos[-1]|pos[0]=VBN|IN': 4, 'pos[-1]=VB': 4, 'pos[1]=PRP$': 4, 'w[0]|w[1]=from|the': 4, 'w[1]=sterling': 4, 'pos[1]=NNS': 4, 'pos[0]|pos[1]=IN|PRP$': 4, 'pos[-1]=JJ': 4, 'w[0]=at': 4, 'w[0]=by': 4, 'w[-1]=%': 3, 'pos[1]=CD': 3, 'w[0]|w[1]=for|sterling': 3, 'pos[-1]|pos[0]=VB|TO': 3, 'w[-1]=economist': 3, 'pos[1]=IN': 3, 'w[0]=on': 3, 'w[-1]|w[0]=improvement|from': 2, 'w[1]=his': 2, 'w[-1]=deficit': 2, 'pos[0]|pos[1]=IN|CD': 2, 'w[-1]=sterling': 2, 'pos[-1]=RB': 2, 'w[-1]|w[0]=rise|in': 2, 'pos[0]|pos[1]=TO|DT': 2, 'pos[-1]|pos[0]=VBD|IN': 2, 'w[-1]=improvement': 2, 'pos[0]|pos[1]=TO|RB': 2, 'w[-1]=rise': 2, 'w[0]|w[1]=on|the': 2, 'w[1]=Midland': 2, 'w[-1]=evidence': 2, 'w[1]=September': 2, 'w[1]=July': 2, 'w[-1]|w[0]=economist|at': 2, 'w[-1]|w[0]=narrow|to': 2, 'w[1]=their': 2, 'pos[1]=RB': 2, 'pos[-1]|pos[0]=RB|IN': 2, 'w[0]|w[1]=to|a': 2, 'pos[0]=VBN': 2, 'pos[1]=JJ': 2, 'pos[0]|pos[1]=IN|JJ': 2, 'w[-1]=quarter': 2, 'w[0]=with': 2, 'w[1]=imports': 2, 'w[-1]=figures': 2, 'w[1]=August': 2, 'w[-1]=narrow': 2, 'w[-1]|w[0]=%|from': 2, 'pos[-1]=VBD': 2, 'w[0]|w[1]=by|the': 2, 'w[-1]=impact': 1, 'w[-1]|w[0]=,|given': 1, 'w[-1]|w[0]=compares|with': 1, 'w[0]=rather': 1, 'w[1]=than': 1, 'w[0]|w[1]=In|his': 1, 'w[-1]|w[0]=level|in': 1, 'w[-1]=reduction': 1, "w[-1]=''": 1, 'pos[0]|pos[1]=IN|IN': 1, 'w[-1]=second': 1, 'w[0]|w[1]=of|1988': 1, 'w[-1]|w[0]=rebound|in': 1, 'w[-1]|w[0]=deficit|of': 1, 'w[-1]=rebound': 1, 'w[-1]|w[0]=services|rather': 1, 'w[0]=without': 1, 'w[-1]=lot': 1, 'w[-1]=other': 1, 'w[0]|w[1]=to|only': 1, 'w[1]=eight': 1, 'w[1]=1988': 1, 'pos[0]|pos[1]=VBN|IN': 1, 'w[0]|w[1]=in|exports': 1, 'w[1]=16': 1, 'w[0]|w[1]=Combined|with': 1, 'w[-1]|w[0]=heavily|on': 1, 'w[0]|w[1]=over|the': 1, 'w[-1]|w[0]=second|from': 1, 'pos[-1]=NNP': 1, 'w[0]|w[1]=with|a': 1, 'pos[-1]|pos[0]=NNS|TO': 1, 'w[1]=Nomura': 1, 'w[-1]=fears': 1, 'w[1]=more': 1, 'w[-1]=due': 1, 'pos[0]|pos[1]=TO|CD': 1, 'w[-1]=billion': 1, 'w[-1]|w[0]=much|of': 1, 'w[1]=with': 1, 'w[-1]|w[0]=sterling|over': 1, 'w[0]|w[1]=from|July': 1, 'w[-1]|w[0]=due|for': 1, 'pos[-1]|pos[0]=NN|TO': 1, 'w[0]|w[1]=for|September': 1, 'w[-1]|w[0]=Confidence|in': 1, 'w[0]|w[1]=for|the': 1, 'w[-1]=measures': 1, 'w[1]=monetary': 1, 'w[-1]|w[0]=evidence|of': 1, 'w[1]=Mr.': 1, 'w[-1]|w[0]=pessimistic|about': 1, 'w[0]|w[1]=of|monetary': 1, 'w[-1]|w[0]=measures|in': 1, 'w[1]=pressure': 1, 'w[-1]=Chancellor': 1, 'w[-1]=risk': 1, 'pos[-1]|pos[0]=VBG|IN': 1, 'w[-1]|w[0]=risk|of': 1, 'pos[-1]|pos[0]=VB|IN': 1, 'w[0]|w[1]=without|a': 1, 'w[-1]=lead': 1, 'w[0]|w[1]=for|release': 1, 'pos[-1]=,': 1, 'w[0]|w[1]=in|July': 1, 'w[-1]=registered': 1, 'w[-1]|w[0]=other|than': 1, 'w[-1]=went': 1, 'w[-1]=risks': 1, 'w[-1]=stockbuilding': 1, 'w[-1]=increase': 1, 'pos[-1]=VBP': 1, 'w[-1]=heavily': 1, 'w[-1]=go': 1, 'w[1]=exchange': 1, 'w[-1]|w[0]=%|in': 1, 'pos[-1]=VBG': 1, 'w[-1]=only': 1, 'w[-1]|w[0]=Forecasts|for': 1, 'w[-1]|w[0]=economist|for': 1, 'w[0]|w[1]=of|Midland': 1, 'w[-1]|w[0]=bullish|for': 1, 'w[-1]=Combined': 1, 'w[-1]|w[0]=firm|at': 1, 'w[-1]=unit': 1, 'pos[1]=VBD': 1, 'pos[-1]=CD': 1, "pos[-1]|pos[0]=''|IN": 1, 'w[0]|w[1]=than|Mr.': 1, 'w[0]|w[1]=at|Baring': 1, 'w[0]|w[1]=at|their': 1, 'w[-1]|w[0]=Chancellor|of': 1, 'w[-1]|w[0]=go|into': 1, 'pos[0]|pos[1]=VBN|VBD': 1, 'w[-1]|w[0]=unit|of': 1, 'w[1]=Baring': 1, 'w[0]|w[1]=given|continued': 1, 'w[-1]=undermined': 1, 'w[-1]=eroded': 1, 'w[0]|w[1]=in|September': 1, 'w[-1]|w[0]=quarter|from': 1, 'pos[1]=VBG': 1, 'pos[-1]|pos[0]=NNP|IN': 1, 'w[-1]|w[0]=went|on': 1, 'w[-1]|w[0]=registered|in': 1, 'pos[0]=RB': 1, 'w[-1]|w[0]=commitment|to': 1, 'w[0]|w[1]=by|industry': 1, 'w[-1]|w[0]=figures|without': 1, 'w[-1]=rigor': 1, 'w[-1]|w[0]=rates|to': 1, 'w[0]|w[1]=for|imports': 1, 'w[0]|w[1]=in|raw': 1, 'w[1]=industry': 1, 'w[-1]|w[0]=evidence|on': 1, 'pos[-1]=VBZ': 1, 'w[0]=given': 1, 'w[-1]=outlook': 1, 'w[0]|w[1]=in|sterling': 1, 'w[-1]|w[0]=impact|of': 1, 'w[-1]=exports': 1, 'w[1]=adjusting': 1, 'w[-1]=firm': 1, 'w[-1]=support': 1, 'w[-1]=drop': 1, 'w[-1]=Forecasts': 1, 'w[0]|w[1]=into|the': 1, 'w[-1]|w[0]=fears|of': 1, 'w[-1]=bullish': 1, 'w[-1]|w[0]=are|at': 1, 'w[-1]|w[0]=deficit|in': 1, 'w[-1]|w[0]=billion|in': 1, 'pos[-1]|pos[0]=VBP|IN': 1, 'w[0]|w[1]=at|Nomura': 1, 'w[-1]|w[0]=risks|for': 1, 'w[-1]|w[0]=eroded|by': 1, 'w[-1]=Confidence': 1, 'w[-1]|w[0]=Combined|with': 1, 'w[0]|w[1]=of|a': 1, "w[-1]|w[0]=''|in": 1, 'w[0]|w[1]=with|at': 1, 'w[-1]|w[0]=rigor|of': 1, 'w[0]=into': 1, 'w[-1]=pessimistic': 1, 'w[1]=at': 1, 'w[-1]=are': 1, "pos[-1]=''": 1, 'w[0]|w[1]=of|October': 1, 'w[-1]=compares': 1, 'pos[0]|pos[1]=RB|IN': 1, 'w[-1]=rates': 1, 'w[-1]=much': 1, 'w[0]=Combined': 1, 'w[-1]=,': 1, 'w[0]|w[1]=in|imports': 1, 'w[0]|w[1]=in|eight': 1, 'w[-1]|w[0]=sterling|of': 1, 'w[0]|w[1]=at|the': 1, 'w[1]=as': 1, 'pos[-1]|pos[0]=VBZ|IN': 1, 'w[-1]|w[0]=drop|in': 1, 'w[-1]|w[0]=outlook|for': 1, 'w[1]=exports': 1, 'pos[1]=JJR': 1, 'pos[0]|pos[1]=IN|JJR': 1, 'w[-1]|w[0]=lot|of': 1, 'w[0]=over': 1, 'w[0]=before': 1, 'w[-1]=freefall': 1, 'w[-1]|w[0]=increase|from': 1, 'w[0]|w[1]=for|Midland': 1, 'w[-1]|w[0]=figures|for': 1, 'w[0]|w[1]=from|a': 1, 'w[0]=In': 1, 'w[0]|w[1]=At|the': 1, 'w[1]=raw': 1, 'w[0]|w[1]=on|services': 1, 'w[0]=after': 1, 'w[0]|w[1]=to|16': 1, 'w[1]=release': 1, 'w[-1]|w[0]=reported|for': 1, 'w[-1]=commitment': 1, 'w[-1]=reported': 1, 'w[0]|w[1]=to|as': 1, 'w[-1]=turnaround': 1, 'pos[-1]|pos[0]=NNS|RB': 1, 'w[0]|w[1]=about|the': 1, 'w[0]|w[1]=in|his': 1, 'w[-1]=services': 1, 'w[1]=only': 1, 'w[0]=about': 1, 'w[-1]|w[0]=freefall|in': 1, 'w[-1]|w[0]=undermined|by': 1, 'w[-1]|w[0]=reduction|in': 1, 'w[-1]|w[0]=quarter|of': 1, 'w[1]=continued': 1, 'w[0]|w[1]=by|exchange': 1, 'w[1]=October': 1, 'w[0]=than': 1, 'w[-1]=level': 1, 'w[-1]|w[0]=exports|after': 1, 'w[0]|w[1]=before|adjusting': 1, 'w[0]|w[1]=rather|than': 1, 'w[0]|w[1]=of|pressure': 1, 'w[0]|w[1]=from|their': 1, 'w[1]=services': 1, 'w[0]|w[1]=for|August': 1, 'w[-1]|w[0]=support|for': 1, 'w[-1]|w[0]=stockbuilding|by': 1, 'w[0]|w[1]=of|more': 1, 'pos[-1]|pos[0]=CD|IN': 1, 'w[-1]|w[0]=only|by': 1, 'pos[-1]|pos[0]=,|VBN': 1, 'w[-1]|w[0]=turnaround|before': 1, 'w[0]=At': 1, 'w[-1]|w[0]=lead|to': 1, 'w[0]|w[1]=in|interest': 1, 'w[0]|w[1]=after|August': 1, 'pos[0]|pos[1]=IN|VBG': 1, 'w[1]=interest': 1}), 'I-NP|B-NP': Counter({'I-NP|B-NP': 7}), 'I-VP|B-ADJP': Counter({'I-VP|B-ADJP': 1}), 'B-PP|B-NP': Counter({'B-PP|B-NP': 67}), 'B-ADVP': Counter({'B-ADVP': 15, 'pos[0]=RB': 11, 'pos[0]|pos[1]=RB|,': 5, 'pos[1]=,': 5, 'w[1]=,': 5, 'pos[1]=RB': 3, 'pos[0]=IN': 2, 'pos[0]|pos[1]=RB|IN': 2, 'pos[0]|pos[1]=RB|RB': 2, 'pos[-1]|pos[0]=VBP|RB': 2, 'pos[-1]=VBP': 2, 'pos[1]=IN': 2, 'pos[1]=.': 2, 'w[1]=.': 2, 'pos[-1]|pos[0]=NN|RB': 1, 'w[0]=still': 1, 'w[0]|w[1]=Nevertheless|,': 1, 'w[0]|w[1]=Meanwhile|,': 1, 'w[0]=only': 1, 'w[-1]=topped': 1, 'w[-1]=decline': 1, 'w[-1]|w[0]=range|widely': 1, 'w[-1]|w[0]=was|up': 1, 'w[0]|w[1]=widely|,': 1, 'w[1]=if': 1, 'pos[-1]|pos[0]=IN|IN': 1, 'w[1]=forecasts': 1, 'w[1]=heavily': 1, 'w[-1]=who': 1, 'w[0]|w[1]=only|by': 1, 'w[-1]|w[0]=with|at': 1, 'w[0]|w[1]=Certainly|,': 1, 'w[-1]=are': 1, 'w[-1]=range': 1, 'pos[0]|pos[1]=IN|CD': 1, 'pos[-1]=VB': 1, 'pos[-1]=VBN': 1, 'w[0]=also': 1, 'w[1]=quickly': 1, 'w[0]=ago': 1, 'w[0]=again': 1, 'w[-1]|w[0]=year|ago': 1, 'w[-1]|w[0]=rates|again': 1, 'pos[-1]=NN': 1, 'w[-1]=is': 1, 'w[0]=further': 1, 'pos[-1]=IN': 1, 'pos[-1]=WP': 1, 'w[0]=Certainly': 1, 'w[-1]|w[0]=decline|further': 1, 'pos[0]|pos[1]=JJ|.': 1, 'pos[0]=DT': 1, 'w[0]=Nevertheless': 1, 'w[-1]|w[0]=topped|only': 1, 'w[1]=least': 1, 'w[0]|w[1]=again|if': 1, 'pos[-1]|pos[0]=VBD|IN': 1, 'pos[-1]|pos[0]=VBZ|RB': 1, 'w[0]|w[1]=still|quite': 1, 'w[-1]|w[0]=who|also': 1, 'w[-1]|w[0]=is|still': 1, 'pos[-1]|pos[0]=VBG|DT': 1, 'w[1]=by': 1, 'w[0]=very': 1, 'w[0]=Meanwhile': 1, 'pos[1]=JJS': 1, 'w[-1]=year': 1, 'w[0]|w[1]=very|heavily': 1, 'w[-1]|w[0]=slowing|that': 1, 'w[0]=that': 1, 'pos[1]=CD': 1, 'pos[0]|pos[1]=RB|.': 1, 'w[-1]|w[0]=are|very': 1, 'pos[0]|pos[1]=DT|RB': 1, 'pos[-1]=VBG': 1, 'w[0]|w[1]=at|least': 1, 'pos[-1]|pos[0]=NNS|RB': 1, 'pos[-1]|pos[0]=VB|JJ': 1, 'w[0]|w[1]=also|forecasts': 1, 'w[0]=However': 1, 'w[0]|w[1]=ago|.': 1, 'pos[-1]=NNS': 1, 'pos[1]=VBZ': 1, 'w[1]=3.8': 1, 'w[0]|w[1]=However|,': 1, 'w[0]=widely': 1, 'w[-1]=was': 1, 'pos[-1]=VBZ': 1, 'w[0]|w[1]=further|.': 1, 'pos[0]=JJ': 1, 'w[-1]=with': 1, 'w[-1]=rates': 1, 'w[0]=at': 1, 'pos[-1]=VBD': 1, 'w[1]=quite': 1, 'w[0]|w[1]=that|quickly': 1, 'pos[0]|pos[1]=IN|JJS': 1, 'w[-1]=slowing': 1, 'pos[0]|pos[1]=RB|VBZ': 1, 'pos[-1]|pos[0]=VBN|RB': 1, 'w[0]|w[1]=up|3.8': 1, 'pos[-1]|pos[0]=WP|RB': 1, 'w[0]=up': 1}), 'B-SBAR|B-SBAR': Counter({'B-SBAR|B-SBAR': 1}), 'B-NP|O': Counter({'B-NP|O': 15})}

3.4 Training CRFs model (i.e. estimating parameters)

After we build our CRFs model, we move to the parameters training/estimating phase. In this stage, we basically learn the parameters of a CRFs model where by the end of the training, our CRFs model will be able to decode sequences (i.e. assign labels/tags by using the observations/attributes tracks) with a good performance. To do so, we use the train_model(args) method in the GenericTrainingWorkflow class. The method takes the following arguments:

  • seqs_id: list of sequence ids referring to the sequences to be used for training the model. These ids refer to the same sequences we used for building the model. They are found in the data_split variable.
  • crf_model: instance of a CRFs model such as the one returned by build_crf_model(args) method we used earlier (see this section).
  • optimization_options: dictionary specifying the training method and its corresponding options. Below, we will discuss this in detail.

3.4.1 Training methods -- optimization options

PySeqLab supports multiple optimization options that are grouped into two categories:

  • gradient-based: these methods compute the gradient of the sequences to train model parameters
  • perceptron/search-based: these methods use perceptron-like training that do not use the gradient for learning model parameters

Generally, perceptron training methods are faster to use compared to gradient-based ones. However, gradient-based methods yield solutions (parameter estimates) that are closer to the optimal solution. This does not necessarily mean that such solutions would result in better performing models. But on average, the gradient-based methods generate better estimates resulting in better performing models. NB: exceptions do arise and therefore deciding on which category to choose is problem specific/dependent.

The available training options are described in the following table:

Training option Perceptron/search based Gradient based Reference
COLLINS-PERCEPTRON (Collins, 2002)
SAPO (Sun, 2015)
SGA (Bottou et al., 2004)
SGA-ADADELTA (Zeiler, 2012)
SVRG (Johnson et al., 2013)
L-BFGS-B (Byrd et al., 1995)

For each method we can further specify corresponding options to be used while training.

Training method Options
COLLINS-PERCEPTRON
        'method': 'COLLINS-PERCEPTRON'
        'num_epochs': integer
        'update_type':{'early', 'max-fast', 'max-exhaustive'}
        'shuffle_seq': boolean
        'beam_size': integer
        'avg_scheme': {'avg_error', 'avg_uniform'}
        'tolerance': float
SAPO

        'method': 'SAPO'
        'num_epochs': integer
        'update_type':'early'
        'shuffle_seq': boolean
        'beam_size': integer
        'topK': integer
        'tolerance': float
SGA

         'method': 'SGA'
         'regularization_type': {'l1', 'l2'}
         'regularization_value': float
         'num_epochs': integer
         'tolerance': float
         'learning_rate_schedule': {"bottu", "exponential_decay", "t_inverse", "constant"}
         't0': float
         'alpha': float
         'eta0': float
SGA-ADADELTA
         'method': 'SGA-ADADELTA'
         'regularization_type': {'l1', 'l2'}
         'regularization_value': float
         'num_epochs': integer
         'tolerance': float
         'rho': float
         'epsilon': float
SVRG

         'method': 'SGA'
         'regularization_type': {'l1', 'l2'}
         'regularization_value': float
         'num_epochs': integer
         'tolerance': float
         'learning_rate_schedule': {"bottu", "exponential_decay", "t_inverse", "constant"}
         't0': float
         'alpha': float
         'eta0': float
L-BFGS-B
         'method': {'L-BFGS-B', 'BFGS'}
         'regularization_type': 'l2'
         'regularization_value': float
         (options below are eqivalent to one provided by scipy.optimize package)
         'disp': boolean
         'maxcor': integer, 
         'ftol': float, 
         'gtol': float,
         'eps': float, 
         'maxls': integer,
         'maxiter': integer, 
         'maxfun': integer

General comments on the options provided to perceptron/search-based methods:

  • num_epochs refers to the number of epochs/passes to go through the sequences/segments
  • shuffle_seq decides if to shuffle the sequences in every epoch
  • tolerance represents the threshold to check against as a stopping criterion. If the relative difference between the average decoding error across all sequences between the current and previous epoch is below the threshold (tolerance), then we stop the training.
  • update_type and beam_size are related options. PySeqLab supports inexact search by using beam search (i.e. pruning states falling off a specified beam size) allowing for faster inference and training. This is implemented within the violation-fixing framework (see (Huang et al., 2012) for more details). Hence, if we want to prune states (i.e. use beam search) while learning the parameters, we can specify the update_type we want based on the violation-fixing framework. If we are to use a full beam (i.e. no pruning), then the update_type is irrelevant as we will be doing full update.
  • topK is an available option in SAPO method. It specifies how many sequences to decode/use while learning the parameters of the model. For more info on the procedure, see (Sun, 2015).

Lastly, the learned parameters are based on the average values across the training updates in all the epochs. This provides a sort of regularization and as a result reduces overfitting. As a reminder, many of these options are set by default. What only matters is to specify the method and the rest is figured out automatically.

General comments on the options provided to gradient-based methods:

  • num_epochs refers to the number of epochs/passes to go through the sequences/segments
  • shuffle_seq decides if to shuffle the sequences in every epoch
  • tolerance represents the threshold to check against as a stopping criterion. If the relative difference between the estimated average log-likelihood across all sequences between the current and previous epoch is below the threshold (tolerance), then we stop the training.
  • regularization_type and regularization_value are related options. PySeqLab supports maximum likelihood (MLE) and maximum a posteriori (MAP) estimation. This is done by offering two regularization schemes: (1) L2 regularization (i.e. assuming prior Gaussian distribution on the model weights) and (2) L1 regularization using the approach in (Tsuruoka et al., 2009). Hence, once we specify the type of regularization (i.e. regularization_type), we specify the regularization value (i.e. regularization_value)
  • learning_rate_schedule option for methods SGA and SVRG specifies the computation of the step size parameter multiplying the gradient. constant supports fixed step size while bottu, exponential_decay and t_inverse updates the step size iteratively each using different methods/formulas. All approaches require to specify the initial step size t0. Additionally, exponential_decay option requires the specification of alpha and eta0 (see Tsuruoka et al., 2009 for more info about exponential_decay approach).
  • rho and epsilon options are related to SGA-ADADELTA method. In this method, the step size is an adaptive vector where each component represent a step size for a corresponding parameter in the CRFs model. For more info regarding both options (rho and epsilon) and their role in the computation of the adaptive step size, consult to ADADELTA (Zeiler, 2012).

Again as a reminder, many of these options are set by default. What only matters is to specify the method and the rest is figured out automatically.

3.4.2 Learning/estimating model parameters

Now that we learned about the available training methods, we proceed to train our constructed CRFs model. We will be using first the gradient-based training methods such as {SGA, SGA-ADADELTA, SVRG}. We start by having the parameter weights initialized to 0 and once we run train_model(args) method in the GenericTrainingWorkflow class, the process of optimization and parameter estimation starts. At the end of the process, we will get a new estimates/values for the weights and the trained model will be dumped on disk. Traversing our working directory (i.e. working_dir) we specified earlier, we will find a new created folder called models. In this folder, each time we train a model, we will get a separate folder named based on a generated date/time string. This folder will comprise model_parts folder and another log file named crf_training_log.txt. The model_parts folder holds the saved parts of the trained model. Whereas the log file contains log of the training process (such as the time of finishing an epoch, the estimated average log-likelihood, etc.). Additionally, if the training method is one of {SGA, SGA-ADADELTA, SVRG}, we will get additional file named avg_loglikelihood_training. This file is a numpy array containing estimated average log-likelihood for each epoch. We can plot this array for diagnostics (generally, it should be a growing curve as we are performing gradient ascent). In case we are using L-BFGS-B method, we will have gradient descent as we are optimizing the negative log-likelihood (see scipy.optimize package for more info). The tree path visualization for the working directory is provide below:

working_dir
│   ├── reference_corpus_2017_5_17-8_56_49_631884
│   │   ├── data_split
│   │   ├── global_features
│   │   │   ├── log.txt
│   │   │   ├── seq_1
│   │   │   ├── seq_10
│   │   │   ├── seq_11
│   │   │   ├── seq_12
│   │   │   ├── seq_13
│   │   │   ├── seq_14
│   │   │   ├── seq_15
│   │   │   ├── seq_16
│   │   │   ├── seq_17
│   │   │   ├── seq_18
│   │   │   ├── seq_19
│   │   │   ├── seq_2
│   │   │   ├── seq_20
│   │   │   ├── seq_21
│   │   │   ├── seq_22
│   │   │   ├── seq_23
│   │   │   ├── seq_24
│   │   │   ├── seq_25
│   │   │   ├── seq_3
│   │   │   ├── seq_4
│   │   │   ├── seq_5
│   │   │   ├── seq_6
│   │   │   ├── seq_7
│   │   │   ├── seq_8
│   │   │   ├── seq_9
│   │   ├── model_activefeatures_f_0 
│   ├── models
│   │   ├── 2017_5_17-14_11_15_259071
│   │   │   ├── model_parts
│   │   │   ├── avg_loglikelihood_training
│   │   │   ├── crf_training_log.txt

Below is an example for training a CRFs model using {SGA, SGA-ADADELTA, SVRG} methods. The training will yield three models, each dumped on disk. For every model, we will get a pointer to its model_part folder that could be utilized for reviving and using the model later. We also plot the estimated average log-likelihood of each training method. Another training example, which involves using L-BGFS-B method is provided below. The training method displayed a success message (i.e. optimal solution is found). As a result, we have trained four models in total that use gradient-based training methods.

In [9]:
optimization_options = {"method" : "",
                        "regularization_type": "l2",
                        "regularization_value":0,
                        }
# train models using SGA, SGA-ADADELTA, and SVRG methods
trained_models_dir = {}
for fold in data_split:
    train_seqs_id = data_split[fold]['train']
    for method in ("SGA-ADADELTA", "SGA", "SVRG"):
        optimization_options['method'] = method
        if(method in {'SGA', 'SGA-ADADELTA'}):
            num_epochs = 10
        else:
            num_epochs = 2
        optimization_options['num_epochs'] = num_epochs
        print("trianing using optimization options:")
        print(optimization_options)
        # make sure we are initializing the weights to be 0
        crf_m.weights.fill(0)
        model_dir = workflow.train_model(train_seqs_id, crf_m, optimization_options)
        print("*"*50)
        avg_ll = ReaderWriter.read_data(os.path.join(model_dir, 'avg_loglikelihood_training'))
        plt.plot(avg_ll[1:], label="method:{}, {}:{}".format(optimization_options['method'], 
                                                     optimization_options['regularization_type'],
                                                     optimization_options['regularization_value']))
        trained_models_dir[method] = model_dir
    plt.legend(loc='lower right')
    plt.xlabel('number of epochs')
    plt.ylabel('estimated average loglikelihood')
    eval_options = {'model_eval':True,
                    'metric':'f1',
                    'seqs_info':workflow.seqs_info}
trianing using optimization options:
{'num_epochs': 10, 'regularization_type': 'l2', 'regularization_value': 0, 'method': 'SGA-ADADELTA'}
k  0
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 1.0
k  1
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.0807339930004063
k  2
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.07814365226061681
k  3
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.07272245811875455
k  4
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.06484294853523974
k  5
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.05726206987040894
k  6
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.05061699423333404
k  7
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.045473216413020094
k  8
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.04107757497659218
k  9
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.03775359687720264
**************************************************
trianing using optimization options:
{'num_epochs': 10, 'regularization_type': 'l2', 'regularization_value': 0, 'method': 'SGA'}
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 1.0
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.36415121738832723
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.13723453323278387
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.07472777247369157
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.061716577197920224
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.04486993334964319
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.03795960507268705
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.029295375214934197
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.026372992322026723
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 0.02322538187415348
**************************************************
trianing using optimization options:
{'num_epochs': 2, 'regularization_type': 'l2', 'regularization_value': 0, 'method': 'SVRG'}
num seqs left: 24
num seqs left: 23
num seqs left: 22
num seqs left: 21
num seqs left: 20
num seqs left: 19
num seqs left: 18
num seqs left: 17
num seqs left: 16
num seqs left: 15
num seqs left: 14
num seqs left: 13
num seqs left: 12
num seqs left: 11
num seqs left: 10
num seqs left: 9
num seqs left: 8
num seqs left: 7
num seqs left: 6
num seqs left: 5
num seqs left: 4
num seqs left: 3
num seqs left: 2
num seqs left: 1
num seqs left: 0
reldiff = 1.0
stage 0
average gradient phase: 24 seqs left
average gradient phase: 23 seqs left
average gradient phase: 22 seqs left
average gradient phase: 21 seqs left
average gradient phase: 20 seqs left
average gradient phase: 19 seqs left
average gradient phase: 18 seqs left
average gradient phase: 17 seqs left
average gradient phase: 16 seqs left
average gradient phase: 15 seqs left
average gradient phase: 14 seqs left
average gradient phase: 13 seqs left
average gradient phase: 12 seqs left
average gradient phase: 11 seqs left
average gradient phase: 10 seqs left
average gradient phase: 9 seqs left
average gradient phase: 8 seqs left
average gradient phase: 7 seqs left
average gradient phase: 6 seqs left
average gradient phase: 5 seqs left
average gradient phase: 4 seqs left
average gradient phase: 3 seqs left
average gradient phase: 2 seqs left
average gradient phase: 1 seqs left
average gradient phase: 0 seqs left
we are in round 1 out of 50
we are in round 2 out of 50
we are in round 3 out of 50
we are in round 4 out of 50
we are in round 5 out of 50
we are in round 6 out of 50
we are in round 7 out of 50
we are in round 8 out of 50
we are in round 9 out of 50
we are in round 10 out of 50
we are in round 11 out of 50
we are in round 12 out of 50
we are in round 13 out of 50
we are in round 14 out of 50
we are in round 15 out of 50
we are in round 16 out of 50
we are in round 17 out of 50
we are in round 18 out of 50
we are in round 19 out of 50
we are in round 20 out of 50
we are in round 21 out of 50
we are in round 22 out of 50
we are in round 23 out of 50
we are in round 24 out of 50
we are in round 25 out of 50
we are in round 26 out of 50
we are in round 27 out of 50
we are in round 28 out of 50
we are in round 29 out of 50
we are in round 30 out of 50
we are in round 31 out of 50
we are in round 32 out of 50
we are in round 33 out of 50
we are in round 34 out of 50
we are in round 35 out of 50
we are in round 36 out of 50
we are in round 37 out of 50
we are in round 38 out of 50
we are in round 39 out of 50
we are in round 40 out of 50
we are in round 41 out of 50
we are in round 42 out of 50
we are in round 43 out of 50
we are in round 44 out of 50
we are in round 45 out of 50
we are in round 46 out of 50
we are in round 47 out of 50
we are in round 48 out of 50
we are in round 49 out of 50
we are in round 50 out of 50
reldiff = 1.0
stage 1
average gradient phase: 24 seqs left
average gradient phase: 23 seqs left
average gradient phase: 22 seqs left
average gradient phase: 21 seqs left
average gradient phase: 20 seqs left
average gradient phase: 19 seqs left
average gradient phase: 18 seqs left
average gradient phase: 17 seqs left
average gradient phase: 16 seqs left
average gradient phase: 15 seqs left
average gradient phase: 14 seqs left
average gradient phase: 13 seqs left
average gradient phase: 12 seqs left
average gradient phase: 11 seqs left
average gradient phase: 10 seqs left
average gradient phase: 9 seqs left
average gradient phase: 8 seqs left
average gradient phase: 7 seqs left
average gradient phase: 6 seqs left
average gradient phase: 5 seqs left
average gradient phase: 4 seqs left
average gradient phase: 3 seqs left
average gradient phase: 2 seqs left
average gradient phase: 1 seqs left
average gradient phase: 0 seqs left
we are in round 1 out of 50
we are in round 2 out of 50
we are in round 3 out of 50
we are in round 4 out of 50
we are in round 5 out of 50
we are in round 6 out of 50
we are in round 7 out of 50
we are in round 8 out of 50
we are in round 9 out of 50
we are in round 10 out of 50
we are in round 11 out of 50
we are in round 12 out of 50
we are in round 13 out of 50
we are in round 14 out of 50
we are in round 15 out of 50
we are in round 16 out of 50
we are in round 17 out of 50
we are in round 18 out of 50
we are in round 19 out of 50
we are in round 20 out of 50
we are in round 21 out of 50
we are in round 22 out of 50
we are in round 23 out of 50
we are in round 24 out of 50
we are in round 25 out of 50
we are in round 26 out of 50
we are in round 27 out of 50
we are in round 28 out of 50
we are in round 29 out of 50
we are in round 30 out of 50
we are in round 31 out of 50
we are in round 32 out of 50
we are in round 33 out of 50
we are in round 34 out of 50
we are in round 35 out of 50
we are in round 36 out of 50
we are in round 37 out of 50
we are in round 38 out of 50
we are in round 39 out of 50
we are in round 40 out of 50
we are in round 41 out of 50
we are in round 42 out of 50
we are in round 43 out of 50
we are in round 44 out of 50
we are in round 45 out of 50
we are in round 46 out of 50
we are in round 47 out of 50
we are in round 48 out of 50
we are in round 49 out of 50
we are in round 50 out of 50
reldiff = 0.14461038990143976
**************************************************

In [10]:
# use L-BFGS-B method for training
optimization_options = {"method" : "L-BFGS-B",
                        "regularization_type": "l2",
                        "regularization_value":0,
                        }
# start with 0 weights 
crf_m.weights.fill(0)
model_dir = workflow.train_model(train_seqs_id, crf_m, optimization_options)
trained_models_dir['L-BFGS-B'] = model_dir
print("*"*50)
iteration  1
iteration  2
iteration  3
iteration  4
iteration  5
iteration  6
iteration  7
iteration  8
iteration  9
iteration  10
iteration  11
iteration  12
iteration  13
iteration  14
iteration  15
iteration  16
iteration  17
iteration  18
iteration  19
iteration  20
iteration  21
iteration  22
iteration  23
iteration  24
iteration  25
iteration  26
iteration  27
iteration  28
iteration  29
success:  True
**************************************************

For the perceptron-based training methods, we use (COLLINS-PERCEPTRON, SAPO) for training CRFs model (see code). We specify the training to run for 10 epochs with full beam size (i.e. no pruning), and topK=5 (number of decoded sequences to use) for the SAPO method. We also plot the estimated average decoding error for every epoch. We see that the decoding error is decreasing indicating the learning is going well. The estimated average decoding error is dumped on disk in a file named avg_decodingerror_training which is the equivalent to the avg_loglikelihood_training file for the gradient-based methods. Another example using perceptron-based training is provided in this section. It uses a beam search (i.e pruning) with beam_size=5 (i.e. keeping at most 5 labels at every position) and update_type=early while decoding the given sequences. Again, consult to the training table options for exploring further training options.

In [11]:
# full beam -- using all states/labels detected in the training data
optimization_options = {"method" : "",
                        "num_epochs":10,
                       }
topK = 1
# train models using COLLINS-PERCEPTRON, SAPO
for fold in data_split:
    train_seqs_id = data_split[fold]['train']
    for method in ("COLLINS-PERCEPTRON", "SAPO"):
        optimization_options['method'] = method
        if(method == 'SAPO'):
            topK = 5
        optimization_options['topK'] = topK           
        print("trianing using optimization options:")
        print(optimization_options)
        # make sure we are initializing the weights to be 0
        crf_m.weights.fill(0)
        model_dir = workflow.train_model(train_seqs_id, crf_m, optimization_options)
        print("*"*50)
        avg_ll = ReaderWriter.read_data(os.path.join(model_dir, 'avg_decodingerror_training'))
        plt.plot(avg_ll[1:], label="method:{}, topK:{}".format(optimization_options['method'], topK))
        trained_models_dir[method] = model_dir
    plt.legend(loc='upper right')
    plt.xlabel('number of epochs')
    plt.ylabel('estimated decoding error')
trianing using optimization options:
{'num_epochs': 10, 'topK': 1, 'method': 'COLLINS-PERCEPTRON'}
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 1.0
average error : [0, 0.35434711205734565]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 0.5155563599450866
average error : [0, 0.35434711205734565, 0.11326613073909479]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 0.4049275779718683
average error : [0, 0.35434711205734565, 0.11326613073909479, 0.04797510690904651]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 0.20651393091371315
average error : [0, 0.35434711205734565, 0.11326613073909479, 0.04797510690904651, 0.07294726029593968]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 0.5134288550887981
average error : [0, 0.35434711205734565, 0.11326613073909479, 0.04797510690904651, 0.07294726029593968, 0.023452725802725803]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 0.22356513756451268
average error : [0, 0.35434711205734565, 0.11326613073909479, 0.04797510690904651, 0.07294726029593968, 0.023452725802725803, 0.014882341261064667]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 0.5067569810161425
average error : [0, 0.35434711205734565, 0.11326613073909479, 0.04797510690904651, 0.07294726029593968, 0.023452725802725803, 0.014882341261064667, 0.004871794871794872]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
average error : [0, 0.35434711205734565, 0.11326613073909479, 0.04797510690904651, 0.07294726029593968, 0.023452725802725803, 0.014882341261064667, 0.004871794871794872, 0.0]
self._exitloop True
**************************************************
trianing using optimization options:
{'num_epochs': 10, 'topK': 5, 'method': 'SAPO'}
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 1.0
average error : [0, 0.36756856328769566]
self._exitloop False
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 0.6187776152511111
average error : [0, 0.36756856328769566, 0.08656245486414232]
self._exitloop False
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 0.35438439594021315
average error : [0, 0.36756856328769566, 0.08656245486414232, 0.04126307993028463]
self._exitloop False
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 0.24820481678052284
average error : [0, 0.36756856328769566, 0.08656245486414232, 0.04126307993028463, 0.024852800052800057]
self._exitloop False
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 0.558359815602826
average error : [0, 0.36756856328769566, 0.08656245486414232, 0.04126307993028463, 0.024852800052800057, 0.08769492966397564]
self._exitloop False
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 0.6102869672131207
average error : [0, 0.36756856328769566, 0.08656245486414232, 0.04126307993028463, 0.024852800052800057, 0.08769492966397564, 0.02122345749250348]
self._exitloop False
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 0.13876524851853037
average error : [0, 0.36756856328769566, 0.08656245486414232, 0.04126307993028463, 0.024852800052800057, 0.08769492966397564, 0.02122345749250348, 0.016051051051051052]
self._exitloop False
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 0.5038536943085051
average error : [0, 0.36756856328769566, 0.08656245486414232, 0.04126307993028463, 0.024852800052800057, 0.08769492966397564, 0.02122345749250348, 0.016051051051051052, 0.005295508274231678]
self._exitloop False
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
average error : [0, 0.36756856328769566, 0.08656245486414232, 0.04126307993028463, 0.024852800052800057, 0.08769492966397564, 0.02122345749250348, 0.016051051051051052, 0.005295508274231678, 0.0]
self._exitloop True
**************************************************

In [12]:
# using beam search with 5 states and early update
optimization_options = {"method" : "",
                        "num_epochs":10,
                        "beam_size":5,
                        "update_type":'early'
                        }
topK = 1
# train models using COLLINS-PERCEPTRON, SAPO
for fold in data_split:
    train_seqs_id = data_split[fold]['train']
    for method in ("COLLINS-PERCEPTRON", "SAPO"):
        optimization_options['method'] = method
        if(method == 'SAPO'):
            topK = 5
        optimization_options['topK'] = topK           
        print("trianing using optimization options:")
        print(optimization_options)
        # make sure we are initializing the weights to be 0
        crf_m.weights.fill(0)
        model_dir = workflow.train_model(train_seqs_id, crf_m, optimization_options)
        print("*"*50)
        avg_ll = ReaderWriter.read_data(os.path.join(model_dir, 'avg_decodingerror_training'))
        plt.plot(avg_ll[1:], label="method:{}, topK:{}, beam_size:5".format(optimization_options['method'], topK))
        trained_models_dir[method] = model_dir
    plt.legend(loc='upper right')
    plt.xlabel('number of epochs')
    plt.ylabel('estimated decoding error')
trianing using optimization options:
{'beam_size': 5, 'num_epochs': 10, 'topK': 1, 'update_type': 'early', 'method': 'COLLINS-PERCEPTRON'}
sequences left 25
in early update routine ...
sequences left 24
in early update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in early update routine ...
sequences left 20
in early update routine ...
sequences left 19
in early update routine ...
sequences left 18
in early update routine ...
sequences left 17
in early update routine ...
sequences left 16
in early update routine ...
sequences left 15
in early update routine ...
sequences left 14
in early update routine ...
sequences left 13
in early update routine ...
sequences left 12
in early update routine ...
sequences left 11
in early update routine ...
sequences left 10
in early update routine ...
sequences left 9
in early update routine ...
sequences left 8
in early update routine ...
sequences left 7
in early update routine ...
sequences left 6
in early update routine ...
sequences left 5
in early update routine ...
sequences left 4
in early update routine ...
sequences left 3
in early update routine ...
sequences left 2
in early update routine ...
sequences left 1
in early update routine ...
reldiff = 1.0
average error : [0, 0.6976666666666665]
self._exitloop False
sequences left 25
in early update routine ...
sequences left 24
in early update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in early update routine ...
sequences left 20
in early update routine ...
sequences left 19
in early update routine ...
sequences left 18
in early update routine ...
sequences left 17
in early update routine ...
sequences left 16
in early update routine ...
sequences left 15
in early update routine ...
sequences left 14
in early update routine ...
sequences left 13
in early update routine ...
sequences left 12
in full update routine ...
sequences left 11
in early update routine ...
sequences left 10
in early update routine ...
sequences left 9
in early update routine ...
sequences left 8
in early update routine ...
sequences left 7
in early update routine ...
sequences left 6
in early update routine ...
sequences left 5
in early update routine ...
sequences left 4
in early update routine ...
sequences left 3
in early update routine ...
sequences left 2
in early update routine ...
sequences left 1
in early update routine ...
reldiff = 0.4077144081661351
average error : [0, 0.6976666666666665, 0.29353817235396185]
self._exitloop False
sequences left 25
in early update routine ...
sequences left 24
in early update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in early update routine ...
sequences left 20
in early update routine ...
sequences left 19
in early update routine ...
sequences left 18
in early update routine ...
sequences left 17
in early update routine ...
sequences left 16
in full update routine ...
sequences left 15
in early update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in early update routine ...
sequences left 10
in early update routine ...
sequences left 9
in early update routine ...
sequences left 8
in early update routine ...
sequences left 7
in early update routine ...
sequences left 6
in full update routine ...
sequences left 5
in early update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in early update routine ...
reldiff = 0.33713468766872284
average error : [0, 0.6976666666666665, 0.29353817235396185, 0.14551733201821457]
self._exitloop False
sequences left 25
in early update routine ...
sequences left 24
in full update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in early update routine ...
sequences left 20
in early update routine ...
sequences left 19
in early update routine ...
sequences left 18
in early update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in early update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in early update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 0.056677082089286616
average error : [0, 0.6976666666666665, 0.29353817235396185, 0.14551733201821457, 0.12990708000838924]
self._exitloop False
sequences left 25
in early update routine ...
sequences left 24
in early update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in early update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in early update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 0.4852779451842126
average error : [0, 0.6976666666666665, 0.29353817235396185, 0.14551733201821457, 0.12990708000838924, 0.04501920961920962]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in early update routine ...
sequences left 1
in early update routine ...
reldiff = 0.12363332258808224
average error : [0, 0.6976666666666665, 0.29353817235396185, 0.14551733201821457, 0.12990708000838924, 0.04501920961920962, 0.05772136867881548]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in early update routine ...
reldiff = 0.18861040669611542
average error : [0, 0.6976666666666665, 0.29353817235396185, 0.14551733201821457, 0.12990708000838924, 0.04501920961920962, 0.05772136867881548, 0.03940274928891949]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 0.6100237760429721
average error : [0, 0.6976666666666665, 0.29353817235396185, 0.14551733201821457, 0.12990708000838924, 0.04501920961920962, 0.05772136867881548, 0.03940274928891949, 0.009544042522765927]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
reldiff = 0.14720119729216474
average error : [0, 0.6976666666666665, 0.29353817235396185, 0.14551733201821457, 0.12990708000838924, 0.04501920961920962, 0.05772136867881548, 0.03940274928891949, 0.009544042522765927, 0.012838827838827838]
self._exitloop False
sequences left 25
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in early update routine ...
sequences left 1
in early update routine ...
reldiff = 0.5527881867361767
average error : [0, 0.6976666666666665, 0.29353817235396185, 0.14551733201821457, 0.12990708000838924, 0.04501920961920962, 0.05772136867881548, 0.03940274928891949, 0.009544042522765927, 0.012838827838827838, 0.04457838457838458]
self._exitloop False
**************************************************
trianing using optimization options:
{'beam_size': 5, 'num_epochs': 10, 'topK': 5, 'update_type': 'early', 'method': 'SAPO'}
in early update routine ...
sequences left 24
in early update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in early update routine ...
sequences left 20
in early update routine ...
sequences left 19
in early update routine ...
sequences left 18
in early update routine ...
sequences left 17
in early update routine ...
sequences left 16
in early update routine ...
sequences left 15
in early update routine ...
sequences left 14
in early update routine ...
sequences left 13
in early update routine ...
sequences left 12
in early update routine ...
sequences left 11
in early update routine ...
sequences left 10
in early update routine ...
sequences left 9
in early update routine ...
sequences left 8
in early update routine ...
sequences left 7
in early update routine ...
sequences left 6
in early update routine ...
sequences left 5
in early update routine ...
sequences left 4
in early update routine ...
sequences left 3
in early update routine ...
sequences left 2
in early update routine ...
sequences left 1
in early update routine ...
sequences left 0
reldiff = 1.0
average error : [0, 0.5474681984681985]
self._exitloop False
in early update routine ...
sequences left 24
in early update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in early update routine ...
sequences left 19
in early update routine ...
sequences left 18
in early update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in early update routine ...
sequences left 12
in early update routine ...
sequences left 11
in early update routine ...
sequences left 10
in early update routine ...
sequences left 9
in early update routine ...
sequences left 8
in early update routine ...
sequences left 7
in early update routine ...
sequences left 6
in full update routine ...
sequences left 5
in early update routine ...
sequences left 4
in early update routine ...
sequences left 3
in full update routine ...
sequences left 2
in early update routine ...
sequences left 1
in early update routine ...
sequences left 0
reldiff = 0.4295945587937496
average error : [0, 0.5474681984681985, 0.21843874360933188]
self._exitloop False
in early update routine ...
sequences left 24
in early update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in early update routine ...
sequences left 20
in early update routine ...
sequences left 19
in early update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in early update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in early update routine ...
sequences left 8
in early update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in early update routine ...
sequences left 1
in early update routine ...
sequences left 0
reldiff = 0.15706049183805587
average error : [0, 0.5474681984681985, 0.21843874360933188, 0.15913657790619168]
self._exitloop False
in early update routine ...
sequences left 24
in early update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in early update routine ...
sequences left 19
in early update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in early update routine ...
sequences left 8
in early update routine ...
sequences left 7
in early update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in early update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in early update routine ...
sequences left 0
reldiff = 0.20990757105450653
average error : [0, 0.5474681984681985, 0.21843874360933188, 0.15913657790619168, 0.1039191822416594]
self._exitloop False
in early update routine ...
sequences left 24
in early update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in early update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in early update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in early update routine ...
sequences left 0
reldiff = 0.35525340001973443
average error : [0, 0.5474681984681985, 0.21843874360933188, 0.15913657790619168, 0.1039191822416594, 0.0494383850444971]
self._exitloop False
in full update routine ...
sequences left 24
in early update routine ...
sequences left 23
in full update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in early update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in early update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 0.2153737434450575
average error : [0, 0.5474681984681985, 0.21843874360933188, 0.15913657790619168, 0.1039191822416594, 0.0494383850444971, 0.031916647201568574]
self._exitloop False
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in early update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in early update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in early update routine ...
sequences left 0
reldiff = 0.09584283378687893
average error : [0, 0.5474681984681985, 0.21843874360933188, 0.15913657790619168, 0.1039191822416594, 0.0494383850444971, 0.031916647201568574, 0.03868312990409764]
self._exitloop False
in full update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in early update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in early update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 0.02333852961072602
average error : [0, 0.5474681984681985, 0.21843874360933188, 0.15913657790619168, 0.1039191822416594, 0.0494383850444971, 0.031916647201568574, 0.03868312990409764, 0.040531892039338845]
self._exitloop False
in early update routine ...
sequences left 24
in full update routine ...
sequences left 23
in full update routine ...
sequences left 22
in full update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in full update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in early update routine ...
sequences left 0
reldiff = 0.025906060210627895
average error : [0, 0.5474681984681985, 0.21843874360933188, 0.15913657790619168, 0.1039191822416594, 0.0494383850444971, 0.031916647201568574, 0.03868312990409764, 0.040531892039338845, 0.038484878815913295]
self._exitloop False
in full update routine ...
sequences left 24
in early update routine ...
sequences left 23
in early update routine ...
sequences left 22
in early update routine ...
sequences left 21
in full update routine ...
sequences left 20
in full update routine ...
sequences left 19
in full update routine ...
sequences left 18
in full update routine ...
sequences left 17
in full update routine ...
sequences left 16
in full update routine ...
sequences left 15
in full update routine ...
sequences left 14
in full update routine ...
sequences left 13
in full update routine ...
sequences left 12
in full update routine ...
sequences left 11
in full update routine ...
sequences left 10
in early update routine ...
sequences left 9
in full update routine ...
sequences left 8
in full update routine ...
sequences left 7
in full update routine ...
sequences left 6
in full update routine ...
sequences left 5
in full update routine ...
sequences left 4
in full update routine ...
sequences left 3
in full update routine ...
sequences left 2
in full update routine ...
sequences left 1
in full update routine ...
sequences left 0
reldiff = 0.09653268202446676
average error : [0, 0.5474681984681985, 0.21843874360933188, 0.15913657790619168, 0.1039191822416594, 0.0494383850444971, 0.031916647201568574, 0.03868312990409764, 0.040531892039338845, 0.038484878815913295, 0.04670885879963043]
self._exitloop False
**************************************************

4. Using trained CRFs model

The final part of the workflow is to evaluate the trained models on test/validation sequences (i.e. sequences that we did not use for building and training the CRFs model). In our previous setup, we used all the sequences for training. We could still test the performance of the trained models on those sequences. Although, we are overfitting here, but it is worth to compare the different training methods using a performance measure.

The method use_model(args) in GenericWorkflowTrainer class is used for:
A) reviving a trained model to decode sequences
B) writing the decoded sequences on a file and/or
C) evaluating the decoding performance based on a specified metric

The arguments for the use_model(args) method are:

Args:

  • savedmodel_dir: the path to the trained model dumped on disk. In our setting/example, the trained_models_dir contains the directory of the models we trained.
  • options: dictionary that specifies options needed to performs tasks (A, B and C). Generally, we have two scenarios:
    • If we already parsed and processed the sequences like in our example, then we use seqs_info key, passing the seqs_info instance attribute in our workflow trainer (see code example).

      Example:

      Sequences are parsed and processed (i.e. using seqs_info)
      options = {'seqs_info': dictionary comprising the processed info of the sequences saved on disk
                 'seqbatch_size':integer, number of sequences in a batch to process 
                 'model_eval': boolean, deciding if to evaluate performance
                 'metric': {'f1', 'precision', 'recall', 'accuracy'},
                 'exclude_states': list of labels to exclude form the decoding evaluation,
                 'file_name': string, name of the file where decoded sequences will written on,
                 'sep': string, separator between the columns 
                       (i.e. the tracks, reference label and predicted label when written on file) 
                 'beam_size': integer, the number of states to keep while decoding with beam search
                 }
      
    • If we have new sequences in a file, then we use seq_file key, passing the path to the file containing the sequences to decode (see code example).

      Example:

      Sequences are still in a file (i.e. using seq_file)
      options = {'seqs_file': path to the sequences file
                 'data_parser_options': dictionary stating the options to be used by the parser
                                        for reading the seqs_file (i.e. see here for more info) 
                 'num_seqs': integer, specifying the maximum number of sequences to read from the given file
                 'seqbatch_size':integer, number of sequences in a batch to process 
                 'model_eval': boolean, deciding if to evaluate performance
                 'metric': {'f1', 'precision', 'recall', 'accuracy'},
                 'exclude_states': list of labels to exclude form the decoding evaluation,
                 'file_name': string, name of the file where decoded sequences will written on,
                 'sep': string, separator between the columns
                       (i.e. the tracks, reference label and predicted label when written on file) 
                 'beam_size': integer, the number of states to keep while decoding with beam search
                 }
      

In [13]:
# case of using seqs_info
# eval_options used:
# seqs_info since we already parsed and processed the sequences
# model_eval = True -- we need to evaluate model performance
# metric = f1 -- F1 score
# the other keys are not specified and hence by default, we will not write the decoded sequences to file 
# using full beam
use_options = {'seqs_info':workflow.seqs_info,
                'model_eval':True,
                'metric':'f1'
               }
print("Using seqs_info option")
for method, model_dir in trained_models_dir.items():
    perf = workflow.use_model(model_dir, use_options)
    print("model trained using {} method".format(method))
    metric_chosen = use_options['metric']
    print("metric:{}, value:{}".format(metric_chosen, perf[metric_chosen]))
    print()
Using seqs_info option
sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 0.8366477272727272
model trained using SGA-ADADELTA method
metric:f1, value:0.8366477272727272

sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 0.9730113636363636
model trained using COLLINS-PERCEPTRON method
metric:f1, value:0.9730113636363636

sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 1.0
model trained using L-BFGS-B method
metric:f1, value:1.0

sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 0.9815340909090909
model trained using SVRG method
metric:f1, value:0.9815340909090909

sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 0.9772727272727273
model trained using SAPO method
metric:f1, value:0.9772727272727273

sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 0.9573863636363636
model trained using SGA method
metric:f1, value:0.9573863636363636

In [14]:
train_file = os.path.join(dataset_dir, 'train.txt')
#case of using seq_file
# eval_options used:
# seq_file path to the CoNLL train.txt file
# data_parser_optoins dictionary to parse the train.txt file
# model_eval = True -- we need to evaluate model performance
# metric = f1 -- F1 score
# the other keys are not specified and hence by default, we will not write the decoded sequences to file 
# using full beam
parser_options = {'header': 'main', # main means the header is found in the first line of the file
                  'y_ref':True, # y_ref is a boolean indicating if the label to predict is already found in the file
                  'column_sep': " ",
                  'seg_other_symbol':None # spearator between the words/observations
                  }
use_options = {'seq_file':train_file,
               'data_parser_options':parser_options,
               'num_seqs':25,
               'model_eval':True,
               'metric':'f1'
              }
print("Using seq_file option")
for method, model_dir in trained_models_dir.items():
    perf = workflow.use_model(model_dir, use_options)
    print("model trained using {} method".format(method))
    metric_chosen = use_options['metric']
    print("metric:{}, value:{}".format(metric_chosen, perf[metric_chosen]))
    print()
Using seq_file option
identifying model active features -- processed seqs:  1
identifying model active features -- processed seqs:  2
identifying model active features -- processed seqs:  3
identifying model active features -- processed seqs:  4
identifying model active features -- processed seqs:  5
identifying model active features -- processed seqs:  6
identifying model active features -- processed seqs:  7
identifying model active features -- processed seqs:  8
identifying model active features -- processed seqs:  9
identifying model active features -- processed seqs:  10
identifying model active features -- processed seqs:  11
identifying model active features -- processed seqs:  12
identifying model active features -- processed seqs:  13
identifying model active features -- processed seqs:  14
identifying model active features -- processed seqs:  15
identifying model active features -- processed seqs:  16
identifying model active features -- processed seqs:  17
identifying model active features -- processed seqs:  18
identifying model active features -- processed seqs:  19
identifying model active features -- processed seqs:  20
identifying model active features -- processed seqs:  21
identifying model active features -- processed seqs:  22
identifying model active features -- processed seqs:  23
identifying model active features -- processed seqs:  24
identifying model active features -- processed seqs:  25
sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 0.8366477272727272
model trained using SGA-ADADELTA method
metric:f1, value:0.8366477272727272

identifying model active features -- processed seqs:  1
identifying model active features -- processed seqs:  2
identifying model active features -- processed seqs:  3
identifying model active features -- processed seqs:  4
identifying model active features -- processed seqs:  5
identifying model active features -- processed seqs:  6
identifying model active features -- processed seqs:  7
identifying model active features -- processed seqs:  8
identifying model active features -- processed seqs:  9
identifying model active features -- processed seqs:  10
identifying model active features -- processed seqs:  11
identifying model active features -- processed seqs:  12
identifying model active features -- processed seqs:  13
identifying model active features -- processed seqs:  14
identifying model active features -- processed seqs:  15
identifying model active features -- processed seqs:  16
identifying model active features -- processed seqs:  17
identifying model active features -- processed seqs:  18
identifying model active features -- processed seqs:  19
identifying model active features -- processed seqs:  20
identifying model active features -- processed seqs:  21
identifying model active features -- processed seqs:  22
identifying model active features -- processed seqs:  23
identifying model active features -- processed seqs:  24
identifying model active features -- processed seqs:  25
sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 0.9730113636363636
model trained using COLLINS-PERCEPTRON method
metric:f1, value:0.9730113636363636

identifying model active features -- processed seqs:  1
identifying model active features -- processed seqs:  2
identifying model active features -- processed seqs:  3
identifying model active features -- processed seqs:  4
identifying model active features -- processed seqs:  5
identifying model active features -- processed seqs:  6
identifying model active features -- processed seqs:  7
identifying model active features -- processed seqs:  8
identifying model active features -- processed seqs:  9
identifying model active features -- processed seqs:  10
identifying model active features -- processed seqs:  11
identifying model active features -- processed seqs:  12
identifying model active features -- processed seqs:  13
identifying model active features -- processed seqs:  14
identifying model active features -- processed seqs:  15
identifying model active features -- processed seqs:  16
identifying model active features -- processed seqs:  17
identifying model active features -- processed seqs:  18
identifying model active features -- processed seqs:  19
identifying model active features -- processed seqs:  20
identifying model active features -- processed seqs:  21
identifying model active features -- processed seqs:  22
identifying model active features -- processed seqs:  23
identifying model active features -- processed seqs:  24
identifying model active features -- processed seqs:  25
sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 1.0
model trained using L-BFGS-B method
metric:f1, value:1.0

identifying model active features -- processed seqs:  1
identifying model active features -- processed seqs:  2
identifying model active features -- processed seqs:  3
identifying model active features -- processed seqs:  4
identifying model active features -- processed seqs:  5
identifying model active features -- processed seqs:  6
identifying model active features -- processed seqs:  7
identifying model active features -- processed seqs:  8
identifying model active features -- processed seqs:  9
identifying model active features -- processed seqs:  10
identifying model active features -- processed seqs:  11
identifying model active features -- processed seqs:  12
identifying model active features -- processed seqs:  13
identifying model active features -- processed seqs:  14
identifying model active features -- processed seqs:  15
identifying model active features -- processed seqs:  16
identifying model active features -- processed seqs:  17
identifying model active features -- processed seqs:  18
identifying model active features -- processed seqs:  19
identifying model active features -- processed seqs:  20
identifying model active features -- processed seqs:  21
identifying model active features -- processed seqs:  22
identifying model active features -- processed seqs:  23
identifying model active features -- processed seqs:  24
identifying model active features -- processed seqs:  25
sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 0.9815340909090909
model trained using SVRG method
metric:f1, value:0.9815340909090909

identifying model active features -- processed seqs:  1
identifying model active features -- processed seqs:  2
identifying model active features -- processed seqs:  3
identifying model active features -- processed seqs:  4
identifying model active features -- processed seqs:  5
identifying model active features -- processed seqs:  6
identifying model active features -- processed seqs:  7
identifying model active features -- processed seqs:  8
identifying model active features -- processed seqs:  9
identifying model active features -- processed seqs:  10
identifying model active features -- processed seqs:  11
identifying model active features -- processed seqs:  12
identifying model active features -- processed seqs:  13
identifying model active features -- processed seqs:  14
identifying model active features -- processed seqs:  15
identifying model active features -- processed seqs:  16
identifying model active features -- processed seqs:  17
identifying model active features -- processed seqs:  18
identifying model active features -- processed seqs:  19
identifying model active features -- processed seqs:  20
identifying model active features -- processed seqs:  21
identifying model active features -- processed seqs:  22
identifying model active features -- processed seqs:  23
identifying model active features -- processed seqs:  24
identifying model active features -- processed seqs:  25
sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 0.9772727272727273
model trained using SAPO method
metric:f1, value:0.9772727272727273

identifying model active features -- processed seqs:  1
identifying model active features -- processed seqs:  2
identifying model active features -- processed seqs:  3
identifying model active features -- processed seqs:  4
identifying model active features -- processed seqs:  5
identifying model active features -- processed seqs:  6
identifying model active features -- processed seqs:  7
identifying model active features -- processed seqs:  8
identifying model active features -- processed seqs:  9
identifying model active features -- processed seqs:  10
identifying model active features -- processed seqs:  11
identifying model active features -- processed seqs:  12
identifying model active features -- processed seqs:  13
identifying model active features -- processed seqs:  14
identifying model active features -- processed seqs:  15
identifying model active features -- processed seqs:  16
identifying model active features -- processed seqs:  17
identifying model active features -- processed seqs:  18
identifying model active features -- processed seqs:  19
identifying model active features -- processed seqs:  20
identifying model active features -- processed seqs:  21
identifying model active features -- processed seqs:  22
identifying model active features -- processed seqs:  23
identifying model active features -- processed seqs:  24
identifying model active features -- processed seqs:  25
sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left
f1 0.9573863636363636
model trained using SGA method
metric:f1, value:0.9573863636363636

A direct approach for using a trained CRFs model without the use of GenericWorkflowTrainer class/subclass is through generate_trained_model(args) function found in the utilities module. The function takes the following arguments:

  • model_parts_dir: directory to the model_parts folder of a trained CRFs model (see here for a refresher)
  • aextractor_obj: initialized instance of class/subclass of GenericAttributeExtractor

Below is an example showing how to revive a trained model (the trained model used COLLINS-PERCEPTRON method). Once, we have the revived model, it is ready for decoding sequences. For that purpose, we use the main decoding method
decode_seqs(decoding_method, out_dir, **kwargs) (see code for a demo)

Args:

  1. decoding_method: string defining the decoding method. It is one of {'viterbi', 'per_state_decoding'}. As not all decoders support 'per_state_decoding', we primarily use 'viterbi' for now.
  2. out_dir: string representing the path to the output directory where sequence parsing will take place


Keyword arguments:

The main ones to specify are:

  • We choose one of the following options {'seqs', 'seqs_dict', 'seqs_info'}
    • seqs: list of sequences that are instances of SequenceStruct class
    • seqs_dict: a dictionary comprising of ids as keys (i.e. representing sequence ids) and corresponding sequences that are instances of SequenceStruct class as values
    • seqs_info: dictionary containing the info about already parsed and processed sequences
  • file_name: string representing the name of the file where the CRFs model writes the decoded sequences to (it is optional)
  • sep: string representing separator between the columns/observations when writing decoded sequences on the specified file using file_name keyword argument

Now that we know how to revive a CRFs model and use it for decoding sequences, we need to evaluate how good is our decoding. This will be tackled using SeqDecodingEvaluator class. Its constructor takes an instance of a CRFs model representation class (see this table for a refresher). This is an example demonstrating the use of SeqDecodingEvaluator class for evaluating the decoded sequences. In addition, it shows how to obtain/show the confusion matrix for every decoded label/state.

In [15]:
#print(trained_models_dir)
from pyseqlab.utilities import generate_trained_model
# revive a model that was trained using COLLINS-PERCEPTRON method
model_parts_dir = os.path.join(trained_models_dir['COLLINS-PERCEPTRON'], 'model_parts')
crf_percep = generate_trained_model(model_parts_dir, generic_attr_extractor)
print(crf_percep)
print()

# use viterbi for decoding
decoding_method = 'viterbi'
# use the same directory of the model
output_dir = trained_models_dir['COLLINS-PERCEPTRON']
# use tab as separator
sep = "\t"
decoded_sequences = crf_percep.decode_seqs(decoding_method, output_dir, seqs= seqs[:25], file_name = 'tutorial_seqs_decoding.txt', sep=sep)
print()
# display the decoded sequnces
for seq_id in decoded_sequences:
    print("seq_id ", seq_id)
    print("predicted labels:")
    print(decoded_sequences[seq_id]['Y_pred'])
    print("refernce labels:")
    print(decoded_sequences[seq_id]['seq'].flat_y)
    print("-"*50)
<pyseqlab.fo_crf.FirstOrderCRF object at 0x7fbd68351128>

identifying model active features -- processed seqs:  1
identifying model active features -- processed seqs:  2
identifying model active features -- processed seqs:  3
identifying model active features -- processed seqs:  4
identifying model active features -- processed seqs:  5
identifying model active features -- processed seqs:  6
identifying model active features -- processed seqs:  7
identifying model active features -- processed seqs:  8
identifying model active features -- processed seqs:  9
identifying model active features -- processed seqs:  10
identifying model active features -- processed seqs:  11
identifying model active features -- processed seqs:  12
identifying model active features -- processed seqs:  13
identifying model active features -- processed seqs:  14
identifying model active features -- processed seqs:  15
identifying model active features -- processed seqs:  16
identifying model active features -- processed seqs:  17
identifying model active features -- processed seqs:  18
identifying model active features -- processed seqs:  19
identifying model active features -- processed seqs:  20
identifying model active features -- processed seqs:  21
identifying model active features -- processed seqs:  22
identifying model active features -- processed seqs:  23
identifying model active features -- processed seqs:  24
identifying model active features -- processed seqs:  25
sequence decoded -- 24 sequences are left
sequence decoded -- 23 sequences are left
sequence decoded -- 22 sequences are left
sequence decoded -- 21 sequences are left
sequence decoded -- 20 sequences are left
sequence decoded -- 19 sequences are left
sequence decoded -- 18 sequences are left
sequence decoded -- 17 sequences are left
sequence decoded -- 16 sequences are left
sequence decoded -- 15 sequences are left
sequence decoded -- 14 sequences are left
sequence decoded -- 13 sequences are left
sequence decoded -- 12 sequences are left
sequence decoded -- 11 sequences are left
sequence decoded -- 10 sequences are left
sequence decoded -- 9 sequences are left
sequence decoded -- 8 sequences are left
sequence decoded -- 7 sequences are left
sequence decoded -- 6 sequences are left
sequence decoded -- 5 sequences are left
sequence decoded -- 4 sequences are left
sequence decoded -- 3 sequences are left
sequence decoded -- 2 sequences are left
sequence decoded -- 1 sequences are left
sequence decoded -- 0 sequences are left

seq_id  1
predicted labels:
['B-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'B-SBAR', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'O', 'B-ADJP', 'B-PP', 'B-NP', 'I-NP', 'O', 'I-VP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'O', 'B-NP', 'B-NP', 'I-NP', 'I-NP', 'O']
refernce labels:
['B-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'B-SBAR', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'O', 'B-ADJP', 'B-PP', 'B-NP', 'B-NP', 'O', 'B-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-NP', 'I-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  2
predicted labels:
['O', 'B-PP', 'B-NP', 'I-NP', 'B-NP', 'I-NP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O']
refernce labels:
['O', 'B-PP', 'B-NP', 'I-NP', 'B-NP', 'I-NP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  3
predicted labels:
['O', 'B-NP', 'B-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-NP', 'I-NP', 'O']
refernce labels:
['O', 'B-NP', 'B-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  4
predicted labels:
['B-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'O', 'B-NP', 'O', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'O']
refernce labels:
['B-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'O', 'B-NP', 'O', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'O']
--------------------------------------------------
seq_id  5
predicted labels:
['O', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'B-ADVP', 'I-ADVP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'O', 'B-VP', 'B-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O']
refernce labels:
['O', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'B-ADVP', 'I-ADVP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'O', 'B-VP', 'B-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  6
predicted labels:
['O', 'B-SBAR', 'B-NP', 'B-VP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'O', 'O', 'B-VP', 'B-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O']
refernce labels:
['O', 'B-SBAR', 'B-NP', 'B-VP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'O', 'O', 'B-VP', 'B-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  7
predicted labels:
['B-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-ADVP', 'O', 'O', 'B-NP', 'I-NP', 'B-VP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-PP', 'B-NP', 'O']
refernce labels:
['B-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-ADVP', 'O', 'O', 'B-NP', 'I-NP', 'B-VP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'O', 'O', 'O', 'O', 'O', 'B-ADJP', 'O', 'O', 'O', 'B-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-PP', 'B-NP', 'O']
--------------------------------------------------
seq_id  8
predicted labels:
['B-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'B-PP', 'B-NP', 'B-VP', 'I-VP', 'B-ADVP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'O']
refernce labels:
['B-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'B-PP', 'B-NP', 'B-VP', 'I-VP', 'B-ADVP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  9
predicted labels:
['B-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'O', 'B-VP', 'B-NP', 'B-VP', 'B-NP', 'I-NP', 'B-SBAR', 'B-NP', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'B-VP', 'I-VP', 'B-NP', 'O']
refernce labels:
['B-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'O', 'B-VP', 'B-NP', 'B-VP', 'B-NP', 'I-NP', 'B-SBAR', 'B-NP', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'B-VP', 'I-VP', 'B-NP', 'O']
--------------------------------------------------
seq_id  10
predicted labels:
['B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'B-VP', 'B-ADJP', 'I-ADJP', 'B-PP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'O', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'O']
refernce labels:
['B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'B-VP', 'B-ADJP', 'I-ADJP', 'B-PP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'O', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  11
predicted labels:
['B-NP', 'B-VP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'O']
refernce labels:
['B-NP', 'B-VP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'O']
--------------------------------------------------
seq_id  12
predicted labels:
['B-ADVP', 'O', 'B-NP', 'I-NP', 'B-VP', 'B-NP', 'B-VP', 'B-SBAR', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'B-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'O']
refernce labels:
['B-ADVP', 'O', 'B-NP', 'I-NP', 'B-VP', 'B-NP', 'B-VP', 'B-SBAR', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'B-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'O']
--------------------------------------------------
seq_id  13
predicted labels:
['B-PP', 'B-PP', 'B-ADVP', 'I-ADVP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'B-PP', 'B-NP', 'B-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O']
refernce labels:
['B-PP', 'B-PP', 'B-ADVP', 'I-ADVP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'B-PP', 'B-NP', 'B-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  14
predicted labels:
['B-NP', 'I-NP', 'O', 'B-NP', 'B-ADVP', 'B-VP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'O', 'B-VP', 'B-SBAR', 'B-SBAR', 'I-SBAR', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-ADJP', 'B-PP', 'B-NP', 'O', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'B-NP', 'B-SBAR', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-PP', 'B-VP', 'B-NP', 'O']
refernce labels:
['B-NP', 'I-NP', 'O', 'B-NP', 'B-ADVP', 'B-VP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'I-NP', 'O', 'B-VP', 'B-SBAR', 'B-SBAR', 'I-SBAR', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-ADJP', 'B-PP', 'B-NP', 'O', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'B-NP', 'B-SBAR', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-PP', 'B-VP', 'B-NP', 'O']
--------------------------------------------------
seq_id  15
predicted labels:
['B-ADVP', 'O', 'B-NP', 'B-VP', 'O', 'O', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'B-PP', 'B-NP', 'I-NP', 'O']
refernce labels:
['B-ADVP', 'O', 'B-NP', 'B-VP', 'O', 'O', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'B-PP', 'B-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  16
predicted labels:
['B-ADVP', 'O', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'B-ADJP', 'I-ADJP', 'O']
refernce labels:
['B-ADVP', 'O', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'B-ADJP', 'I-ADJP', 'O']
--------------------------------------------------
seq_id  17
predicted labels:
['B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'B-VP', 'B-SBAR', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'B-SBAR', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-NP', 'O']
refernce labels:
['B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'B-VP', 'B-SBAR', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'B-SBAR', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-NP', 'O']
--------------------------------------------------
seq_id  18
predicted labels:
['B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'O']
refernce labels:
['B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  19
predicted labels:
['O', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'B-SBAR', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'B-ADVP', 'B-ADVP', 'O']
refernce labels:
['O', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'B-SBAR', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'B-ADVP', 'I-ADVP', 'O']
--------------------------------------------------
seq_id  20
predicted labels:
['B-NP', 'I-NP', 'B-VP', 'O', 'B-NP', 'B-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'B-VP', 'B-ADVP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-ADVP', 'O']
refernce labels:
['B-NP', 'I-NP', 'B-VP', 'O', 'B-NP', 'B-VP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'B-VP', 'B-ADVP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-ADVP', 'O']
--------------------------------------------------
seq_id  21
predicted labels:
['B-NP', 'B-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'O']
refernce labels:
['B-NP', 'B-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'B-NP', 'I-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'O']
--------------------------------------------------
seq_id  22
predicted labels:
['B-NP', 'I-NP', 'B-VP', 'B-NP', 'I-NP', 'B-VP', 'B-NP', 'I-NP', 'O', 'B-VP', 'B-ADVP', 'B-ADJP', 'I-ADJP', 'O', 'O', 'O', 'B-NP', 'B-SBAR', 'B-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'B-PP', 'B-NP', 'B-PP', 'I-PP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O']
refernce labels:
['B-NP', 'I-NP', 'B-VP', 'B-NP', 'I-NP', 'B-VP', 'B-NP', 'I-NP', 'O', 'B-VP', 'B-ADVP', 'B-ADJP', 'I-ADJP', 'O', 'O', 'O', 'B-NP', 'B-SBAR', 'B-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'B-PP', 'B-NP', 'B-PP', 'I-PP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O']
--------------------------------------------------
seq_id  23
predicted labels:
['B-ADVP', 'O', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'B-ADJP', 'B-SBAR', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'B-ADVP', 'B-SBAR', 'B-ADJP', 'B-VP', 'I-VP', 'I-VP', 'B-SBAR', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'O', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'B-ADVP', 'O']
refernce labels:
['B-ADVP', 'O', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'B-ADJP', 'B-SBAR', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'B-ADVP', 'B-SBAR', 'B-ADJP', 'B-VP', 'I-VP', 'I-VP', 'B-SBAR', 'B-NP', 'I-NP', 'I-NP', 'B-VP', 'I-VP', 'B-NP', 'O', 'O', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'B-ADVP', 'O']
--------------------------------------------------
seq_id  24
predicted labels:
['B-NP', 'O', 'B-NP', 'B-VP', 'B-NP', 'I-NP', 'B-SBAR', 'B-NP', 'I-NP', 'O', 'B-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'O']
refernce labels:
['B-NP', 'O', 'B-NP', 'B-VP', 'B-NP', 'I-NP', 'B-SBAR', 'B-NP', 'I-NP', 'O', 'B-VP', 'I-VP', 'I-VP', 'B-NP', 'I-NP', 'I-NP', 'B-PP', 'B-NP', 'I-NP', 'B-VP', 'I-VP', 'I-VP', 'B-PP', 'B-NP', 'I-NP', 'I-NP', 'O', 'O']
--------------------------------------------------
seq_id  25
predicted labels:
['B-NP', 'B-VP', 'B-NP', 'B-VP', 'B-NP', 'B-VP', 'B-NP', 'B-ADJP', 'B-PP', 'B-NP', 'I-NP', 'B-ADJP', 'B-PP', 'B-NP', 'I-NP', 'B-NP', 'I-NP', 'B-SBAR', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'B-ADJP', 'B-SBAR', 'B-ADJP', 'O']
refernce labels:
['B-NP', 'B-VP', 'B-NP', 'B-VP', 'B-NP', 'B-VP', 'B-NP', 'B-ADJP', 'B-PP', 'B-NP', 'I-NP', 'B-ADJP', 'B-PP', 'B-NP', 'I-NP', 'B-NP', 'I-NP', 'B-SBAR', 'B-NP', 'B-VP', 'I-VP', 'I-VP', 'B-ADJP', 'B-SBAR', 'B-ADJP', 'O']
--------------------------------------------------

In [16]:
from pyseqlab.crf_learning import SeqDecodingEvaluator
# initialize an evaluator
evaluator = SeqDecodingEvaluator(crf_percep.model)
# evaluate performance 
Y_seqs_dict = GenericTrainingWorkflow.map_pred_to_ref_seqs(decoded_sequences)
taglevel_perf = evaluator.compute_states_confmatrix(Y_seqs_dict)
perf = evaluator.get_performance_metric(taglevel_perf, "f1", exclude_states=[])
perf = evaluator.get_performance_metric(taglevel_perf, "precision", exclude_states=[])
perf = evaluator.get_performance_metric(taglevel_perf, "recall", exclude_states=[])
perf = evaluator.get_performance_metric(taglevel_perf, "accuracy", exclude_states=[])
print()
# demonstrate confusion matrix per label/state
for state, code in crf_percep.model.Y_codebook.items():
    print("confusion_matrix for state={}: ".format(state))
    print(taglevel_perf[code])
    print("-"*40)
f1 0.9730113636363636
precision 0.9730113636363636
recall 0.9730113636363636
accuracy 0.9961444805194806

confusion_matrix for state=B-VP: 
[[  68.    1.]
 [   1.  634.]]
----------------------------------------
confusion_matrix for state=I-ADJP: 
[[   3.    0.]
 [   0.  701.]]
----------------------------------------
confusion_matrix for state=I-SBAR: 
[[   1.    0.]
 [   0.  703.]]
----------------------------------------
confusion_matrix for state=I-PP: 
[[   1.    0.]
 [   0.  703.]]
----------------------------------------
confusion_matrix for state=O: 
[[  73.    1.]
 [   8.  622.]]
----------------------------------------
confusion_matrix for state=I-VP: 
[[  53.    1.]
 [   1.  649.]]
----------------------------------------
confusion_matrix for state=B-SBAR: 
[[  17.    0.]
 [   0.  687.]]
----------------------------------------
confusion_matrix for state=I-ADVP: 
[[   2.    0.]
 [   1.  701.]]
----------------------------------------
confusion_matrix for state=B-ADJP: 
[[  11.    0.]
 [   1.  692.]]
----------------------------------------
confusion_matrix for state=B-ADVP: 
[[  15.    1.]
 [   0.  688.]]
----------------------------------------
confusion_matrix for state=B-NP: 
[[ 170.    5.]
 [   3.  526.]]
----------------------------------------
confusion_matrix for state=I-NP: 
[[ 199.    9.]
 [   4.  492.]]
----------------------------------------
confusion_matrix for state=B-PP: 
[[  72.    1.]
 [   0.  631.]]
----------------------------------------

5. Applications

There are still many aspects to explore and experiment with. We show the potentials of the package by developing applications in three different areas:

  1. Natural language processing (i.e. shallow parsing, part-of-speech tagging, bio-medical named entity recognition)
  2. Wearable and sensor measurement sequences (human activity recognition)
  3. Classification of Eukaryotic splice-junction sequences

Each application has its own repository including notebook tutorials showing the:

  • statement of the problem
  • model building and training procedure
  • evaluation of the decoding performance
  • process of reviving and using the trained models

6. Literature & references

Bottou, L., & Le Cun, Y. (2004). Large Scale Online Learning. Advances in Neural Information Processing Systems, 16, 217–225.

Collins, M. (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP ’02 (pp. 1–8). doi:10.3115/1118693.1118694

Cuong, V. N., Ye, N., Lee, W. S., & Chieu, H. L. (2014). Conditional Random Field with High-order Dependencies for Sequence Labeling and Segmentation. Journal of Machine Learning Research, 15, 981–1009. Huang, L., Fayong, S., & Guo, Y. (2012).

Structured perceptron with inexact search. 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 142–151. Retrieved from http://dl.acm.org/citation.cfm?id=2382029.2382049

Johnson, R., & Zhang, T. (2013). Accelerating Stochastic Gradient Descent using Predictive Variance Reduction. Advances in Neural Information Processing Systems 26, 1(3), 315–323.

Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., & Collier, N. (2004). Introduction to the Bio-entity Recognition Task at JNLPBA. Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, 70–75. doi:10.3115/1567594.1567610

Lafferty, J., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML ’01 Proceedings of the Eighteenth International Conference on Machine Learning, 8(June), 282–289. doi:10.1038/nprot.2006.61

Sarawagi, S., & Cohen, W. W. (2005). Semi-Markov Conditional Random Fields for Information Extraction. Advances in Neural Information Processing Systems, 1185–1192. doi:10.1.1.128.3524

Soong, F. K., & Huang, E.-F. (1990). A tree-trellis based fast search for finding the N Best sentence hypotheses in continuous speech recognition. Proceedings of the Workshop on Speech and Natural Language - HLT ’90, 12–19. doi:10.3115/116580.116591

Sun, X. (2015). Towards Shockingly Easy Structured Classification: A Search-based Probabilistic Online Learning Framework. Retrieved from http://arxiv.org/abs/1503.08381

Tsuruoka, Y., Tsujii, J., & Ananiadou, S. (2009). Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 1, 477. doi:10.3115/1687878.1687946

Vieira, T., Cotterell, R., & Eisner, J. (2016). Speed-Accuracy Tradeoffs in Tagging with Variable-Order CRFs and Structured Sparsity. In EMNLP.

Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269. doi:10.1109/TIT.1967.1054010

Ye, N., Lee, W. S., Chieu, H. L., & Wu, D. (2009). Conditional Random Fields with High-Order Features for Sequence Labeling. Neural Information Processing Systems, 2, 2. Retrieved from http://www.comp.nus.edu.sg/~leews/publications/nips09_paper.pdf

Zeiler, M. D. (2012). ADADELTA: An Adaptive Learning Rate Method. Retrieved from http://arxiv.org/abs/1212.5701

In [ ]: