Research

18 views

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation

A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation
of 8

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Tags
Transcript
  A Weighted Finite State Transducer Implementation of the AlignmentTemplate Model for Statistical Machine Translation Shankar Kumar and William Byrne Center for Language and Speech Processing, Johns Hopkins University,3400 North Charles Street, Baltimore, MD, 21218, USA   skumar,byrne   @jhu.edu Abstract We present a derivation of the alignment tem-plate model for statistical machine translationand an implementation of the model usingweighted finite state transducers. The approachwe describe allows us to implement each con-stituent distribution of the model as a weightedfinite state transducer or acceptor. We showthat bitext word alignment and translation un-der the model can be performed with standardFSM operations involving these transducers.One of the benefits of using this framework is that it obviates the need to develop special-ized search procedures, even for the generationof lattices or N-Best lists of bitext word align-ments and translation hypotheses. We evaluatetheimplementationofthemodelontheFrench-to-English Hansards task and report alignmentand translation performance. 1 Introduction The Alignment Template Translation Model(ATTM) (Och et al., 1999) has emerged as a promisingmodeling framework for statistical machine translation.The ATTM attempts to overcome the deficiencies of word-to-word translation models (Brown et al., 1993)through the use of phrasal translations. The overallmodel is based on a two-level alignment between thesource and the target sentence: a phrase-level alignmentbetween source and target phrases and a word-levelalignment between words in these phrase pairs.The goal of this paper is to reformulate the ATTMso that the operations we intend to perform under a sta-tistical translation model, namely bitext word alignmentand translation, can be implementation using standardweighted finite state transducer (WFST) operations. Ourmain motivation for a WFST modeling framework liesin the resulting simplicity of alignment and translationprocesses compared to dynamic programming or   de-coders. The WFST implementationallows us to use stan-dardoptimizedalgorithmsavailable froman off-the-shelf FSM toolkit (Mohri et al., 1997). This avoids the need todevelop specialized search procedures, even for the gen- TEMPLATESEQUENCEMODELPERMUTATIONMODELPHRASEPHRASALTRANSLATIONMODELTARGETLANGUAGE MODEL v 2 v 31 v SOURCESEGMENTATIONMODEL uz  TARGET LANGUAGE SENTENCE SENTENCESOURCE LANGUAGE source language phrasesalignment templatestarget language phrases ffff  f  f  2 3 456 f  7 v 21 vv 3 zz 123 uu 12e 1 e 21 e 4 e 5 e 6 eee 3 e3 789 aaa 231 Figure 1: ATTM Architecture.eration of lattices or N-best lists of bitext word alignmentor translation hypotheses.Weighted Finite State Transducers for Statistical Ma-chine Translation (SMT) have been proposed in theliterature to implement word-to-word translation mod-els (Knight and Al-Onaizan, 1998) or to perform trans-lation in an application domain such as the call routingtask (Bangalore and Ricardi, 2001). One of the objec-tives of these approaches has been to provide an imple-mentation for SMT that uses standard FSM algorithmsto performmodel computationsand therefore make SMTtechniques accessible to a wider community. Our WFSTimplementation of the ATTM has been developed withsimilar objectives.We start off by presenting a derivation of the ATTMthat identifies the conditional independence assumptionsthat underly the model. The derivation allows us to spec-ify each component distribution of the model and imple-ment it as a weighted finite state transducer. We thenshow that bitext word alignment and translation can beperformed with standard FSM operations involving thesetransducers. Finally we report bitext word alignmentand translationperformanceof the implementationon theCanadian French-to-English Hansards task.  Edmonton, May-June 2003 Main Papers , pp. 63-70 Proceedings of HLT-NAACL 2003  2 Alignment Template Translation Models We present here a derivation of the alignment templatetranslation model (ATTM) (Och et al., 1999; Och, 2002)and give an implementation of the model using weightedfinite state transducers (WFSTs). The finite state model-ing is performed using the AT&T FSM Toolkit (Mohri etal., 1997).In this model, the translation of a source language sen-tence to a target language sentence is described by a jointprobability distribution over all possible segmentationsand alignments. This distribution is presented in Figure 1and Equations 1-7. The components of the overall trans-lation model are the source language model (Term 2),the source segmentation model (Term 3), the phrase per-mutation model (Term 4), the template sequence model(Term 5), the phrasal translation model (Term 6) and thetarget language model (Term 7). Each of these condi-tionaldistributionsis modeledindependentlyandwe nowdefine each in turn and present its implementation as aweighted finite state acceptor or transducer.  (1)  (2)  (3)  (4)  (5)   (6)  (7)We begin by distinguishing words and phrases. We as-sume that   is a phrase in the target language sentencethat has length   and consists of words   .Similarly,a phrase   inthe sourcelanguagesentencecon-tains words    , where   is the NULL token.We assume that each word in each language can be as-signed to a unique class so that   unambiguously spec-ifies a class sequence   and   specifies the class se-quence   . Throughout the model, if a sentence   issegmented into phrases   , we say   to indi-cate that the words in the phrase sequence agree with thesrcinal sentence. Source LanguageModel  The modelassigns probabil-ity to any sentence   in the source language; this prob-ability is not actually needed by the translation processwhen   is given. As the first component in the model, afinite state acceptor   is constructed for   . Source SegmentationModel  We introduce the phrasecount random variable   which specifies the number of phrases in a particular segmentation of the source lan-guage sentence. For a sentence of length   , there are  ways to segment it into   phrases. Motivated bythis, we choose the distribution      as    (8)so that      .We construct a joint distribution over all phrase seg-mentations    as        (9)where      The normalization constant  , is chosen so that  .Here,   is a “unigram” distribution over sourcelanguage phrases; we assume that we have an inventoryof phrases from which this quantity can be estimated. Inthis way, the likelihood of a particular segmentation isdetermined by the likelihood of the phrases that result.We now describe the finite state implementation of thesourcesegmentationmodelandshowhowto computethemost likely segmentation under the model:          .1. For each source language sentence   to be trans-lated, we implement a weighted finite state trans-ducer   that segments the sentence into all possiblephrase sequences   permissible given the inven-tory of phrases. The score of a segmentation  under   is givenby   . We then generatea lattice of segmentations of    (implemented as anacceptor   ) by composing it with the transducer   ,i.e.   .2. We then decompose   into   disjoint subsets    so that  contains all segmentations of the source languagesentence with exactly   phrases. To construct   ,we create an unweighted acceptor     that acceptsany phrase sequence of length   ; for efficiency, thephrase vocabulary is restricted to the phrases in   .  is then obtained by the finite state composition    .3. For    The normalization factors   are obtained by sum-ming the probabilities of all segmentations in   .This sum can be computed efficiently using latticeforward probabilities (Wessel et al., 1998). Fora fixed   , the most likely segmentation in   isfound as      (10)4. Finally we select the optimal segmentation as        (11)  A portion of the segmentation transducer   for theFrench sentence  nous avons une inflation galopante  ispresented in Figure 2. When composed with   ,   gen-erates the following two phrase segmentations:  nousavons une inflation galopante  and  nous avons une in- flation galopante . The  “ ”  symbol is used to indicatephrases formed by concatenation of consecutive words.The phrases specified by the source segmentation modelremain in the order that they appear in the source sen-tence.        n      o       u       s       :       ε    a  v  o  n  s  :     ε : une ε avons: ε nous: ε inflation: ε galopante: ε une: ε inflation: ε galopante: ε      ε         :        n      o       u       s          /           0  .          0          0          2         4    ε   :  a  v  o  n  s  /    0.   0   0   0   3 ε : no us_ a vo ns_  u ne /5.8 2e−6 i  n f   l  a t  i  o n  _ g  a l  o  p a n t  e  /   4  .8  e − 7   ε   :  ε       :                   u                  n                  e                       _i                          n                  f                           l                           a                  t                       i                          o                  n                       _        g                 a                  l                           o                           p                  a                  n                  t                       e                   /                           4                          .    8                          e                  −           7                           Figure2: Aportionofthephrasesegmentationtransducer  for the sentence “nous avons une inflation galopante”. Phrase Permutation Model  We now define a modelfor the reordering of phrase sequences as determinedby the previous model. The phrase alignment sequence  specifies a reordering of phrases into target languagephrase order; the words within the phrases remain in thesource language order. The phrase sequence   is re-ordered into            . The phrase alignment se-quence is modeled as a first order Markov process      (12)    with     . The alignment sequence distri-bution is constructed to assign decreasing likelihood tophrase re-orderings that diverge from the srcinal wordorder. Suppose         and        , we set theMarkov chain probabilities as follows (Och et al., 1999)                    (13)In the above equations,   is a tuning factor andwe normalize the probabilities     so that           .The finite state implementation of this model involvestwo acceptors. We first build a unweighted permutationacceptor    that contains all permutations of the phrasesequence   in the source language (Knight and Al-Onaizan, 1998) . We note that a path through    corre-sponds to an alignment sequence   . Figure 3 shows theacceptor    for the source phrase sequence  nous avonsune inflation galopante .A source phrase sequence   of length   words re-quires a permutation acceptor    of    states. Forlongphrasesequenceswe computea score        for each arc and then prune the arcs by thisscore, i.e. phrase alignments containing   are in-cluded only if this score is above a threshold. Pruningcan therefore be applied while    is constructed. nousavonsavons u n e  _i  n f  l  a t i  o n  _ g a l  o  p a n t e  u n e  _i n f  l a t i o n  _ g a l o  p a n t e u n e  _i n f  l a t i o n  _ g a l o  p a n t e   n o  u s u n e  _i  n f  l  a t i  o n  _ g a l  o  p a n t e    n o  u s avonsavons   n o  u s Figure 3: The permutation acceptor    for thesource-language phrase sequence  nous avonsune inflation galopante .The second acceptor   in the implementation of thephrase permutation model assigns alignment probabil-ities (Equation 13) to a given permutation   of thesource phrase sequence   (Figure 4). In this example,the phrases in the source phrase sequence are specified asfollows:   ( nous ),   ( avons ) and     ( une inflation galopante ). We now show the computa-tion of some of the alignment probabilities (Equation 13)in this example (    )                                  Normalizingthesetermsgives       and        . Template Sequence Model  Here we describe themain component of the model. An alignment template    specifies the allowable alignmentsbe-tween the class sequences   and   .   is a      binary, 0/1 valued matrix which is constructedas follows: If     can be aligned to    , then     ;otherwise     . This process may allow    to alignwith the NULL token   , i.e.     , so that wordscan be freely inserted in translation. Given a pair of classsequences   and   , we specify exactly one matrix  .We say that     is  consistent   with thetarget language phrase   and the source language phrase  nous / 0 .47  nous/0.33nous/0.45   /  0.   3   3   a   v  o   n  s u  n  e   _i   n  f   l   a  t  i   o  n   _ g  a  l   o   p  a  n  t  e   /   0   . 3   3       a   v   o   n   s    /    0 .    5    3 avons/0.53une_inflation_galopante/0.55 u   n   e    _i    n   f     l    a   t    i    o   n    _ g   a   l    o     p   a   n   t    e    /     0    . 4    7     Figure 4: Acceptor   that assigns probabilities to per-mutations of the source language phrase sequence  nousavons une inflation galopante  (    ).  if    is the class sequence for   and   is the classsequence for   .In Section 4.1, we will outline a procedure to builda library of alignment templates from bitext word-levelalignments. Eachtemplate     usedinourmodel has an index   in this template library. Thereforeany operation that involves a mapping to (from) templatesequences will be implemented as a mapping to (from) asequence of these indices.We have described the segmentation and permutationprocesses that transform a source language sentence intophrases in target language phrase order. The next stepis to generate a consistent sequence of alignment tem-plates. We assume that the templates are conditionallyindependent of each other and depend only on the sourcelanguage phrase which generated each of them       (14)We will implementthismodelusingthetransducer     thatmaps any permutation            of the phrase se-quence   into a template sequence   with probabilityas in Equation 14. For every phrase   , this transducer al-lows only the templates   that are consistent with   withprobability     , i.e.        enforces the consis-tency between each source phrase and alignment tem-plate. Phrasal Translation Model  We assume that a tar-get phrase is generated independently by each alignmenttemplate and source phrase       (15)This allows us to describe the  phrase-internal transla-tion model     as follows. We assume that eachword in the target phrase is produced independently andthat the consistency is enforced between the words in  and the class sequence   so that       if       .We nowintroducethewordalignmentvariables     , which indicates that    is aligned to   within   and   .                                                                       (16)The term       is a translation dictionary (Och andNey, 2000) and            is obtained as                   (17)We have assumed that             , i.e. thatgiventhetemplate, wordalignmentsdonotdependonthesource language phrase.For a given phrase   and a consistent alignment tem-plate     , a weighted acceptor   can beconstructed to assign probabilityto translated phrases ac-cording to Equations 16 and 17.   is constructed fromfour component machines   ,   ,   and   , constructed asfollows.The first acceptor   implements the alignment matrix  . It has     states andbetween anypair of states    and   , each arc   corresponds to a word alignment vari-able      . Thereforethe numberof transitionsbetweenstates   and     is equal to the number of non-zero val-ues of     . The    arc from state     to   has probability           (Equation 17).Thesecondmachine   isanunweightedtransducerthatmaps the index     in the phrase   tothe corresponding word    .The third transducer is the lexicon transducer   thatmaps the source word    to the target word   with probability   .Thefourthacceptor   is unweightedandallowsall tar-get word sequences   which can be specified by the  inflationawayrun 3/0.5  A z F = une inflation galopanteE = run away inflation i=2i=3i=1 C 0 : NULL1 : une2 : inflation3 : galopante DIO inflation /0.5/0.01/0.44runaway Z 2/0.53/1.00/1.0 1230123 E F inflationgalopante : inflation / 0.04galopante : run / 0.50: inflation / 0.85NULL : away / 0.01 Figure 5: Component transducers to construct the accep-tor   for an alignment template   .class sequence   .   has     states. The numberof transitions between states     and   is equal to thenumber of target language words with class specified by   .Figure 5 shows all the four component FSTs for build-ing the transducer   corresponding to an alignment tem-plate from our library. Having built these four machines,we obtain   as follows. We first compose the four trans-ducers,projectthe resultingtransducerontotheoutputla-bels, and determinize it under the       semiring. Thisis implemented using AT&T FSM tools as follows  fsmcompose  O I D C   fsmproject   -o     fsmrmepsilon   fsmdeterminize    .Given an alignment template   and a consistent sourcephrase   , we note that the composition and determiniza-tion operations assign the probability     (Equa-tion 16) to each consistent target phrase   . This summa-rizes the construction of a transducer for a single align-ment template.We now implement a transducer   that maps se-quences of alignment templates to target language wordsequences. We identify all templates consistent with thephrases in the source language phrase sequence   . Thetransducer   is constructedvia the FSM unionoperationof the transducers that implement these templates.For the source phrase sequence     ( nous avonsune inflation galopante ), we show the transducer   inFigure 6. Our example library consists of three tem-plates   ,   and    .   maps the source word  nous to the target word  we  via the word alignment matrix  specified as    .   maps the source word avons  to the target phrase  have a  via the word align-ment matrix   specified as      .    maps : have ε ε :a  /0.42/0.07 : run ε : away ε ε :  ε ε :  ε  ε 2 z :  ε z 3:  ε :  ε  ε : we Z1z 1:  ε  /0.72 /0.44: ε inflation  /0.5/0.01 Z3Z2 Figure 6: Transducer   that maps the source templatesequence     into target phrase sequences     .the source phrase  une inflation    galopante  to the targetphrase  run away inflation  via the word alignment matrix  specified as             .  is built out of the three component acceptors   ,  , and    . The acceptor    corresponds to the map-ping from the template    and the source phrase    to allconsistent target phrases    . Target Language Model  We specify this model as         where   enforces the requirement that wordsin the translation agree with those in the phrase sequence.We note that       is modeled as a standard backoff trigramlanguagemodel(Stolcke,2002). Suchalanguagemodel can be easily compiled as a weighted finite stateacceptor (Mohri et al., 2002). 3 Alignment and Translation Via WFSTs We will now describe how the alignment template trans-lationmodelcanbeusedtoperformword-levelalignmentof bitexts and translation of source language sentences.Given a source language sentence   and a target sen-tence   , the word-to-word alignment between the sen-tences can be found as            The variables   specify the alignmentbetweensourcephrasesandtargetphraseswhile   givesthe word-to-wordalignment within the phrase sequences.Given a source language sentence   , the translationcan be found as          
Advertisement
Related Documents
View more
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks