rain1024 commited on 25 days ago

Commit

36a70ab

1 Parent(s): 0ec913f

Add references folder with research papers (markdown, tex, source files)

Browse files

Files changed (32) hide show

references/2001.icml.lafferty/paper.md +1498 -0
references/2001.icml.lafferty/paper.tex +1497 -0
references/2014.eacl.nguyen/paper.md +420 -0
references/2014.eacl.nguyen/paper.tex +419 -0
references/2018.naacl.vu/paper.md +147 -0
references/2018.naacl.vu/paper.tex +302 -0
references/2018.naacl.vu/source/VnCoreNLP.bbl +169 -0
references/2018.naacl.vu/source/VnCoreNLP.tex +302 -0
references/2018.naacl.vu/source/naacl_natbib.bst +1552 -0
references/2018.naacl.vu/source/naaclhlt2018.sty +543 -0
references/2020.emnlp.nguyen/paper.md +123 -0
references/2020.emnlp.nguyen/paper.tex +301 -0
references/2020.emnlp.nguyen/source/acl_natbib.bst +1975 -0
references/2020.emnlp.nguyen/source/emnlp2020.sty +560 -0
references/2020.emnlp.nguyen/source/emnlp2020_PhoBERT.bbl +227 -0
references/2020.emnlp.nguyen/source/emnlp2020_PhoBERT.tex +301 -0
references/2021.naacl.nguyen/paper.md +167 -0
references/2021.naacl.nguyen/paper.tex +641 -0
references/2021.naacl.nguyen/source/acl_natbib.bst +1979 -0
references/2021.naacl.nguyen/source/minted.sty +1212 -0
references/2021.naacl.nguyen/source/naacl2021.bbl +180 -0
references/2021.naacl.nguyen/source/naacl2021.sty +310 -0
references/2021.naacl.nguyen/source/naacl2021.tex +641 -0
references/2021.naacl.nguyen/source/refs.bib +625 -0
references/README.md +43 -0
references/python_crfsuite.md +131 -0
references/research_vietnamese_pos/README.md +145 -0
references/research_vietnamese_pos/bibliography.bib +171 -0
references/research_vietnamese_pos/papers.md +378 -0
references/research_vietnamese_pos/sota.md +95 -0
references/underthesea.md +137 -0
references/universal_dependencies.md +100 -0

references/2001.icml.lafferty/paper.md ADDED Viewed

	@@ -0,0 +1,1498 @@

+---
+title: "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data"
+authors:
+  - "John D. Lafferty"
+  - "Andrew McCallum"
+  - "Fernando C. N. Pereira"
+year: 2001
+venue: "ICML"
+url: "https://dl.acm.org/doi/10.5555/645530.655813"
+---
+# **Conditional Random Fields: Probabilistic Models** **for Segmenting and Labeling Sequence Data**
+**John Lafferty** _[†∗]_ LAFFERTY@CS.CMU.EDU
+**Andrew McCallum** _[∗†]_ MCCALLUM@WHIZBANG.COM
+**Fernando Pereira** _[∗‡]_ FPEREIRA@WHIZBANG.COM
+_∗_ WhizBang! Labs–Research, 4616 Henry Street, Pittsburgh, PA 15213 USA
+_†_ School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 USA
+_‡_ Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104 USA
+**Abstract**
+We present _conditional random fields_, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars
+for such tasks, including the ability to relax
+strong independence assumptions made in those
+models. Conditional random fields also avoid
+a fundamental limitation of maximum entropy
+Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states
+with few successor states. We present iterative
+parameter estimation algorithms for conditional
+random fields and compare the performance of
+the resulting models to HMMs and MEMMs on
+synthetic and natural-language data.
+**1. Introduction**
+The need to segment and label sequences arises in many
+different problems in several scientific fields. Hidden
+Markov models (HMMs) and stochastic grammars are well
+understood and widely used probabilistic models for such
+problems. In computational biology, HMMs and stochastic grammars have been successfully used to align biological sequences, find sequences homologous to a known
+evolutionary family, and analyze RNA secondary structure
+(Durbin et al., 1998). In computational linguistics and
+computer science, HMMs and stochastic grammars have
+been applied to a wide variety of problems in text and
+speech processing, including topic segmentation, part-ofspeech (POS) tagging, information extraction, and syntactic disambiguation (Manning & Sch¨utze, 1999).
+HMMs and stochastic grammars are generative models, assigning a joint probability to paired observation and label
+sequences; the parameters are typically trained to maxi
+mize the joint likelihood of training examples. To define
+a joint probability over observation and label sequences,
+a generative model needs to enumerate all possible observation sequences, typically requiring a representation
+in which observations are task-appropriate atomic entities,
+such as words or nucleotides. In particular, it is not practical to represent multiple interacting features or long-range
+dependencies of the observations, since the inference problem for such models is intractable.
+This difficulty is one of the main motivations for looking at
+conditional models as an alternative. A conditional model
+specifies the probabilities of possible label sequences given
+an observation sequence. Therefore, it does not expend
+modeling effort on the observations, which at test time
+are fixed anyway. Furthermore, the conditional probability of the label sequence can depend on arbitrary, nonindependent features of the observation sequence without
+forcing the model to account for the distribution of those
+dependencies. The chosen features may represent attributes
+at different levels of granularity of the same observations
+(for example, words and characters in English text), or
+aggregate properties of the observation sequence (for instance, text layout). The probability of a transition between
+labels may depend not only on the current observation,
+but also on past and future observations, if available. In
+contrast, generative models must make very strict independence assumptions on the observations, for instance conditional independence given the labels, to achieve tractability.
+Maximum entropy Markov models (MEMMs) are conditional probabilistic sequence models that attain all of the
+above advantages (McCallum et al., 2000). In MEMMs,
+each source state [1] has a exponential model that takes the
+observation features as input, and outputs a distribution
+over possible next states. These exponential models are
+trained by an appropriate iterative scaling method in the
+1Output labels are associated with states; it is possible for several states to have the same label, but for simplicity in the rest of
+this paper we assume a one-to-one correspondence.
+maximum entropy framework. Previously published experimental results show MEMMs increasing recall and doubling precision relative to HMMs in a FAQ segmentation
+task.
+MEMMs and other non-generative finite-state models
+based on next-state classifiers, such as discriminative
+Markov models (Bottou, 1991), share a weakness we call
+here the _label bias problem_ : the transitions leaving a given
+state compete only against each other, rather than against
+all other transitions in the model. In probabilistic terms,
+transition scores are the conditional probabilities of possible next states given the current state and the observation sequence. This per-state normalization of transition
+scores implies a “conservation of score mass” (Bottou,
+1991) whereby all the mass that arrives at a state must be
+distributed among the possible successor states. An observation can affect which destination states get the mass, but
+not how much total mass to pass on. This causes a bias toward states with fewer outgoing transitions. In the extreme
+case, a state with a single outgoing transition effectively
+ignores the observation. In those cases, unlike in HMMs,
+Viterbi decoding cannot downgrade a branch based on observations after the branch point, and models with statetransition structures that have sparsely connected chains of
+states are not properly handled. The Markovian assumptions in MEMMs and similar state-conditional models insulate decisions at one state from future decisions in a way
+that does not match the actual dependencies between consecutive states.
+This paper introduces _conditional random fields_ (CRFs), a
+sequence modeling framework that has all the advantages
+of MEMMs but also solves the label bias problem in a
+principled way. The critical difference between CRFs and
+MEMMs is that a MEMM uses per-state exponential models for the conditional probabilities of next states given the
+current state, while a CRF has a single exponential model
+for the joint probability of the entire sequence of labels
+given the observation sequence. Therefore, the weights of
+different features at different states can be traded off against
+each other.
+We can also think of a CRF as a finite state model with unnormalized transition probabilities. However, unlike some
+other weighted finite-state approaches (LeCun et al., 1998),
+CRFs assign a well-defined probability distribution over
+possible labelings, trained by maximum likelihood or MAP
+estimation. Furthermore, the loss function is convex, [2] guaranteeing convergence to the global optimum. CRFs also
+generalize easily to analogues of stochastic context-free
+grammars that would be useful in such problems as RNA
+secondary structure prediction and natural language processing.
+2In the case of fully observable states, as we are discussing
+here; if several states have the same label, the usual local maxima
+of Baum-Welch arise.
+_Figure 1._ Label bias example, after (Bottou, 1991). For conciseness, we place observation-label pairs _o_ : _l_ on transitions rather
+than states; the symbol ‘ ~~’~~ represents the null output label.
+We present the model, describe two training procedures and
+sketch a proof of convergence. We also give experimental
+results on synthetic data showing that CRFs solve the classical version of the label bias problem, and, more significantly, that CRFs perform better than HMMs and MEMMs
+when the true data distribution has higher-order dependencies than the model, as is often the case in practice. Finally,
+we confirm these results as well as the claimed advantages
+of conditional models by evaluating HMMs, MEMMs and
+CRFs with identical state structure on a part-of-speech tagging task.
+**2. The Label Bias Problem**
+Classical probabilistic automata (Paz, 1971), discriminative Markov models (Bottou, 1991), maximum entropy
+taggers (Ratnaparkhi, 1996), and MEMMs, as well as
+non-probabilistic sequence tagging and segmentation models with independently trained next-state classifiers (Punyakanok & Roth, 2001) are all potential victims of the label
+bias problem.
+For example, Figure 1 represents a simple finite-state
+model designed to distinguish between the two words rib
+and rob. Suppose that the observation sequence is r i b.
+In the first time step, r matches both transitions from the
+start state, so the probability mass gets distributed roughly
+equally among those two transitions. Next we observe i.
+Both states 1 and 4 have only one outgoing transition. State
+1 has seen this observation often in training, state 4 has almost never seen this observation; but like state 1, state 4
+has no choice but to pass all its mass to its single outgoing
+transition, since it is not generating the observation, only
+conditioning on it. Thus, states with a single outgoing transition effectively ignore their observations. More generally,
+states with low-entropy next state distributions will take little notice of observations. Returning to the example, the
+top path and the bottom path will be about equally likely,
+independently of the observation sequence. If one of the
+two words is slightly more common in the training set, the
+transitions out of the start state will slightly prefer its corresponding transition, and that word’s state sequence will
+always win. This behavior is demonstrated experimentally
+in Section 5.
+L´eon Bottou (1991) discussed two solutions for the label
+bias problem. One is to change the state-transition struc
+ture of the model. In the above example we could collapse
+states 1 and 4, and delay the branching until we get a discriminating observation. This operation is a special case
+of determinization (Mohri, 1997), but determinization of
+weighted finite-state machines is not always possible, and
+even when possible, it may lead to combinatorial explosion. The other solution mentioned is to start with a fullyconnected model and let the training procedure figure out
+a good structure. But that would preclude the use of prior
+structural knowledge that has proven so valuable in information extraction tasks (Freitag & McCallum, 2000).
+Proper solutions require models that account for whole
+state sequences at once by letting some transitions “vote”
+more strongly than others depending on the corresponding
+observations. This implies that score mass will not be conserved, but instead individual transitions can “amplify” or
+“dampen” the mass they receive. In the above example, the
+transitions from the start state would have a very weak effect on path score, while the transitions from states 1 and 4
+would have much stronger effects, amplifying or damping
+depending on the actual observation, and a proportionally
+higher contribution to the selection of the Viterbi path. [3]
+In the related work section we discuss other heuristic model
+classes that account for state sequences globally rather than
+locally. To the best of our knowledge, CRFs are the only
+model class that does this in a purely probabilistic setting,
+with guaranteed global maximum likelihood convergence.
+**3. Conditional Random Fields**
+In what follows, **X** is a random variable over data sequences to be labeled, and **Y** is a random variable over
+corresponding label sequences. All components **Y** _i_ of **Y**
+are assumed to range over a finite label alphabet _Y_ . For example, **X** might range over natural language sentences and
+**Y** range over part-of-speech taggings of those sentences,
+with _Y_ the set of possible part-of-speech tags. The random variables **X** and **Y** are jointly distributed, but in a discriminative framework we construct a conditional model
+_p_ ( **Y** _|_ **X** ) from paired observation and label sequences, and
+do not explicitly model the marginal _p_ ( **X** ).
+**Definition** . _Let G_ = ( _V, E_ ) _be a graph such that_
+**Y** = ( **Y** _v_ ) _v_ _V, so that_ **Y** _is indexed by the vertices_
+_∈_
+_of G._ _Then_ ( **X** _,_ **Y** ) _is a conditional random field in_
+_case, when conditioned on_ **X** _, the random variables_ **Y** _v_
+_obey the Markov property with respect to the graph:_
+_p_ ( **Y** _v_ **X** _,_ **Y** _w, w_ = _v_ ) = _p_ ( **Y** _v_ **X** _,_ **Y** _w, w_ _v_ ) _, where_
+_w ∼_ _v | means that _ _w and v are neighbors in |_ _∼ G._
+Thus, a CRF is a random field globally conditioned on the
+observation **X** . Throughout the paper we tacitly assume
+that the graph _G_ is fixed. In the simplest and most impor
+3Weighted determinization and minimization techniques shift
+transition weights while preserving overall path weight (Mohri,
+2000); their connection to this discussion deserves further study.
+tant example for modeling sequences, _G_ is a simple chain
+or line: _G_ = ( _V_ = _{_ 1 _,_ 2 _, . . . m}, E_ = _{_ ( _i, i_ + 1) _}_ ).
+**X** may also have a natural graph structure; yet in general it is not necessary to assume that **X** and **Y** have the
+same graphical structure, or even that **X** has any graphical structure at all. However, in this paper we will be
+most concerned with sequences **X** = ( **X** 1 _,_ **X** 2 _, . . .,_ **X** _n_ )
+and **Y** = ( **Y** 1 _,_ **Y** 2 _, . . .,_ **Y** _n_ ).
+If the graph _G_ = ( _V, E_ ) of **Y** is a tree (of which a chain
+is the simplest example), its cliques are the edges and vertices. Therefore, by the fundamental theorem of random
+fields (Hammersley & Clifford, 1971), the joint distribution over the label sequence **Y** given **X** has the form
+As a particular case, we can construct an HMM-like CRF
+by defining one feature for each state pair ( _y_ _[]_ _, y_ ), and one
+feature for each state-observation pair ( _y, x_ ):
+_fy,y_ ( _<u, v>,_ **y** _<u,v>,_ **x** ) = _δ_ ( **y** _u, y_ _[]_ ) _δ_ ( **y** _v, y_ )
+_|_
+_gy,x_ ( _v,_ **y** _|v,_ **x** ) = _δ_ ( **y** _v, y_ ) _δ_ ( **x** _v, x_ ) .
+The corresponding parameters _λy,y_ and _µy,x_ play a similar role to the (logarithms of the) usual HMM parameters
+_p_ ( _y_ _[]_ _| y_ ) and _p_ ( _x|y_ ). Boltzmann chain models (Saul & Jordan, 1996; MacKay, 1996) have a similar form but use a
+single normalization constant to yield a joint distribution,
+whereas CRFs use the observation-dependent normalization _Z_ ( **x** ) for conditional distributions.
+Although it encompasses HMM-like models, the class of
+conditional random fields is much more expressive, because it allows arbitrary dependencies on the observation
+_pθ_ ( **y**  _|_ **x** ) _∝_ (1)
+
+
+exp
+ []
+_v∈V,k_
+_e∈E,k_
+_λk fk_ ( _e,_ **y** _|e,_ **x** ) +
+_µk gk_ ( _v,_ **y** _|v,_ **x** ),
+where **x** is a data sequence, **y** a label sequence, and **y** _|S_ is
+the set of components of **y** associated with the vertices in
+subgraph _S_ .
+We assume that the _features fk_ and _gk_ are given and fixed.
+For example, a Boolean vertex feature _gk_ might be true if
+the word **X** _i_ is upper case and the tag **Y** _i_ is “proper noun.”
+The parameter estimation problem is to determine the parameters _θ_ = ( _λ_ 1 _, λ_ 2 _, . . ._ ; _µ_ 1 _, µ_ 2 _, . . ._ ) from training data
+_D_ = _{_ ( **x** [(] _[i]_ [)] _,_ **y** [(] _[i]_ [)] ) _}i_ _[N]_ =1 [with empirical distribution][ ] _[p]_ [(] **[x]** _[,]_ **[ y]** [)][.]
+In Section 4 we describe an iterative scaling algorithm that
+maximizes the log-likelihood objective function _O_ ( _θ_ ):
+_O_ ( _θ_ ) =
+_∝_
+ _N_
+_i_ =1
+**x** _,_ **y**
+log _pθ_ ( **y** [(] _[i]_ [)] **x** [(] _[i]_ [)] )
+_|_
+ _p_ ( **x** _,_ **y** ) log _pθ_ ( **y** **x** ) .
+_|_
+**Y** _i−_ 1 **Y** _i_ **Y** _i_ +1
+**Y** _i−_ 1 **Y** _i_ **Y** _i_ +1
+_i−_ 1
+✲
+✻
+_i_ **Y**
+✲
+✻
+❝
++1
+✻
+❝
+**Y** _i−_ 1 **Y** _i_ **Y** _i_ +1
+❝
+**Y** _i−_ 1
+✲
+_i_
+✲
+❄
+ _i_
+_i_
+❄
+ _i_
+❝
+_i_
+❝ _i_
+❄
+**X**  _i_
+**X** _i−_ 1 **X** _i_ **X** _i_ +1
+**X** _i−_ 1 **X** _i_ **X** _i_ +1
+**X** _i−_ 1 **X** _i_ **X** _i_ +1
+_i_ ❝
+_Figure 2._ Graphical structures of simple HMMs (left), MEMMs (center), and the chain-structured case of CRFs (right) for sequences.
+An open circle indicates that the variable is not generated by the model.
+sequence. In addition, the features do not need to specify
+completely a state or observation, so one might expect that
+the model can be estimated from less training data. Another
+attractive property is the convexity of the loss function; indeed, CRFs share all of the convexity properties of general
+maximum entropy models.
+For the remainder of the paper we assume that the dependencies of **Y**, conditioned on **X**, form a chain. To simplify some expressions, we add special start and stop states
+**Y** 0 = start and **Y** _n_ +1 = stop. Thus, we will be using the
+graphical structure shown in Figure 2. For a chain structure, the conditional probability of a label sequence can be
+expressed concisely in matrix form, which will be useful
+in describing the parameter estimation and inference algorithms in Section 4. Suppose that _pθ_ ( **Y** **X** ) is a CRF
+_|_
+given by (1). For each position _i_ in the observation sequence **x**, we define the _|Y| × |Y|_ matrix random variable
+_Mi_ ( **x** ) = [ _Mi_ ( _y_ _[]_ _, y_ **x** )] by
+_|_
+of the training data. Both algorithms are based on the improved iterative scaling (IIS) algorithm of Della Pietra et al.
+(1997); the proof technique based on auxiliary functions
+can be extended to show convergence of the algorithms for
+CRFs.
+Iterative scaling algorithms update the weights as _λk_
+_λδλk_ + _k_ and _δλ δµk_ and _k_ . In particular, the IIS update _µk ←_ _µk_ + _δµk_ for appropriately chosen _δλk_ for an edge _←_
+feature _fk_ is the solution of
+_T_ ( **x** _,_ **y** )
+_E_ [ _fk_ ]
+=
+=def
+**x** _,_ **y**
+_fk_ ( _ei,_ **y** _ei,_ **x** ) _e_ _[δλ][k][T]_ [ (] **[x]** _[,]_ **[y]** [)] .
+_|_
+**x** _,_ **y**
+_n_ +1
+ _p_ ( **x** ) _p_ ( **y** _|_ **x** )
+_i_ =1
+ _p_ ( **x** _,_ **y** )
+_n_ +1
+_i_ =1
+_fk_ ( _ei,_ **y** _ei,_ **x** )
+_|_
+where _T_ ( **x** _,_ **y** ) is the _total feature count_
+_i,k_
+_fk_ ( _ei,_ **y** _ei,_ **x** ) +
+_|_
+=def
+_gk_ ( _vi,_ **y** _vi,_ **x** ) .
+_|_
+_i,k_
+_Mi_ ( _y_ _[]_ _, y |_ **x** ) = exp (Λ _i_ ( _y_ _[]_ _, y |_ **x** ))
+Λ _i_ ( _y_ _[]_ _, y |_ **x** ) =  _k_ _[λ][k][ f][k]_ [(] _[e][i][,]_ **[ Y]** _[|][e]_
+ _k_ _[λ][k][ f][k]_ [(] _[e][i][,]_ **[ Y]** _[|][e]_ _i_ [= (] _[y][][, y]_ [)] _[,]_ **[ x]** [) +]
+_k_ _[µ][k][ g][k]_ [(] _[v][i][,]_ **[ Y]** _[|][v]_ _i_ [=] _[ y,]_ **[ x]** [)][,]
+where _ei_ is the edge with labels ( **Y** _i_ 1 _,_ **Y** _i_ ) and _vi_ is the
+_−_
+vertex with label **Y** _i_ . In contrast to generative models, conditional models like CRFs do not need to enumerate over
+all possible observation sequences **x**, and therefore these
+matrices can be computed directly as needed from a given
+training or test observation sequence **x** and the parameter
+vector _θ_ . Then the normalization (partition function) _Zθ_ ( **x** )
+is the (start _,_ stop) entry of the product of these matrices:
+_Zθ_ ( **x** ) = ( _M_ 1( **x** ) _M_ 2( **x** ) _· · · Mn_ +1( **x** ))start _,_ stop .
+Using this notation, the conditional probability of a label
+sequence **y** is written as
+The equations for vertex feature updates _δµk_ have similar
+form.
+However, efficiently computing the exponential sums on
+the right-hand sides of these equations is problematic, because _T_ ( **x** _,_ **y** ) is a global property of ( **x** _,_ **y** ), and dynamic
+programming will sum over sequences with potentially
+varying _T_ . To deal with this, the first algorithm, Algorithm
+S, uses a “slack feature.” The second, Algorithm T, keeps
+track of partial _T_ totals.
+For Algorithm S, we define the _slack feature_ by
+_S −_
+_s_ ( **x** _,_ **y** )
+=def
+_fk_ ( _ei,_ **y** _ei,_ **x** )
+_|_ _−_
+_k_
+_i_
+_gk_ ( _vi,_ **y** _vi,_ **x** ),
+_|_
+_i_
+ _n_ +1
+_k_
+_pθ_ ( **y** **x** ) =
+_|_
+_n_ +1
+_i_ =1 ~~~~ _[M]_ _n_ +1 _[i]_ [(] **[y]** _[i][−]_ [1] _[,]_ **[ y]** ~~~~ _[i][ |]_ **[ x]** [)]
+_n_ +1
+_i_ =1 _[M][i]_ [(] **[x]** [)]
+~~~~
+,
+start _,_ stop
+where **y** 0 = start and **y** _n_ +1 = stop.
+**4. Parameter Estimation for CRFs**
+We now describe two iterative scaling algorithms to find
+the parameter vector _θ_ that maximizes the log-likelihood
+where _S_ is a constant chosen so that _s_ ( **x** [(] _[i]_ [)] _,_ **y** ) _≥_ 0 for all
+**y** and all observation vectors **x** [(] _[i]_ [)] in the training set, thus
+making _T_ ( **x** _,_ **y** ) = _S_ . Feature _s_ is “global,” that is, it does
+not correspond to any particular edge or vertex.
+For each index _i_ = 0 _, . . ., n_ + 1 we now define the _forward_
+_vectors αi_ ( **x** ) with base case
+_α_ 0( _y_ **x** ) =
+_|_
+1 if _y_ = start
+0 otherwise
+and recurrence
+_αi_ ( **x** ) = _αi−_ 1( **x** ) _Mi_ ( **x** ) .
+Similarly, the _backward vectors βi_ ( **x** ) are defined by
+_βk_ and _γk_ are the unique positive roots to the following
+polynomial equations
+_T_ max
+_i_ =0
+_ak,t βk_ _[t]_ [=][ ] _[Ef][k]_ _[,]_
+_T_ max
+_i_ =0
+_bk,t γk_ _[t]_ [=][ ] _[Eg][k]_ [,] (2)
+_βn_ +1( _y_ **x** ) =
+_|_
+1 if _y_ = stop
+0 otherwise
+and
+_βi_ ( **x** ) _[]_ = _Mi_ +1( **x** ) _βi_ +1( **x** ) .
+With these definitions, the update equations are
+_δλk_ = [1]
+_S_ [log]
+_Efk_
+_Efk_
+_,_ _δµk_ = [1]
+_S_ [log]
+_Egk_
+_Egk_
+,
+where
+_Efk_ =
+_Egk_ =
+**x**
+**x**
+_gk_ ( _vi,_ **y** _|vi_ = _y,_ **x** ) _×_
+_y_ _[]_ _,y_
+_n_ +1
+ _p_ ( **x** )
+_i_ =1
+ _n_
+ _p_ ( **x** )
+_i_ =1
+_y_
+_fk_ ( _ei,_ **y** _|ei_ = ( _y_ _[]_ _, y_ ) _,_ **x** ) _×_
+which can be easily computed by Newton’s method.
+A single iteration of Algorithm S and Algorithm T has
+roughly the same time and space complexity as the well
+known Baum-Welch algorithm for HMMs. To prove convergence of our algorithms, we can derive an auxiliary
+function to bound the change in likelihood from below; this
+method is developed in detail by Della Pietra et al. (1997).
+The full proof is somewhat detailed; however, here we give
+an idea of how to derive the auxiliary function. To simplify
+notation, we assume only edge features _fk_ with parameters
+_λk_ .
+Given two parameter settings _θ_ = ( _λ_ 1 _, λ_ 2 _, . . ._ ) and _θ_ _[]_ =
+( _λ_ 1 + _δλ_ 1 _, λ_ 2 + _δλ_ 2 _, . . ._ ), we bound from below the change
+in the objective function with an _auxiliary function A_ ( _θ_ _[]_ _, θ_ )
+as follows
+_αi−_ 1( _y_ _[]_ _|_ **x** ) _Mi_ ( _y_ _[]_ _, y |_ **x** ) _βi_ ( _y |_ **x** )
+_Zθ_ ~~(~~ ~~**x**~~ ~~)~~
+_O_ ( _θ_ _[]_ ) _−O_ ( _θ_ ) =
+**x** _,_ **y**
+ _p_ ( **x** _,_ **y** ) log _[p][θ][]_ [(] **[y]** _[ |]_ **[ x]** [)]
+~~_p_~~ _θ_ ~~(~~ ~~**y**~~ ~~**x**~~ ~~)~~
+_|_
+= ( _θ_ _[]_ _−_ _θ_ ) _·_ _Ef_ [] _−_
+ _p_ ( **x** ) log _[Z][θ][]_ [(] **[x]** [)]
+_Zθ_ ~~(~~ ~~**x**~~ ~~)~~
+_αi_ ( _y_ **x** ) _βi_ ( _y_ **x** )
+_|_ _|_ .
+_Zθ_ ~~(~~ ~~**x**~~ ~~)~~
+_≥_ ( _θ_ _[]_ _−_ _θ_ ) _·_ _Ef_ [] _−_
+**x**
+**x**
+ _p_ ( **x** ) _[Z][θ][]_ [(] **[x]** [)]
+ _p_ ( **x** )
+_Zθ_ ~~(~~ ~~**x**~~ ~~)~~
+**x**
+**x** _,_ **y** _,k_
+ _p_ ( **x** ) _pθ_ ( **y** **x** ) _[f][k]_ [(] **[x]** _[,]_ **[ y]** [)]
+_|_ _T_ ~~(~~ ~~**x**~~ ~~)~~ _[e][δλ][k][T]_ [ (] **[x]** [)]
+The factors involving the forward and backward vectors in
+the above equations have the same meaning as for standard
+hidden Markov models. For example,
+_pθ_ ( **Y** _i_ = _y_ **x** ) = _αi_ ( _y |_ **x** ) _βi_ ( _y |_ **x** )
+_|_ _Zθ_ ~~(~~ ~~**x**~~ ~~)~~
+is the marginal probability of label **Y** _i_ = _y_ given that the
+observation sequence is **x** . This algorithm is closely related
+to the algorithm of Darroch and Ratcliff (1972), and MART
+algorithms used in image reconstruction.
+The constant _S_ in Algorithm S can be quite large, since in
+practice it is proportional to the length of the longest training observation sequence. As a result, the algorithm may
+converge slowly, taking very small steps toward the maximum in each iteration. If the length of the observations **x** [(] _[i]_ [)]
+and the number of active features varies greatly, a fasterconverging algorithm can be obtained by keeping track of
+feature totals for each observation sequence separately.
+Let _T_ ( **x** ) = maxdef **y** _T_ ( **x** _,_ **y** ). Algorithm T accumulates
+feature expectations into counters indexed by _T_ ( **x** ). More
+specifically, we use the forward-backward recurrences just
+introduced to compute the expectations _ak,t_ of feature _fk_
+and _bk,t_ of feature _gk_ given that _T_ ( **x** ) = _t_ . Then our parameter updates are _δλk_ = log _βk_ and _δµk_ = log _γk_, where
+= _δλ ·_ _Ef_ [] _−_
+_≥_ _δλ ·_ _Ef_ [] _−_
+_pθ_ ( **y** **x** ) _e_ _[δλ][·][f]_ [(] **[x]** _[,]_ **[y]** [)]
+_|_
+**y**
+=def _A_ ( _θ_ _[]_ _, θ_ )
+where the inequalities follow from the convexity of _−_ log
+and exp. Differentiating _A_ with respect to _δλk_ and setting
+the result to zero yields equation (2).
+**5. Experiments**
+We first discuss two sets of experiments with synthetic data
+that highlight the differences between CRFs and MEMMs.
+The first experiments are a direct verification of the label
+bias problem discussed in Section 2. In the second set of
+experiments, we generate synthetic data using randomly
+chosen hidden Markov models, each of which is a mixture of a first-order and second-order model. Competing
+_first-order_ models are then trained and compared on test
+data. As the data becomes more second-order, the test error rates of the trained models increase. This experiment
+corresponds to the common modeling practice of approximating complex local and long-range dependencies, as occur in natural data, by small-order Markov models. Our
+60
+50
+40
+30
+20
+0 10 20 30 40 50 60
+HMM Error
+0 10 20 30 40 50 60
+HMM Error
+10
+0
+0 10 20 30 40 50 60
+CRF Error
+60
+50
+40
+30
+20
+10
+0
+60
+50
+40
+30
+20
+10
+0
+_Figure 3._ Plots of 2 _×_ 2 error rates for HMMs, CRFs, and MEMMs on randomly generated synthetic data sets, as described in Section 5.2.
+As the data becomes “more second order,” the error rates of the test models increase. As shown in the left plot, the CRF typically
+significantly outperforms the MEMM. The center plot shows that the HMM outperforms the MEMM. In the right plot, each open square
+represents a data set with _α <_ [1] 2 [, and a solid circle indicates a data set with] _[ α][ ≥]_ [1] 2 [. The plot shows that when the data is mostly second]
+order ( _α ≥_ [1] 2 [), the discriminatively trained CRF typically outperforms the HMM. These experiments are not designed to demonstrate]
+[1]
+2 [, and a solid circle indicates a data set with] _[ α][ ≥]_ [1] 2
+order ( _α ≥_ 2 [), the discriminatively trained CRF typically outperforms the HMM. These experiments are not designed to demonstrate]
+the advantages of the additional representational power of CRFs and MEMMs relative to HMMs.
+results clearly indicate that even when the models are parameterized in exactly the same way, CRFs are more robust to inaccurate modeling assumptions than MEMMs or
+HMMs, and resolve the label bias problem, which affects
+the performance of MEMMs. To avoid confusion of different effects, the MEMMs and CRFs in these experiments
+_do not_ use overlapping features of the observations. Finally, in a set of POS tagging experiments, we confirm the
+advantage of CRFs over MEMMs. We also show that the
+addition of overlapping features to CRFs and MEMMs allows them to perform much better than HMMs, as already
+shown for MEMMs by McCallum et al. (2000).
+**5.1 Modeling label bias**
+We generate data from a simple HMM which encodes a
+noisy version of the finite-state network in Figure 1. Each
+state emits its designated symbol with probability 29 _/_ 32
+and any of the other symbols with probability 1 _/_ 32. We
+train both an MEMM and a CRF with the same topologies
+on the data generated by the HMM. The observation features are simply the identity of the observation symbols.
+In a typical run using 2 _,_ 000 training and 500 test samples,
+trained to convergence of the iterative scaling algorithm,
+the CRF error is 4 _._ 6% while the MEMM error is 42%,
+showing that the MEMM fails to discriminate between the
+two branches.
+**5.2 Modeling mixed-order sources**
+For these results, we use five labels, a-e ( _|Y|_ = 5), and 26
+observation values, A-Z ( _|X|_ = 26); however, the results
+were qualitatively the same over a range of sizes for _Y_ and
+_X_ . We generate data from a mixed-order HMM with state
+transition probabilities given by _α p_ larly, emission probabilities given by _α p_ have a standard first-order HMM. In order to limit the size22(( **yx** _ii | |_ **y y** _ii−,_ **x** 1 _,i_ **y** _−_ 1 _i−_ )+(12) + (1 _−α_ ) _− p_ 1( _α_ **x** ) _i | p p_ **y** 1( _iα_ **y** ) _p_ (. Thus, for _i_ **y** _|αi_ **y** _|_ ( **yx** _i−ii |_ 1 _−_ **y** )1 and, simi- _,i_ **y** _, α_ **x** _i_ = 0 _−i−_ 21)) = we=
+of the Bayes error rate for the resulting models, the conditional probability tables _pα_ are constrained to be sparse.
+In particular, _pα_ ( _y, y_ _[]_ ) can have at most two nonzero en
+_· |_
+tries, for each _y, y_ _[]_, and _pα_ ( _y, x_ _[]_ ) can have at most three
+_· |_
+nonzero entries for each _y, x_ _[]_ . For each randomly generated model, a sample of 1,000 sequences of length 25 is
+generated for training and testing.
+On each randomly generated training set, a CRF is trained
+using Algorithm S. (Note that since the length of the sequences and number of active features is constant, Algorithms S and T are identical.) The algorithm is fairly slow
+to converge, typically taking approximately 500 iterations
+for the model to stabilize. On the 500 MHz Pentium PC
+used in our experiments, each iteration takes approximately
+0.2 seconds. On the same data an MEMM is trained using
+iterative scaling, which does not require forward-backward
+calculations, and is thus more efficient. The MEMM training converges more quickly, stabilizing after approximately
+100 iterations. For each model, the Viterbi algorithm is
+used to label a test set; the experimental results do not significantly change when using forward-backward decoding
+to minimize the per-symbol error rate.
+The results of several runs are presented in Figure 3. Each
+plot compares two classes of models, with each point indicating the error rate for a single test set. As _α_ increases, the
+error rates generally increase, as the first-order models fail
+to fit the second-order data. The figure compares models
+parameterized as _µy_, _λy,y_, and _λy,y,x_ ; results for models
+parameterized as _µy_, _λy,y_, and _µy,x_ are qualitatively the
+same. As shown in the first graph, the CRF generally outperforms the MEMM, often by a wide margin of 10%–20%
+relative error. (The points for very small error rate, with
+_α <_ 0 _._ 01, where the MEMM does better than the CRF,
+are suspected to be the result of an insufficient number of
+training iterations for the CRF.)
+|model|error oov error|
+|---|---|
+|HMM<br>MEMM<br>CRF|5.69%<br>45.99%<br>6.37%<br>54.61%<br>5.55%<br>48.05%|
+|MEMM+<br>CRF+|4.81%<br>26.99%<br>4.27%<br>23.76%|
++Using spelling features
+_Figure 4._ Per-word error rates for POS tagging on the Penn treebank, using first-order models trained on 50% of the 1.1 million
+word corpus. The oov rate is 5.45%.
+**5.3 POS tagging experiments**
+To confirm our synthetic data results, we also compared
+HMMs, MEMMs and CRFs on Penn treebank POS tagging, where each word in a given input sentence must be
+labeled with one of 45 syntactic tags.
+We carried out two sets of experiments with this natural
+language data. First, we trained first-order HMM, MEMM,
+and CRF models as in the synthetic data experiments, introducing parameters _µy,x_ for each tag-word pair and _λy,y_
+for each tag-tag pair in the training set. The results are consistent with what is observed on synthetic data: the HMM
+outperforms the MEMM, as a consequence of the label bias
+problem, while the CRF outperforms the HMM. The error rates for training runs using a 50%-50% train-test split
+are shown in Figure 5.3; the results are qualitatively similar for other splits of the data. The error rates on outof-vocabulary (oov) words, which are not observed in the
+training set, are reported separately.
+In the second set of experiments, we take advantage of the
+power of conditional models by adding a small set of orthographic features: whether a spelling begins with a number or upper case letter, whether it contains a hyphen, and
+whether it ends in one of the following suffixes: -ing, ogy, -ed, -s, -ly, -ion, -tion, -ity, -ies. Here we find, as
+expected, that both the MEMM and the CRF benefit significantly from the use of these features, with the overall error
+rate reduced by around 25%, and the out-of-vocabulary error rate reduced by around 50%.
+One usually starts training from the all zero parameter vector, corresponding to the uniform distribution. However,
+for these datasets, CRF training with that initialization is
+much slower than MEMM training. Fortunately, we can
+use the optimal MEMM parameter vector as a starting
+point for training the corresponding CRF. In Figure 5.3,
+MEMM [+] was trained to convergence in around 100 iterations. Its parameters were then used to initialize the training of CRF [+], which converged in 1,000 iterations. In contrast, training of the same CRF from the uniform distribution had not converged even after 2,000 iterations.
+**6. Further Aspects of CRFs**
+Many further aspects of CRFs are attractive for applications and deserve further study. In this section we briefly
+mention just two.
+Conditional random fields can be trained using the exponential loss objective function used by the AdaBoost algorithm (Freund & Schapire, 1997). Typically, boosting is
+applied to classification problems with a small, fixed number of classes; applications of boosting to sequence labeling
+have treated each label as a separate classification problem
+(Abney et al., 1999). However, it is possible to apply the
+parallel update algorithm of Collins et al. (2000) to optimize the per-sequence exponential loss. This requires a
+forward-backward algorithm to compute efficiently certain
+feature expectations, along the lines of Algorithm T, except that each feature requires a separate set of forward and
+backward accumulators.
+Another attractive aspect of CRFs is that one can implement efficient feature selection and feature induction algorithms for them. That is, rather than specifying in advance which features of ( **X** _,_ **Y** ) to use, we could start from
+feature-generating rules and evaluate the benefit of generated features automatically on data. In particular, the feature induction algorithms presented in Della Pietra et al.
+(1997) can be adapted to fit the dynamic programming
+techniques of conditional random fields.
+**7. Related Work and Conclusions**
+As far as we know, the present work is the first to combine
+the benefits of conditional models with the global normalization of random field models. Other applications of exponential models in sequence modeling have either attempted
+to build generative models (Rosenfeld, 1997), which involve a hard normalization problem, or adopted local conditional models (Berger et al., 1996; Ratnaparkhi, 1996;
+McCallum et al., 2000) that may suffer from label bias.
+Non-probabilistic local decision models have also been
+widely used in segmentation and tagging (Brill, 1995;
+Roth, 1998; Abney et al., 1999). Because of the computational complexity of global training, these models are only
+trained to minimize the error of individual label decisions
+assuming that neighboring labels are correctly chosen. Label bias would be expected to be a problem here too.
+An alternative approach to discriminative modeling of sequence labeling is to use a permissive generative model,
+which can only model local dependencies, to produce a
+list of candidates, and then use a more global discriminative model to rerank those candidates. This approach is
+standard in large-vocabulary speech recognition (Schwartz
+& Austin, 1993), and has also been proposed for parsing
+(Collins, 2000). However, these methods fail when the correct output is pruned away in the first pass.
+Closest to our proposal are gradient-descent methods that
+adjust the parameters of all of the local classifiers to minimize a smooth loss function (e.g., quadratic loss) combining loss terms for each label. If state dependencies are local, this can be done efficiently with dynamic programming
+(LeCun et al., 1998). Such methods should alleviate label
+bias. However, their loss function is not convex, so they
+may get stuck in local minima.
+Conditional random fields offer a unique combination of
+properties: discriminatively trained models for sequence
+segmentation and labeling; combination of arbitrary, overlapping and agglomerative observation features from both
+the past and future; efficient training and decoding based
+on dynamic programming; and parameter estimation guaranteed to find the global optimum. Their main current limitation is the slow convergence of the training algorithm
+relative to MEMMs, let alone to HMMs, for which training
+on fully observed data is very efficient. In future work, we
+plan to investigate alternative training methods such as the
+update methods of Collins et al. (2000) and refinements on
+using a MEMM as starting point as we did in some of our
+experiments. More general tree-structured random fields,
+feature induction methods, and further natural data evaluations will also be investigated.
+**Acknowledgments**
+We thank Yoshua Bengio, L´eon Bottou, Michael Collins
+and Yann LeCun for alerting us to what we call here the label bias problem. We also thank Andrew Ng and Sebastian
+Thrun for discussions related to this work.
+**References**
+Abney, S., Schapire, R. E., & Singer, Y. (1999). Boosting
+applied to tagging and PP attachment. _Proc. EMNLP-_
+_VLC_ . New Brunswick, New Jersey: Association for
+Computational Linguistics.
+Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J.
+(1996). A maximum entropy approach to natural language processing. _Computational Linguistics_, _22_ .
+Bottou, L. (1991). _Une_ _approche_ _th´eorique_ _de_
+_l’apprentissage connexionniste: Applications `a la recon-_
+_naissance de la parole_ . Doctoral dissertation, Universit´e
+de Paris XI.
+Brill, E. (1995). Transformation-based error-driven learn
+ing and natural language processing: a case study in part
+of speech tagging. _Computational Linguistics_, _21_, 543–
+565.
+Collins, M. (2000). Discriminative reranking for natural
+language parsing. _Proc. ICML 2000_ . Stanford, California.
+Collins, M., Schapire, R., & Singer, Y. (2000). Logistic re
+gression, AdaBoost, and Bregman distances. _Proc. 13th_
+_COLT_ .
+Darroch, J. N., & Ratcliff, D. (1972). Generalized iterative
+scaling for log-linear models. _The Annals of Mathemat-_
+_ical Statistics_, _43_, 1470–1480.
+Della Pietra, S., Della Pietra, V., & Lafferty, J. (1997). In
+ducing features of random fields. _IEEE Transactions on_
+_Pattern Analysis and Machine Intelligence_, _19_, 380–393.
+Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998).
+_Biological sequence analysis: Probabilistic models of_
+_proteins and nucleic acids_ . Cambridge University Press.
+Freitag, D., & McCallum, A. (2000). Information extrac
+tion with HMM structures learned by stochastic optimization. _Proc. AAAI 2000_ .
+Freund, Y., & Schapire, R. (1997). A decision-theoretic
+generalization of on-line learning and an application to
+boosting. _Journal of Computer and System Sciences_, _55_,
+119–139.
+Hammersley, J., & Clifford, P. (1971). Markov fields on
+finite graphs and lattices. Unpublished manuscript.
+LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998).
+Gradient-based learning applied to document recognition. _Proceedings of the IEEE_, _86_, 2278–2324.
+MacKay, D. J. (1996). Equivalence of linear Boltzmann
+chains and hidden Markov models. _Neural Computation_,
+_8_, 178–181.
+Manning, C. D., & Sch¨utze, H. (1999). _Foundations of sta-_
+_tistical natural language processing_ . Cambridge Massachusetts: MIT Press.
+McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum
+entropy Markov models for information extraction and
+segmentation. _Proc. ICML 2000_ (pp. 591–598). Stanford, California.
+Mohri, M. (1997). Finite-state transducers in language and
+speech processing. _Computational Linguistics_, _23_ .
+Mohri, M. (2000). Minimization algorithms for sequential
+transducers. _Theoretical Computer Science_, _234_, 177–
+201.
+Paz, A. (1971). _Introduction to probabilistic automata_ .
+Academic Press.
+Punyakanok, V., & Roth, D. (2001). The use of classifiers
+in sequential inference. _NIPS 13_ . Forthcoming.
+Ratnaparkhi, A. (1996). A maximum entropy model for
+part-of-speech tagging. _Proc. EMNLP_ . New Brunswick,
+New Jersey: Association for Computational Linguistics.
+Rosenfeld, R. (1997). A whole sentence maximum entropy
+language model. _Proceedings of the IEEE Workshop on_
+_Speech Recognition and Understanding_ . Santa Barbara,
+California.
+Roth, D. (1998). Learning to resolve natural language am
+biguities: A unified approach. _Proc. 15th AAAI_ (pp. 806–
+813). Menlo Park, California: AAAI Press.
+Saul, L., & Jordan, M. (1996). Boltzmann chains and hid
+den Markov models. _Advances in Neural Information_
+_Processing Systems 7_ . MIT Press.
+Schwartz, R., & Austin, S. (1993). A comparison of several
+approximate algorithms for finding multiple (N-BEST)
+sentence hypotheses. _Proc. ICASSP_ . Minneapolis, MN.

references/2001.icml.lafferty/paper.tex ADDED Viewed

	@@ -0,0 +1,1497 @@

+\documentclass[11pt]{article}
+\usepackage[utf8]{inputenc}
+\usepackage{amsmath,amssymb}
+\usepackage{booktabs}
+\usepackage{hyperref}
+\begin{document}
+\section*{\textbf{Conditional Random Fields: Probabilistic Models} \textbf{for Segmenting and Labeling Sequence Data}}
+\textbf{John Lafferty} _[†∗]_ LAFFERTY@CS.CMU.EDU
+\textbf{Andrew McCallum} _[∗†]_ MCCALLUM@WHIZBANG.COM
+\textbf{Fernando Pereira} _[∗‡]_ FPEREIRA@WHIZBANG.COM
+_∗_ WhizBang! Labs–Research, 4616 Henry Street, Pittsburgh, PA 15213 USA
+_†_ School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 USA
+_‡_ Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104 USA
+\textbf{Abstract}
+We present _conditional random fields_, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars
+for such tasks, including the ability to relax
+strong independence assumptions made in those
+models. Conditional random fields also avoid
+a fundamental limitation of maximum entropy
+Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states
+with few successor states. We present iterative
+parameter estimation algorithms for conditional
+random fields and compare the performance of
+the resulting models to HMMs and MEMMs on
+synthetic and natural-language data.
+\textbf{1. Introduction}
+The need to segment and label sequences arises in many
+different problems in several scientific fields. Hidden
+Markov models (HMMs) and stochastic grammars are well
+understood and widely used probabilistic models for such
+problems. In computational biology, HMMs and stochastic grammars have been successfully used to align biological sequences, find sequences homologous to a known
+evolutionary family, and analyze RNA secondary structure
+(Durbin et al., 1998). In computational linguistics and
+computer science, HMMs and stochastic grammars have
+been applied to a wide variety of problems in text and
+speech processing, including topic segmentation, part-ofspeech (POS) tagging, information extraction, and syntactic disambiguation (Manning & Sch¨utze, 1999).
+HMMs and stochastic grammars are generative models, assigning a joint probability to paired observation and label
+sequences; the parameters are typically trained to maxi
+mize the joint likelihood of training examples. To define
+a joint probability over observation and label sequences,
+a generative model needs to enumerate all possible observation sequences, typically requiring a representation
+in which observations are task-appropriate atomic entities,
+such as words or nucleotides. In particular, it is not practical to represent multiple interacting features or long-range
+dependencies of the observations, since the inference problem for such models is intractable.
+This difficulty is one of the main motivations for looking at
+conditional models as an alternative. A conditional model
+specifies the probabilities of possible label sequences given
+an observation sequence. Therefore, it does not expend
+modeling effort on the observations, which at test time
+are fixed anyway. Furthermore, the conditional probability of the label sequence can depend on arbitrary, nonindependent features of the observation sequence without
+forcing the model to account for the distribution of those
+dependencies. The chosen features may represent attributes
+at different levels of granularity of the same observations
+(for example, words and characters in English text), or
+aggregate properties of the observation sequence (for instance, text layout). The probability of a transition between
+labels may depend not only on the current observation,
+but also on past and future observations, if available. In
+contrast, generative models must make very strict independence assumptions on the observations, for instance conditional independence given the labels, to achieve tractability.
+Maximum entropy Markov models (MEMMs) are conditional probabilistic sequence models that attain all of the
+above advantages (McCallum et al., 2000). In MEMMs,
+each source state [1] has a exponential model that takes the
+observation features as input, and outputs a distribution
+over possible next states. These exponential models are
+trained by an appropriate iterative scaling method in the
+1Output labels are associated with states; it is possible for several states to have the same label, but for simplicity in the rest of
+this paper we assume a one-to-one correspondence.
+maximum entropy framework. Previously published experimental results show MEMMs increasing recall and doubling precision relative to HMMs in a FAQ segmentation
+task.
+MEMMs and other non-generative finite-state models
+based on next-state classifiers, such as discriminative
+Markov models (Bottou, 1991), share a weakness we call
+here the _label bias problem_ : the transitions leaving a given
+state compete only against each other, rather than against
+all other transitions in the model. In probabilistic terms,
+transition scores are the conditional probabilities of possible next states given the current state and the observation sequence. This per-state normalization of transition
+scores implies a “conservation of score mass” (Bottou,
+1991) whereby all the mass that arrives at a state must be
+distributed among the possible successor states. An observation can affect which destination states get the mass, but
+not how much total mass to pass on. This causes a bias toward states with fewer outgoing transitions. In the extreme
+case, a state with a single outgoing transition effectively
+ignores the observation. In those cases, unlike in HMMs,
+Viterbi decoding cannot downgrade a branch based on observations after the branch point, and models with statetransition structures that have sparsely connected chains of
+states are not properly handled. The Markovian assumptions in MEMMs and similar state-conditional models insulate decisions at one state from future decisions in a way
+that does not match the actual dependencies between consecutive states.
+This paper introduces _conditional random fields_ (CRFs), a
+sequence modeling framework that has all the advantages
+of MEMMs but also solves the label bias problem in a
+principled way. The critical difference between CRFs and
+MEMMs is that a MEMM uses per-state exponential models for the conditional probabilities of next states given the
+current state, while a CRF has a single exponential model
+for the joint probability of the entire sequence of labels
+given the observation sequence. Therefore, the weights of
+different features at different states can be traded off against
+each other.
+We can also think of a CRF as a finite state model with unnormalized transition probabilities. However, unlike some
+other weighted finite-state approaches (LeCun et al., 1998),
+CRFs assign a well-defined probability distribution over
+possible labelings, trained by maximum likelihood or MAP
+estimation. Furthermore, the loss function is convex, [2] guaranteeing convergence to the global optimum. CRFs also
+generalize easily to analogues of stochastic context-free
+grammars that would be useful in such problems as RNA
+secondary structure prediction and natural language processing.
+2In the case of fully observable states, as we are discussing
+here; if several states have the same label, the usual local maxima
+of Baum-Welch arise.
+_Figure 1._ Label bias example, after (Bottou, 1991). For conciseness, we place observation-label pairs _o_ : _l_ on transitions rather
+than states; the symbol ‘ ~~’~~ represents the null output label.
+We present the model, describe two training procedures and
+sketch a proof of convergence. We also give experimental
+results on synthetic data showing that CRFs solve the classical version of the label bias problem, and, more significantly, that CRFs perform better than HMMs and MEMMs
+when the true data distribution has higher-order dependencies than the model, as is often the case in practice. Finally,
+we confirm these results as well as the claimed advantages
+of conditional models by evaluating HMMs, MEMMs and
+CRFs with identical state structure on a part-of-speech tagging task.
+\textbf{2. The Label Bias Problem}
+Classical probabilistic automata (Paz, 1971), discriminative Markov models (Bottou, 1991), maximum entropy
+taggers (Ratnaparkhi, 1996), and MEMMs, as well as
+non-probabilistic sequence tagging and segmentation models with independently trained next-state classifiers (Punyakanok & Roth, 2001) are all potential victims of the label
+bias problem.
+For example, Figure 1 represents a simple finite-state
+model designed to distinguish between the two words rib
+and rob. Suppose that the observation sequence is r i b.
+In the first time step, r matches both transitions from the
+start state, so the probability mass gets distributed roughly
+equally among those two transitions. Next we observe i.
+Both states 1 and 4 have only one outgoing transition. State
+1 has seen this observation often in training, state 4 has almost never seen this observation; but like state 1, state 4
+has no choice but to pass all its mass to its single outgoing
+transition, since it is not generating the observation, only
+conditioning on it. Thus, states with a single outgoing transition effectively ignore their observations. More generally,
+states with low-entropy next state distributions will take little notice of observations. Returning to the example, the
+top path and the bottom path will be about equally likely,
+independently of the observation sequence. If one of the
+two words is slightly more common in the training set, the
+transitions out of the start state will slightly prefer its corresponding transition, and that word’s state sequence will
+always win. This behavior is demonstrated experimentally
+in Section 5.
+L´eon Bottou (1991) discussed two solutions for the label
+bias problem. One is to change the state-transition struc
+ture of the model. In the above example we could collapse
+states 1 and 4, and delay the branching until we get a discriminating observation. This operation is a special case
+of determinization (Mohri, 1997), but determinization of
+weighted finite-state machines is not always possible, and
+even when possible, it may lead to combinatorial explosion. The other solution mentioned is to start with a fullyconnected model and let the training procedure figure out
+a good structure. But that would preclude the use of prior
+structural knowledge that has proven so valuable in information extraction tasks (Freitag & McCallum, 2000).
+Proper solutions require models that account for whole
+state sequences at once by letting some transitions “vote”
+more strongly than others depending on the corresponding
+observations. This implies that score mass will not be conserved, but instead individual transitions can “amplify” or
+“dampen” the mass they receive. In the above example, the
+transitions from the start state would have a very weak effect on path score, while the transitions from states 1 and 4
+would have much stronger effects, amplifying or damping
+depending on the actual observation, and a proportionally
+higher contribution to the selection of the Viterbi path. [3]
+In the related work section we discuss other heuristic model
+classes that account for state sequences globally rather than
+locally. To the best of our knowledge, CRFs are the only
+model class that does this in a purely probabilistic setting,
+with guaranteed global maximum likelihood convergence.
+\textbf{3. Conditional Random Fields}
+In what follows, \textbf{X} is a random variable over data sequences to be labeled, and \textbf{Y} is a random variable over
+corresponding label sequences. All components \textbf{Y} _i_ of \textbf{Y}
+are assumed to range over a finite label alphabet _Y_ . For example, \textbf{X} might range over natural language sentences and
+\textbf{Y} range over part-of-speech taggings of those sentences,
+with _Y_ the set of possible part-of-speech tags. The random variables \textbf{X} and \textbf{Y} are jointly distributed, but in a discriminative framework we construct a conditional model
+_p_ ( \textbf{Y} _|_ \textbf{X} ) from paired observation and label sequences, and
+do not explicitly model the marginal _p_ ( \textbf{X} ).
+\textbf{Definition} . _Let G_ = ( _V, E_ ) _be a graph such that_
+\textbf{Y} = ( \textbf{Y} _v_ ) _v_ _V, so that_ \textbf{Y} _is indexed by the vertices_
+_∈_
+_of G._ _Then_ ( \textbf{X} _,_ \textbf{Y} ) _is a conditional random field in_
+_case, when conditioned on_ \textbf{X} _, the random variables_ \textbf{Y} _v_
+_obey the Markov property with respect to the graph:_
+_p_ ( \textbf{Y} _v_ \textbf{X} _,_ \textbf{Y} _w, w_ = _v_ ) = _p_ ( \textbf{Y} _v_ \textbf{X} _,_ \textbf{Y} _w, w_ _v_ ) _, where_
+_w ∼_ _v | means that _ _w and v are neighbors in |_ _∼ G._
+Thus, a CRF is a random field globally conditioned on the
+observation \textbf{X} . Throughout the paper we tacitly assume
+that the graph _G_ is fixed. In the simplest and most impor
+3Weighted determinization and minimization techniques shift
+transition weights while preserving overall path weight (Mohri,
+2000); their connection to this discussion deserves further study.
+tant example for modeling sequences, _G_ is a simple chain
+or line: _G_ = ( _V_ = _{_ 1 _,_ 2 _, . . . m}, E_ = _{_ ( _i, i_ + 1) _}_ ).
+\textbf{X} may also have a natural graph structure; yet in general it is not necessary to assume that \textbf{X} and \textbf{Y} have the
+same graphical structure, or even that \textbf{X} has any graphical structure at all. However, in this paper we will be
+most concerned with sequences \textbf{X} = ( \textbf{X} 1 _,_ \textbf{X} 2 _, . . .,_ \textbf{X} _n_ )
+and \textbf{Y} = ( \textbf{Y} 1 _,_ \textbf{Y} 2 _, . . .,_ \textbf{Y} _n_ ).
+If the graph _G_ = ( _V, E_ ) of \textbf{Y} is a tree (of which a chain
+is the simplest example), its cliques are the edges and vertices. Therefore, by the fundamental theorem of random
+fields (Hammersley & Clifford, 1971), the joint distribution over the label sequence \textbf{Y} given \textbf{X} has the form
+As a particular case, we can construct an HMM-like CRF
+by defining one feature for each state pair ( _y_ _[]_ _, y_ ), and one
+feature for each state-observation pair ( _y, x_ ):
+_fy,y_ ( _<u, v>,_ \textbf{y} _<u,v>,_ \textbf{x} ) = _δ_ ( \textbf{y} _u, y_ _[]_ ) _δ_ ( \textbf{y} _v, y_ )
+_|_
+_gy,x_ ( _v,_ \textbf{y} _|v,_ \textbf{x} ) = _δ_ ( \textbf{y} _v, y_ ) _δ_ ( \textbf{x} _v, x_ ) .
+The corresponding parameters _λy,y_ and _µy,x_ play a similar role to the (logarithms of the) usual HMM parameters
+_p_ ( _y_ _[]_ _| y_ ) and _p_ ( _x|y_ ). Boltzmann chain models (Saul & Jordan, 1996; MacKay, 1996) have a similar form but use a
+single normalization constant to yield a joint distribution,
+whereas CRFs use the observation-dependent normalization _Z_ ( \textbf{x} ) for conditional distributions.
+Although it encompasses HMM-like models, the class of
+conditional random fields is much more expressive, because it allows arbitrary dependencies on the observation
+_pθ_ ( \textbf{y}  _|_ \textbf{x} ) _∝_ (1)
+
+
+exp
+ []
+_v∈V,k_
+_e∈E,k_
+_λk fk_ ( _e,_ \textbf{y} _|e,_ \textbf{x} ) +
+_µk gk_ ( _v,_ \textbf{y} _|v,_ \textbf{x} ),
+where \textbf{x} is a data sequence, \textbf{y} a label sequence, and \textbf{y} _|S_ is
+the set of components of \textbf{y} associated with the vertices in
+subgraph _S_ .
+We assume that the _features fk_ and _gk_ are given and fixed.
+For example, a Boolean vertex feature _gk_ might be true if
+the word \textbf{X} _i_ is upper case and the tag \textbf{Y} _i_ is “proper noun.”
+The parameter estimation problem is to determine the parameters _θ_ = ( _λ_ 1 _, λ_ 2 _, . . ._ ; _µ_ 1 _, µ_ 2 _, . . ._ ) from training data
+_D_ = _{_ ( \textbf{x} [(] _[i]_ [)] _,_ \textbf{y} [(] _[i]_ [)] ) _}i_ _[N]_ =1 [with empirical distribution][ ] _[p]_ [(] \textbf{[x]} _[,]_ \textbf{[ y]} [)][.]
+In Section 4 we describe an iterative scaling algorithm that
+maximizes the log-likelihood objective function _O_ ( _θ_ ):
+_O_ ( _θ_ ) =
+_∝_
+ _N_
+_i_ =1
+\textbf{x} _,_ \textbf{y}
+log _pθ_ ( \textbf{y} [(] _[i]_ [)] \textbf{x} [(] _[i]_ [)] )
+_|_
+ _p_ ( \textbf{x} _,_ \textbf{y} ) log _pθ_ ( \textbf{y} \textbf{x} ) .
+_|_
+\textbf{Y} _i−_ 1 \textbf{Y} _i_ \textbf{Y} _i_ +1
+\textbf{Y} _i−_ 1 \textbf{Y} _i_ \textbf{Y} _i_ +1
+_i−_ 1
+✲
+✻
+_i_ \textbf{Y}
+✲
+✻
+❝
++1
+✻
+❝
+\textbf{Y} _i−_ 1 \textbf{Y} _i_ \textbf{Y} _i_ +1
+❝
+\textbf{Y} _i−_ 1
+✲
+_i_
+✲
+❄
+ _i_
+_i_
+❄
+ _i_
+❝
+_i_
+❝ _i_
+❄
+\textbf{X}  _i_
+\textbf{X} _i−_ 1 \textbf{X} _i_ \textbf{X} _i_ +1
+\textbf{X} _i−_ 1 \textbf{X} _i_ \textbf{X} _i_ +1
+\textbf{X} _i−_ 1 \textbf{X} _i_ \textbf{X} _i_ +1
+_i_ ❝
+_Figure 2._ Graphical structures of simple HMMs (left), MEMMs (center), and the chain-structured case of CRFs (right) for sequences.
+An open circle indicates that the variable is not generated by the model.
+sequence. In addition, the features do not need to specify
+completely a state or observation, so one might expect that
+the model can be estimated from less training data. Another
+attractive property is the convexity of the loss function; indeed, CRFs share all of the convexity properties of general
+maximum entropy models.
+For the remainder of the paper we assume that the dependencies of \textbf{Y}, conditioned on \textbf{X}, form a chain. To simplify some expressions, we add special start and stop states
+\textbf{Y} 0 = start and \textbf{Y} _n_ +1 = stop. Thus, we will be using the
+graphical structure shown in Figure 2. For a chain structure, the conditional probability of a label sequence can be
+expressed concisely in matrix form, which will be useful
+in describing the parameter estimation and inference algorithms in Section 4. Suppose that _pθ_ ( \textbf{Y} \textbf{X} ) is a CRF
+_|_
+given by (1). For each position _i_ in the observation sequence \textbf{x}, we define the _|Y| × |Y|_ matrix random variable
+_Mi_ ( \textbf{x} ) = [ _Mi_ ( _y_ _[]_ _, y_ \textbf{x} )] by
+_|_
+of the training data. Both algorithms are based on the improved iterative scaling (IIS) algorithm of Della Pietra et al.
+(1997); the proof technique based on auxiliary functions
+can be extended to show convergence of the algorithms for
+CRFs.
+Iterative scaling algorithms update the weights as _λk_
+_λδλk_ + _k_ and _δλ δµk_ and _k_ . In particular, the IIS update _µk ←_ _µk_ + _δµk_ for appropriately chosen _δλk_ for an edge _←_
+feature _fk_ is the solution of
+_T_ ( \textbf{x} _,_ \textbf{y} )
+_E_ [ _fk_ ]
+=
+=def
+\textbf{x} _,_ \textbf{y}
+_fk_ ( _ei,_ \textbf{y} _ei,_ \textbf{x} ) _e_ _[δλ][k][T]_ [ (] \textbf{[x]} _[,]_ \textbf{[y]} [)] .
+_|_
+\textbf{x} _,_ \textbf{y}
+_n_ +1
+ _p_ ( \textbf{x} ) _p_ ( \textbf{y} _|_ \textbf{x} )
+_i_ =1
+ _p_ ( \textbf{x} _,_ \textbf{y} )
+_n_ +1
+_i_ =1
+_fk_ ( _ei,_ \textbf{y} _ei,_ \textbf{x} )
+_|_
+where _T_ ( \textbf{x} _,_ \textbf{y} ) is the _total feature count_
+_i,k_
+_fk_ ( _ei,_ \textbf{y} _ei,_ \textbf{x} ) +
+_|_
+=def
+_gk_ ( _vi,_ \textbf{y} _vi,_ \textbf{x} ) .
+_|_
+_i,k_
+_Mi_ ( _y_ _[]_ _, y |_ \textbf{x} ) = exp (Λ _i_ ( _y_ _[]_ _, y |_ \textbf{x} ))
+Λ _i_ ( _y_ _[]_ _, y |_ \textbf{x} ) =  _k_ _[λ][k][ f][k]_ [(] _[e][i][,]_ \textbf{[ Y]} _[|][e]_
+ _k_ _[λ][k][ f][k]_ [(] _[e][i][,]_ \textbf{[ Y]} _[|][e]_ _i_ [= (] _[y][][, y]_ [)] _[,]_ \textbf{[ x]} [) +]
+_k_ _[µ][k][ g][k]_ [(] _[v][i][,]_ \textbf{[ Y]} _[|][v]_ _i_ [=] _[ y,]_ \textbf{[ x]} [)][,]
+where _ei_ is the edge with labels ( \textbf{Y} _i_ 1 _,_ \textbf{Y} _i_ ) and _vi_ is the
+_−_
+vertex with label \textbf{Y} _i_ . In contrast to generative models, conditional models like CRFs do not need to enumerate over
+all possible observation sequences \textbf{x}, and therefore these
+matrices can be computed directly as needed from a given
+training or test observation sequence \textbf{x} and the parameter
+vector _θ_ . Then the normalization (partition function) _Zθ_ ( \textbf{x} )
+is the (start _,_ stop) entry of the product of these matrices:
+_Zθ_ ( \textbf{x} ) = ( _M_ 1( \textbf{x} ) _M_ 2( \textbf{x} ) _· · · Mn_ +1( \textbf{x} ))start _,_ stop .
+Using this notation, the conditional probability of a label
+sequence \textbf{y} is written as
+The equations for vertex feature updates _δµk_ have similar
+form.
+However, efficiently computing the exponential sums on
+the right-hand sides of these equations is problematic, because _T_ ( \textbf{x} _,_ \textbf{y} ) is a global property of ( \textbf{x} _,_ \textbf{y} ), and dynamic
+programming will sum over sequences with potentially
+varying _T_ . To deal with this, the first algorithm, Algorithm
+S, uses a “slack feature.” The second, Algorithm T, keeps
+track of partial _T_ totals.
+For Algorithm S, we define the _slack feature_ by
+_S −_
+_s_ ( \textbf{x} _,_ \textbf{y} )
+=def
+_fk_ ( _ei,_ \textbf{y} _ei,_ \textbf{x} )
+_|_ _−_
+_k_
+_i_
+_gk_ ( _vi,_ \textbf{y} _vi,_ \textbf{x} ),
+_|_
+_i_
+ _n_ +1
+_k_
+_pθ_ ( \textbf{y} \textbf{x} ) =
+_|_
+_n_ +1
+_i_ =1 ~~~~ _[M]_ _n_ +1 _[i]_ [(] \textbf{[y]} _[i][−]_ [1] _[,]_ \textbf{[ y]} ~~~~ _[i][ |]_ \textbf{[ x]} [)]
+_n_ +1
+_i_ =1 _[M][i]_ [(] \textbf{[x]} [)]
+~~~~
+,
+start _,_ stop
+where \textbf{y} 0 = start and \textbf{y} _n_ +1 = stop.
+\textbf{4. Parameter Estimation for CRFs}
+We now describe two iterative scaling algorithms to find
+the parameter vector _θ_ that maximizes the log-likelihood
+where _S_ is a constant chosen so that _s_ ( \textbf{x} [(] _[i]_ [)] _,_ \textbf{y} ) _≥_ 0 for all
+\textbf{y} and all observation vectors \textbf{x} [(] _[i]_ [)] in the training set, thus
+making _T_ ( \textbf{x} _,_ \textbf{y} ) = _S_ . Feature _s_ is “global,” that is, it does
+not correspond to any particular edge or vertex.
+For each index _i_ = 0 _, . . ., n_ + 1 we now define the _forward_
+_vectors αi_ ( \textbf{x} ) with base case
+_α_ 0( _y_ \textbf{x} ) =
+_|_
+1 if _y_ = start
+0 otherwise
+and recurrence
+_αi_ ( \textbf{x} ) = _αi−_ 1( \textbf{x} ) _Mi_ ( \textbf{x} ) .
+Similarly, the _backward vectors βi_ ( \textbf{x} ) are defined by
+_βk_ and _γk_ are the unique positive roots to the following
+polynomial equations
+_T_ max
+_i_ =0
+_ak,t βk_ _[t]_ [=][ ] _[Ef][k]_ _[,]_
+_T_ max
+_i_ =0
+_bk,t γk_ _[t]_ [=][ ] _[Eg][k]_ [,] (2)
+_βn_ +1( _y_ \textbf{x} ) =
+_|_
+1 if _y_ = stop
+0 otherwise
+and
+_βi_ ( \textbf{x} ) _[]_ = _Mi_ +1( \textbf{x} ) _βi_ +1( \textbf{x} ) .
+With these definitions, the update equations are
+_δλk_ = [1]
+_S_ [log]
+_Efk_
+_Efk_
+_,_ _δµk_ = [1]
+_S_ [log]
+_Egk_
+_Egk_
+,
+where
+_Efk_ =
+_Egk_ =
+\textbf{x}
+\textbf{x}
+_gk_ ( _vi,_ \textbf{y} _|vi_ = _y,_ \textbf{x} ) _×_
+_y_ _[]_ _,y_
+_n_ +1
+ _p_ ( \textbf{x} )
+_i_ =1
+ _n_
+ _p_ ( \textbf{x} )
+_i_ =1
+_y_
+_fk_ ( _ei,_ \textbf{y} _|ei_ = ( _y_ _[]_ _, y_ ) _,_ \textbf{x} ) _×_
+which can be easily computed by Newton’s method.
+A single iteration of Algorithm S and Algorithm T has
+roughly the same time and space complexity as the well
+known Baum-Welch algorithm for HMMs. To prove convergence of our algorithms, we can derive an auxiliary
+function to bound the change in likelihood from below; this
+method is developed in detail by Della Pietra et al. (1997).
+The full proof is somewhat detailed; however, here we give
+an idea of how to derive the auxiliary function. To simplify
+notation, we assume only edge features _fk_ with parameters
+_λk_ .
+Given two parameter settings _θ_ = ( _λ_ 1 _, λ_ 2 _, . . ._ ) and _θ_ _[]_ =
+( _λ_ 1 + _δλ_ 1 _, λ_ 2 + _δλ_ 2 _, . . ._ ), we bound from below the change
+in the objective function with an _auxiliary function A_ ( _θ_ _[]_ _, θ_ )
+as follows
+_αi−_ 1( _y_ _[]_ _|_ \textbf{x} ) _Mi_ ( _y_ _[]_ _, y |_ \textbf{x} ) _βi_ ( _y |_ \textbf{x} )
+_Zθ_ ~~(~~ ~~\textbf{x}~~ ~~)~~
+_O_ ( _θ_ _[]_ ) _−O_ ( _θ_ ) =
+\textbf{x} _,_ \textbf{y}
+ _p_ ( \textbf{x} _,_ \textbf{y} ) log _[p][θ][]_ [(] \textbf{[y]} _[ |]_ \textbf{[ x]} [)]
+~~_p_~~ _θ_ ~~(~~ ~~\textbf{y}~~ ~~\textbf{x}~~ ~~)~~
+_|_
+= ( _θ_ _[]_ _−_ _θ_ ) _·_ _Ef_ [] _−_
+ _p_ ( \textbf{x} ) log _[Z][θ][]_ [(] \textbf{[x]} [)]
+_Zθ_ ~~(~~ ~~\textbf{x}~~ ~~)~~
+_αi_ ( _y_ \textbf{x} ) _βi_ ( _y_ \textbf{x} )
+_|_ _|_ .
+_Zθ_ ~~(~~ ~~\textbf{x}~~ ~~)~~
+_≥_ ( _θ_ _[]_ _−_ _θ_ ) _·_ _Ef_ [] _−_
+\textbf{x}
+\textbf{x}
+ _p_ ( \textbf{x} ) _[Z][θ][]_ [(] \textbf{[x]} [)]
+ _p_ ( \textbf{x} )
+_Zθ_ ~~(~~ ~~\textbf{x}~~ ~~)~~
+\textbf{x}
+\textbf{x} _,_ \textbf{y} _,k_
+ _p_ ( \textbf{x} ) _pθ_ ( \textbf{y} \textbf{x} ) _[f][k]_ [(] \textbf{[x]} _[,]_ \textbf{[ y]} [)]
+_|_ _T_ ~~(~~ ~~\textbf{x}~~ ~~)~~ _[e][δλ][k][T]_ [ (] \textbf{[x]} [)]
+The factors involving the forward and backward vectors in
+the above equations have the same meaning as for standard
+hidden Markov models. For example,
+_pθ_ ( \textbf{Y} _i_ = _y_ \textbf{x} ) = _αi_ ( _y |_ \textbf{x} ) _βi_ ( _y |_ \textbf{x} )
+_|_ _Zθ_ ~~(~~ ~~\textbf{x}~~ ~~)~~
+is the marginal probability of label \textbf{Y} _i_ = _y_ given that the
+observation sequence is \textbf{x} . This algorithm is closely related
+to the algorithm of Darroch and Ratcliff (1972), and MART
+algorithms used in image reconstruction.
+The constant _S_ in Algorithm S can be quite large, since in
+practice it is proportional to the length of the longest training observation sequence. As a result, the algorithm may
+converge slowly, taking very small steps toward the maximum in each iteration. If the length of the observations \textbf{x} [(] _[i]_ [)]
+and the number of active features varies greatly, a fasterconverging algorithm can be obtained by keeping track of
+feature totals for each observation sequence separately.
+Let _T_ ( \textbf{x} ) = maxdef \textbf{y} _T_ ( \textbf{x} _,_ \textbf{y} ). Algorithm T accumulates
+feature expectations into counters indexed by _T_ ( \textbf{x} ). More
+specifically, we use the forward-backward recurrences just
+introduced to compute the expectations _ak,t_ of feature _fk_
+and _bk,t_ of feature _gk_ given that _T_ ( \textbf{x} ) = _t_ . Then our parameter updates are _δλk_ = log _βk_ and _δµk_ = log _γk_, where
+= _δλ ·_ _Ef_ [] _−_
+_≥_ _δλ ·_ _Ef_ [] _−_
+_pθ_ ( \textbf{y} \textbf{x} ) _e_ _[δλ][·][f]_ [(] \textbf{[x]} _[,]_ \textbf{[y]} [)]
+_|_
+\textbf{y}
+=def _A_ ( _θ_ _[]_ _, θ_ )
+where the inequalities follow from the convexity of _−_ log
+and exp. Differentiating _A_ with respect to _δλk_ and setting
+the result to zero yields equation (2).
+\textbf{5. Experiments}
+We first discuss two sets of experiments with synthetic data
+that highlight the differences between CRFs and MEMMs.
+The first experiments are a direct verification of the label
+bias problem discussed in Section 2. In the second set of
+experiments, we generate synthetic data using randomly
+chosen hidden Markov models, each of which is a mixture of a first-order and second-order model. Competing
+_first-order_ models are then trained and compared on test
+data. As the data becomes more second-order, the test error rates of the trained models increase. This experiment
+corresponds to the common modeling practice of approximating complex local and long-range dependencies, as occur in natural data, by small-order Markov models. Our
+60
+50
+40
+30
+20
+0 10 20 30 40 50 60
+HMM Error
+0 10 20 30 40 50 60
+HMM Error
+10
+0
+0 10 20 30 40 50 60
+CRF Error
+60
+50
+40
+30
+20
+10
+0
+60
+50
+40
+30
+20
+10
+0
+_Figure 3._ Plots of 2 _×_ 2 error rates for HMMs, CRFs, and MEMMs on randomly generated synthetic data sets, as described in Section 5.2.
+As the data becomes “more second order,” the error rates of the test models increase. As shown in the left plot, the CRF typically
+significantly outperforms the MEMM. The center plot shows that the HMM outperforms the MEMM. In the right plot, each open square
+represents a data set with _α <_ [1] 2 [, and a solid circle indicates a data set with] _[ α][ ≥]_ [1] 2 [. The plot shows that when the data is mostly second]
+order ( _α ≥_ [1] 2 [), the discriminatively trained CRF typically outperforms the HMM. These experiments are not designed to demonstrate]
+[1]
+2 [, and a solid circle indicates a data set with] _[ α][ ≥]_ [1] 2
+order ( _α ≥_ 2 [), the discriminatively trained CRF typically outperforms the HMM. These experiments are not designed to demonstrate]
+the advantages of the additional representational power of CRFs and MEMMs relative to HMMs.
+results clearly indicate that even when the models are parameterized in exactly the same way, CRFs are more robust to inaccurate modeling assumptions than MEMMs or
+HMMs, and resolve the label bias problem, which affects
+the performance of MEMMs. To avoid confusion of different effects, the MEMMs and CRFs in these experiments
+_do not_ use overlapping features of the observations. Finally, in a set of POS tagging experiments, we confirm the
+advantage of CRFs over MEMMs. We also show that the
+addition of overlapping features to CRFs and MEMMs allows them to perform much better than HMMs, as already
+shown for MEMMs by McCallum et al. (2000).
+\textbf{5.1 Modeling label bias}
+We generate data from a simple HMM which encodes a
+noisy version of the finite-state network in Figure 1. Each
+state emits its designated symbol with probability 29 _/_ 32
+and any of the other symbols with probability 1 _/_ 32. We
+train both an MEMM and a CRF with the same topologies
+on the data generated by the HMM. The observation features are simply the identity of the observation symbols.
+In a typical run using 2 _,_ 000 training and 500 test samples,
+trained to convergence of the iterative scaling algorithm,
+the CRF error is 4 _._ 6% while the MEMM error is 42%,
+showing that the MEMM fails to discriminate between the
+two branches.
+\textbf{5.2 Modeling mixed-order sources}
+For these results, we use five labels, a-e ( _|Y|_ = 5), and 26
+observation values, A-Z ( _|X|_ = 26); however, the results
+were qualitatively the same over a range of sizes for _Y_ and
+_X_ . We generate data from a mixed-order HMM with state
+transition probabilities given by _α p_ larly, emission probabilities given by _α p_ have a standard first-order HMM. In order to limit the size22(( \textbf{yx} _ii | |_ \textbf{y y} _ii−,_ \textbf{x} 1 _,i_ \textbf{y} _−_ 1 _i−_ )+(12) + (1 _−α_ ) _− p_ 1( _α_ \textbf{x} ) _i | p p_ \textbf{y} 1( _iα_ \textbf{y} ) _p_ (. Thus, for _i_ \textbf{y} _|αi_ \textbf{y} _|_ ( \textbf{yx} _i−ii |_ 1 _−_ \textbf{y} )1 and, simi- _,i_ \textbf{y} _, α_ \textbf{x} _i_ = 0 _−i−_ 21)) = we=
+of the Bayes error rate for the resulting models, the conditional probability tables _pα_ are constrained to be sparse.
+In particular, _pα_ ( _y, y_ _[]_ ) can have at most two nonzero en
+_· |_
+tries, for each _y, y_ _[]_, and _pα_ ( _y, x_ _[]_ ) can have at most three
+_· |_
+nonzero entries for each _y, x_ _[]_ . For each randomly generated model, a sample of 1,000 sequences of length 25 is
+generated for training and testing.
+On each randomly generated training set, a CRF is trained
+using Algorithm S. (Note that since the length of the sequences and number of active features is constant, Algorithms S and T are identical.) The algorithm is fairly slow
+to converge, typically taking approximately 500 iterations
+for the model to stabilize. On the 500 MHz Pentium PC
+used in our experiments, each iteration takes approximately
+0.2 seconds. On the same data an MEMM is trained using
+iterative scaling, which does not require forward-backward
+calculations, and is thus more efficient. The MEMM training converges more quickly, stabilizing after approximately
+100 iterations. For each model, the Viterbi algorithm is
+used to label a test set; the experimental results do not significantly change when using forward-backward decoding
+to minimize the per-symbol error rate.
+The results of several runs are presented in Figure 3. Each
+plot compares two classes of models, with each point indicating the error rate for a single test set. As _α_ increases, the
+error rates generally increase, as the first-order models fail
+to fit the second-order data. The figure compares models
+parameterized as _µy_, _λy,y_, and _λy,y,x_ ; results for models
+parameterized as _µy_, _λy,y_, and _µy,x_ are qualitatively the
+same. As shown in the first graph, the CRF generally outperforms the MEMM, often by a wide margin of 10%–20%
+relative error. (The points for very small error rate, with
+_α <_ 0 _._ 01, where the MEMM does better than the CRF,
+are suspected to be the result of an insufficient number of
+training iterations for the CRF.)
+|model|error oov error|
+|---|---|
+|HMM<br>MEMM<br>CRF|5.69%<br>45.99%<br>6.37%<br>54.61%<br>5.55%<br>48.05%|
+|MEMM+<br>CRF+|4.81%<br>26.99%<br>4.27%<br>23.76%|
++Using spelling features
+_Figure 4._ Per-word error rates for POS tagging on the Penn treebank, using first-order models trained on 50% of the 1.1 million
+word corpus. The oov rate is 5.45%.
+\textbf{5.3 POS tagging experiments}
+To confirm our synthetic data results, we also compared
+HMMs, MEMMs and CRFs on Penn treebank POS tagging, where each word in a given input sentence must be
+labeled with one of 45 syntactic tags.
+We carried out two sets of experiments with this natural
+language data. First, we trained first-order HMM, MEMM,
+and CRF models as in the synthetic data experiments, introducing parameters _µy,x_ for each tag-word pair and _λy,y_
+for each tag-tag pair in the training set. The results are consistent with what is observed on synthetic data: the HMM
+outperforms the MEMM, as a consequence of the label bias
+problem, while the CRF outperforms the HMM. The error rates for training runs using a 50%-50% train-test split
+are shown in Figure 5.3; the results are qualitatively similar for other splits of the data. The error rates on outof-vocabulary (oov) words, which are not observed in the
+training set, are reported separately.
+In the second set of experiments, we take advantage of the
+power of conditional models by adding a small set of orthographic features: whether a spelling begins with a number or upper case letter, whether it contains a hyphen, and
+whether it ends in one of the following suffixes: -ing, ogy, -ed, -s, -ly, -ion, -tion, -ity, -ies. Here we find, as
+expected, that both the MEMM and the CRF benefit significantly from the use of these features, with the overall error
+rate reduced by around 25%, and the out-of-vocabulary error rate reduced by around 50%.
+One usually starts training from the all zero parameter vector, corresponding to the uniform distribution. However,
+for these datasets, CRF training with that initialization is
+much slower than MEMM training. Fortunately, we can
+use the optimal MEMM parameter vector as a starting
+point for training the corresponding CRF. In Figure 5.3,
+MEMM [+] was trained to convergence in around 100 iterations. Its parameters were then used to initialize the training of CRF [+], which converged in 1,000 iterations. In contrast, training of the same CRF from the uniform distribution had not converged even after 2,000 iterations.
+\textbf{6. Further Aspects of CRFs}
+Many further aspects of CRFs are attractive for applications and deserve further study. In this section we briefly
+mention just two.
+Conditional random fields can be trained using the exponential loss objective function used by the AdaBoost algorithm (Freund & Schapire, 1997). Typically, boosting is
+applied to classification problems with a small, fixed number of classes; applications of boosting to sequence labeling
+have treated each label as a separate classification problem
+(Abney et al., 1999). However, it is possible to apply the
+parallel update algorithm of Collins et al. (2000) to optimize the per-sequence exponential loss. This requires a
+forward-backward algorithm to compute efficiently certain
+feature expectations, along the lines of Algorithm T, except that each feature requires a separate set of forward and
+backward accumulators.
+Another attractive aspect of CRFs is that one can implement efficient feature selection and feature induction algorithms for them. That is, rather than specifying in advance which features of ( \textbf{X} _,_ \textbf{Y} ) to use, we could start from
+feature-generating rules and evaluate the benefit of generated features automatically on data. In particular, the feature induction algorithms presented in Della Pietra et al.
+(1997) can be adapted to fit the dynamic programming
+techniques of conditional random fields.
+\textbf{7. Related Work and Conclusions}
+As far as we know, the present work is the first to combine
+the benefits of conditional models with the global normalization of random field models. Other applications of exponential models in sequence modeling have either attempted
+to build generative models (Rosenfeld, 1997), which involve a hard normalization problem, or adopted local conditional models (Berger et al., 1996; Ratnaparkhi, 1996;
+McCallum et al., 2000) that may suffer from label bias.
+Non-probabilistic local decision models have also been
+widely used in segmentation and tagging (Brill, 1995;
+Roth, 1998; Abney et al., 1999). Because of the computational complexity of global training, these models are only
+trained to minimize the error of individual label decisions
+assuming that neighboring labels are correctly chosen. Label bias would be expected to be a problem here too.
+An alternative approach to discriminative modeling of sequence labeling is to use a permissive generative model,
+which can only model local dependencies, to produce a
+list of candidates, and then use a more global discriminative model to rerank those candidates. This approach is
+standard in large-vocabulary speech recognition (Schwartz
+& Austin, 1993), and has also been proposed for parsing
+(Collins, 2000). However, these methods fail when the correct output is pruned away in the first pass.
+Closest to our proposal are gradient-descent methods that
+adjust the parameters of all of the local classifiers to minimize a smooth loss function (e.g., quadratic loss) combining loss terms for each label. If state dependencies are local, this can be done efficiently with dynamic programming
+(LeCun et al., 1998). Such methods should alleviate label
+bias. However, their loss function is not convex, so they
+may get stuck in local minima.
+Conditional random fields offer a unique combination of
+properties: discriminatively trained models for sequence
+segmentation and labeling; combination of arbitrary, overlapping and agglomerative observation features from both
+the past and future; efficient training and decoding based
+on dynamic programming; and parameter estimation guaranteed to find the global optimum. Their main current limitation is the slow convergence of the training algorithm
+relative to MEMMs, let alone to HMMs, for which training
+on fully observed data is very efficient. In future work, we
+plan to investigate alternative training methods such as the
+update methods of Collins et al. (2000) and refinements on
+using a MEMM as starting point as we did in some of our
+experiments. More general tree-structured random fields,
+feature induction methods, and further natural data evaluations will also be investigated.
+\textbf{Acknowledgments}
+We thank Yoshua Bengio, L´eon Bottou, Michael Collins
+and Yann LeCun for alerting us to what we call here the label bias problem. We also thank Andrew Ng and Sebastian
+Thrun for discussions related to this work.
+\textbf{References}
+Abney, S., Schapire, R. E., & Singer, Y. (1999). Boosting
+applied to tagging and PP attachment. _Proc. EMNLP-_
+_VLC_ . New Brunswick, New Jersey: Association for
+Computational Linguistics.
+Berger, A. L., Della Pietra, S. A., & Della Pietra, V. J.
+(1996). A maximum entropy approach to natural language processing. _Computational Linguistics_, _22_ .
+Bottou, L. (1991). _Une_ _approche_ _th´eorique_ _de_
+_l’apprentissage connexionniste: Applications `a la recon-_
+_naissance de la parole_ . Doctoral dissertation, Universit´e
+de Paris XI.
+Brill, E. (1995). Transformation-based error-driven learn
+ing and natural language processing: a case study in part
+of speech tagging. _Computational Linguistics_, _21_, 543–
+565.
+Collins, M. (2000). Discriminative reranking for natural
+language parsing. _Proc. ICML 2000_ . Stanford, California.
+Collins, M., Schapire, R., & Singer, Y. (2000). Logistic re
+gression, AdaBoost, and Bregman distances. _Proc. 13th_
+_COLT_ .
+Darroch, J. N., & Ratcliff, D. (1972). Generalized iterative
+scaling for log-linear models. _The Annals of Mathemat-_
+_ical Statistics_, _43_, 1470–1480.
+Della Pietra, S., Della Pietra, V., & Lafferty, J. (1997). In
+ducing features of random fields. _IEEE Transactions on_
+_Pattern Analysis and Machine Intelligence_, _19_, 380–393.
+Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998).
+_Biological sequence analysis: Probabilistic models of_
+_proteins and nucleic acids_ . Cambridge University Press.
+Freitag, D., & McCallum, A. (2000). Information extrac
+tion with HMM structures learned by stochastic optimization. _Proc. AAAI 2000_ .
+Freund, Y., & Schapire, R. (1997). A decision-theoretic
+generalization of on-line learning and an application to
+boosting. _Journal of Computer and System Sciences_, _55_,
+119–139.
+Hammersley, J., & Clifford, P. (1971). Markov fields on
+finite graphs and lattices. Unpublished manuscript.
+LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998).
+Gradient-based learning applied to document recognition. _Proceedings of the IEEE_, _86_, 2278–2324.
+MacKay, D. J. (1996). Equivalence of linear Boltzmann
+chains and hidden Markov models. _Neural Computation_,
+_8_, 178–181.
+Manning, C. D., & Sch¨utze, H. (1999). _Foundations of sta-_
+_tistical natural language processing_ . Cambridge Massachusetts: MIT Press.
+McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum
+entropy Markov models for information extraction and
+segmentation. _Proc. ICML 2000_ (pp. 591–598). Stanford, California.
+Mohri, M. (1997). Finite-state transducers in language and
+speech processing. _Computational Linguistics_, _23_ .
+Mohri, M. (2000). Minimization algorithms for sequential
+transducers. _Theoretical Computer Science_, _234_, 177–
+201.
+Paz, A. (1971). _Introduction to probabilistic automata_ .
+Academic Press.
+Punyakanok, V., & Roth, D. (2001). The use of classifiers
+in sequential inference. _NIPS 13_ . Forthcoming.
+Ratnaparkhi, A. (1996). A maximum entropy model for
+part-of-speech tagging. _Proc. EMNLP_ . New Brunswick,
+New Jersey: Association for Computational Linguistics.
+Rosenfeld, R. (1997). A whole sentence maximum entropy
+language model. _Proceedings of the IEEE Workshop on_
+_Speech Recognition and Understanding_ . Santa Barbara,
+California.
+Roth, D. (1998). Learning to resolve natural language am
+biguities: A unified approach. _Proc. 15th AAAI_ (pp. 806–
+813). Menlo Park, California: AAAI Press.
+Saul, L., & Jordan, M. (1996). Boltzmann chains and hid
+den Markov models. _Advances in Neural Information_
+_Processing Systems 7_ . MIT Press.
+Schwartz, R., & Austin, S. (1993). A comparison of several
+approximate algorithms for finding multiple (N-BEST)
+sentence hypotheses. _Proc. ICASSP_ . Minneapolis, MN.
+\end{document}

references/2014.eacl.nguyen/paper.md ADDED Viewed

	@@ -0,0 +1,420 @@

+---
+title: "RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger"
+authors:
+  - "Dat Quoc Nguyen"
+  - "Dai Quoc Nguyen"
+  - "Dang Duc Pham"
+  - "Son Bao Pham"
+year: 2014
+venue: "EACL 2014 Demonstrations"
+url: "https://aclanthology.org/E14-2005/"
+---
+# **RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger**
+**Dat Quoc Nguyen** [1] and **Dai Quoc Nguyen** [1] and **Dang Duc Pham** [2] and **Son Bao Pham** [1]
+1
+Faculty of Information Technology
+University of Engineering and Technology
+Vietnam National University, Hanoi
+{datnq, dainq, sonpb}@vnu.edu.vn
+2
+L3S Research Center, Germany
+pham@L3S.de
+**Abstract**
+This paper describes our robust, easyto-use and language independent toolkit
+namely RDRPOSTagger which employs
+an error-driven approach to automatically
+construct a Single Classification Ripple
+Down Rules tree of transformation rules
+for POS tagging task. During the demonstration session, we will run the tagger on
+data sets in 15 different languages.
+**1** **Introduction**
+As one of the most important tasks in Natural
+Language Processing, Part-of-speech (POS) tagging is to assign a tag representing its lexical
+category to each word in a text. Recently, POS
+taggers employing machine learning techniques
+are still mainstream toolkits obtaining state-ofthe-art performances [1] . However, most of them are
+time-consuming in learning process and require a
+powerful computer for possibly training machine
+learning models.
+Turning to rule-based approaches, the most
+well-known method is proposed by Brill (1995).
+He proposed an approach to automatically learn
+transformation rules for the POS tagging problem.
+In the Brill’s tagger, a new selected rule is learned
+on a context that is generated by all previous rules,
+where a following rule will modify the outputs of
+all the preceding rules. Hence, this procedure returns a difficulty to control the interactions among
+a large number of rules.
+Our RDRPOSTagger is presented to overcome
+the problems mentioned above. The RDRPOSTagger exploits a failure-driven approach to automatically restructure transformation rules in the
+form of a Single Classification Ripple Down Rules
+(SCRDR) tree (Richards, 2009). It accepts interactions between rules, but a rule only changes the
+1
+http://aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_the_art)
+17
+outputs of some previous rules in a controlled context. All rules are structured in a SCRDR tree
+which allows a new exception rule to be added
+when the tree returns an incorrect classification.
+A specific description of our new RDRPOSTagger
+approach is detailed in (Nguyen et al., 2011).
+Packaged in a 0.6MB zip file, implementations
+in Python and Java can be found at the tagger’s
+website _http://rdrpostagger.sourceforge.net/_ . The
+following items exhibit properties of the tagger:
+_•_ The RDRPOSTagger is easy to configure and
+train. There are only two threshold parameters utilized to learn the rule-based model. Besides, the
+tagger is very simple to use with standard input
+and output, having clear usage and instructions
+available on its website.
+_•_ The RDRPOSTagger is language independent.
+This POS tagging toolkit has been successfully
+applied to English and Vietnamese. To train the
+toolkit for other languages, users just provide a
+lexicon of words and the most frequent associated
+tags. Moreover, it can be easily combined with existing POS taggers to reach an even better result.
+_•_ The RDRPOSTagger obtains very competitive
+accuracies. On Penn WSJ Treebank corpus (Marcus et al., 1993), taking WSJ sections 0-18 as the
+training set, the tagger achieves a competitive performance compared to other state-of-the-art English POS taggers on the test set of WSJ sections
+22-24. For Vietnamese, it outperforms all previous machine learning-based POS tagging systems
+to obtain an up-to-date highest result on the Vietnamese Treebank corpus (Nguyen et al., 2009).
+_•_ The RDRPOSTagger is fast. For instance in
+English, the time [2] taken to train the tagger on
+the WSJ sections 0-18 is **40** minutes. The tagging
+speed on the test set of the WSJ sections 22-24 is
+**2800** words/second accounted for the latest implementation in Python whilst it is **92k** words/second
+2Training and tagging times are computed on a Windows7 OS computer of Core 2Duo 2.4GHz & 3GB of memory.
+_Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics_, pages 17–20,
+Gothenburg, Sweden, April 26-30 2014. c _⃝_ 2014 Association for Computational Linguistics
+_Figure 1:_ A part of our SCRDR tree for English POS tagging.
+for the implementation in Java.
+**2** **SCRDR methodology**
+A SCRDR tree (Richards, 2009) is a binary tree
+with two distinct types of edges. These edges are
+typically called _except_ and _if-not_ edges. Associated with each node in a tree is a _rule_ . A rule has
+the form: _if α then β_ where _α_ is called the _condi-_
+_tion_ and _β_ is referred to as the _conclusion_ .
+Cases in SCRDR are evaluated by passing a
+case to the root of the tree. At any node in the
+tree, if the condition of a node _N_ ’s rule is satisfied by the case, the case is passed on to the exception child of _N_ using the _except_ link if it exists.
+Otherwise, the case is passed on to the _N_ ’s _if-not_
+child. The conclusion given by this process is the
+conclusion from the last node in the SCRDR tree
+which _fired_ (satisfied by the case). To ensure that
+a conclusion is always given, the root node typically contains a trivial condition which is always
+satisfied. This node is called the _default_ node.
+A new node containing a new rule (i.e. a new exception rule) is added to an SCRDR tree when the
+evaluation process returns the _wrong_ conclusion.
+The new node is attached to the last node in the
+evaluation path of the given case with the _except_
+link if the last node is the _fired_ one. Otherwise, it
+is attached with the _if-not_ link.
+For example with the SCRDR tree in the figure 1, given a case _“as/IN investors/NNS an-_
+_ticipate/VB a/DT recovery/NN”_ where _“antici-_
+_pate/VB”_ is the current word and tag pair, the case
+satisfies the conditions of the rules at nodes (0),
+(1) and (3), it then is passed to the node (6) (utilizing except links). As the case does not satisfy the
+condition of the rule at node (6), it will be transferred to node (7) using if-not link. Since the case
+does not fulfill the conditions of the rules at nodes
+(7) and (8), we have the evaluation path (0)-(1)(3)-(6)-(7)-(8) with fired node (3). Therefore, the
+tag for _“anticipate”_ is concluded as “VBP”.
+Rule (1) - the rule at node (1) - is the exception
+rule [3] of the default rule (0). As node (2) is the ifnot child node of the node (1), the associated rule
+(2) is also an exception rule of the rule (0). Similarly, both rules (3) and (4) are exception rules of
+the rule (1) whereas all rules (6), (7) and (8) are
+exception rules of the rule (3), and so on. Thus,
+the exception structure of the SCRDR tree extends
+to 4 levels: rules (1) and (2) at layer 1, rules (3),
+(4) and (5) at layer 2, rules (6), (7) and (8) at layer
+3, and rule (9) at layer 4.
+**3** **The RDRPOSTagger toolkit**
+The toolkit consists of four main components: Utility, Initial-tagger, SCRDR-learner and
+SCRDR-tagger.
+**3.1** **The Utility**
+The major functions of this component are to evaluate tagging performances (displaying accuracy
+results), and to create a lexicon of words and the
+most frequent associated tags as well as to extract
+_Raw corpus_ from an input golden training corpus.
+**3.2** **The Initial-tagger**
+The initial-tagger developed in the RDRPOSTagger toolkit is based on the lexicon which is generated in the use of the Utility component to assign a tag for each word. To deal with unknown
+words, the initial-tagger utilizes several regular expressions or heuristics for English and Vietnamese
+whereas the most frequent tag in the training corpus is exploited to label unknown-words when
+adapting to other languages.
+**3.3** **The SCRDR-learner**
+The SCRDR-learner component uses a failuredriven method to automatically build a SCRDR
+tree of transformation rules. Figure 3 describes the
+learning process of the learner.
+3The default rule is the unique rule which is not an exception rule of any other rule. Every rule in layer _n_ is an exception rule of a rule in layer _n −_ 1.
+18
+#12: _if_ next1 _[st]_ Tag == **“object.next1** _[st]_ **Tag”** _then_ tag = **“correctTag”**
+#14: _if_ prev1 _[st]_ Tag == **“object.prev1** _[st]_ **Tag”** _then_ tag = **“correctTag”**
+#18: _if_ word == **“object.word”** && next1 _[st]_ Tag == **“object.next1** _[st]_ **Tag”** _then_ tag = **“correctTag”**
+_Figure 2:_ Rule template examples.
+_Figure 3:_ The diagram of the learning process of the learner.
+The _Initialized corpus_ is returned by performing the Initial-tagger on the Raw corpus. By comparing the initialized one with the _Golden corpus_,
+an _Object-driven dictionary_ of pairs ( _**O**_ _bject, cor-_
+_rectTag_ ) is produced in which _Object_ captures the
+5-word window context covering the current word
+and its tag in following format ( _previous_ 2 _[nd]_ _word_
+_/ previous_ 2 _[nd]_ _tag, previous_ 1 _[st]_ _word / previous_
+1 _[st]_ _tag, word / currentTag, next_ 1 _[st]_ _word / next_ 1 _[st]_
+_tag, next_ 2 _[nd]_ _word / next_ 2 _[nd]_ _tag_ ) from the initialized corpus, and the _correctTag_ is the corresponding tag of the current word in the golden corpus.
+There are 27 _Rule templates_ applied for _Rule se-_
+_lector_ which is to select the most suitable rules
+to build the _SCRDR tree_ . Examples of the rule
+templates are shown in figure 2 where elements
+in bold will be replaced by concrete values from
+_Object_ s in the object-driven dictionary to create
+concrete rules. The SCRDR tree of rules is initialized by building the default rule and all exception
+rules of the default one in form of _if currentTag =_
+_“_ _**TAG**_ _” then tag = “_ _**TAG**_ _”_ at the layer-1 exception
+structure, for example rules (1) and (2) in the figure 1, and the like. The learning approach to construct new exception rules to the tree is as follows:
+_•_ At a node-F in the SCRDR tree, let _SO_ be
+the set of Objects from the object-driven dictionary, which those Objects are fired at the node-F
+but their initialized tags are incorrect (the _current-_
+_Tag_ is not the _correctTag_ associated). It means that
+node-F gives wrong conclusions to all Objects in
+the _SO_ set.
+_•_ In order to select a new exception rule of the
+rule at node-F from all concrete rules which are
+generated for all Objects in the _SO_ set, the selected rule have to satisfy constraints: (i) The rule
+must be unsatisfied by cases for which node-F has
+already given correct conclusions. This constraint
+does not apply to node-F at layer-1 exception structure. (ii) The rule must associate to a highest score
+value of subtracting B from A in comparison to
+other ones, where A and B are the numbers of the
+_SO_ ’s Objects which are correctly and incorrectly
+concluded by the rule respectively. (iii) And the
+highest value is not smaller than a given threshold.
+The SCRDR-learner applies two threshold parameters: first threshold is to choose exception
+rules at the layer-2 exception structure (e.g rules
+(3), (4) and (5) in figure 1), and second threshold
+is to select rules for higher exception layers.
+_•_ The process to add new exception rules is repeated until there is no rule satisfying the constraints above. At each iteration, a new rule is
+added to the current SCRDR tree to correct error
+conclusions made by the tree.
+**3.4** **The SCRDR-tagger**
+The SCRDR-tagger component is to perform the
+POS tagging on a raw text corpus where each line
+is a sequence of words separated by white space
+characters. The component labels the text corpus
+by using the Initial-tagger. It slides due to a leftto-right direction on a 5-word window context to
+generate a corresponding Object for each initially
+tagged word. The Object is then classified by the
+learned SCRDR tree model to produce final conclusion tag of the word as illustrated in the example in the section 2.
+**4** **Evaluation**
+The RDRPOSTagger has already been successfully applied to English and Vietnamese corpora.
+**4.1** **Results for English**
+Experiments for English employed the Penn WSJ
+Treebank corpus to exploit the WSJ sections 0-18
+(38219 sentences) for training, the WSJ sections
+19-21 (5527 sentences) for validation and the WSJ
+sections 22-24 (5462 sentences) for test.
+Using a lexicon created in the use of the train
+19
+ing set, the Initial-tagger obtains an accuracy of
+93.51% on the test set. By varying the thresholds
+on the validation set, we have found the most suitable values [4] of 3 and 2 to be used for evaluating
+the RDRPOSTagger on the test set. Those thresholds return a SCRDR tree model of 2319 rules
+in a 4-level exception structure. The training time
+and tagging speed for those thresholds are mentioned in the introduction section. On the same test
+set, the RDRPOSTagger achieves a performance at
+96.49% against 96.46% accounted for the state-ofthe-art POS tagger TnT (Brants, 2000).
+For another experiment, only in training process: 1-time occurrence words in training set are
+initially tagged as out-of-dictionary words. With
+a learned tree model of 2418 rules, the tagger
+reaches an accuracy of 96.51% on the test set.
+Retraining the tagger utilizing another initial
+tagger [5] developed in the Brill’s tagger (Brill,
+1995) instead of the lexicon-based initial one,
+the RDRPOSTagger gains an accuracy result of
+96.57% which is slightly higher than the performance at 96.53% of the Brill’s.
+**4.2** **Results for Vietnamese**
+In the first Evaluation Campaign [6] on Vietnamese
+Language Processing, the POS tagging track provided a golden training corpus of 28k sentences
+(631k words) collected from two sources of the
+national VLSP project and the Vietnam Lexicography Center, and a raw test corpus of 2100 sentences (66k words). The training process returned
+a SCRDR tree of 2896 rules [7] . Obtaining a highest
+performance on the test set, the RDRPOSTagger
+surpassed all other participating systems.
+We also carry out POS tagging experiments on
+the golden corpus of 28k sentences and on the
+Vietnamese Treebank of 10k sentences (Nguyen
+et al., 2009) according to 5-fold cross-validation
+scheme [8] . The average accuracy results are presented in the table 1. Achieving an accuracy of
+92.59% on the Vietnamese Treebank, the RDR
+4The thresholds 3 and 2 are reused for all other experiments in English and Vietnamese.
+5The initial tagger gets a result of 93.58% on the test set.
+6http://uet.vnu.edu.vn/rivf2013/campaign.html
+7It took 100 minutes to construct the tree leading to tagging speeds of 1100 words/second and 45k words/second for
+the implementations in Python and Java, respectively, on the
+computer of Core 2Duo 2.4GHz & 3GB of memory.
+8In each cross-validation run, one fold is selected as test
+set, 4 remaining folds are merged as training set. The initial
+tagger exploits a lexicon generated from the training set. In
+training process, 1-time occurrence words are initially labeled
+as out-of-lexicon words.
+**Table 1:** Accuracy results for Vietnamese
+| Corpus | Initial-tagger | RDRPOSTagger |
+|--------|----------------|--------------|
+| 28k    | 91.18%         | 93.42%       |
+| 10k    | 90.59%         | 92.59%       |
+POSTagger outperforms previous Maximum Entropy Model, Conditional Random Field and Support Vector Machine-based POS tagging systems
+(Tran et al., 2009) on the same evaluation scheme.
+**5** **Demonstration and Conclusion**
+In addition to English and Vietnamese, in the
+demonstration session, we will present promising
+experimental results and run the RDRPOSTagger
+for other languages including Bulgarian, Czech,
+Danish, Dutch, French, German, Hindi, Italian,
+Lao, Portuguese, Spanish, Swedish and Thai. We
+will also let the audiences to contribute their own
+data sets for retraining and testing the tagger.
+In this paper, we describe the rule-based
+POS tagging toolkit RDRPOSTagger to automatically construct transformation rules in form
+of the SCRDR exception structure. We believe that our robust, easy-to-use and languageindependent toolkit RDRPOSTagger can be useful
+for NLP/CL-related tasks.
+**References**
+Thorsten Brants. 2000. TnT: a statistical part-ofspeech tagger. In _Proc. of 6th Applied Natural Lan-_
+_guage Processing Conference_, pages 224–231.
+Eric Brill. 1995. Transformation-based error-driven
+learning and natural language processing: a case
+study in part-of-speech tagging. _Comput. Linguist._,
+21(4):543–565.
+Mitchell P Marcus, Mary Ann Marcinkiewicz, and
+Beatrice Santorini. 1993. Building a large annotated corpus of English: the penn treebank. _Comput._
+_Linguist._, 19(2):313–330.
+Phuong Thai Nguyen, Xuan Luong Vu, Thi
+Minh Huyen Nguyen, Van Hiep Nguyen, and
+Hong Phuong Le. 2009. Building a Large
+Syntactically-Annotated Corpus of Vietnamese. In
+_Proc. of LAW III workshop_, pages 182–185.
+Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham,
+and Dang Duc Pham. 2011. Ripple Down Rules for
+Part-of-Speech Tagging. In _Proc. of 12th CICLing -_
+_Volume Part I_, pages 190–201.
+Debbie Richards. 2009. Two decades of ripple down
+rules research. _Knowledge Engineering Review_,
+24(2):159–184.
+Oanh Thi Tran, Cuong Anh Le, Thuy Quang Ha, and
+Quynh Hoang Le. 2009. An experimental study
+on vietnamese pos tagging. _Proc. of the 2009 Inter-_
+_national Conference on Asian Language Processing_,
+pages 23–27.
+20

references/2014.eacl.nguyen/paper.tex ADDED Viewed

	@@ -0,0 +1,419 @@

+\documentclass[11pt]{article}
+\usepackage[utf8]{inputenc}
+\usepackage{amsmath,amssymb}
+\usepackage{booktabs}
+\usepackage{hyperref}
+\usepackage{graphicx}
+\begin{document}
+\section*{\textbf{RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger}}
+\textbf{Dat Quoc Nguyen} [1] and \textbf{Dai Quoc Nguyen} [1] and \textbf{Dang Duc Pham} [2] and \textbf{Son Bao Pham} [1]
+1
+Faculty of Information Technology
+University of Engineering and Technology
+Vietnam National University, Hanoi
+{datnq, dainq, sonpb}@vnu.edu.vn
+2
+L3S Research Center, Germany
+pham@L3S.de
+\textbf{Abstract}
+This paper describes our robust, easyto-use and language independent toolkit
+namely RDRPOSTagger which employs
+an error-driven approach to automatically
+construct a Single Classification Ripple
+Down Rules tree of transformation rules
+for POS tagging task. During the demonstration session, we will run the tagger on
+data sets in 15 different languages.
+\textbf{1} \textbf{Introduction}
+As one of the most important tasks in Natural
+Language Processing, Part-of-speech (POS) tagging is to assign a tag representing its lexical
+category to each word in a text. Recently, POS
+taggers employing machine learning techniques
+are still mainstream toolkits obtaining state-ofthe-art performances [1] . However, most of them are
+time-consuming in learning process and require a
+powerful computer for possibly training machine
+learning models.
+Turning to rule-based approaches, the most
+well-known method is proposed by Brill (1995).
+He proposed an approach to automatically learn
+transformation rules for the POS tagging problem.
+In the Brill’s tagger, a new selected rule is learned
+on a context that is generated by all previous rules,
+where a following rule will modify the outputs of
+all the preceding rules. Hence, this procedure returns a difficulty to control the interactions among
+a large number of rules.
+Our RDRPOSTagger is presented to overcome
+the problems mentioned above. The RDRPOSTagger exploits a failure-driven approach to automatically restructure transformation rules in the
+form of a Single Classification Ripple Down Rules
+(SCRDR) tree (Richards, 2009). It accepts interactions between rules, but a rule only changes the
+1
+http://aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_the_art)
+17
+outputs of some previous rules in a controlled context. All rules are structured in a SCRDR tree
+which allows a new exception rule to be added
+when the tree returns an incorrect classification.
+A specific description of our new RDRPOSTagger
+approach is detailed in (Nguyen et al., 2011).
+Packaged in a 0.6MB zip file, implementations
+in Python and Java can be found at the tagger’s
+website _http://rdrpostagger.sourceforge.net/_ . The
+following items exhibit properties of the tagger:
+_•_ The RDRPOSTagger is easy to configure and
+train. There are only two threshold parameters utilized to learn the rule-based model. Besides, the
+tagger is very simple to use with standard input
+and output, having clear usage and instructions
+available on its website.
+_•_ The RDRPOSTagger is language independent.
+This POS tagging toolkit has been successfully
+applied to English and Vietnamese. To train the
+toolkit for other languages, users just provide a
+lexicon of words and the most frequent associated
+tags. Moreover, it can be easily combined with existing POS taggers to reach an even better result.
+_•_ The RDRPOSTagger obtains very competitive
+accuracies. On Penn WSJ Treebank corpus (Marcus et al., 1993), taking WSJ sections 0-18 as the
+training set, the tagger achieves a competitive performance compared to other state-of-the-art English POS taggers on the test set of WSJ sections
+22-24. For Vietnamese, it outperforms all previous machine learning-based POS tagging systems
+to obtain an up-to-date highest result on the Vietnamese Treebank corpus (Nguyen et al., 2009).
+_•_ The RDRPOSTagger is fast. For instance in
+English, the time [2] taken to train the tagger on
+the WSJ sections 0-18 is \textbf{40} minutes. The tagging
+speed on the test set of the WSJ sections 22-24 is
+\textbf{2800} words/second accounted for the latest implementation in Python whilst it is \textbf{92k} words/second
+2Training and tagging times are computed on a Windows7 OS computer of Core 2Duo 2.4GHz & 3GB of memory.
+_Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics_, pages 17–20,
+Gothenburg, Sweden, April 26-30 2014. c _⃝_ 2014 Association for Computational Linguistics
+_Figure 1:_ A part of our SCRDR tree for English POS tagging.
+for the implementation in Java.
+\textbf{2} \textbf{SCRDR methodology}
+A SCRDR tree (Richards, 2009) is a binary tree
+with two distinct types of edges. These edges are
+typically called _except_ and _if-not_ edges. Associated with each node in a tree is a _rule_ . A rule has
+the form: _if α then β_ where _α_ is called the _condi-_
+_tion_ and _β_ is referred to as the _conclusion_ .
+Cases in SCRDR are evaluated by passing a
+case to the root of the tree. At any node in the
+tree, if the condition of a node _N_ ’s rule is satisfied by the case, the case is passed on to the exception child of _N_ using the _except_ link if it exists.
+Otherwise, the case is passed on to the _N_ ’s _if-not_
+child. The conclusion given by this process is the
+conclusion from the last node in the SCRDR tree
+which _fired_ (satisfied by the case). To ensure that
+a conclusion is always given, the root node typically contains a trivial condition which is always
+satisfied. This node is called the _default_ node.
+A new node containing a new rule (i.e. a new exception rule) is added to an SCRDR tree when the
+evaluation process returns the _wrong_ conclusion.
+The new node is attached to the last node in the
+evaluation path of the given case with the _except_
+link if the last node is the _fired_ one. Otherwise, it
+is attached with the _if-not_ link.
+For example with the SCRDR tree in the figure 1, given a case _“as/IN investors/NNS an-_
+_ticipate/VB a/DT recovery/NN”_ where _“antici-_
+_pate/VB”_ is the current word and tag pair, the case
+satisfies the conditions of the rules at nodes (0),
+(1) and (3), it then is passed to the node (6) (utilizing except links). As the case does not satisfy the
+condition of the rule at node (6), it will be transferred to node (7) using if-not link. Since the case
+does not fulfill the conditions of the rules at nodes
+(7) and (8), we have the evaluation path (0)-(1)(3)-(6)-(7)-(8) with fired node (3). Therefore, the
+tag for _“anticipate”_ is concluded as “VBP”.
+Rule (1) - the rule at node (1) - is the exception
+rule [3] of the default rule (0). As node (2) is the ifnot child node of the node (1), the associated rule
+(2) is also an exception rule of the rule (0). Similarly, both rules (3) and (4) are exception rules of
+the rule (1) whereas all rules (6), (7) and (8) are
+exception rules of the rule (3), and so on. Thus,
+the exception structure of the SCRDR tree extends
+to 4 levels: rules (1) and (2) at layer 1, rules (3),
+(4) and (5) at layer 2, rules (6), (7) and (8) at layer
+3, and rule (9) at layer 4.
+\textbf{3} \textbf{The RDRPOSTagger toolkit}
+The toolkit consists of four main components: Utility, Initial-tagger, SCRDR-learner and
+SCRDR-tagger.
+\textbf{3.1} \textbf{The Utility}
+The major functions of this component are to evaluate tagging performances (displaying accuracy
+results), and to create a lexicon of words and the
+most frequent associated tags as well as to extract
+_Raw corpus_ from an input golden training corpus.
+\textbf{3.2} \textbf{The Initial-tagger}
+The initial-tagger developed in the RDRPOSTagger toolkit is based on the lexicon which is generated in the use of the Utility component to assign a tag for each word. To deal with unknown
+words, the initial-tagger utilizes several regular expressions or heuristics for English and Vietnamese
+whereas the most frequent tag in the training corpus is exploited to label unknown-words when
+adapting to other languages.
+\textbf{3.3} \textbf{The SCRDR-learner}
+The SCRDR-learner component uses a failuredriven method to automatically build a SCRDR
+tree of transformation rules. Figure 3 describes the
+learning process of the learner.
+3The default rule is the unique rule which is not an exception rule of any other rule. Every rule in layer _n_ is an exception rule of a rule in layer _n −_ 1.
+18
+#12: _if_ next1 _[st]_ Tag == \textbf{“object.next1} _[st]_ \textbf{Tag”} _then_ tag = \textbf{“correctTag”}
+#14: _if_ prev1 _[st]_ Tag == \textbf{“object.prev1} _[st]_ \textbf{Tag”} _then_ tag = \textbf{“correctTag”}
+#18: _if_ word == \textbf{“object.word”} && next1 _[st]_ Tag == \textbf{“object.next1} _[st]_ \textbf{Tag”} _then_ tag = \textbf{“correctTag”}
+_Figure 2:_ Rule template examples.
+_Figure 3:_ The diagram of the learning process of the learner.
+The _Initialized corpus_ is returned by performing the Initial-tagger on the Raw corpus. By comparing the initialized one with the _Golden corpus_,
+an _Object-driven dictionary_ of pairs ( _\textbf{O}_ _bject, cor-_
+_rectTag_ ) is produced in which _Object_ captures the
+5-word window context covering the current word
+and its tag in following format ( _previous_ 2 _[nd]_ _word_
+_/ previous_ 2 _[nd]_ _tag, previous_ 1 _[st]_ _word / previous_
+1 _[st]_ _tag, word / currentTag, next_ 1 _[st]_ _word / next_ 1 _[st]_
+_tag, next_ 2 _[nd]_ _word / next_ 2 _[nd]_ _tag_ ) from the initialized corpus, and the _correctTag_ is the corresponding tag of the current word in the golden corpus.
+There are 27 _Rule templates_ applied for _Rule se-_
+_lector_ which is to select the most suitable rules
+to build the _SCRDR tree_ . Examples of the rule
+templates are shown in figure 2 where elements
+in bold will be replaced by concrete values from
+_Object_ s in the object-driven dictionary to create
+concrete rules. The SCRDR tree of rules is initialized by building the default rule and all exception
+rules of the default one in form of _if currentTag =_
+_“_ _\textbf{TAG}_ _” then tag = “_ _\textbf{TAG}_ _”_ at the layer-1 exception
+structure, for example rules (1) and (2) in the figure 1, and the like. The learning approach to construct new exception rules to the tree is as follows:
+_•_ At a node-F in the SCRDR tree, let _SO_ be
+the set of Objects from the object-driven dictionary, which those Objects are fired at the node-F
+but their initialized tags are incorrect (the _current-_
+_Tag_ is not the _correctTag_ associated). It means that
+node-F gives wrong conclusions to all Objects in
+the _SO_ set.
+_•_ In order to select a new exception rule of the
+rule at node-F from all concrete rules which are
+generated for all Objects in the _SO_ set, the selected rule have to satisfy constraints: (i) The rule
+must be unsatisfied by cases for which node-F has
+already given correct conclusions. This constraint
+does not apply to node-F at layer-1 exception structure. (ii) The rule must associate to a highest score
+value of subtracting B from A in comparison to
+other ones, where A and B are the numbers of the
+_SO_ ’s Objects which are correctly and incorrectly
+concluded by the rule respectively. (iii) And the
+highest value is not smaller than a given threshold.
+The SCRDR-learner applies two threshold parameters: first threshold is to choose exception
+rules at the layer-2 exception structure (e.g rules
+(3), (4) and (5) in figure 1), and second threshold
+is to select rules for higher exception layers.
+_•_ The process to add new exception rules is repeated until there is no rule satisfying the constraints above. At each iteration, a new rule is
+added to the current SCRDR tree to correct error
+conclusions made by the tree.
+\textbf{3.4} \textbf{The SCRDR-tagger}
+The SCRDR-tagger component is to perform the
+POS tagging on a raw text corpus where each line
+is a sequence of words separated by white space
+characters. The component labels the text corpus
+by using the Initial-tagger. It slides due to a leftto-right direction on a 5-word window context to
+generate a corresponding Object for each initially
+tagged word. The Object is then classified by the
+learned SCRDR tree model to produce final conclusion tag of the word as illustrated in the example in the section 2.
+\textbf{4} \textbf{Evaluation}
+The RDRPOSTagger has already been successfully applied to English and Vietnamese corpora.
+\textbf{4.1} \textbf{Results for English}
+Experiments for English employed the Penn WSJ
+Treebank corpus to exploit the WSJ sections 0-18
+(38219 sentences) for training, the WSJ sections
+19-21 (5527 sentences) for validation and the WSJ
+sections 22-24 (5462 sentences) for test.
+Using a lexicon created in the use of the train
+19
+ing set, the Initial-tagger obtains an accuracy of
+93.51% on the test set. By varying the thresholds
+on the validation set, we have found the most suitable values [4] of 3 and 2 to be used for evaluating
+the RDRPOSTagger on the test set. Those thresholds return a SCRDR tree model of 2319 rules
+in a 4-level exception structure. The training time
+and tagging speed for those thresholds are mentioned in the introduction section. On the same test
+set, the RDRPOSTagger achieves a performance at
+96.49% against 96.46% accounted for the state-ofthe-art POS tagger TnT (Brants, 2000).
+For another experiment, only in training process: 1-time occurrence words in training set are
+initially tagged as out-of-dictionary words. With
+a learned tree model of 2418 rules, the tagger
+reaches an accuracy of 96.51% on the test set.
+Retraining the tagger utilizing another initial
+tagger [5] developed in the Brill’s tagger (Brill,
+1995) instead of the lexicon-based initial one,
+the RDRPOSTagger gains an accuracy result of
+96.57% which is slightly higher than the performance at 96.53% of the Brill’s.
+\textbf{4.2} \textbf{Results for Vietnamese}
+In the first Evaluation Campaign [6] on Vietnamese
+Language Processing, the POS tagging track provided a golden training corpus of 28k sentences
+(631k words) collected from two sources of the
+national VLSP project and the Vietnam Lexicography Center, and a raw test corpus of 2100 sentences (66k words). The training process returned
+a SCRDR tree of 2896 rules [7] . Obtaining a highest
+performance on the test set, the RDRPOSTagger
+surpassed all other participating systems.
+We also carry out POS tagging experiments on
+the golden corpus of 28k sentences and on the
+Vietnamese Treebank of 10k sentences (Nguyen
+et al., 2009) according to 5-fold cross-validation
+scheme [8] . The average accuracy results are presented in the table 1. Achieving an accuracy of
+92.59% on the Vietnamese Treebank, the RDR
+4The thresholds 3 and 2 are reused for all other experiments in English and Vietnamese.
+5The initial tagger gets a result of 93.58% on the test set.
+6http://uet.vnu.edu.vn/rivf2013/campaign.html
+7It took 100 minutes to construct the tree leading to tagging speeds of 1100 words/second and 45k words/second for
+the implementations in Python and Java, respectively, on the
+computer of Core 2Duo 2.4GHz & 3GB of memory.
+8In each cross-validation run, one fold is selected as test
+set, 4 remaining folds are merged as training set. The initial
+tagger exploits a lexicon generated from the training set. In
+training process, 1-time occurrence words are initially labeled
+as out-of-lexicon words.
+\textbf{Table 1:} Accuracy results for Vietnamese
+| Corpus | Initial-tagger | RDRPOSTagger |
+|--------|----------------|--------------|
+| 28k    | 91.18%         | 93.42%       |
+| 10k    | 90.59%         | 92.59%       |
+POSTagger outperforms previous Maximum Entropy Model, Conditional Random Field and Support Vector Machine-based POS tagging systems
+(Tran et al., 2009) on the same evaluation scheme.
+\textbf{5} \textbf{Demonstration and Conclusion}
+In addition to English and Vietnamese, in the
+demonstration session, we will present promising
+experimental results and run the RDRPOSTagger
+for other languages including Bulgarian, Czech,
+Danish, Dutch, French, German, Hindi, Italian,
+Lao, Portuguese, Spanish, Swedish and Thai. We
+will also let the audiences to contribute their own
+data sets for retraining and testing the tagger.
+In this paper, we describe the rule-based
+POS tagging toolkit RDRPOSTagger to automatically construct transformation rules in form
+of the SCRDR exception structure. We believe that our robust, easy-to-use and languageindependent toolkit RDRPOSTagger can be useful
+for NLP/CL-related tasks.
+\textbf{References}
+Thorsten Brants. 2000. TnT: a statistical part-ofspeech tagger. In _Proc. of 6th Applied Natural Lan-_
+_guage Processing Conference_, pages 224–231.
+Eric Brill. 1995. Transformation-based error-driven
+learning and natural language processing: a case
+study in part-of-speech tagging. _Comput. Linguist._,
+21(4):543–565.
+Mitchell P Marcus, Mary Ann Marcinkiewicz, and
+Beatrice Santorini. 1993. Building a large annotated corpus of English: the penn treebank. _Comput._
+_Linguist._, 19(2):313–330.
+Phuong Thai Nguyen, Xuan Luong Vu, Thi
+Minh Huyen Nguyen, Van Hiep Nguyen, and
+Hong Phuong Le. 2009. Building a Large
+Syntactically-Annotated Corpus of Vietnamese. In
+_Proc. of LAW III workshop_, pages 182–185.
+Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham,
+and Dang Duc Pham. 2011. Ripple Down Rules for
+Part-of-Speech Tagging. In _Proc. of 12th CICLing -_
+_Volume Part I_, pages 190–201.
+Debbie Richards. 2009. Two decades of ripple down
+rules research. _Knowledge Engineering Review_,
+24(2):159–184.
+Oanh Thi Tran, Cuong Anh Le, Thuy Quang Ha, and
+Quynh Hoang Le. 2009. An experimental study
+on vietnamese pos tagging. _Proc. of the 2009 Inter-_
+_national Conference on Asian Language Processing_,
+pages 23–27.
+20
+\end{document}

references/2018.naacl.vu/paper.md ADDED Viewed

	@@ -0,0 +1,147 @@

+---
+title: "VnCoreNLP: A Vietnamese Natural Language Processing Toolkit"
+authors:
+  - "Thanh Vu"
+  - "Dat Quoc Nguyen"
+  - "Dai Quoc Nguyen"
+  - "Mark Dras"
+  - "Mark Johnson"
+year: 2018
+venue: "NAACL 2018 Demonstrations"
+url: "https://aclanthology.org/N18-5012/"
+---
+We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language processing (NLP) tasks including  word segmentation, part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing, and obtains state-of-the-art (SOTA) results for these tasks.
+We release VnCoreNLP to provide  rich linguistic annotations to facilitate research work on Vietnamese NLP.
+Our VnCoreNLP is open-source  and available at: https://github.com/vncorenlp/VnCoreNLP.
+# Introduction
+Research on Vietnamese NLP has been actively explored in the last decade, boosted by the successes of the 4-year KC01.01/2006-2010  national project  on Vietnamese language and speech processing (VLSP). Over the last 5 years, standard benchmark datasets for key Vietnamese NLP tasks are publicly available: datasets for word segmentation and POS tagging were released for the first VLSP evaluation campaign in 2013; a dependency treebank was published in 2014 [Nguyen2014NLDB]; and an NER dataset was released for the second  VLSP  campaign in 2016.  So there is a need for building an NLP pipeline, such as the Stanford CoreNLP toolkit [manning-EtAl:2014:P14-5], for those key tasks to assist users and to support researchers and tool developers of downstream tasks.
+and  built Vietnamese NLP pipelines by wrapping  existing  word segmenters and POS taggers including: JVnSegmenter [Y06-1028], vnTokenizer [Le2008], JVnTagger [NguyenPN2010] and  vnTagger [lehong00526139]. However,
+these word segmenters and POS taggers are no longer considered
+SOTA  models for Vietnamese [NguyenL2016,JCSCE].   built the NNVLP toolkit for Vietnamese sequence labeling tasks by applying a  BiLSTM-CNN-CRF model [ma-hovy:2016:P16-1]. However,    did not make a comparison to SOTA traditional feature-based models. In addition,   NNVLP   is slow with a processing speed at about 300 words per second, which is not practical for real-world application such as dealing  with  large-scale data.
+{5pt plus 2pt minus 1pt}
+[!t]
+[width=7.5cm]{VnCoreNLP_Architecture.pdf}
+object.}
+In this paper, we present a Java NLP toolkit for Vietnamese, namely  VnCoreNLP, which aims to facilitate Vietnamese NLP research   by providing rich linguistic annotations through key NLP components of word segmentation, POS tagging, NER and dependency parsing. Figure [fig:diagram] describes the overall system architecture. The
+following items highlight typical characteristics of VnCoreNLP:
+{5pt}
+{0pt}
+{0pt}
+- **Easy-to-use** -- All VnCoreNLP components are wrapped into a single .jar file, so users do not have to install external dependencies. Users can run processing pipelines  from either the command-line or the Java API.
+- **Fast** -- VnCoreNLP is fast, so it can be used  for dealing with large-scale data. Also  it benefits users  suffering from limited computation resources (e.g. users from  Vietnam).
+- **Accurate** -- VnCoreNLP components obtain higher results than all previous published results on  the same benchmark datasets.
+# Basic usages
+Our design goal is to make VnCoreNLP simple to setup and run from either the command-line or the Java API. Performing linguistic annotations for a given file can be done by using a simple command as in Figure  [fig:command].
+[ht]
+{ \$ java -Xmx2g -jar VnCoreNLP.jar -fin input.txt -fout output.txt}
+Suppose that the file { input.txt} in Figure  [fig:command] contains a sentence ``Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội.'' (Mr Nguyen Khac Chuc  is working at Vietnam National University Hanoi). Table [tab:expoutput] shows the output for this sentence in plain text form.
+[ht]
+{!}{
+| 1 | Ông | Nc | O | 4 | sub |
+|---|---|---|---|---|---|
+| 2 | Nguyễn\_Khắc\_Chúc | Np | B-PER | 1 | nmod |
+| 3 | đang | R | O | 4 | adv |
+| 4 | làm\_việc | V | O | 0 | root |
+| 5 | tại | E | O | 4 | loc |
+| 6 | Đại\_học | N | B-ORG | 5 | pob |
+| 7 | Quốc\_gia | N | I-ORG | 6 | nmod |
+| 8 | Hà\_Nội | Np | I-ORG | 6 | nmod |
+| 9 | . | CH | O | 4 | punct |
+}
+for the sentence `Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội.'' from  file { input.txt} in Figure [fig:command]. The output is  in a 6-column format representing word index, word form, POS tag, NER label, head index of the current word, and dependency relation type.}
+Similarly, we can also get the same output by using the API as easy as in Listing [lst1].
+[label=lst1,caption= {Minimal code for an analysis pipeline.}]
+VnCoreNLP pipeline = new VnCoreNLP() ;
+Annotation annotation = new Annotation("
+pipeline.annotate(annotation);
+String annotatedStr = annotation.toString();
+In addition,
+Listing [lst2] provides a more realistic and complete example code, presenting key components of the toolkit.
+Here an annotation pipeline can be used for any text rather than just a single sentence, e.g. for  a paragraph or entire news story.
+# Components
+This section  briefly describes each component of VnCoreNLP. Note that our goal is not to develop new approach or model for each component task. Here we focus on incorporating existing models into a single pipeline. In particular, except a new model we develop for the language-dependent component of word segmentation, we apply traditional feature-based models which obtain SOTA results for English POS tagging, NER and dependency parsing to Vietnamese.  The reason is based on a well-established belief in the literature that for  a less-resourced language such as Vietnamese, we should consider using  feature-based models to obtain fast and accurate performances, rather than using neural network-based  models [King2015].
+{1pt plus 1.0pt minus 1.0pt}
+[float=tp,label=lst2,caption= {A simple and complete example code.}]
+import vn.pipeline.*;
+import java.io.*;
+public class VnCoreNLPExample {
+public static void main(String[] args) throws IOException {
+// "wseg", "pos", "ner", and "parse" refer to as word segmentation, POS tagging, NER and dependency parsing, respectively.
+String[] annotators = {"wseg", "pos", "ner", "parse"};
+VnCoreNLP pipeline = new VnCoreNLP(annotators);
+// Mr Nguyen Khac Chuc is working at Vietnam National University, Hanoi. Mrs Lan, Mr Chuc's wife, is also working at this university.
+String str =
+Annotation annotation = new Annotation(str);
+pipeline.annotate(annotation);
+PrintStream outputPrinter = new PrintStream("output.txt");
+pipeline.printToFile(annotation, outputPrinter);
+// Users can get a single sentence to analyze individually
+Sentence firstSentence = annotation.getSentences().get(0);
+}
+}
+{5pt}
+{0pt}
+{0pt}
+- **wseg** -- Unlike English where white space is a strong indicator of word boundaries, when written in Vietnamese white space  is also used to separate syllables that constitute words. So word segmentation  is referred to as the key first  step in Vietnamese NLP{.}\ {W}e have proposed a  transformation rule-based learning model for Vietnamese word segmentation, which obtains better segmentation accuracy and speed than all previous word segmenters. See details in .
+- **pos** -- To label words with their POS tag, we apply   MarMoT  which is a generic
+CRF framework and a SOTA POS and morphological
+tagger  [mueller-schmid-schutze:2013:EMNLP].
+- **ner** -- To recognize named entities, we apply a dynamic feature induction model that automatically optimizes feature combinations [choi:2016:N16-1].
+- **parse** -- To perform dependency parsing, we apply the greedy version of a transition-based parsing model with selectional branching [choi2015ACL].
+# Evaluation
+We detail experimental results of the  word segmentation (**wseg**) and POS tagging (**pos**) components of VnCoreNLP in  and , respectively. In particular, our word segmentation component gets the highest results in terms of both segmentation F1 score at 97.90\
+Following subsections present evaluations for the NER (**ner**) and dependency parsing (**parse**) components.
+## Named entity recognition
+We make a comparison between SOTA feature-based and neural network-based models, which, to the best of our knowledge, has not been done in any prior work on   Vietnamese NER.
+#### Dataset: The NER shared task at the 2016 VLSP workshop provides a set of 16,861 manually annotated sentences for training and development, and a set of 2,831 manually annotated sentences for test, with four NER labels PER, LOC, ORG and MISC. Note that in both  datasets, words are also supplied with gold POS tags. In addition, each word representing a full personal name are separated into syllables  that constitute the word.
+So this annotation scheme results in an unrealistic scenario for a pipeline evaluation because:  (**i**) gold POS tags are not available in a real-world application, and (**ii**) in the standard annotation (and benchmark datasets) for Vietnamese word segmentation  and POS tagging  [nguyen-EtAl:2009:LAW-III],   each full name is referred to as a word token (i.e.,  all  word segmenters have been trained to output a full name as a word  and all POS taggers have been trained to assign a  label to the entire full-name).
+For a more realistic scenario, we merge those contiguous syllables constituting a full name to form a word. Then we replace the gold POS tags by automatic tags predicted by our POS tagging component. From the set of 16,861 sentences, we sample 2,000 sentences for development and using the remaining 14,861 sentences for training.
+#### Models: We make an empirical comparison between the VnCoreNLP's NER component  and the following neural network-based models:
+{5pt}
+{0pt}
+{0pt}
+- {BiLSTM-CRF} [HuangXY15]  is a sequence labeling model which extends the BiLSTM model with a CRF layer.
+- {BiLSTM-CRF + CNN-char}, i.e. {BiLSTM-CNN-CRF}, is an extension of  {BiLSTM-CRF}, using CNN to derive character-based word representations   [ma-hovy:2016:P16-1].
+- {BiLSTM-CRF + LSTM-char}   is an extension of  {BiLSTM-CRF}, using BiLSTM to derive the character-based word representations [lample-EtAl:2016:N16-1].
+- BiLSTM-CRF is another extension to BiLSTM-CRF,   incorporating embeddings of automatically predicted POS tags [reimers-gurevych:2017:EMNLP2017].
+We use a well-known implementation which is optimized for performance of all  BiLSTM-CRF-based models from . We then follow [Section 3.4]{NguyenVNDJ-ALTA-2017} to perform hyper-parameter tuning.
+{20.0pt plus 2.0pt minus 4.0pt}
+[!t]
+|
+**Model** | **F1** | **Speed** |
+|---|---|---|
+|
+VnCoreNLP | **88.55** | **18K** |
+| BiLSTM-CRF | 86.48 | 2.8K |
+| \ \ \ \ \ + CNN-char | {88.28} | 1.8K |
+| \ \ \ \ \ + LSTM-char | 87.71 | 1.3K |
+| BiLSTM-CRF | 86.12 | \_ |
+| \ \ \ \ \ + CNN-char | 88.06 | \_ |
+| \ \ \ \ \ + LSTM-char | 87.43 | \_ |
+|  |
+#### Main results: Table [tab:ner] presents F1 score  and  speed of each model on the test set, where VnCoreNLP obtains the highest  score at 88.55\
+It is initially surprising that for such an  isolated language as Vietnamese  where all words are not inflected, using character-based representations  helps producing 1+\
+## Dependency parsing
+#### Experimental setup: We use the Vietnamese dependency treebank VnDT [Nguyen2014NLDB] consisting of 10,200 sentences in our experiments. Following , we use the last 1020 sentences of VnDT for test while the remaining sentences are used for training. Evaluation metrics are the labeled attachment score (LAS) and unlabeled attachment score (UAS).
+#### Main results: Table [tab:dep] compares the dependency parsing results of VnCoreNLP with results reported in prior work, using the same experimental setup. The first six rows present the scores with gold POS tags. The next two rows show  scores of VnCoreNLP with automatic  POS tags which are produced by our POS tagging component. The last row presents scores of the joint POS tagging and dependency parsing model jPTDP  [NguyenCoNLL2017]. Table [tab:dep] shows that compared to previously published results,  VnCoreNLP produces the highest LAS score. Note that previous results for other systems are reported without using additional information of automatically predicted NER labels. In this  case,  the LAS score  for VnCoreNLP without automatic NER features (i.e. VnCoreNLP in Table [tab:dep]) is still higher than previous ones. Notably, we also obtain a fast parsing speed at 8K words per second.
+[!t]
+{0.5em}
+|
+{c|}{**Model**} | **LAS** | **UAS** | **Speed** |
+|---|---|---|---|
+|
+{*}{[origin=c]{90}{Gold POS}} | VnCoreNLP | **73.39** | 79.02 | \_ |
+| VnCoreNLP | 73.21 | 78.91 | \_ |
+| BIST-bmstparser | 73.17 | **79.39** | \_ |
+| BIST-barchybrid | 72.53 | 79.33 | \_ |
+| MSTParser | 70.29 | 76.47 | \_ |
+| MaltParser | 69.10 | 74.91 | \_ |
+|
+{*}{[origin=c]{90}{Auto POS}} | VnCoreNLP | **70.23** | 76.93 | 8K |
+| VnCoreNLP | 70.10 | 76.85 | **9K** |
+| jPTDP | 69.49 | **77.68** | 700 |
+|  |
+# Conclusion
+In this paper, we have presented the VnCoreNLP toolkit---an easy-to-use, fast and accurate   processing pipeline for Vietnamese NLP. VnCoreNLP provides core NLP steps including word segmentation, POS tagging, NER and dependency parsing. Current version of VnCoreNLP has been trained without any linguistic optimization, i.e. we only employ existing pre-defined features in the traditional feature-based models for POS tagging, NER and dependency parsing. So future work will   focus on incorporating Vietnamese linguistic features into these feature-based models.
+VnCoreNLP is released  for research and educational purposes, and available at: https://github.com/vncorenlp/VnCoreNLP.

references/2018.naacl.vu/paper.tex ADDED Viewed

	@@ -0,0 +1,302 @@

+\documentclass[11pt,a4paper]{article}
+\pdfoutput=1
+\usepackage[hyperref]{naaclhlt2018}
+\usepackage{times}
+\usepackage{latexsym}
+\usepackage{graphicx}
+\usepackage{tabularx}
+\usepackage{multirow}
+%\usepackage{fixltx2e}
+%\usepackage{enumitem}
+\usepackage{marvosym}
+\usepackage{vntex}
+\usepackage[english]{babel}
+%\usepackage[hidelinks]{hyperref}
+\usepackage{url}
+\usepackage{listings}
+\lstset{
+  basicstyle=\footnotesize\ttfamily,
+  language=Java,
+  breaklines=true,
+  basicstyle=\footnotesize\ttfamily,
+  captionpos=b,
+  inputencoding=utf8,
+  escapeinside={\%*}{*)}
+}
+\aclfinalcopy % Uncomment this line for the final submission
+%\def\aclpaperid{***} %  Enter the acl Paper ID here
+\setlength\titlebox{5.25cm}
+\newcommand\BibTeX{B{\sc ib}\TeX}
+\title{VnCoreNLP: A Vietnamese Natural Language Processing Toolkit}
+\author{Thanh Vu$^1$, Dat Quoc Nguyen$^2$, Dai Quoc Nguyen$^3$, Mark Dras$^4$ \and Mark Johnson$^4$\\
+$^1$Newcastle University, United Kingdom; $^2$The University of Melbourne, Australia; \\
+$^3$Deakin University, Australia; $^4$Macquarie University, Australia \\
+{\tt thanh.vu@newcastle.ac.uk}, {\tt dqnguyen@unimelb.edu.au},\\
+{\tt dai.nguyen@deakin.edu.au}, {\tt\{mark.dras, mark.johnson\}@mq.edu.au}}
+\date{}
+\begin{document}
+\maketitle
+\begin{abstract}
+We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language processing (NLP) tasks including  word segmentation, part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing, and obtains state-of-the-art (SOTA) results for these tasks.
+We release VnCoreNLP to provide  rich linguistic annotations to facilitate research work on Vietnamese NLP.
+Our VnCoreNLP is open-source  and available at: \url{https://github.com/vncorenlp/VnCoreNLP}.
+\end{abstract}
+\section{Introduction}
+Research on Vietnamese NLP has been actively explored in the last decade, boosted by the successes of the 4-year KC01.01/2006-2010  national project  on Vietnamese language and speech processing (VLSP). Over the last 5 years, standard benchmark datasets for key Vietnamese NLP tasks are publicly available: datasets for word segmentation and POS tagging were released for the first VLSP evaluation campaign in 2013; a dependency treebank was published in 2014 \cite{Nguyen2014NLDB}; and an NER dataset was released for the second  VLSP  campaign in 2016.  So there is a need for building an NLP pipeline, such as the Stanford CoreNLP toolkit \cite{manning-EtAl:2014:P14-5}, for those key tasks to assist users and to support researchers and tool developers of downstream tasks.
+\newcite{NguyenPN2010} and \newcite{Le:2013:VOS} built Vietnamese NLP pipelines by wrapping  existing  word segmenters and POS taggers including: JVnSegmenter \cite{Y06-1028}, vnTokenizer \cite{Le2008}, JVnTagger \cite{NguyenPN2010} and  vnTagger \cite{lehong00526139}. However,
+these word segmenters and POS taggers are no longer considered
+SOTA  models for Vietnamese \cite{NguyenL2016,JCSCE}.  \newcite{PhamPNP2017b} built the NNVLP toolkit for Vietnamese sequence labeling tasks by applying a  BiLSTM-CNN-CRF model \cite{ma-hovy:2016:P16-1}. However,  \newcite{PhamPNP2017b}  did not make a comparison to SOTA traditional feature-based models. In addition,   NNVLP   is slow with a processing speed at about 300 words per second, which is not practical for real-world application such as dealing  with  large-scale data.
+\setlength{\abovecaptionskip}{5pt plus 2pt minus 1pt}
+\begin{figure}[!t]
+\centering
+\includegraphics[width=7.5cm]{VnCoreNLP_Architecture.pdf}
+\caption{In pipeline architecture of VnCoreNLP, annotations are performed on an {\tt Annotation} object.}
+\label{fig:diagram}
+\end{figure}
+In this paper, we present a Java NLP toolkit for Vietnamese, namely  VnCoreNLP, which aims to facilitate Vietnamese NLP research   by providing rich linguistic annotations through key NLP components of word segmentation, POS tagging, NER and dependency parsing. Figure \ref{fig:diagram} describes the overall system architecture. The
+following items highlight typical characteristics of VnCoreNLP:
+\begin{itemize}
+\setlength{\itemsep}{5pt}
+\setlength{\parskip}{0pt}
+\setlength{\parsep}{0pt}
+\item \textbf{Easy-to-use} -- All VnCoreNLP components are wrapped into a single .jar file, so users do not have to install external dependencies. Users can run processing pipelines  from either the command-line or the Java API.
+\item \textbf{Fast} -- VnCoreNLP is fast, so it can be used  for dealing with large-scale data. Also  it benefits users  suffering from limited computation resources (e.g. users from  Vietnam).
+\item \textbf{Accurate} -- VnCoreNLP components obtain higher results than all previous published results on  the same benchmark datasets.
+\end{itemize}
+\section{Basic usages}
+Our design goal is to make VnCoreNLP simple to setup and run from either the command-line or the Java API. Performing linguistic annotations for a given file can be done by using a simple command as in Figure  \ref{fig:command}.
+\begin{figure}[ht]
+{\footnotesize\ttfamily \$ java -Xmx2g -jar VnCoreNLP.jar -fin input.txt -fout output.txt}
+\caption{Minimal command to run VnCoreNLP.}
+\label{fig:command}
+\end{figure}
+Suppose that the file {\ttfamily input.txt} in Figure  \ref{fig:command} contains a sentence ``Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội.'' (Mr\textsubscript{Ông} Nguyen Khac Chuc  is\textsubscript{đang} working\textsubscript{làm\_việc} at\textsubscript{tại} Vietnam National\textsubscript{quốc\_gia} University\textsubscript{đại\_học} Hanoi\textsubscript{Hà\_Nội}). Table \ref{tab:expoutput} shows the output for this sentence in plain text form.
+\begin{table}[ht]
+    \centering
+    \resizebox{8cm}{!}{
+    \begin{tabular}{l l l l l l}
+        1 & Ông & Nc & O & 4 & sub \\
+        2 & Nguyễn\_Khắc\_Chúc  & Np & B-PER & 1 & nmod\\
+        3 & đang & R & O & 4 & adv\\
+        4 & làm\_việc & V & O & 0 & root\\
+        5 & tại & E & O & 4 & loc\\
+        6 & Đại\_học & N & B-ORG & 5 & pob\\
+        7 & Quốc\_gia & N & I-ORG & 6 & nmod\\
+        8 & Hà\_Nội & Np & I-ORG & 6 & nmod\\
+        9 & . & CH & O & 4 & punct\\
+    \end{tabular}
+    }
+    \caption{The output in file {\ttfamily output.txt} for the sentence `Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội.'' from  file {\ttfamily input.txt} in Figure \ref{fig:command}. The output is  in a 6-column format representing word index, word form, POS tag, NER label, head index of the current word, and dependency relation type.}
+    \label{tab:expoutput}
+\end{table}
+ Similarly, we can also get the same output by using the API as easy as in Listing \ref{lst1}.
+\begin{lstlisting}[label=lst1,caption= {Minimal code for an analysis pipeline.}]
+VnCoreNLP pipeline = new VnCoreNLP() ;
+Annotation annotation = new Annotation("%*Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội.*)");
+pipeline.annotate(annotation);
+String annotatedStr = annotation.toString();
+\end{lstlisting}
+In addition,
+Listing \ref{lst2} provides a more realistic and complete example code, presenting key components of the toolkit.
+Here an annotation pipeline can be used for any text rather than just a single sentence, e.g. for  a paragraph or entire news story.
+\section{Components}
+This section  briefly describes each component of VnCoreNLP. Note that our goal is not to develop new approach or model for each component task. Here we focus on incorporating existing models into a single pipeline. In particular, except a new model we develop for the language-dependent component of word segmentation, we apply traditional feature-based models which obtain SOTA results for English POS tagging, NER and dependency parsing to Vietnamese.  The reason is based on a well-established belief in the literature that for  a less-resourced language such as Vietnamese, we should consider using  feature-based models to obtain fast and accurate performances, rather than using neural network-based  models \cite{King2015}.
+\setlength{\textfloatsep}{1pt plus 1.0pt minus 1.0pt}
+\begin{lstlisting}[float=tp,label=lst2,caption= {A simple and complete example code.}]
+import vn.pipeline.*;
+import java.io.*;
+public class VnCoreNLPExample {
+ public static void main(String[] args) throws IOException {
+  // "wseg", "pos", "ner", and "parse" refer to as word segmentation, POS tagging, NER and dependency parsing, respectively.
+  String[] annotators = {"wseg", "pos", "ner", "parse"};
+  VnCoreNLP pipeline = new VnCoreNLP(annotators);
+  // Mr Nguyen Khac Chuc is working at Vietnam National University, Hanoi. Mrs Lan, Mr Chuc's wife, is also working at this university.
+  String str = %*"Ông Nguyễn Khắc Chúc  đang làm việc tại Đại học Quốc gia Hà Nội. Bà Lan, vợ ông Chúc, cũng làm việc tại đây."*);
+  Annotation annotation = new Annotation(str);
+  pipeline.annotate(annotation);
+  PrintStream outputPrinter = new PrintStream("output.txt");
+  pipeline.printToFile(annotation, outputPrinter);
+  // Users can get a single sentence to analyze individually
+  Sentence firstSentence = annotation.getSentences().get(0);
+ }
+}
+\end{lstlisting}
+%
+\begin{itemize}
+\setlength{\itemsep}{5pt}
+\setlength{\parskip}{0pt}
+\setlength{\parsep}{0pt}
+\item \textbf{wseg} -- Unlike English where white space is a strong indicator of word boundaries, when written in Vietnamese white space  is also used to separate syllables that constitute words. So word segmentation  is referred to as the key first  step in Vietnamese NLP{.}\ {W}e have proposed a  transformation rule-based learning model for Vietnamese word segmentation, which obtains better segmentation accuracy and speed than all previous word segmenters. See details in \newcite{NguyenNVDJ2018}.
+\item \textbf{pos} -- To label words with their POS tag, we apply   MarMoT  which is a generic
+CRF framework and a SOTA POS and morphological
+tagger  \cite{mueller-schmid-schutze:2013:EMNLP}.\footnote{\url{http://cistern.cis.lmu.de/marmot/}}
+\item \textbf{ner} -- To recognize named entities, we apply a dynamic feature induction model that automatically optimizes feature combinations \cite{choi:2016:N16-1}.\footnote{\url{https://emorynlp.github.io/nlp4j/components/named-entity-recognition.html}}
+\item \textbf{parse} -- To perform dependency parsing, we apply the greedy version of a transition-based parsing model with selectional branching \cite{choi2015ACL}.\footnote{\url{https://emorynlp.github.io/nlp4j/components/dependency-parsing.html}}
+\end{itemize}
+\section{Evaluation}
+We detail experimental results of the  word segmentation (\textbf{wseg}) and POS tagging (\textbf{pos}) components of VnCoreNLP in \newcite{NguyenNVDJ2018} and \newcite{NguyenVNDJ-ALTA-2017}, respectively. In particular, our word segmentation component gets the highest results in terms of both segmentation F1 score at 97.90\% and speed  at 62K words per second.\footnote{All speeds reported in this paper are  computed on a personal computer of Intel Core i7 2.2 GHz.} Our POS tagging component also obtains the highest  accuracy to date at 95.88\% with a fast tagging speed at 25K words per second, and outperforms BiLSTM-CRF-based models.
+Following subsections present evaluations for the NER (\textbf{ner}) and dependency parsing (\textbf{parse}) components.
+\subsection{Named entity recognition}\label{ssec:ner}
+We make a comparison between SOTA feature-based and neural network-based models, which, to the best of our knowledge, has not been done in any prior work on   Vietnamese NER.
+\paragraph{Dataset:} The NER shared task at the 2016 VLSP workshop provides a set of 16,861 manually annotated sentences for training and development, and a set of 2,831 manually annotated sentences for test, with four NER labels PER, LOC, ORG and MISC. Note that in both  datasets, words are also supplied with gold POS tags. In addition, each word representing a full personal name are separated into syllables  that constitute the word.
+So this annotation scheme results in an unrealistic scenario for a pipeline evaluation because:  (\textbf{i}) gold POS tags are not available in a real-world application, and (\textbf{ii}) in the standard annotation (and benchmark datasets) for Vietnamese word segmentation  and POS tagging  \cite{nguyen-EtAl:2009:LAW-III},   each full name is referred to as a word token (i.e.,  all  word segmenters have been trained to output a full name as a word  and all POS taggers have been trained to assign a  label to the entire full-name).
+For a more realistic scenario, we merge those contiguous syllables constituting a full name to form a word.\footnote{Based on the gold label PER,  contiguous syllables such as ``Nguyễn/B-PER'', ``Khắc/I-PER'' and ``Chúc/I-PER'' are merged to form a word  as ``Nguyễn\_Khắc\_Chúc/B-PER.''} Then we replace the gold POS tags by automatic tags predicted by our POS tagging component. From the set of 16,861 sentences, we sample 2,000 sentences for development and using the remaining 14,861 sentences for training.
+\paragraph{Models:} We make an empirical comparison between the VnCoreNLP's NER component  and the following neural network-based models:
+\begin{itemize}
+\setlength{\itemsep}{5pt}
+\setlength{\parskip}{0pt}
+\setlength{\parsep}{0pt}
+    \item {BiLSTM-CRF} \cite{HuangXY15}  is a sequence labeling model which extends the BiLSTM model with a CRF layer.
+    \item  {BiLSTM-CRF + CNN-char}, i.e. {BiLSTM-CNN-CRF}, is an extension of  {BiLSTM-CRF}, using CNN to derive character-based word representations   \cite{ma-hovy:2016:P16-1}.%\footnote{\url{https://github.com/XuezheMax/LasagneNLP}}
+    \item {BiLSTM-CRF + LSTM-char}   is an extension of  {BiLSTM-CRF}, using BiLSTM to derive the character-based word representations \cite{lample-EtAl:2016:N16-1}.
+    \item BiLSTM-CRF\textsubscript{+POS} is another extension to BiLSTM-CRF,   incorporating embeddings of automatically predicted POS tags \cite{reimers-gurevych:2017:EMNLP2017}.
+\end{itemize}
+We use a well-known implementation which is optimized for performance of all  BiLSTM-CRF-based models from \newcite{reimers-gurevych:2017:EMNLP2017}.\footnote{\url{https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf}} We then follow \newcite[Section 3.4]{NguyenVNDJ-ALTA-2017} to perform hyper-parameter tuning.\footnote{We employ pre-trained Vietnamese word vectors from \url{https://github.com/sonvx/word2vecVN}.}
+%\setlength{\abovecaptionskip}{3pt plus 2pt minus 1pt}
+\setlength{\textfloatsep}{20.0pt plus 2.0pt minus 4.0pt}
+\begin{table}[!t]
+    \centering
+     \begin{tabular}{l|c|l}
+    \hline
+     \textbf{Model}  & \textbf{F1} & \textbf{Speed}  \\
+     \hline
+     VnCoreNLP &  \textbf{88.55}  &  \textbf{18K} \\
+      BiLSTM-CRF & 86.48 & 2.8K   \\
+      \ \ \ \ \ + CNN-char & {88.28} &  1.8K  \\
+       \ \ \ \ \ + LSTM-char & 87.71 &  1.3K   \\
+      BiLSTM-CRF\textsubscript{+POS} & 86.12  & \_    \\
+      \ \ \ \ \ + CNN-char & 88.06 & \_      \\
+      \ \ \ \ \ + LSTM-char & 87.43 & \_   \\
+    \hline
+    \end{tabular}
+    \caption{F1 scores (in \%) on the test set  w.r.t. gold word-segmentation. ``\textbf{Speed}'' denotes the processing speed of the number of words per second (for VnCoreNLP, we include the time POS tagging takes in the speed).}
+    \label{tab:ner}
+\end{table}
+\paragraph{Main results:} Table \ref{tab:ner} presents F1 score  and  speed of each model on the test set, where VnCoreNLP obtains the highest  score at 88.55\% with a fast speed at 18K words per second. In particular, VnCoreNLP  obtains  10 times faster speed than  the second most accurate model BiLSTM-CRF + CNN-char.
+It is initially surprising that for such an  isolated language as Vietnamese  where all words are not inflected, using character-based representations  helps producing 1+\% improvements to the BiLSTM-CRF model. We  find that the improvements to BiLSTM-CRF are mostly accounted for by the PER label. The reason turns out to be simple: about 50\% of named entities are labeled with tag PER, so  character-based representations are in fact able to capture  common family, middle or given name syllables in `unknown' full-name  words.  Furthermore, we also find that BiLSTM-CRF-based models do not benefit from  additional predicted POS tags. It is probably  because BiLSTM can take word order into account, while without word inflection,  all grammatical information in Vietnamese is conveyed through its fixed word order, thus explicit predicted POS tags with noisy grammatical information are not helpful.
+\subsection{Dependency parsing}\label{ssec:dep}
+\paragraph{Experimental setup:} We use the Vietnamese dependency treebank VnDT \cite{Nguyen2014NLDB} consisting of 10,200 sentences in our experiments. Following \newcite{NguyenALTA2016}, we use the last 1020 sentences of VnDT for test while the remaining sentences are used for training. Evaluation metrics are the labeled attachment score (LAS) and unlabeled attachment score (UAS).
+\paragraph{Main results:} Table \ref{tab:dep} compares the dependency parsing results of VnCoreNLP with results reported in prior work, using the same experimental setup. The first six rows present the scores with gold POS tags. The next two rows show  scores of VnCoreNLP with automatic  POS tags which are produced by our POS tagging component. The last row presents scores of the joint POS tagging and dependency parsing model jPTDP  \protect\cite{NguyenCoNLL2017}. Table \ref{tab:dep} shows that compared to previously published results,  VnCoreNLP produces the highest LAS score. Note that previous results for other systems are reported without using additional information of automatically predicted NER labels. In this  case,  the LAS score  for VnCoreNLP without automatic NER features (i.e. VnCoreNLP\textsubscript{--NER} in Table \ref{tab:dep}) is still higher than previous ones. Notably, we also obtain a fast parsing speed at 8K words per second.
+\begin{table}[!t]
+    \centering
+    \setlength{\tabcolsep}{0.5em}
+    \def\arraystretch{1.1}
+    \begin{tabular}{l|l|c|c|l }
+    \hline
+     \multicolumn{2}{c|}{\textbf{Model}}  & \textbf{LAS} & \textbf{UAS} & \textbf{Speed}  \\
+     \hline
+    \multirow{6}{*}{\rotatebox[origin=c]{90}{Gold POS}}
+    &  VnCoreNLP  &  \textbf{73.39}  &  79.02  & \_ \\
+    & VnCoreNLP\textsubscript{--NER}  & 73.21  & 78.91 & \_ \\
+    & BIST-bmstparser &  73.17 & \textbf{79.39} & \_ \\
+    & BIST-barchybrid & 72.53 & 79.33 & \_ \\
+    & MSTParser & 70.29 & 76.47 & \_\\
+    & MaltParser & 69.10 & 74.91 & \_\\
+    \hline
+    \multirow{3}{*}{\rotatebox[origin=c]{90}{Auto POS}}
+      &   VnCoreNLP  & \textbf{70.23}  &  76.93 & 8K \\
+      & VnCoreNLP\textsubscript{--NER}  & 70.10 & 76.85 &  \textbf{9K} \\
+      & jPTDP & 69.49 & \textbf{77.68} & 700 \\
+    \hline
+    \end{tabular}
+    \caption{LAS and UAS scores (in \%) computed on all tokens (i.e. including punctuation) on the test set w.r.t. gold word-segmentation. ``\textbf{Speed}'' is defined as in Table \ref{tab:ner}. The subscript ``--NER'' denotes the model without using automatically predicted NER labels as features. The results of the MSTParser \protect\cite{McDonald2005OLT}, MaltParser \protect\cite{Nivre2007}, and BiLSTM-based parsing models BIST-bmstparser and BIST-barchybrid \protect\cite{TACL885}  are reported in \protect\newcite{NguyenALTA2016}. The result of the jPTDP model  for Vietnamese is mentioned in \protect\newcite{NguyenVNDJ-ALTA-2017}.}% and detailed  at \url{https://drive.google.com/drive/folders/0B5eBgc8jrKtpUmhhSmtFLWdrTzQ}.}
+    \label{tab:dep}
+\end{table}
+\section{Conclusion}
+In this paper, we have presented the VnCoreNLP toolkit---an easy-to-use, fast and accurate   processing pipeline for Vietnamese NLP. VnCoreNLP provides core NLP steps including word segmentation, POS tagging, NER and dependency parsing. Current version of VnCoreNLP has been trained without any linguistic optimization, i.e. we only employ existing pre-defined features in the traditional feature-based models for POS tagging, NER and dependency parsing. So future work will   focus on incorporating Vietnamese linguistic features into these feature-based models.
+VnCoreNLP is released  for research and educational purposes, and available at: \url{https://github.com/vncorenlp/VnCoreNLP}.
+% include your own bib file like this:
+%\bibliographystyle{acl}
+%\bibliography{acl2018}
+\bibliography{Refs}
+\bibliographystyle{naacl_natbib}
+\end{document}

references/2018.naacl.vu/source/VnCoreNLP.bbl ADDED Viewed

	@@ -0,0 +1,169 @@

+\begin{thebibliography}{}
+\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
+\bibitem[{Choi(2016)}]{choi:2016:N16-1}
+Jinho~D. Choi. 2016.
+\newblock {Dynamic Feature Induction: The Last Gist to the State-of-the-Art}.
+\newblock In {\em Proceedings of NAACL-HLT\/}. pages 271--281.
+\bibitem[{Choi et~al.(2015)Choi, Tetreault, and Stent}]{choi2015ACL}
+Jinho~D. Choi, Joel Tetreault, and Amanda Stent. 2015.
+\newblock {It Depends: Dependency Parser Comparison Using A Web-based
+  Evaluation Tool}.
+\newblock In {\em Proceedings of ACL-IJCNLP\/}. pages 387--396.
+\bibitem[{Huang et~al.(2015)Huang, Xu, and Yu}]{HuangXY15}
+Zhiheng Huang, Wei Xu, and Kai Yu. 2015.
+\newblock Bidirectional {LSTM-CRF} models for sequence tagging.
+\newblock {\em arXiv preprint\/} arXiv:1508.01991.
+\bibitem[{King(2015)}]{King2015}
+Benjamin~Philip King. 2015.
+\newblock {\em Practical Natural Language Processing for Low-Resource
+  Languages\/}.
+\newblock Ph.D. thesis, The University of Michigan.
+\bibitem[{Kiperwasser and Goldberg(2016)}]{TACL885}
+Eliyahu Kiperwasser and Yoav Goldberg. 2016.
+\newblock {Simple and Accurate Dependency Parsing Using Bidirectional LSTM
+  Feature Representations}.
+\newblock {\em Transactions of the Association for Computational Linguistics\/}
+  4:313--327.
+\bibitem[{Lample et~al.(2016)Lample, Ballesteros, Subramanian, Kawakami, and
+  Dyer}]{lample-EtAl:2016:N16-1}
+Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and
+  Chris Dyer. 2016.
+\newblock {Neural Architectures for Named Entity Recognition}.
+\newblock In {\em Proceedings of NAACL-HLT\/}. pages 260--270.
+\bibitem[{Le et~al.(2008)Le, Nguyen, Roussanaly, and Ho}]{Le2008}
+Hong~Phuong Le, Thi Minh~Huyen Nguyen, Azim Roussanaly, and Tuong~Vinh Ho.
+  2008.
+\newblock {A hybrid approach to word segmentation of Vietnamese texts}.
+\newblock In {\em Proceedings of LATA\/}. pages 240--249.
+\bibitem[{Le et~al.(2013)Le, Do, Nguyen, and Nguyen}]{Le:2013:VOS}
+Ngoc~Minh Le, Bich~Ngoc Do, Vi~Duong Nguyen, and Thi~Dam Nguyen. 2013.
+\newblock {VNLP: An Open Source Framework for Vietnamese Natural Language
+  Processing}.
+\newblock In {\em Proceedings of SoICT\/}. pages 88--93.
+\bibitem[{Le-Hong et~al.(2010)Le-Hong, Roussanaly, Nguyen, and
+  Rossignol}]{lehong00526139}
+Phuong Le-Hong, Azim Roussanaly, Thi Minh~Huyen Nguyen, and Mathias Rossignol.
+  2010.
+\newblock {An empirical study of maximum entropy approach for part-of-speech
+  tagging of Vietnamese texts}.
+\newblock In {\em {Proceedings of TALN}\/}.
+\bibitem[{Ma and Hovy(2016)}]{ma-hovy:2016:P16-1}
+Xuezhe Ma and Eduard Hovy. 2016.
+\newblock {End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF}.
+\newblock In {\em Proceedings of ACL (Volume 1: Long Papers)\/}. pages
+  1064--1074.
+\bibitem[{Manning et~al.(2014)Manning, Surdeanu, Bauer, Finkel, Bethard, and
+  McClosky}]{manning-EtAl:2014:P14-5}
+Christopher~D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven~J.
+  Bethard, and David McClosky. 2014.
+\newblock The {Stanford} {CoreNLP} natural language processing toolkit.
+\newblock In {\em Proceedings of ACL 2014 System Demonstrations\/}. pages
+  55--60.
+\bibitem[{McDonald et~al.(2005)McDonald, Crammer, and
+  Pereira}]{McDonald2005OLT}
+Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005.
+\newblock {Online Large-margin Training of Dependency Parsers}.
+\newblock In {\em Proceedings of ACL\/}. pages 91--98.
+\bibitem[{Mueller et~al.(2013)Mueller, Schmid, and
+  Sch\"{u}tze}]{mueller-schmid-schutze:2013:EMNLP}
+Thomas Mueller, Helmut Schmid, and Hinrich Sch\"{u}tze. 2013.
+\newblock {Efficient Higher-Order CRFs for Morphological Tagging}.
+\newblock In {\em Proceedings of EMNLP\/}. pages 322--332.
+\bibitem[{Nguyen et~al.(2006)Nguyen, Nguyen et~al.}]{Y06-1028}
+Cam-Tu Nguyen, Trung-Kien Nguyen, et~al. 2006.
+\newblock {Vietnamese Word Segmentation with CRFs and SVMs: An Investigation}.
+\newblock In {\em Proceedings of PACLIC\/}. pages 215--222.
+\bibitem[{Nguyen et~al.(2010)Nguyen, Phan, and Nguyen}]{NguyenPN2010}
+Cam-Tu Nguyen, Xuan-Hieu Phan, and Thu-Trang Nguyen. 2010.
+\newblock {JVnTextPro: A Java-based Vietnamese Text Processing Tool}.
+\newblock \url{http://jvntextpro.sourceforge.net/}.
+\bibitem[{Nguyen et~al.(2016{\natexlab{a}})Nguyen, Dras, and
+  Johnson}]{NguyenALTA2016}
+Dat~Quoc Nguyen, Mark Dras, and Mark Johnson. 2016{\natexlab{a}}.
+\newblock {An empirical study for Vietnamese dependency parsing}.
+\newblock In {\em Proceedings of ALTA\/}. pages 143--149.
+\bibitem[{Nguyen et~al.(2017{\natexlab{a}})Nguyen, Dras, and
+  Johnson}]{NguyenCoNLL2017}
+Dat~Quoc Nguyen, Mark Dras, and Mark Johnson. 2017{\natexlab{a}}.
+\newblock {A Novel Neural Network Model for Joint POS Tagging and Graph-based
+  Dependency Parsing}.
+\newblock In {\em Proceedings of the CoNLL 2017 Shared Task\/}. pages 134--142.
+\bibitem[{Nguyen et~al.(2014)Nguyen, Nguyen, Pham, Nguyen, and
+  Nguyen}]{Nguyen2014NLDB}
+Dat~Quoc Nguyen, Dai~Quoc Nguyen, Son~Bao Pham, Phuong-Thai Nguyen, and Minh~Le
+  Nguyen. 2014.
+\newblock {From Treebank Conversion to Automatic Dependency Parsing for
+  Vietnamese}.
+\newblock In {\em {Proceedings of NLDB}\/}. pages 196--207.
+\bibitem[{Nguyen et~al.(2018)Nguyen, Nguyen, Vu, Dras, and
+  Johnson}]{NguyenNVDJ2018}
+Dat~Quoc Nguyen, Dai~Quoc Nguyen, Thanh Vu, Mark Dras, and Mark Johnson. 2018.
+\newblock {A Fast and Accurate Vietnamese Word Segmenter}.
+\newblock In {\em Proceedings of LREC\/}. page to appear.
+\bibitem[{Nguyen et~al.(2017{\natexlab{b}})Nguyen, Vu, Nguyen, Dras, and
+  Johnson}]{NguyenVNDJ-ALTA-2017}
+Dat~Quoc Nguyen, Thanh Vu, Dai~Quoc Nguyen, Mark Dras, and Mark Johnson.
+  2017{\natexlab{b}}.
+\newblock {Fro{m}\ {W}ord Segmentation to POS Tagging for Vietnamese}.
+\newblock In {\em Proceedings of ALTA\/}. pages 108--113.
+\bibitem[{Nguyen et~al.(2009)Nguyen, Vu et~al.}]{nguyen-EtAl:2009:LAW-III}
+Phuong~Thai Nguyen, Xuan~Luong Vu, et~al. 2009.
+\newblock {Building a Large Syntactically-Annotated Corpus of Vietnamese}.
+\newblock In {\em Proceedings of LAW\/}. pages 182--185.
+\bibitem[{Nguyen and Le(2016)}]{NguyenL2016}
+Tuan-Phong Nguyen and Anh-Cuong Le. 2016.
+\newblock {A Hybrid Approach to Vietnamese Word Segmentation}.
+\newblock In {\em Proceedings of RIVF\/}. pages 114--119.
+\bibitem[{Nguyen et~al.(2016{\natexlab{b}})Nguyen, Truong, Nguyen, and
+  Le}]{JCSCE}
+Tuan~Phong Nguyen, Quoc~Tuan Truong, Xuan~Nam Nguyen, and Anh~Cuong Le.
+  2016{\natexlab{b}}.
+\newblock {An Experimental Investigation of Part-Of-Speech Taggers for
+  Vietnamese}.
+\newblock {\em VNU Journal of Science: Computer Science and Communication
+  Engineering\/} 32(3):11--25.
+\bibitem[{Nivre et~al.(2007)Nivre, Hall et~al.}]{Nivre2007}
+Joakim Nivre, Johan Hall, et~al. 2007.
+\newblock {MaltParser: A language-independent system for data-driven dependency
+  parsing}.
+\newblock {\em Natural Language Engineering\/} 13(2):95--135.
+\bibitem[{Pham et~al.(2017)Pham, Pham, Nguyen, and Le{-}Hong}]{PhamPNP2017b}
+Thai{-}Hoang Pham, Xuan{-}Khoai Pham, Tuan{-}Anh Nguyen, and Phuong Le{-}Hong.
+  2017.
+\newblock {NNVLP: {A} Neural Network-Based Vietnamese Language Processing
+  Toolkit}.
+\newblock In {\em Proceedings of the IJCNLP 2017 System Demonstrations\/}.
+  pages 37--40.
+\bibitem[{Reimers and Gurevych(2017)}]{reimers-gurevych:2017:EMNLP2017}
+Nils Reimers and Iryna Gurevych. 2017.
+\newblock {Reporting Score Distributions Makes a Difference: Performance Study
+  of LSTM-networks for Sequence Tagging}.
+\newblock In {\em Proceedings of EMNLP\/}. pages 338--348.
+\end{thebibliography}

references/2018.naacl.vu/source/VnCoreNLP.tex ADDED Viewed

	@@ -0,0 +1,302 @@

+\documentclass[11pt,a4paper]{article}
+\pdfoutput=1
+\usepackage[hyperref]{naaclhlt2018}
+\usepackage{times}
+\usepackage{latexsym}
+\usepackage{graphicx}
+\usepackage{tabularx}
+\usepackage{multirow}
+%\usepackage{fixltx2e}
+%\usepackage{enumitem}
+\usepackage{marvosym}
+\usepackage{vntex}
+\usepackage[english]{babel}
+%\usepackage[hidelinks]{hyperref}
+\usepackage{url}
+\usepackage{listings}
+\lstset{
+  basicstyle=\footnotesize\ttfamily,
+  language=Java,
+  breaklines=true,
+  basicstyle=\footnotesize\ttfamily,
+  captionpos=b,
+  inputencoding=utf8,
+  escapeinside={\%*}{*)}
+}
+\aclfinalcopy % Uncomment this line for the final submission
+%\def\aclpaperid{***} %  Enter the acl Paper ID here
+\setlength\titlebox{5.25cm}
+\newcommand\BibTeX{B{\sc ib}\TeX}
+\title{VnCoreNLP: A Vietnamese Natural Language Processing Toolkit}
+\author{Thanh Vu$^1$, Dat Quoc Nguyen$^2$, Dai Quoc Nguyen$^3$, Mark Dras$^4$ \and Mark Johnson$^4$\\
+$^1$Newcastle University, United Kingdom; $^2$The University of Melbourne, Australia; \\
+$^3$Deakin University, Australia; $^4$Macquarie University, Australia \\
+{\tt thanh.vu@newcastle.ac.uk}, {\tt dqnguyen@unimelb.edu.au},\\
+{\tt dai.nguyen@deakin.edu.au}, {\tt\{mark.dras, mark.johnson\}@mq.edu.au}}
+\date{}
+\begin{document}
+\maketitle
+\begin{abstract}
+We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language processing (NLP) tasks including  word segmentation, part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing, and obtains state-of-the-art (SOTA) results for these tasks.
+We release VnCoreNLP to provide  rich linguistic annotations to facilitate research work on Vietnamese NLP.
+Our VnCoreNLP is open-source  and available at: \url{https://github.com/vncorenlp/VnCoreNLP}.
+\end{abstract}
+\section{Introduction}
+Research on Vietnamese NLP has been actively explored in the last decade, boosted by the successes of the 4-year KC01.01/2006-2010  national project  on Vietnamese language and speech processing (VLSP). Over the last 5 years, standard benchmark datasets for key Vietnamese NLP tasks are publicly available: datasets for word segmentation and POS tagging were released for the first VLSP evaluation campaign in 2013; a dependency treebank was published in 2014 \cite{Nguyen2014NLDB}; and an NER dataset was released for the second  VLSP  campaign in 2016.  So there is a need for building an NLP pipeline, such as the Stanford CoreNLP toolkit \cite{manning-EtAl:2014:P14-5}, for those key tasks to assist users and to support researchers and tool developers of downstream tasks.
+\newcite{NguyenPN2010} and \newcite{Le:2013:VOS} built Vietnamese NLP pipelines by wrapping  existing  word segmenters and POS taggers including: JVnSegmenter \cite{Y06-1028}, vnTokenizer \cite{Le2008}, JVnTagger \cite{NguyenPN2010} and  vnTagger \cite{lehong00526139}. However,
+these word segmenters and POS taggers are no longer considered
+SOTA  models for Vietnamese \cite{NguyenL2016,JCSCE}.  \newcite{PhamPNP2017b} built the NNVLP toolkit for Vietnamese sequence labeling tasks by applying a  BiLSTM-CNN-CRF model \cite{ma-hovy:2016:P16-1}. However,  \newcite{PhamPNP2017b}  did not make a comparison to SOTA traditional feature-based models. In addition,   NNVLP   is slow with a processing speed at about 300 words per second, which is not practical for real-world application such as dealing  with  large-scale data.
+\setlength{\abovecaptionskip}{5pt plus 2pt minus 1pt}
+\begin{figure}[!t]
+\centering
+\includegraphics[width=7.5cm]{VnCoreNLP_Architecture.pdf}
+\caption{In pipeline architecture of VnCoreNLP, annotations are performed on an {\tt Annotation} object.}
+\label{fig:diagram}
+\end{figure}
+In this paper, we present a Java NLP toolkit for Vietnamese, namely  VnCoreNLP, which aims to facilitate Vietnamese NLP research   by providing rich linguistic annotations through key NLP components of word segmentation, POS tagging, NER and dependency parsing. Figure \ref{fig:diagram} describes the overall system architecture. The
+following items highlight typical characteristics of VnCoreNLP:
+\begin{itemize}
+\setlength{\itemsep}{5pt}
+\setlength{\parskip}{0pt}
+\setlength{\parsep}{0pt}
+\item \textbf{Easy-to-use} -- All VnCoreNLP components are wrapped into a single .jar file, so users do not have to install external dependencies. Users can run processing pipelines  from either the command-line or the Java API.
+\item \textbf{Fast} -- VnCoreNLP is fast, so it can be used  for dealing with large-scale data. Also  it benefits users  suffering from limited computation resources (e.g. users from  Vietnam).
+\item \textbf{Accurate} -- VnCoreNLP components obtain higher results than all previous published results on  the same benchmark datasets.
+\end{itemize}
+\section{Basic usages}
+Our design goal is to make VnCoreNLP simple to setup and run from either the command-line or the Java API. Performing linguistic annotations for a given file can be done by using a simple command as in Figure  \ref{fig:command}.
+\begin{figure}[ht]
+{\footnotesize\ttfamily \$ java -Xmx2g -jar VnCoreNLP.jar -fin input.txt -fout output.txt}
+\caption{Minimal command to run VnCoreNLP.}
+\label{fig:command}
+\end{figure}
+Suppose that the file {\ttfamily input.txt} in Figure  \ref{fig:command} contains a sentence ``Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội.'' (Mr\textsubscript{Ông} Nguyen Khac Chuc  is\textsubscript{đang} working\textsubscript{làm\_việc} at\textsubscript{tại} Vietnam National\textsubscript{quốc\_gia} University\textsubscript{đại\_học} Hanoi\textsubscript{Hà\_Nội}). Table \ref{tab:expoutput} shows the output for this sentence in plain text form.
+\begin{table}[ht]
+    \centering
+    \resizebox{8cm}{!}{
+    \begin{tabular}{l l l l l l}
+        1 & Ông & Nc & O & 4 & sub \\
+        2 & Nguyễn\_Khắc\_Chúc  & Np & B-PER & 1 & nmod\\
+        3 & đang & R & O & 4 & adv\\
+        4 & làm\_việc & V & O & 0 & root\\
+        5 & tại & E & O & 4 & loc\\
+        6 & Đại\_học & N & B-ORG & 5 & pob\\
+        7 & Quốc\_gia & N & I-ORG & 6 & nmod\\
+        8 & Hà\_Nội & Np & I-ORG & 6 & nmod\\
+        9 & . & CH & O & 4 & punct\\
+    \end{tabular}
+    }
+    \caption{The output in file {\ttfamily output.txt} for the sentence `Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội.'' from  file {\ttfamily input.txt} in Figure \ref{fig:command}. The output is  in a 6-column format representing word index, word form, POS tag, NER label, head index of the current word, and dependency relation type.}
+    \label{tab:expoutput}
+\end{table}
+ Similarly, we can also get the same output by using the API as easy as in Listing \ref{lst1}.
+\begin{lstlisting}[label=lst1,caption= {Minimal code for an analysis pipeline.}]
+VnCoreNLP pipeline = new VnCoreNLP() ;
+Annotation annotation = new Annotation("%*Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội.*)");
+pipeline.annotate(annotation);
+String annotatedStr = annotation.toString();
+\end{lstlisting}
+In addition,
+Listing \ref{lst2} provides a more realistic and complete example code, presenting key components of the toolkit.
+Here an annotation pipeline can be used for any text rather than just a single sentence, e.g. for  a paragraph or entire news story.
+\section{Components}
+This section  briefly describes each component of VnCoreNLP. Note that our goal is not to develop new approach or model for each component task. Here we focus on incorporating existing models into a single pipeline. In particular, except a new model we develop for the language-dependent component of word segmentation, we apply traditional feature-based models which obtain SOTA results for English POS tagging, NER and dependency parsing to Vietnamese.  The reason is based on a well-established belief in the literature that for  a less-resourced language such as Vietnamese, we should consider using  feature-based models to obtain fast and accurate performances, rather than using neural network-based  models \cite{King2015}.
+\setlength{\textfloatsep}{1pt plus 1.0pt minus 1.0pt}
+\begin{lstlisting}[float=tp,label=lst2,caption= {A simple and complete example code.}]
+import vn.pipeline.*;
+import java.io.*;
+public class VnCoreNLPExample {
+ public static void main(String[] args) throws IOException {
+  // "wseg", "pos", "ner", and "parse" refer to as word segmentation, POS tagging, NER and dependency parsing, respectively.
+  String[] annotators = {"wseg", "pos", "ner", "parse"};
+  VnCoreNLP pipeline = new VnCoreNLP(annotators);
+  // Mr Nguyen Khac Chuc is working at Vietnam National University, Hanoi. Mrs Lan, Mr Chuc's wife, is also working at this university.
+  String str = %*"Ông Nguyễn Khắc Chúc  đang làm việc tại Đại học Quốc gia Hà Nội. Bà Lan, vợ ông Chúc, cũng làm việc tại đây."*);
+  Annotation annotation = new Annotation(str);
+  pipeline.annotate(annotation);
+  PrintStream outputPrinter = new PrintStream("output.txt");
+  pipeline.printToFile(annotation, outputPrinter);
+  // Users can get a single sentence to analyze individually
+  Sentence firstSentence = annotation.getSentences().get(0);
+ }
+}
+\end{lstlisting}
+%
+\begin{itemize}
+\setlength{\itemsep}{5pt}
+\setlength{\parskip}{0pt}
+\setlength{\parsep}{0pt}
+\item \textbf{wseg} -- Unlike English where white space is a strong indicator of word boundaries, when written in Vietnamese white space  is also used to separate syllables that constitute words. So word segmentation  is referred to as the key first  step in Vietnamese NLP{.}\ {W}e have proposed a  transformation rule-based learning model for Vietnamese word segmentation, which obtains better segmentation accuracy and speed than all previous word segmenters. See details in \newcite{NguyenNVDJ2018}.
+\item \textbf{pos} -- To label words with their POS tag, we apply   MarMoT  which is a generic
+CRF framework and a SOTA POS and morphological
+tagger  \cite{mueller-schmid-schutze:2013:EMNLP}.\footnote{\url{http://cistern.cis.lmu.de/marmot/}}
+\item \textbf{ner} -- To recognize named entities, we apply a dynamic feature induction model that automatically optimizes feature combinations \cite{choi:2016:N16-1}.\footnote{\url{https://emorynlp.github.io/nlp4j/components/named-entity-recognition.html}}
+\item \textbf{parse} -- To perform dependency parsing, we apply the greedy version of a transition-based parsing model with selectional branching \cite{choi2015ACL}.\footnote{\url{https://emorynlp.github.io/nlp4j/components/dependency-parsing.html}}
+\end{itemize}
+\section{Evaluation}
+We detail experimental results of the  word segmentation (\textbf{wseg}) and POS tagging (\textbf{pos}) components of VnCoreNLP in \newcite{NguyenNVDJ2018} and \newcite{NguyenVNDJ-ALTA-2017}, respectively. In particular, our word segmentation component gets the highest results in terms of both segmentation F1 score at 97.90\% and speed  at 62K words per second.\footnote{All speeds reported in this paper are  computed on a personal computer of Intel Core i7 2.2 GHz.} Our POS tagging component also obtains the highest  accuracy to date at 95.88\% with a fast tagging speed at 25K words per second, and outperforms BiLSTM-CRF-based models.
+Following subsections present evaluations for the NER (\textbf{ner}) and dependency parsing (\textbf{parse}) components.
+\subsection{Named entity recognition}\label{ssec:ner}
+We make a comparison between SOTA feature-based and neural network-based models, which, to the best of our knowledge, has not been done in any prior work on   Vietnamese NER.
+\paragraph{Dataset:} The NER shared task at the 2016 VLSP workshop provides a set of 16,861 manually annotated sentences for training and development, and a set of 2,831 manually annotated sentences for test, with four NER labels PER, LOC, ORG and MISC. Note that in both  datasets, words are also supplied with gold POS tags. In addition, each word representing a full personal name are separated into syllables  that constitute the word.
+So this annotation scheme results in an unrealistic scenario for a pipeline evaluation because:  (\textbf{i}) gold POS tags are not available in a real-world application, and (\textbf{ii}) in the standard annotation (and benchmark datasets) for Vietnamese word segmentation  and POS tagging  \cite{nguyen-EtAl:2009:LAW-III},   each full name is referred to as a word token (i.e.,  all  word segmenters have been trained to output a full name as a word  and all POS taggers have been trained to assign a  label to the entire full-name).
+For a more realistic scenario, we merge those contiguous syllables constituting a full name to form a word.\footnote{Based on the gold label PER,  contiguous syllables such as ``Nguyễn/B-PER'', ``Khắc/I-PER'' and ``Chúc/I-PER'' are merged to form a word  as ``Nguyễn\_Khắc\_Chúc/B-PER.''} Then we replace the gold POS tags by automatic tags predicted by our POS tagging component. From the set of 16,861 sentences, we sample 2,000 sentences for development and using the remaining 14,861 sentences for training.
+\paragraph{Models:} We make an empirical comparison between the VnCoreNLP's NER component  and the following neural network-based models:
+\begin{itemize}
+\setlength{\itemsep}{5pt}
+\setlength{\parskip}{0pt}
+\setlength{\parsep}{0pt}
+    \item {BiLSTM-CRF} \cite{HuangXY15}  is a sequence labeling model which extends the BiLSTM model with a CRF layer.
+    \item  {BiLSTM-CRF + CNN-char}, i.e. {BiLSTM-CNN-CRF}, is an extension of  {BiLSTM-CRF}, using CNN to derive character-based word representations   \cite{ma-hovy:2016:P16-1}.%\footnote{\url{https://github.com/XuezheMax/LasagneNLP}}
+    \item {BiLSTM-CRF + LSTM-char}   is an extension of  {BiLSTM-CRF}, using BiLSTM to derive the character-based word representations \cite{lample-EtAl:2016:N16-1}.
+    \item BiLSTM-CRF\textsubscript{+POS} is another extension to BiLSTM-CRF,   incorporating embeddings of automatically predicted POS tags \cite{reimers-gurevych:2017:EMNLP2017}.
+\end{itemize}
+We use a well-known implementation which is optimized for performance of all  BiLSTM-CRF-based models from \newcite{reimers-gurevych:2017:EMNLP2017}.\footnote{\url{https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf}} We then follow \newcite[Section 3.4]{NguyenVNDJ-ALTA-2017} to perform hyper-parameter tuning.\footnote{We employ pre-trained Vietnamese word vectors from \url{https://github.com/sonvx/word2vecVN}.}
+%\setlength{\abovecaptionskip}{3pt plus 2pt minus 1pt}
+\setlength{\textfloatsep}{20.0pt plus 2.0pt minus 4.0pt}
+\begin{table}[!t]
+    \centering
+     \begin{tabular}{l|c|l}
+    \hline
+     \textbf{Model}  & \textbf{F1} & \textbf{Speed}  \\
+     \hline
+     VnCoreNLP &  \textbf{88.55}  &  \textbf{18K} \\
+      BiLSTM-CRF & 86.48 & 2.8K   \\
+      \ \ \ \ \ + CNN-char & {88.28} &  1.8K  \\
+       \ \ \ \ \ + LSTM-char & 87.71 &  1.3K   \\
+      BiLSTM-CRF\textsubscript{+POS} & 86.12  & \_    \\
+      \ \ \ \ \ + CNN-char & 88.06 & \_      \\
+      \ \ \ \ \ + LSTM-char & 87.43 & \_   \\
+    \hline
+    \end{tabular}
+    \caption{F1 scores (in \%) on the test set  w.r.t. gold word-segmentation. ``\textbf{Speed}'' denotes the processing speed of the number of words per second (for VnCoreNLP, we include the time POS tagging takes in the speed).}
+    \label{tab:ner}
+\end{table}
+\paragraph{Main results:} Table \ref{tab:ner} presents F1 score  and  speed of each model on the test set, where VnCoreNLP obtains the highest  score at 88.55\% with a fast speed at 18K words per second. In particular, VnCoreNLP  obtains  10 times faster speed than  the second most accurate model BiLSTM-CRF + CNN-char.
+It is initially surprising that for such an  isolated language as Vietnamese  where all words are not inflected, using character-based representations  helps producing 1+\% improvements to the BiLSTM-CRF model. We  find that the improvements to BiLSTM-CRF are mostly accounted for by the PER label. The reason turns out to be simple: about 50\% of named entities are labeled with tag PER, so  character-based representations are in fact able to capture  common family, middle or given name syllables in `unknown' full-name  words.  Furthermore, we also find that BiLSTM-CRF-based models do not benefit from  additional predicted POS tags. It is probably  because BiLSTM can take word order into account, while without word inflection,  all grammatical information in Vietnamese is conveyed through its fixed word order, thus explicit predicted POS tags with noisy grammatical information are not helpful.
+\subsection{Dependency parsing}\label{ssec:dep}
+\paragraph{Experimental setup:} We use the Vietnamese dependency treebank VnDT \cite{Nguyen2014NLDB} consisting of 10,200 sentences in our experiments. Following \newcite{NguyenALTA2016}, we use the last 1020 sentences of VnDT for test while the remaining sentences are used for training. Evaluation metrics are the labeled attachment score (LAS) and unlabeled attachment score (UAS).
+\paragraph{Main results:} Table \ref{tab:dep} compares the dependency parsing results of VnCoreNLP with results reported in prior work, using the same experimental setup. The first six rows present the scores with gold POS tags. The next two rows show  scores of VnCoreNLP with automatic  POS tags which are produced by our POS tagging component. The last row presents scores of the joint POS tagging and dependency parsing model jPTDP  \protect\cite{NguyenCoNLL2017}. Table \ref{tab:dep} shows that compared to previously published results,  VnCoreNLP produces the highest LAS score. Note that previous results for other systems are reported without using additional information of automatically predicted NER labels. In this  case,  the LAS score  for VnCoreNLP without automatic NER features (i.e. VnCoreNLP\textsubscript{--NER} in Table \ref{tab:dep}) is still higher than previous ones. Notably, we also obtain a fast parsing speed at 8K words per second.
+\begin{table}[!t]
+    \centering
+    \setlength{\tabcolsep}{0.5em}
+    \def\arraystretch{1.1}
+    \begin{tabular}{l|l|c|c|l }
+    \hline
+     \multicolumn{2}{c|}{\textbf{Model}}  & \textbf{LAS} & \textbf{UAS} & \textbf{Speed}  \\
+     \hline
+    \multirow{6}{*}{\rotatebox[origin=c]{90}{Gold POS}}
+    &  VnCoreNLP  &  \textbf{73.39}  &  79.02  & \_ \\
+    & VnCoreNLP\textsubscript{--NER}  & 73.21  & 78.91 & \_ \\
+    & BIST-bmstparser &  73.17 & \textbf{79.39} & \_ \\
+    & BIST-barchybrid & 72.53 & 79.33 & \_ \\
+    & MSTParser & 70.29 & 76.47 & \_\\
+    & MaltParser & 69.10 & 74.91 & \_\\
+    \hline
+    \multirow{3}{*}{\rotatebox[origin=c]{90}{Auto POS}}
+      &   VnCoreNLP  & \textbf{70.23}  &  76.93 & 8K \\
+      & VnCoreNLP\textsubscript{--NER}  & 70.10 & 76.85 &  \textbf{9K} \\
+      & jPTDP & 69.49 & \textbf{77.68} & 700 \\
+    \hline
+    \end{tabular}
+    \caption{LAS and UAS scores (in \%) computed on all tokens (i.e. including punctuation) on the test set w.r.t. gold word-segmentation. ``\textbf{Speed}'' is defined as in Table \ref{tab:ner}. The subscript ``--NER'' denotes the model without using automatically predicted NER labels as features. The results of the MSTParser \protect\cite{McDonald2005OLT}, MaltParser \protect\cite{Nivre2007}, and BiLSTM-based parsing models BIST-bmstparser and BIST-barchybrid \protect\cite{TACL885}  are reported in \protect\newcite{NguyenALTA2016}. The result of the jPTDP model  for Vietnamese is mentioned in \protect\newcite{NguyenVNDJ-ALTA-2017}.}% and detailed  at \url{https://drive.google.com/drive/folders/0B5eBgc8jrKtpUmhhSmtFLWdrTzQ}.}
+    \label{tab:dep}
+\end{table}
+\section{Conclusion}
+In this paper, we have presented the VnCoreNLP toolkit---an easy-to-use, fast and accurate   processing pipeline for Vietnamese NLP. VnCoreNLP provides core NLP steps including word segmentation, POS tagging, NER and dependency parsing. Current version of VnCoreNLP has been trained without any linguistic optimization, i.e. we only employ existing pre-defined features in the traditional feature-based models for POS tagging, NER and dependency parsing. So future work will   focus on incorporating Vietnamese linguistic features into these feature-based models.
+VnCoreNLP is released  for research and educational purposes, and available at: \url{https://github.com/vncorenlp/VnCoreNLP}.
+% include your own bib file like this:
+%\bibliographystyle{acl}
+%\bibliography{acl2018}
+\bibliography{Refs}
+\bibliographystyle{naacl_natbib}
+\end{document}

references/2018.naacl.vu/source/naacl_natbib.bst ADDED Viewed

	@@ -0,0 +1,1552 @@

+%% Moved into NAACL 2018
+%% M Mitchell 2017/Sep/30
+%%
+%% Output of docstrip was hand-edited to create doi links.
+%% This file creates bib entries with \href commands in the title.
+%% You must use the hyperref package, or define \href to do nothing.
+%% Dan Gildea (gildea) 2016/10/30
+%%
+%% Modified further by Min-Yen Kan (knmnyn) on 2016/12/06 to add
+%% visible DOIs that are also hyperref'ed, and use the newer
+%% https://doi.org/ syntax.
+%% Modified further by Min-Yen Kan (knmnyn) to use the URL field for
+%% reference when no DOI is present.
+%% Modified to use \url command for visible urls -- Dan Gildea 2017/4/12
+%%
+%% This is file `acl.bst',
+%% generated with the docstrip utility.
+%%
+%% The original source files were:
+%%
+%% merlin.mbs  (with options: `ay,nat,nm-revv1,jnrlst,keyxyr,dt-beg,yr-per,note-yr,num-xser,jnm-x,pre-pub,xedn')
+%% ----------------------------------------
+%% *** ACL bibliography stule for use with ACL proceedings or CL journal ***
+%%
+%% Copyright 1994-2002 Patrick W Daly
+ % ===============================================================
+ % IMPORTANT NOTICE:
+ % This bibliographic style (bst) file has been generated from one or
+ % more master bibliographic style (mbs) files, listed above.
+ %
+ % This generated file can be redistributed and/or modified under the terms
+ % of the LaTeX Project Public License Distributed from CTAN
+ % archives in directory macros/latex/base/lppl.txt; either
+ % version 1 of the License, or any later version.
+ % ===============================================================
+ % Name and version information of the main mbs file:
+ % \ProvidesFile{merlin.mbs}[2002/10/21 4.05 (PWD, AO, DPC)]
+ %   For use with BibTeX version 0.99a or later
+ %-------------------------------------------------------------------
+ % This bibliography style file is intended for texts in ENGLISH
+ % This is an author-year citation style bibliography. As such, it is
+ % non-standard LaTeX, and requires a special package file to function properly.
+ % Such a package is    natbib.sty   by Patrick W. Daly
+ % The form of the \bibitem entries is
+ %   \bibitem[Jones et al.(1990)]{key}...
+ %   \bibitem[Jones et al.(1990)Jones, Baker, and Smith]{key}...
+ % The essential feature is that the label (the part in brackets) consists
+ % of the author names, as they should appear in the citation, with the year
+ % in parentheses following. There must be no space before the opening
+ % parenthesis!
+ % With natbib v5.3, a full list of authors may also follow the year.
+ % In natbib.sty, it is possible to define the type of enclosures that is
+ % really wanted (brackets or parentheses), but in either case, there must
+ % be parentheses in the label.
+ % The \cite command functions as follows:
+ %   \citet{key} ==>>                Jones et al. (1990)
+ %   \citet*{key} ==>>               Jones, Baker, and Smith (1990)
+ %   \citep{key} ==>>                (Jones et al., 1990)
+ %   \citep*{key} ==>>               (Jones, Baker, and Smith, 1990)
+ %   \citep[chap. 2]{key} ==>>       (Jones et al., 1990, chap. 2)
+ %   \citep[e.g.][]{key} ==>>        (e.g. Jones et al., 1990)
+ %   \citep[e.g.][p. 32]{key} ==>>   (e.g. Jones et al., p. 32)
+ %   \citeauthor{key} ==>>           Jones et al.
+ %   \citeauthor*{key} ==>>          Jones, Baker, and Smith
+ %   \citeyear{key} ==>>             1990
+ %---------------------------------------------------------------------
+ENTRY
+  { address
+    author
+    booktitle
+    chapter
+    doi
+    edition
+    editor
+    howpublished
+    institution
+    journal
+    key
+    month
+    note
+    number
+    organization
+    pages
+    publisher
+    school
+    series
+    title
+    type
+    url
+    volume
+    year
+  }
+  {}
+  { label extra.label sort.label short.list }
+INTEGERS { output.state before.all mid.sentence after.sentence after.block }
+FUNCTION {init.state.consts}
+{ #0 'before.all :=
+  #1 'mid.sentence :=
+  #2 'after.sentence :=
+  #3 'after.block :=
+}
+STRINGS { s t}
+FUNCTION {output.nonnull}
+{ 's :=
+  output.state mid.sentence =
+    { ", " * write$ }
+    { output.state after.block =
+        { add.period$ write$
+          newline$
+          "\newblock " write$
+        }
+        { output.state before.all =
+            'write$
+            { add.period$ " " * write$ }
+          if$
+        }
+      if$
+      mid.sentence 'output.state :=
+    }
+  if$
+  s
+}
+FUNCTION {output}
+{ duplicate$ empty$
+    'pop$
+    'output.nonnull
+  if$
+}
+FUNCTION {output.check}
+{ 't :=
+  duplicate$ empty$
+    { pop$ "empty " t * " in " * cite$ * warning$ }
+    'output.nonnull
+  if$
+}
+FUNCTION {fin.entry}
+{ add.period$
+  write$
+  newline$
+}
+FUNCTION {new.block}
+{ output.state before.all =
+    'skip$
+    { after.block 'output.state := }
+  if$
+}
+FUNCTION {new.sentence}
+{ output.state after.block =
+    'skip$
+    { output.state before.all =
+        'skip$
+        { after.sentence 'output.state := }
+      if$
+    }
+  if$
+}
+FUNCTION {add.blank}
+{  " " * before.all 'output.state :=
+}
+FUNCTION {date.block}
+{
+  new.block
+}
+FUNCTION {not}
+{   { #0 }
+    { #1 }
+  if$
+}
+FUNCTION {and}
+{   'skip$
+    { pop$ #0 }
+  if$
+}
+FUNCTION {or}
+{   { pop$ #1 }
+    'skip$
+  if$
+}
+FUNCTION {new.block.checkb}
+{ empty$
+  swap$ empty$
+  and
+    'skip$
+    'new.block
+  if$
+}
+FUNCTION {field.or.null}
+{ duplicate$ empty$
+    { pop$ "" }
+    'skip$
+  if$
+}
+FUNCTION {emphasize}
+{ duplicate$ empty$
+    { pop$ "" }
+    { "{\em " swap$ * "\/}" * }
+  if$
+}
+FUNCTION {tie.or.space.prefix}
+{ duplicate$ text.length$ #3 <
+    { "~" }
+    { " " }
+  if$
+  swap$
+}
+FUNCTION {capitalize}
+{ "u" change.case$ "t" change.case$ }
+FUNCTION {space.word}
+{ " " swap$ * " " * }
+ % Here are the language-specific definitions for explicit words.
+ % Each function has a name bbl.xxx where xxx is the English word.
+ % The language selected here is ENGLISH
+FUNCTION {bbl.and}
+{ "and"}
+FUNCTION {bbl.etal}
+{ "et~al." }
+FUNCTION {bbl.editors}
+{ "editors" }
+FUNCTION {bbl.editor}
+{ "editor" }
+FUNCTION {bbl.edby}
+{ "edited by" }
+FUNCTION {bbl.edition}
+{ "edition" }
+FUNCTION {bbl.volume}
+{ "volume" }
+FUNCTION {bbl.of}
+{ "of" }
+FUNCTION {bbl.number}
+{ "number" }
+FUNCTION {bbl.nr}
+{ "no." }
+FUNCTION {bbl.in}
+{ "in" }
+FUNCTION {bbl.pages}
+{ "pages" }
+FUNCTION {bbl.page}
+{ "page" }
+FUNCTION {bbl.chapter}
+{ "chapter" }
+FUNCTION {bbl.techrep}
+{ "Technical Report" }
+FUNCTION {bbl.mthesis}
+{ "Master's thesis" }
+FUNCTION {bbl.phdthesis}
+{ "Ph.D. thesis" }
+MACRO {jan} {"January"}
+MACRO {feb} {"February"}
+MACRO {mar} {"March"}
+MACRO {apr} {"April"}
+MACRO {may} {"May"}
+MACRO {jun} {"June"}
+MACRO {jul} {"July"}
+MACRO {aug} {"August"}
+MACRO {sep} {"September"}
+MACRO {oct} {"October"}
+MACRO {nov} {"November"}
+MACRO {dec} {"December"}
+MACRO {acmcs} {"ACM Computing Surveys"}
+MACRO {acta} {"Acta Informatica"}
+MACRO {cacm} {"Communications of the ACM"}
+MACRO {ibmjrd} {"IBM Journal of Research and Development"}
+MACRO {ibmsj} {"IBM Systems Journal"}
+MACRO {ieeese} {"IEEE Transactions on Software Engineering"}
+MACRO {ieeetc} {"IEEE Transactions on Computers"}
+MACRO {ieeetcad}
+ {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"}
+MACRO {ipl} {"Information Processing Letters"}
+MACRO {jacm} {"Journal of the ACM"}
+MACRO {jcss} {"Journal of Computer and System Sciences"}
+MACRO {scp} {"Science of Computer Programming"}
+MACRO {sicomp} {"SIAM Journal on Computing"}
+MACRO {tocs} {"ACM Transactions on Computer Systems"}
+MACRO {tods} {"ACM Transactions on Database Systems"}
+MACRO {tog} {"ACM Transactions on Graphics"}
+MACRO {toms} {"ACM Transactions on Mathematical Software"}
+MACRO {toois} {"ACM Transactions on Office Information Systems"}
+MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"}
+MACRO {tcs} {"Theoretical Computer Science"}
+FUNCTION {bibinfo.check}
+{ swap$
+  duplicate$ missing$
+    {
+      pop$ pop$
+      ""
+    }
+    { duplicate$ empty$
+        {
+          swap$ pop$
+        }
+        { swap$
+          pop$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {bibinfo.warn}
+{ swap$
+  duplicate$ missing$
+    {
+      swap$ "missing " swap$ * " in " * cite$ * warning$ pop$
+      ""
+    }
+    { duplicate$ empty$
+        {
+          swap$ "empty " swap$ * " in " * cite$ * warning$
+        }
+        { swap$
+          pop$
+        }
+      if$
+    }
+  if$
+}
+STRINGS  { bibinfo}
+INTEGERS { nameptr namesleft numnames }
+FUNCTION {format.names}
+{ 'bibinfo :=
+  duplicate$ empty$ 'skip$ {
+  's :=
+  "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      duplicate$ #1 >
+        { "{jj~}{ff~}{vv~}{ll}" }
+        { "{jj~}{ff~}{vv~}{ll}" }
+      if$
+      format.name$
+      bibinfo bibinfo.check
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              numnames #2 >
+                { "," * }
+                'skip$
+              if$
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+  } if$
+}
+FUNCTION {format.names.ed}
+{
+  'bibinfo :=
+  duplicate$ empty$ 'skip$ {
+  's :=
+  "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{ff~}{vv~}{ll}{, jj}"
+      format.name$
+      bibinfo bibinfo.check
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              numnames #2 >
+                { "," * }
+                'skip$
+              if$
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+  } if$
+}
+FUNCTION {format.key}
+{ empty$
+    { key field.or.null }
+    { "" }
+  if$
+}
+FUNCTION {format.authors}
+{ author "author" format.names
+}
+FUNCTION {get.bbl.editor}
+{ editor num.names$ #1 > 'bbl.editors 'bbl.editor if$ }
+FUNCTION {format.editors}
+{ editor "editor" format.names duplicate$ empty$ 'skip$
+    {
+      "," *
+      " " *
+      get.bbl.editor
+      *
+    }
+  if$
+}
+FUNCTION {format.note}
+{
+ note empty$
+    { "" }
+    { note #1 #1 substring$
+      duplicate$ "{" =
+        'skip$
+        { output.state mid.sentence =
+          { "l" }
+          { "u" }
+        if$
+        change.case$
+        }
+      if$
+      note #2 global.max$ substring$ * "note" bibinfo.check
+    }
+  if$
+}
+FUNCTION {doilink}
+{ duplicate$ empty$
+    { pop$ "" }
+    { doi empty$
+	{ url empty$
+            { skip$ }
+            { "\href{" url * "}{" * swap$ * "}" * }
+	  if$
+	}
+	{ "\href{https://doi.org/" doi * "}{" * swap$ * "}" * }
+      if$
+    }
+  if$
+}
+FUNCTION {format.doi}
+{ doi empty$
+    { "" }
+    { "\url{https://doi.org/" doi * "}" * }
+  if$
+  "doi" bibinfo.check
+}
+FUNCTION {format.url}
+{ doi empty$
+    {
+      url empty$
+	{ "" }
+	{ "\url{" url * "}" * }
+      if$
+      "url" bibinfo.check
+    }
+    { "" }
+  if$
+}
+FUNCTION {format.title}
+{ title
+  duplicate$ empty$ 'skip$
+    { "t" change.case$ doilink }
+  if$
+  "title" bibinfo.check
+}
+FUNCTION {format.full.names}
+{'s :=
+ "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{vv~}{ll}" format.name$
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  numnames #2 >
+                    { "," * }
+                    'skip$
+                  if$
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+}
+FUNCTION {author.editor.key.full}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { cite$ #1 #3 substring$ }
+            'key
+          if$
+        }
+        { editor format.full.names }
+      if$
+    }
+    { author format.full.names }
+  if$
+}
+FUNCTION {author.key.full}
+{ author empty$
+    { key empty$
+         { cite$ #1 #3 substring$ }
+          'key
+      if$
+    }
+    { author format.full.names }
+  if$
+}
+FUNCTION {editor.key.full}
+{ editor empty$
+    { key empty$
+         { cite$ #1 #3 substring$ }
+          'key
+      if$
+    }
+    { editor format.full.names }
+  if$
+}
+FUNCTION {make.full.names}
+{ type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.key.full
+    { type$ "proceedings" =
+        'editor.key.full
+        'author.key.full
+      if$
+    }
+  if$
+}
+FUNCTION {output.bibitem}
+{ newline$
+  "\bibitem[{" write$
+  label write$
+  ")" make.full.names duplicate$ short.list =
+     { pop$ }
+     { * }
+   if$
+  "}]{" * write$
+  cite$ write$
+  "}" write$
+  newline$
+  ""
+  before.all 'output.state :=
+}
+FUNCTION {n.dashify}
+{
+  't :=
+  ""
+    { t empty$ not }
+    { t #1 #1 substring$ "-" =
+        { t #1 #2 substring$ "--" = not
+            { "--" *
+              t #2 global.max$ substring$ 't :=
+            }
+            {   { t #1 #1 substring$ "-" = }
+                { "-" *
+                  t #2 global.max$ substring$ 't :=
+                }
+              while$
+            }
+          if$
+        }
+        { t #1 #1 substring$ *
+          t #2 global.max$ substring$ 't :=
+        }
+      if$
+    }
+  while$
+}
+FUNCTION {word.in}
+{ bbl.in capitalize
+  " " * }
+FUNCTION {format.date}
+{ year "year" bibinfo.check duplicate$ empty$
+    {
+      "empty year in " cite$ * "; set to ????" * warning$
+       pop$ "????"
+    }
+    'skip$
+  if$
+  extra.label *
+  before.all 'output.state :=
+  after.sentence 'output.state :=
+}
+FUNCTION {format.btitle}
+{ title "title" bibinfo.check
+  duplicate$ empty$ 'skip$
+    {
+      emphasize
+    }
+  if$
+}
+FUNCTION {either.or.check}
+{ empty$
+    'pop$
+    { "can't use both " swap$ * " fields in " * cite$ * warning$ }
+  if$
+}
+FUNCTION {format.bvolume}
+{ volume empty$
+    { "" }
+    { bbl.volume volume tie.or.space.prefix
+      "volume" bibinfo.check * *
+      series "series" bibinfo.check
+      duplicate$ empty$ 'pop$
+        { swap$ bbl.of space.word * swap$
+          emphasize * }
+      if$
+      "volume and number" number either.or.check
+    }
+  if$
+}
+FUNCTION {format.number.series}
+{ volume empty$
+    { number empty$
+        { series field.or.null }
+        { series empty$
+            { number "number" bibinfo.check }
+        { output.state mid.sentence =
+            { bbl.number }
+            { bbl.number capitalize }
+          if$
+          number tie.or.space.prefix "number" bibinfo.check * *
+          bbl.in space.word *
+          series "series" bibinfo.check *
+        }
+      if$
+    }
+      if$
+    }
+    { "" }
+  if$
+}
+FUNCTION {format.edition}
+{ edition duplicate$ empty$ 'skip$
+    {
+      output.state mid.sentence =
+        { "l" }
+        { "t" }
+      if$ change.case$
+      "edition" bibinfo.check
+      " " * bbl.edition *
+    }
+  if$
+}
+INTEGERS { multiresult }
+FUNCTION {multi.page.check}
+{ 't :=
+  #0 'multiresult :=
+    { multiresult not
+      t empty$ not
+      and
+    }
+    { t #1 #1 substring$
+      duplicate$ "-" =
+      swap$ duplicate$ "," =
+      swap$ "+" =
+      or or
+        { #1 'multiresult := }
+        { t #2 global.max$ substring$ 't := }
+      if$
+    }
+  while$
+  multiresult
+}
+FUNCTION {format.pages}
+{ pages duplicate$ empty$ 'skip$
+    { duplicate$ multi.page.check
+        {
+          bbl.pages swap$
+          n.dashify
+        }
+        {
+          bbl.page swap$
+        }
+      if$
+      tie.or.space.prefix
+      "pages" bibinfo.check
+      * *
+    }
+  if$
+}
+FUNCTION {format.journal.pages}
+{ pages duplicate$ empty$ 'pop$
+    { swap$ duplicate$ empty$
+        { pop$ pop$ format.pages }
+        {
+          ":" *
+          swap$
+          n.dashify
+          "pages" bibinfo.check
+          *
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {format.vol.num.pages}
+{ volume field.or.null
+  duplicate$ empty$ 'skip$
+    {
+      "volume" bibinfo.check
+    }
+  if$
+  number "number" bibinfo.check duplicate$ empty$ 'skip$
+    {
+      swap$ duplicate$ empty$
+        { "there's a number but no volume in " cite$ * warning$ }
+        'skip$
+      if$
+      swap$
+      "(" swap$ * ")" *
+    }
+  if$ *
+  format.journal.pages
+}
+FUNCTION {format.chapter.pages}
+{ chapter empty$
+    'format.pages
+    { type empty$
+        { bbl.chapter }
+        { type "l" change.case$
+          "type" bibinfo.check
+        }
+      if$
+      chapter tie.or.space.prefix
+      "chapter" bibinfo.check
+      * *
+      pages empty$
+        'skip$
+        { ", " * format.pages * }
+      if$
+    }
+  if$
+}
+FUNCTION {format.booktitle}
+{
+  booktitle "booktitle" bibinfo.check
+  emphasize
+}
+FUNCTION {format.in.ed.booktitle}
+{ format.booktitle duplicate$ empty$ 'skip$
+    {
+      editor "editor" format.names.ed duplicate$ empty$ 'pop$
+        {
+          "," *
+          " " *
+          get.bbl.editor
+          ", " *
+          * swap$
+          * }
+      if$
+      word.in swap$ *
+    }
+  if$
+}
+FUNCTION {format.thesis.type}
+{ type duplicate$ empty$
+    'pop$
+    { swap$ pop$
+      "t" change.case$ "type" bibinfo.check
+    }
+  if$
+}
+FUNCTION {format.tr.number}
+{ number "number" bibinfo.check
+  type duplicate$ empty$
+    { pop$ bbl.techrep }
+    'skip$
+  if$
+  "type" bibinfo.check
+  swap$ duplicate$ empty$
+    { pop$ "t" change.case$ }
+    { tie.or.space.prefix * * }
+  if$
+}
+FUNCTION {format.article.crossref}
+{
+  word.in
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.book.crossref}
+{ volume duplicate$ empty$
+    { "empty volume in " cite$ * "'s crossref of " * crossref * warning$
+      pop$ word.in
+    }
+    { bbl.volume
+      capitalize
+      swap$ tie.or.space.prefix "volume" bibinfo.check * * bbl.of space.word *
+    }
+  if$
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.incoll.inproc.crossref}
+{
+  word.in
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.org.or.pub}
+{ 't :=
+  ""
+  address empty$ t empty$ and
+    'skip$
+    {
+      t empty$
+        { address "address" bibinfo.check *
+        }
+        { t *
+          address empty$
+            'skip$
+            { ", " * address "address" bibinfo.check * }
+          if$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {format.publisher.address}
+{ publisher "publisher" bibinfo.warn format.org.or.pub
+}
+FUNCTION {format.organization.address}
+{ organization "organization" bibinfo.check format.org.or.pub
+}
+FUNCTION {article}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    {
+      journal
+      "journal" bibinfo.check
+      emphasize
+      "journal" output.check
+      add.blank
+      format.vol.num.pages output
+    }
+    { format.article.crossref output.nonnull
+      format.pages output
+    }
+  if$
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {book}
+{ output.bibitem
+  author empty$
+    { format.editors "author and editor" output.check
+      editor format.key output
+    }
+    { format.authors output.nonnull
+      crossref missing$
+        { "author and editor" editor either.or.check }
+        'skip$
+      if$
+    }
+  if$
+  format.date "year" output.check
+  date.block
+  format.btitle "title" output.check
+  crossref missing$
+    { format.bvolume output
+      new.block
+      format.number.series output
+      new.sentence
+      format.publisher.address output
+    }
+    {
+      new.block
+      format.book.crossref output.nonnull
+    }
+  if$
+  format.edition output
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {booklet}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  format.title "title" output.check
+  new.block
+  howpublished "howpublished" bibinfo.check output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {inbook}
+{ output.bibitem
+  author empty$
+    { format.editors "author and editor" output.check
+      editor format.key output
+    }
+    { format.authors output.nonnull
+      crossref missing$
+        { "author and editor" editor either.or.check }
+        'skip$
+      if$
+    }
+  if$
+  format.date "year" output.check
+  date.block
+  format.btitle "title" output.check
+  crossref missing$
+    {
+      format.publisher.address output
+      format.bvolume output
+      format.chapter.pages "chapter and pages" output.check
+      new.block
+      format.number.series output
+      new.sentence
+    }
+    {
+      format.chapter.pages "chapter and pages" output.check
+      new.block
+      format.book.crossref output.nonnull
+    }
+  if$
+  format.edition output
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {incollection}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    { format.in.ed.booktitle "booktitle" output.check
+      format.publisher.address output
+      format.bvolume output
+      format.number.series output
+      format.chapter.pages output
+      new.sentence
+      format.edition output
+    }
+    { format.incoll.inproc.crossref output.nonnull
+      format.chapter.pages output
+    }
+  if$
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {inproceedings}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    { format.in.ed.booktitle "booktitle" output.check
+      new.sentence
+      publisher empty$
+        { format.organization.address output }
+        { organization "organization" bibinfo.check output
+          format.publisher.address output
+        }
+      if$
+      format.bvolume output
+      format.number.series output
+      format.pages output
+    }
+    { format.incoll.inproc.crossref output.nonnull
+      format.pages output
+    }
+  if$
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {conference} { inproceedings }
+FUNCTION {manual}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  format.btitle "title" output.check
+  organization address new.block.checkb
+  organization "organization" bibinfo.check output
+  address "address" bibinfo.check output
+  format.edition output
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {mastersthesis}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  format.btitle
+  "title" output.check
+  new.block
+  bbl.mthesis format.thesis.type output.nonnull
+  school "school" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {misc}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  format.title output
+  new.block
+  howpublished "howpublished" bibinfo.check output
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {phdthesis}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  format.btitle
+  "title" output.check
+  new.block
+  bbl.phdthesis format.thesis.type output.nonnull
+  school "school" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {proceedings}
+{ output.bibitem
+  format.editors output
+  editor format.key output
+  format.date "year" output.check
+  date.block
+  format.btitle "title" output.check
+  format.bvolume output
+  format.number.series output
+  new.sentence
+  publisher empty$
+    { format.organization.address output }
+    { organization "organization" bibinfo.check output
+      format.publisher.address output
+    }
+  if$
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {techreport}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  format.title
+  "title" output.check
+  new.block
+  format.tr.number output.nonnull
+  institution "institution" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {unpublished}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  format.title "title" output.check
+  new.block
+  format.note "note" output.check
+  new.block
+  format.doi output
+  format.url output
+  fin.entry
+}
+FUNCTION {default.type} { misc }
+READ
+FUNCTION {sortify}
+{ purify$
+  "l" change.case$
+}
+INTEGERS { len }
+FUNCTION {chop.word}
+{ 's :=
+  'len :=
+  s #1 len substring$ =
+    { s len #1 + global.max$ substring$ }
+    's
+  if$
+}
+FUNCTION {format.lab.names}
+{ 's :=
+  "" 't :=
+  s #1 "{vv~}{ll}" format.name$
+  s num.names$ duplicate$
+  #2 >
+    { pop$
+      " " * bbl.etal *
+    }
+    { #2 <
+        'skip$
+        { s #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" =
+            {
+              " " * bbl.etal *
+            }
+            { bbl.and space.word * s #2 "{vv~}{ll}" format.name$
+              * }
+          if$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {author.key.label}
+{ author empty$
+    { key empty$
+        { cite$ #1 #3 substring$ }
+        'key
+      if$
+    }
+    { author format.lab.names }
+  if$
+}
+FUNCTION {author.editor.key.label}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { cite$ #1 #3 substring$ }
+            'key
+          if$
+        }
+        { editor format.lab.names }
+      if$
+    }
+    { author format.lab.names }
+  if$
+}
+FUNCTION {editor.key.label}
+{ editor empty$
+    { key empty$
+        { cite$ #1 #3 substring$ }
+        'key
+      if$
+    }
+    { editor format.lab.names }
+  if$
+}
+FUNCTION {calc.short.authors}
+{ type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.key.label
+    { type$ "proceedings" =
+        'editor.key.label
+        'author.key.label
+      if$
+    }
+  if$
+  'short.list :=
+}
+FUNCTION {calc.label}
+{ calc.short.authors
+  short.list
+  "("
+  *
+  year duplicate$ empty$
+  short.list key field.or.null = or
+     { pop$ "" }
+     'skip$
+  if$
+  *
+  'label :=
+}
+FUNCTION {sort.format.names}
+{ 's :=
+  #1 'nameptr :=
+  ""
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{vv{ } }{ll{ }}{  ff{ }}{  jj{ }}"
+      format.name$ 't :=
+      nameptr #1 >
+        {
+          "   "  *
+          namesleft #1 = t "others" = and
+            { "zzzzz" * }
+            { t sortify * }
+          if$
+        }
+        { t sortify * }
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+}
+FUNCTION {sort.format.title}
+{ 't :=
+  "A " #2
+    "An " #3
+      "The " #4 t chop.word
+    chop.word
+  chop.word
+  sortify
+  #1 global.max$ substring$
+}
+FUNCTION {author.sort}
+{ author empty$
+    { key empty$
+        { "to sort, need author or key in " cite$ * warning$
+          ""
+        }
+        { key sortify }
+      if$
+    }
+    { author sort.format.names }
+  if$
+}
+FUNCTION {author.editor.sort}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { "to sort, need author, editor, or key in " cite$ * warning$
+              ""
+            }
+            { key sortify }
+          if$
+        }
+        { editor sort.format.names }
+      if$
+    }
+    { author sort.format.names }
+  if$
+}
+FUNCTION {editor.sort}
+{ editor empty$
+    { key empty$
+        { "to sort, need editor or key in " cite$ * warning$
+          ""
+        }
+        { key sortify }
+      if$
+    }
+    { editor sort.format.names }
+  if$
+}
+FUNCTION {presort}
+{ calc.label
+  label sortify
+  "    "
+  *
+  type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.sort
+    { type$ "proceedings" =
+        'editor.sort
+        'author.sort
+      if$
+    }
+  if$
+  #1 entry.max$ substring$
+  'sort.label :=
+  sort.label
+  *
+  "    "
+  *
+  title field.or.null
+  sort.format.title
+  *
+  #1 entry.max$ substring$
+  'sort.key$ :=
+}
+ITERATE {presort}
+SORT
+STRINGS { last.label next.extra }
+INTEGERS { last.extra.num number.label }
+FUNCTION {initialize.extra.label.stuff}
+{ #0 int.to.chr$ 'last.label :=
+  "" 'next.extra :=
+  #0 'last.extra.num :=
+  #0 'number.label :=
+}
+FUNCTION {forward.pass}
+{ last.label label =
+    { last.extra.num #1 + 'last.extra.num :=
+      last.extra.num int.to.chr$ 'extra.label :=
+    }
+    { "a" chr.to.int$ 'last.extra.num :=
+      "" 'extra.label :=
+      label 'last.label :=
+    }
+  if$
+  number.label #1 + 'number.label :=
+}
+FUNCTION {reverse.pass}
+{ next.extra "b" =
+    { "a" 'extra.label := }
+    'skip$
+  if$
+  extra.label 'next.extra :=
+  extra.label
+  duplicate$ empty$
+    'skip$
+    { "{\natexlab{" swap$ * "}}" * }
+  if$
+  'extra.label :=
+  label extra.label * 'label :=
+}
+EXECUTE {initialize.extra.label.stuff}
+ITERATE {forward.pass}
+REVERSE {reverse.pass}
+FUNCTION {bib.sort.order}
+{ sort.label
+  "    "
+  *
+  year field.or.null sortify
+  *
+  "    "
+  *
+  title field.or.null
+  sort.format.title
+  *
+  #1 entry.max$ substring$
+  'sort.key$ :=
+}
+ITERATE {bib.sort.order}
+SORT
+FUNCTION {begin.bib}
+{ preamble$ empty$
+    'skip$
+    { preamble$ write$ newline$ }
+  if$
+  "\begin{thebibliography}{}"
+  write$ newline$
+  "\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi"
+  write$ newline$
+}
+EXECUTE {begin.bib}
+EXECUTE {init.state.consts}
+ITERATE {call.type$}
+FUNCTION {end.bib}
+{ newline$
+  "\end{thebibliography}" write$ newline$
+}
+EXECUTE {end.bib}
+%% End of customized bst file
+%%
+%% End of file `acl.bst'.

references/2018.naacl.vu/source/naaclhlt2018.sty ADDED Viewed

	@@ -0,0 +1,543 @@

+% This is the LaTex style file for NAACL 2018. Major modifications include
+% changing the color of the line numbers to a light gray; changing font size of abstract to be 10pt; changing caption font size to be 10pt.
+% -- Meg Mitchell and Stephanie Lukin
+% 2017: modified to support DOI links in bibliography.  Now uses
+% natbib package rather than defining citation commands in this file.
+% Use with acl_natbib.bst bib style.  -- Dan Gildea
+% This is the LaTeX style for ACL 2016. It contains Margaret Mitchell's
+% line number adaptations (ported by Hai Zhao and Yannick Versley).
+% It is nearly identical to the style files for ACL 2015,
+% ACL 2014, EACL 2006, ACL2005, ACL 2002, ACL 2001, ACL 2000,
+% EACL 95 and EACL 99.
+%
+% Changes made include: adapt layout to A4 and centimeters, widen abstract
+% This is the LaTeX style file for ACL 2000.  It is nearly identical to the
+% style files for EACL 95 and EACL 99.  Minor changes include editing the
+% instructions to reflect use of \documentclass rather than \documentstyle
+% and removing the white space before the title on the first page
+% -- John Chen, June 29, 2000
+% This is the LaTeX style file for EACL-95.  It is identical to the
+% style file for ANLP '94 except that the margins are adjusted for A4
+% paper.  -- abney 13 Dec 94
+% The ANLP '94 style file is a slightly modified
+% version of the style used for AAAI and IJCAI, using some changes
+% prepared by Fernando Pereira and others and some minor changes
+% by Paul Jacobs.
+% Papers prepared using the aclsub.sty file and acl.bst bibtex style
+% should be easily converted to final format using this style.
+% (1) Submission information (\wordcount, \subject, and \makeidpage)
+% should be removed.
+% (2) \summary should be removed.  The summary material should come
+% after \maketitle and should be in the ``abstract'' environment
+% (between \begin{abstract} and \end{abstract}).
+% (3) Check all citations.  This style should handle citations correctly
+% and also allows multiple citations separated by semicolons.
+% (4) Check figures and examples.  Because the final format is double-
+% column, some adjustments may have to be made to fit text in the column
+% or to choose full-width (\figure*} figures.
+% Place this in a file called aclap.sty in the TeX search path.
+% (Placing it in the same directory as the paper should also work.)
+% Prepared by Peter F. Patel-Schneider, liberally using the ideas of
+% other style hackers, including Barbara Beeton.
+% This style is NOT guaranteed to work.  It is provided in the hope
+% that it will make the preparation of papers easier.
+%
+% There are undoubtably bugs in this style.  If you make bug fixes,
+% improvements, etc.  please let me know.  My e-mail address is:
+%       pfps@research.att.com
+% Papers are to be prepared using the ``acl_natbib'' bibliography style,
+% as follows:
+%       \documentclass[11pt]{article}
+%       \usepackage{acl2000}
+%       \title{Title}
+%       \author{Author 1 \and Author 2 \\ Address line \\ Address line \And
+%               Author 3 \\ Address line \\ Address line}
+%       \begin{document}
+%       ...
+%       \bibliography{bibliography-file}
+%       \bibliographystyle{acl_natbib}
+%       \end{document}
+% Author information can be set in various styles:
+% For several authors from the same institution:
+% \author{Author 1 \and ... \and Author n \\
+%         Address line \\ ... \\ Address line}
+% if the names do not fit well on one line use
+%         Author 1 \\ {\bf Author 2} \\ ... \\ {\bf Author n} \\
+% For authors from different institutions:
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \And  ... \And
+%         Author n \\ Address line \\ ... \\ Address line}
+% To start a seperate ``row'' of authors use \AND, as in
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \AND
+%         Author 2 \\ Address line \\ ... \\ Address line \And
+%         Author 3 \\ Address line \\ ... \\ Address line}
+% If the title and author information does not fit in the area allocated,
+% place \setlength\titlebox{<new height>} right after
+% \usepackage{acl2015}
+% where <new height> can be something larger than 5cm
+% include hyperref, unless user specifies nohyperref option like this:
+% \usepackage[nohyperref]{naaclhlt2018}
+\newif\ifacl@hyperref
+\DeclareOption{hyperref}{\acl@hyperreftrue}
+\DeclareOption{nohyperref}{\acl@hyperreffalse}
+\ExecuteOptions{hyperref} % default is to use hyperref
+\ProcessOptions\relax
+\ifacl@hyperref
+  \RequirePackage{hyperref}
+  \usepackage{xcolor}		% make links dark blue
+  \definecolor{darkblue}{rgb}{0, 0, 0.5}
+  \hypersetup{colorlinks=true,citecolor=darkblue, linkcolor=darkblue, urlcolor=darkblue}
+\else
+  % This definition is used if the hyperref package is not loaded.
+  % It provides a backup, no-op definiton of \href.
+  % This is necessary because \href command is used in the acl_natbib.bst file.
+  \def\href#1#2{{#2}}
+\fi
+\typeout{Conference Style for NAACL-HLT 2018}
+% NOTE:  Some laser printers have a serious problem printing TeX output.
+% These printing devices, commonly known as ``write-white'' laser
+% printers, tend to make characters too light.  To get around this
+% problem, a darker set of fonts must be created for these devices.
+%
+\newcommand{\Thanks}[1]{\thanks{\ #1}}
+% A4 modified by Eneko; again modified by Alexander for 5cm titlebox
+\setlength{\paperwidth}{21cm}   % A4
+\setlength{\paperheight}{29.7cm}% A4
+\setlength\topmargin{-0.5cm}
+\setlength\oddsidemargin{0cm}
+\setlength\textheight{24.7cm}
+\setlength\textwidth{16.0cm}
+\setlength\columnsep{0.6cm}
+\newlength\titlebox
+\setlength\titlebox{5cm}
+\setlength\headheight{5pt}
+\setlength\headsep{0pt}
+\thispagestyle{empty}
+\pagestyle{empty}
+\flushbottom \twocolumn \sloppy
+% We're never going to need a table of contents, so just flush it to
+% save space --- suggested by drstrip@sandia-2
+\def\addcontentsline#1#2#3{}
+\newif\ifaclfinal
+\aclfinalfalse
+\def\aclfinalcopy{\global\aclfinaltrue}
+%% ----- Set up hooks to repeat content on every page of the output doc,
+%% necessary for the line numbers in the submitted version.  --MM
+%%
+%% Copied from CVPR 2015's cvpr_eso.sty, which appears to be largely copied from everyshi.sty.
+%%
+%% Original cvpr_eso.sty available at: http://www.pamitc.org/cvpr15/author_guidelines.php
+%% Original evershi.sty available at: https://www.ctan.org/pkg/everyshi
+%%
+%% Copyright (C) 2001 Martin Schr\"oder:
+%%
+%%                         Martin Schr"oder
+%%                         Cr"usemannallee 3
+%%                         D-28213 Bremen
+%%                         Martin.Schroeder@ACM.org
+%%
+%% This program may be redistributed and/or modified under the terms
+%% of the LaTeX Project Public License, either version 1.0 of this
+%% license, or (at your option) any later version.
+%% The latest version of this license is in
+%%    CTAN:macros/latex/base/lppl.txt.
+%%
+%% Happy users are requested to send [Martin] a postcard. :-)
+%%
+\newcommand{\@EveryShipoutACL@Hook}{}
+\newcommand{\@EveryShipoutACL@AtNextHook}{}
+\newcommand*{\EveryShipoutACL}[1]
+   {\g@addto@macro\@EveryShipoutACL@Hook{#1}}
+\newcommand*{\AtNextShipoutACL@}[1]
+   {\g@addto@macro\@EveryShipoutACL@AtNextHook{#1}}
+\newcommand{\@EveryShipoutACL@Shipout}{%
+   \afterassignment\@EveryShipoutACL@Test
+   \global\setbox\@cclv= %
+   }
+\newcommand{\@EveryShipoutACL@Test}{%
+   \ifvoid\@cclv\relax
+      \aftergroup\@EveryShipoutACL@Output
+   \else
+      \@EveryShipoutACL@Output
+   \fi%
+   }
+\newcommand{\@EveryShipoutACL@Output}{%
+   \@EveryShipoutACL@Hook%
+   \@EveryShipoutACL@AtNextHook%
+      \gdef\@EveryShipoutACL@AtNextHook{}%
+   \@EveryShipoutACL@Org@Shipout\box\@cclv%
+   }
+\newcommand{\@EveryShipoutACL@Org@Shipout}{}
+\newcommand*{\@EveryShipoutACL@Init}{%
+   \message{ABD: EveryShipout initializing macros}%
+   \let\@EveryShipoutACL@Org@Shipout\shipout
+   \let\shipout\@EveryShipoutACL@Shipout
+   }
+\AtBeginDocument{\@EveryShipoutACL@Init}
+%% ----- Set up for placing additional items into the submitted version --MM
+%%
+%% Based on eso-pic.sty
+%%
+%% Original available at: https://www.ctan.org/tex-archive/macros/latex/contrib/eso-pic
+%% Copyright (C) 1998-2002 by Rolf Niepraschk <niepraschk@ptb.de>
+%%
+%% Which may be distributed and/or modified under the conditions of
+%% the LaTeX Project Public License, either version 1.2 of this license
+%% or (at your option) any later version.  The latest version of this
+%% license is in:
+%%
+%%    http://www.latex-project.org/lppl.txt
+%%
+%% and version 1.2 or later is part of all distributions of LaTeX version
+%% 1999/12/01 or later.
+%%
+%% In contrast to the original, we do not include the definitions for/using:
+%% gridpicture, div[2], isMEMOIR[1], gridSetup[6][], subgridstyle{dotted}, labelfactor{}, gap{}, gridunitname{}, gridunit{}, gridlines{\thinlines}, subgridlines{\thinlines}, the {keyval} package, evenside margin, nor any definitions with 'color'.
+%%
+%% These are beyond  what is needed for the NAACL style.
+%%
+\newcommand\LenToUnit[1]{#1\@gobble}
+\newcommand\AtPageUpperLeft[1]{%
+  \begingroup
+    \@tempdima=0pt\relax\@tempdimb=\ESO@yoffsetI\relax
+    \put(\LenToUnit{\@tempdima},\LenToUnit{\@tempdimb}){#1}%
+  \endgroup
+}
+\newcommand\AtPageLowerLeft[1]{\AtPageUpperLeft{%
+  \put(0,\LenToUnit{-\paperheight}){#1}}}
+\newcommand\AtPageCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-.5\paperheight}){#1}}}
+\newcommand\AtPageLowerCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-\paperheight}){#1}}}%
+\newcommand\AtPageLowishCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-.96\paperheight}){#1}}}
+\newcommand\AtTextUpperLeft[1]{%
+  \begingroup
+    \setlength\@tempdima{1in}%
+    \advance\@tempdima\oddsidemargin%
+    \@tempdimb=\ESO@yoffsetI\relax\advance\@tempdimb-1in\relax%
+    \advance\@tempdimb-\topmargin%
+    \advance\@tempdimb-\headheight\advance\@tempdimb-\headsep%
+    \put(\LenToUnit{\@tempdima},\LenToUnit{\@tempdimb}){#1}%
+  \endgroup
+}
+\newcommand\AtTextLowerLeft[1]{\AtTextUpperLeft{%
+  \put(0,\LenToUnit{-\textheight}){#1}}}
+\newcommand\AtTextCenter[1]{\AtTextUpperLeft{%
+  \put(\LenToUnit{.5\textwidth},\LenToUnit{-.5\textheight}){#1}}}
+\newcommand{\ESO@HookI}{} \newcommand{\ESO@HookII}{}
+\newcommand{\ESO@HookIII}{}
+\newcommand{\AddToShipoutPicture}{%
+  \@ifstar{\g@addto@macro\ESO@HookII}{\g@addto@macro\ESO@HookI}}
+\newcommand{\ClearShipoutPicture}{\global\let\ESO@HookI\@empty}
+\newcommand{\@ShipoutPicture}{%
+  \bgroup
+    \@tempswafalse%
+    \ifx\ESO@HookI\@empty\else\@tempswatrue\fi%
+    \ifx\ESO@HookII\@empty\else\@tempswatrue\fi%
+    \ifx\ESO@HookIII\@empty\else\@tempswatrue\fi%
+    \if@tempswa%
+      \@tempdima=1in\@tempdimb=-\@tempdima%
+      \advance\@tempdimb\ESO@yoffsetI%
+      \unitlength=1pt%
+      \global\setbox\@cclv\vbox{%
+        \vbox{\let\protect\relax
+          \pictur@(0,0)(\strip@pt\@tempdima,\strip@pt\@tempdimb)%
+            \ESO@HookIII\ESO@HookI\ESO@HookII%
+            \global\let\ESO@HookII\@empty%
+          \endpicture}%
+          \nointerlineskip%
+        \box\@cclv}%
+    \fi
+  \egroup
+}
+\EveryShipoutACL{\@ShipoutPicture}
+\newif\ifESO@dvips\ESO@dvipsfalse
+\newif\ifESO@grid\ESO@gridfalse
+\newif\ifESO@texcoord\ESO@texcoordfalse
+\newcommand*\ESO@griddelta{}\newcommand*\ESO@griddeltaY{}
+\newcommand*\ESO@gridDelta{}\newcommand*\ESO@gridDeltaY{}
+\newcommand*\ESO@yoffsetI{}\newcommand*\ESO@yoffsetII{}
+\ifESO@texcoord
+  \def\ESO@yoffsetI{0pt}\def\ESO@yoffsetII{-\paperheight}
+  \edef\ESO@griddeltaY{-\ESO@griddelta}\edef\ESO@gridDeltaY{-\ESO@gridDelta}
+\else
+  \def\ESO@yoffsetI{\paperheight}\def\ESO@yoffsetII{0pt}
+  \edef\ESO@griddeltaY{\ESO@griddelta}\edef\ESO@gridDeltaY{\ESO@gridDelta}
+\fi
+%% ----- Submitted version markup: Page numbers, ruler, and confidentiality.  Using ideas/code from cvpr.sty 2015. --MM
+\font\naaclhv  = phvb at 8pt
+%% Define vruler %%
+%\makeatletter
+\newbox\aclrulerbox
+\newcount\aclrulercount
+\newdimen\aclruleroffset
+\newdimen\cv@lineheight
+\newdimen\cv@boxheight
+\newbox\cv@tmpbox
+\newcount\cv@refno
+\newcount\cv@tot
+% NUMBER with left flushed zeros  \fillzeros[<WIDTH>]<NUMBER>
+\newcount\cv@tmpc@ \newcount\cv@tmpc
+\def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi
+\cv@tmpc=1 %
+\loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi
+   \ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat
+\ifnum#2<0\advance\cv@tmpc1\relax-\fi
+\loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat
+\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}%
+% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
+\def\makevruler[#1][#2][#3][#4][#5]{\begingroup\offinterlineskip
+\textheight=#5\vbadness=10000\vfuzz=120ex\overfullrule=0pt%
+\global\setbox\aclrulerbox=\vbox to \textheight{%
+{\parskip=0pt\hfuzz=150em\cv@boxheight=\textheight
+\color{gray}
+\cv@lineheight=#1\global\aclrulercount=#2%
+\cv@tot\cv@boxheight\divide\cv@tot\cv@lineheight\advance\cv@tot2%
+\cv@refno1\vskip-\cv@lineheight\vskip1ex%
+\loop\setbox\cv@tmpbox=\hbox to0cm{{\naaclhv\hfil\fillzeros[#4]\aclrulercount}}%
+\ht\cv@tmpbox\cv@lineheight\dp\cv@tmpbox0pt\box\cv@tmpbox\break
+\advance\cv@refno1\global\advance\aclrulercount#3\relax
+\ifnum\cv@refno<\cv@tot\repeat}}\endgroup}%
+%\makeatother
+\def\aclpaperid{***}
+\def\confidential{NAACL-HLT 2018 Submission~\aclpaperid.  Confidential Review Copy.  DO NOT DISTRIBUTE.}
+%% Page numbering, Vruler and Confidentiality %%
+% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
+\def\aclruler#1{\makevruler[14.17pt][#1][1][3][\textheight]\usebox{\aclrulerbox}}
+\def\leftoffset{-2.1cm} %original: -45pt
+\def\rightoffset{17.5cm} %original: 500pt
+\ifaclfinal\else\pagenumbering{arabic}
+\AddToShipoutPicture{%
+\ifaclfinal\else
+\AtPageLowishCenter{\thepage}
+\aclruleroffset=\textheight
+\advance\aclruleroffset4pt
+  \AtTextUpperLeft{%
+    \put(\LenToUnit{\leftoffset},\LenToUnit{-\aclruleroffset}){%left ruler
+      \aclruler{\aclrulercount}}
+    \put(\LenToUnit{\rightoffset},\LenToUnit{-\aclruleroffset}){%right ruler
+      \aclruler{\aclrulercount}}
+  }
+  \AtTextUpperLeft{%confidential
+    \put(0,\LenToUnit{1cm}){\parbox{\textwidth}{\centering\naaclhv\confidential}}
+  }
+\fi
+}
+%%%% ----- End settings for placing additional items into the submitted version --MM ----- %%%%
+%%%% ----- Begin settings for both submitted and camera-ready version ----- %%%%
+%% Title and Authors %%
+\newcommand\outauthor{
+    \begin{tabular}[t]{c}
+	\ifaclfinal
+	     \bf\@author
+	\else
+		% Avoiding common accidental de-anonymization issue. --MM
+     		\bf Anonymous ACL submission
+	\fi
+    \end{tabular}}
+% Changing the expanded titlebox for submissions to 2.5 in (rather than 6.5cm)
+% and moving it to the style sheet, rather than within the example tex file. --MM
+\ifaclfinal
+\else
+	\addtolength\titlebox{.25in}
+\fi
+% Mostly taken from deproc.
+\def\maketitle{\par
+ \begingroup
+   \def\thefootnote{\fnsymbol{footnote}}
+   \def\@makefnmark{\hbox to 0pt{$^{\@thefnmark}$\hss}}
+   \twocolumn[\@maketitle] \@thanks
+ \endgroup
+ \setcounter{footnote}{0}
+ \let\maketitle\relax \let\@maketitle\relax
+ \gdef\@thanks{}\gdef\@author{}\gdef\@title{}\let\thanks\relax}
+\def\@maketitle{\vbox to \titlebox{\hsize\textwidth
+ \linewidth\hsize \vskip 0.125in minus 0.125in \centering
+ {\Large\bf \@title \par} \vskip 0.2in plus 1fil minus 0.1in
+ {\def\and{\unskip\enspace{\rm and}\enspace}%
+  \def\And{\end{tabular}\hss \egroup \hskip 1in plus 2fil
+           \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}%
+  \def\AND{\end{tabular}\hss\egroup \hfil\hfil\egroup
+          \vskip 0.25in plus 1fil minus 0.125in
+           \hbox to \linewidth\bgroup\large \hfil\hfil
+             \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}
+  \hbox to \linewidth\bgroup\large \hfil\hfil
+    \hbox to 0pt\bgroup\hss
+	\outauthor
+   \hss\egroup
+    \hfil\hfil\egroup}
+  \vskip 0.3in plus 2fil minus 0.1in
+}}
+% margins and font size for abstract
+\renewenvironment{abstract}%
+		 {\centerline{\large\bf Abstract}%
+		  \begin{list}{}%
+		     {\setlength{\rightmargin}{0.6cm}%
+		      \setlength{\leftmargin}{0.6cm}}%
+		   \item[]\ignorespaces%
+		   \@setsize\normalsize{12pt}\xpt\@xpt
+		   }%
+		 {\unskip\end{list}}
+%\renewenvironment{abstract}{\centerline{\large\bf
+% Abstract}\vspace{0.5ex}\begin{quote}}{\par\end{quote}\vskip 1ex}
+% Resizing figure and table captions
+\newcommand{\figcapfont}{\rm}
+\newcommand{\tabcapfont}{\rm}
+\renewcommand{\fnum@figure}{\figcapfont Figure \thefigure}
+\renewcommand{\fnum@table}{\tabcapfont Table \thetable}
+\renewcommand{\figcapfont}{\@setsize\normalsize{12pt}\xpt\@xpt}
+\renewcommand{\tabcapfont}{\@setsize\normalsize{12pt}\xpt\@xpt}
+\RequirePackage{natbib}
+% for citation commands in the .tex, authors can use:
+% \citep, \citet, and \citeyearpar for compatibility with natbib, or
+% \cite, \newcite, and \shortcite for compatibility with older ACL .sty files
+\renewcommand\cite{\citep}	% to get "(Author Year)" with natbib
+\newcommand\shortcite{\citeyearpar}% to get "(Year)" with natbib
+\newcommand\newcite{\citet}	% to get "Author (Year)" with natbib
+% bibliography
+\def\@up#1{\raise.2ex\hbox{#1}}
+% Don't put a label in the bibliography at all.  Just use the unlabeled format
+% instead.
+\def\thebibliography#1{\vskip\parskip%
+\vskip\baselineskip%
+\def\baselinestretch{1}%
+\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi%
+\vskip-\parskip%
+\vskip-\baselineskip%
+\section*{References\@mkboth
+ {References}{References}}\list
+ {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent}
+ \setlength{\itemindent}{-\parindent}}
+ \def\newblock{\hskip .11em plus .33em minus -.07em}
+ \sloppy\clubpenalty4000\widowpenalty4000
+ \sfcode`\.=1000\relax}
+\let\endthebibliography=\endlist
+% Allow for a bibliography of sources of attested examples
+\def\thesourcebibliography#1{\vskip\parskip%
+\vskip\baselineskip%
+\def\baselinestretch{1}%
+\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi%
+\vskip-\parskip%
+\vskip-\baselineskip%
+\section*{Sources of Attested Examples\@mkboth
+ {Sources of Attested Examples}{Sources of Attested Examples}}\list
+ {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent}
+ \setlength{\itemindent}{-\parindent}}
+ \def\newblock{\hskip .11em plus .33em minus -.07em}
+ \sloppy\clubpenalty4000\widowpenalty4000
+ \sfcode`\.=1000\relax}
+\let\endthesourcebibliography=\endlist
+% sections with less space
+\def\section{\@startsection {section}{1}{\z@}{-2.0ex plus
+    -0.5ex minus -.2ex}{1.5ex plus 0.3ex minus .2ex}{\large\bf\raggedright}}
+\def\subsection{\@startsection{subsection}{2}{\z@}{-1.8ex plus
+    -0.5ex minus -.2ex}{0.8ex plus .2ex}{\normalsize\bf\raggedright}}
+%% changed by KO to - values to get teh initial parindent right
+\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-1.5ex plus
+   -0.5ex minus -.2ex}{0.5ex plus .2ex}{\normalsize\bf\raggedright}}
+\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus
+   0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+\def\subparagraph{\@startsection{subparagraph}{5}{\parindent}{1.5ex plus
+   0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+% Footnotes
+\footnotesep 6.65pt %
+\skip\footins 9pt plus 4pt minus 2pt
+\def\footnoterule{\kern-3pt \hrule width 5pc \kern 2.6pt }
+\setcounter{footnote}{0}
+% Lists and paragraphs
+\parindent 1em
+\topsep 4pt plus 1pt minus 2pt
+\partopsep 1pt plus 0.5pt minus 0.5pt
+\itemsep 2pt plus 1pt minus 0.5pt
+\parsep 2pt plus 1pt minus 0.5pt
+\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em
+\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em \leftmarginvi .5em
+\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt
+\def\@listi{\leftmargin\leftmargini}
+\def\@listii{\leftmargin\leftmarginii
+   \labelwidth\leftmarginii\advance\labelwidth-\labelsep
+   \topsep 2pt plus 1pt minus 0.5pt
+   \parsep 1pt plus 0.5pt minus 0.5pt
+   \itemsep \parsep}
+\def\@listiii{\leftmargin\leftmarginiii
+    \labelwidth\leftmarginiii\advance\labelwidth-\labelsep
+    \topsep 1pt plus 0.5pt minus 0.5pt
+    \parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt
+    \itemsep \topsep}
+\def\@listiv{\leftmargin\leftmarginiv
+     \labelwidth\leftmarginiv\advance\labelwidth-\labelsep}
+\def\@listv{\leftmargin\leftmarginv
+     \labelwidth\leftmarginv\advance\labelwidth-\labelsep}
+\def\@listvi{\leftmargin\leftmarginvi
+     \labelwidth\leftmarginvi\advance\labelwidth-\labelsep}
+\abovedisplayskip 7pt plus2pt minus5pt%
+\belowdisplayskip \abovedisplayskip
+\abovedisplayshortskip  0pt plus3pt%
+\belowdisplayshortskip  4pt plus3pt minus3pt%
+% Less leading in most fonts (due to the narrow columns)
+% The choices were between 1-pt and 1.5-pt leading
+\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt}
+\def\small{\@setsize\small{10pt}\ixpt\@ixpt}
+\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt}
+\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt}
+\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt}
+\def\large{\@setsize\large{14pt}\xiipt\@xiipt}
+\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt}
+\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt}
+\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt}
+\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt}

references/2020.emnlp.nguyen/paper.md ADDED Viewed

	@@ -0,0 +1,123 @@

+---
+title: "PhoBERT: Pre-trained language models for Vietnamese"
+authors:
+  - "Dat Quoc Nguyen"
+  - "Anh Tuan Nguyen"
+year: 2020
+venue: "EMNLP Findings 2020"
+url: "https://aclanthology.org/2020.findings-emnlp.92/"
+---
+We present **PhoBERT** with two versions---PhoBERT and PhoBERT---the *first* public large-scale monolingual language models pre-trained for Vietnamese. Experimental results show that PhoBERT consistently outperforms the recent best pre-trained multilingual  model XLM-R [conneau2019unsupervised] and improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part-of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference. We release PhoBERT  to facilitate future research and downstream applications for Vietnamese NLP. Our PhoBERT models are available at: https://github.com/VinAIResearch/PhoBERT.
+# Introduction
+Pre-trained language models, especially BERT [devlin-etal-2019-bert]---the Bidirectional Encoder Representations from Transformers [NIPS2017_7181], have recently become  extremely popular and helped to produce significant improvement gains for various NLP tasks. The success of pre-trained BERT and its variants has largely been limited to the English language. For other languages, one could retrain a language-specific model using the BERT architecture [abs-1906-08101,vries2019bertje,vu-xuan-etal-2019-etnlp,2019arXiv191103894M] or  employ existing pre-trained multilingual BERT-based models  [devlin-etal-2019-bert,NIPS2019_8928,conneau2019unsupervised].
+In terms of Vietnamese language modeling, to the best of our knowledge, there are two main concerns as follows:
+[leftmargin=*]
+- sep{-1pt}
+- The Vietnamese Wikipedia corpus is the only data used to train  monolingual language models [vu-xuan-etal-2019-etnlp], and it also is the only Vietnamese dataset which is included in the pre-training data used by all multilingual language models except XLM-R. It is worth noting that Wikipedia data is not representative of a general language use, and the Vietnamese Wikipedia data is relatively small (1GB in size uncompressed), while pre-trained language models can be significantly improved by using more pre-training data [RoBERTa].
+- All publicly released monolingual and multilingual BERT-based language models are not aware of the difference between Vietnamese syllables and word tokens. This ambiguity comes from the fact that the white space is also used to separate syllables that constitute words when written in Vietnamese. show that 85\
+For example, a 6-syllable written text ``Tôi là một nghiên cứu viên'' (I am a researcher) forms 4 words ``Tôi là một nghiên\_cứu\_viên''.
+Without doing a pre-process step of Vietnamese word segmentation, those models directly apply Byte-Pair encoding (BPE) methods [sennrich-etal-2016-neural,kudo-richardson-2018-sentencepiece] to the syllable-level Vietnamese pre-training  data. any pre-trained BERT-based language model (https://github.com/vietnlp/etnlp). In particular,   release a set of 15K BERT-based  word embeddings specialized only for the Vietnamese NER task.}
+Intuitively,  for word-level Vietnamese NLP tasks, those models pre-trained on  syllable-level data  might not perform as good as language models pre-trained on word-level data.
+To handle the two concerns above, we train the {first} large-scale monolingual BERT-based ``base'' and ``large'' models  using a 20GB *word-level* Vietnamese corpus.
+We evaluate our models on four downstream Vietnamese NLP tasks: the common word-level ones of Part-of-speech (POS) tagging,  Dependency parsing and Named-entity recognition (NER), and a language understanding task of Natural language inference (NLI) which can be formulated as either a syllable- or word-level task. Experimental results show that  our models obtain state-of-the-art (SOTA) results on all these  tasks.
+Our contributions are summarized as follows:
+[leftmargin=*]
+- sep{-1pt}
+- We present the *first* large-scale monolingual   language models pre-trained for Vietnamese.
+- Our models help produce SOTA performances  on four downstream  tasks of POS tagging, Dependency parsing, NER and NLI, thus  showing  the effectiveness of large-scale BERT-based  monolingual language models for Vietnamese.
+- To the best of our knowledge, we also perform the *first* set of experiments to compare monolingual language models with the recent best multilingual model XLM-R in multiple (i.e. four) different language-specific tasks. The experiments show that our models outperform XLM-R   on all these  tasks, thus convincingly confirming that dedicated language-specific models still outperform multilingual ones.
+- We publicly release our models under the name PhoBERT which can be used with  `fairseq`  [ott2019fairseq] and  `transformers` [Wolf2019HuggingFacesTS]. We hope that PhoBERT can serve as a strong baseline for future Vietnamese NLP research and  applications.
+# PhoBERT
+This section outlines the architecture and describes the   pre-training data and optimization setup that we use for PhoBERT.
+**Architecture:**\  Our PhoBERT has two versions, PhoBERT and PhoBERT, using the same  architectures  of  BERT and BERT, respectively. PhoBERT pre-training approach is based on RoBERTa [RoBERTa] which optimizes the BERT pre-training procedure for more robust performance.
+**Pre-training data:**\ To handle the first concern mentioned in Section  [sec:intro], we use a 20GB pre-training dataset of uncompressed texts. This dataset is a concatenation of two corpora: (i) the first one is the Vietnamese Wikipedia corpus ($$1GB), and (ii) the second corpus ($$19GB) is generated by removing similar articles and duplication from a 50GB Vietnamese news corpus. To solve the second concern,
+we employ RDRSegmenter [nguyen-etal-2018-fast] from VnCoreNLP [vu-etal-2018-vncorenlp] to perform word and sentence segmentation on the pre-training dataset, resulting in $$145M word-segmented sentences  ($$3B word tokens). Different from RoBERTa, we then apply `fastBPE` [sennrich-etal-2016-neural] to segment these sentences with subword units, using a vocabulary of 64K subword types. On average there are 24.4 subword tokens per sentence.
+**Optimization:**\  We employ the RoBERTa implementation in  `fairseq`  [ott2019fairseq]. We set a maximum length at 256 subword tokens, thus generating 145M $$ 24.4 / 256 $$ 13.8M sentence blocks. Following , we optimize the models using Adam [KingmaB14].  We use a batch size of 1024 across 4 V100 GPUs (16GB each) and a peak learning rate of 0.0004 for PhoBERT, and a batch size of 512 and a peak learning rate of 0.0002 for PhoBERT. We run for 40 epochs (here, the learning rate is warmed up for 2 epochs), thus resulting in 13.8M $$ 40 / 1024 $$ 540K training steps for PhoBERT and 1.08M training steps for PhoBERT. We pre-train PhoBERT during  3 weeks, and then PhoBERT during  5 weeks.
+[!t]
+|
+**Task** | **\#training** | **\#valid** | **\#test** |
+|---|---|---|---|
+|
+POS tagging$^$ | 27,000 | 870 | 2,120 |
+| Dep. parsing$^$ | 8,977 | 200 | 1,020 |
+| NER$^$ | 14,861 | 2,000 | 2,831 |
+| NLI$^$ | 392,702 | 2,490 | 5,010 |
+|  |
+[!ht]
+{!}{
+|
+{c|}{**POS tagging** (word-level)} | {c}{**Dependency parsing** (word-level)} |
+|---|---|
+|
+Model | Acc. | Model | LAS / UAS |
+|
+RDRPOSTagger [nguyen-etal-2014-rdrpostagger] [$$] | 95.1 | \_ | \_ |
+| BiLSTM-CNN-CRF [ma-hovy-2016-end] [$$] | 95.4 | VnCoreNLP-DEP [vu-etal-2018-vncorenlp] [$$] | 71.38 / 77.35 |
+| VnCoreNLP-POS  [nguyen-etal-2017-word] [$$] | 95.9 | jPTDP-v2  [$$] | 73.12 / 79.63 |
+| jPTDP-v2 [nguyen-verspoor-2018-improved] [$$] | 95.7 | jointWPD [$$] | 73.90 / 80.12 |
+| jointWPD [nguyen-2019-neural] [$$] | 96.0 | Biaffine [DozatM17] [$$] | 74.99 / 81.19 |
+| XLM-R (our result) | 96.2 | Biaffine w/ XLM-R (our result) | 76.46 /  83.10 |
+| XLM-R (our result) | 96.3 | Biaffine w/ XLM-R (our result) | 75.87 / 82.70 |
+|
+PhoBERT | 96.7 | Biaffine w/ PhoBERT | **78.77** / **85.22** |
+| PhoBERT | **96.8** | Biaffine w/ PhoBERT | 77.85 / 84.32 |
+|  |
+}
+and  , respectively.}
+# Experimental setup
+We evaluate the performance of PhoBERT on four  downstream Vietnamese NLP tasks: POS tagging, Dependency parsing, NER and NLI.
+### Downstream task datasets
+Table [tab:data] presents the statistics of the experimental datasets that we employ for downstream task evaluation.
+For POS tagging, Dependency parsing  and NER, we follow the  VnCoreNLP setup   [vu-etal-2018-vncorenlp], using standard benchmarks of the VLSP 2013 POS tagging dataset, the VnDT dependency treebank v1.1 [Nguyen2014NLDB] with   POS tags predicted by VnCoreNLP and the VLSP 2016 NER dataset [JCC13161].
+For NLI, we use the manually-constructed Vietnamese validation and test sets  from the cross-lingual NLI (XNLI) corpus v1.0 [conneau-etal-2018-xnli] where the Vietnamese  training set is released  as a machine-translated version of the corresponding English training set [N18-1101].
+Unlike the  POS tagging, Dependency parsing   and NER datasets which provide the gold word segmentation, for NLI, we employ RDRSegmenter to segment the text into words before applying BPE to produce subwords from word tokens.
+### Fine-tuning
+Following , for POS tagging and NER, we append a linear prediction layer on top of the PhoBERT architecture (i.e. to the last Transformer layer of PhoBERT) w.r.t. the first  subword  of each word token.
+For dependency parsing, following , we employ a reimplementation of the state-of-the-art Biaffine dependency  parser [DozatM17] from  with default optimal hyper-parameters.
+We then extend this parser  by replacing the pre-trained word embedding of each word in an input sentence by the corresponding contextualized embedding (from the last layer) computed for the first subword token of the word.
+For POS tagging, NER and NLI, we employ `transformers` [Wolf2019HuggingFacesTS] to fine-tune PhoBERT for each task and each dataset independently. We use AdamW [loshchilov2018decoupled] with a fixed learning rate of 1.e-5 and a batch size of 32 [RoBERTa]. We fine-tune in 30 training epochs, evaluate the task performance after each epoch on the validation set  (here, early stopping is applied when there is no improvement after 5 continuous epochs), and then select the best model checkpoint to report the final result on the test set (note that each of our scores is an average over 5 runs with different random seeds).
+[!ht]
+{!}{
+|
+{c|}{**NER** (word-level)} | {c}{**NLI** (syllable- or word-level)} |
+|---|---|
+|
+Model | F | Model | Acc. |
+|
+BiLSTM-CNN-CRF [$$] | 88.3 | \_ | \_ |
+| VnCoreNLP-NER [vu-etal-2018-vncorenlp] [$$] | 88.6 | BiLSTM-max [conneau-etal-2018-xnli] | 66.4 |
+| VNER [8713740] | 89.6 | mBiLSTM [ArtetxeS19] | 72.0 |
+| BiLSTM-CNN-CRF + ETNLP [$$] | 91.1 | multilingual BERT [devlin-etal-2019-bert] [$$] | 69.5 |
+| VnCoreNLP-NER + ETNLP [$$] | 91.3 | XLM [NIPS2019_8928] | 76.6 |
+| XLM-R (our result) | 92.0 | XLM-R [conneau2019unsupervised] | {75.4} |
+| XLM-R (our result) | 92.8 | XLM-R [conneau2019unsupervised] | 79.7 |
+|
+PhoBERT | 93.6 | PhoBERT | {78.5} |
+| PhoBERT | **94.7** | PhoBERT | **80.0** |
+|  |
+}
+,   and , respectively.
+Note that there are higher Vietnamese NLI results reported  for XLM-R when fine-tuning on the concatenation of all 15  training datasets from the XNLI corpus (i.e. TRANSLATE-TRAIN-ALL: 79.5\
+# Experimental results
+### Main results
+Tables [tab:posdep] and [tab:nernli] compare  PhoBERT scores with the previous highest reported results, using the same experimental setup. It is clear that our PhoBERT helps produce new SOTA performance results  for all four downstream tasks.
+For  POS tagging, the neural model jointWPD for joint POS tagging and dependency parsing [nguyen-2019-neural] and the feature-based model VnCoreNLP-POS [nguyen-etal-2017-word] are the two previous SOTA models, obtaining accuracies at  about 96.0\
+For Dependency parsing, the previous highest parsing scores LAS and UAS are obtained by the Biaffine  parser at 75.0\
+For  NER, PhoBERT produces 1.1 points higher F than PhoBERT. In addition,  PhoBERT obtains 2+ points higher than the previous SOTA feature- and neural network-based models VnCoreNLP-NER [vu-etal-2018-vncorenlp] and BiLSTM-CNN-CRF [ma-hovy-2016-end] which are trained with the set of 15K BERT-based ETNLP word embeddings  [vu-xuan-etal-2019-etnlp].
+For  NLI,
+PhoBERT outperforms the multilingual BERT [devlin-etal-2019-bert]  and the BERT-based cross-lingual model with a new translation language modeling objective XLM [NIPS2019_8928]  by large margins.   PhoBERT also performs  better than the recent best pre-trained multilingual model XLM-R   but  using far fewer parameters than XLM-R:  135M (PhoBERT) vs.  250M (XLM-R);  370M (PhoBERT) vs.  560M (XLM-R).
+### Discussion
+We find that PhoBERT achieves 0.9\
+Using more pre-training data can significantly improve the quality of the pre-trained language models [RoBERTa]. Thus it is not surprising that PhoBERT helps produce better performance than  ETNLP on NER, and the multilingual BERT and XLM on NLI (here, PhoBERT uses 20GB of Vietnamese texts while those models employ the 1GB Vietnamese Wikipedia corpus).
+Following the fine-tuning approach that we use for PhoBERT, we carefully fine-tune XLM-R for the remaining Vietnamese POS tagging, Dependency parsing and NER tasks (here, it is  applied to the first sub-syllable token of the first syllable of each word). and the batch size from \{16, 32\}.}
+Tables [tab:posdep] and [tab:nernli] show  that our PhoBERT also does better than   XLM-R on these three word-level tasks.
+It is worth noting that XLM-R uses a 2.5TB pre-training corpus which contains 137GB of  Vietnamese texts (i.e. about 137\ /\ 20 $$ 7 times bigger than our  pre-training corpus).
+Recall that PhoBERT performs Vietnamese word segmentation to segment  syllable-level  sentences  into word tokens before applying BPE to segment the word-segmented sentences into subword units, while XLM-R directly applies BPE to the syllable-level Vietnamese pre-training  sentences.
+This  reconfirms that the dedicated language-specific models still outperform the  multilingual ones [2019arXiv191103894M]. only  compare their model CamemBERT with XLM-R on the French NLI task.}
+# Conclusion
+In this paper, we have presented the first large-scale monolingual  PhoBERT language models pre-trained for Vietnamese. We demonstrate the usefulness of PhoBERT by showing that  PhoBERT  performs better than the recent best multilingual model XLM-R and helps produce the SOTA performances for four downstream Vietnamese NLP tasks of POS tagging, Dependency parsing, NER and NLI.
+By publicly releasing PhoBERT models,
+we hope that they can foster future research and applications in Vietnamese NLP.
+{
+}

references/2020.emnlp.nguyen/paper.tex ADDED Viewed

	@@ -0,0 +1,301 @@

+\documentclass[11pt,a4paper]{article}
+\usepackage[hyperref]{emnlp2020}
+\pdfoutput=1
+\usepackage{times}
+\usepackage{latexsym}
+%\renewcommand{\UrlFont}{\ttfamily\small}
+\usepackage{times}
+\usepackage{latexsym}
+\usepackage{amsmath}
+\usepackage{url}
+\usepackage{amssymb}
+\usepackage{amsfonts}
+\usepackage{graphicx}
+\usepackage{tabularx}
+\usepackage{multirow}
+\usepackage{arydshln}
+\usepackage{mathtools,nccmath}
+\usepackage[utf8]{inputenc}
+\usepackage[utf8]{vietnam}
+\usepackage{enumitem}
+% This is not strictly necessary, and may be commented out,
+% but it will improve the layout of the manuscript,
+% and will typically save some space.
+%\usepackage{microtype}
+\setlength{\textfloatsep}{15pt plus 5.0pt minus 5.0pt}
+\setlength{\floatsep}{15pt plus 5.0pt minus 5.0pt}
+%\setlength{\dbltextfloatsep }{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\dblfloatsep}{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\intextsep}{15pt plus 2.0pt minus 3.0pt}
+\setlength{\abovecaptionskip}{3pt plus 1pt minus 1pt}
+\aclfinalcopy % Uncomment this line for the final submission
+%\def\aclpaperid{***} %  Enter the acl Paper ID here
+\setlength\titlebox{5cm}
+% You can expand the titlebox if you need extra space
+% to show all the authors. Please do not make the titlebox
+% smaller than 5cm (the original size); we will check this
+% in the camera-ready version and ask you to change it back.
+\newcommand\BibTeX{B\textsc{ib}\TeX}
+\title{PhoBERT: Pre-trained language models for Vietnamese}
+\author{Dat Quoc Nguyen$^1$  \and Anh Tuan Nguyen$^{2,}$\thanks{\ \ Work done during internship at  VinAI Research.}  \\
+  $^1$VinAI Research, Vietnam; $^2$NVIDIA, USA\\
+   \tt{\normalsize v.datnq9@vinai.io, tuananhn@nvidia.com}}
+\date{}
+\begin{document}
+\maketitle
+\begin{abstract}
+We present \textbf{PhoBERT} with two versions---PhoBERT\textsubscript{base} and PhoBERT\textsubscript{large}---the \emph{first} public large-scale monolingual language models pre-trained for Vietnamese. Experimental results show that PhoBERT consistently outperforms the recent best pre-trained multilingual  model XLM-R \citep{conneau2019unsupervised} and improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part-of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference. We release PhoBERT  to facilitate future research and downstream applications for Vietnamese NLP. Our PhoBERT models are available at: \url{https://github.com/VinAIResearch/PhoBERT}.
+\end{abstract}
+\section{Introduction}\label{sec:intro}
+Pre-trained language models, especially BERT \citep{devlin-etal-2019-bert}---the Bidirectional Encoder Representations from Transformers \citep{NIPS2017_7181}, have recently become  extremely popular and helped to produce significant improvement gains for various NLP tasks. The success of pre-trained BERT and its variants has largely been limited to the English language. For other languages, one could retrain a language-specific model using the BERT architecture \citep{abs-1906-08101,vries2019bertje,vu-xuan-etal-2019-etnlp,2019arXiv191103894M} or  employ existing pre-trained multilingual BERT-based models  \citep{devlin-etal-2019-bert,NIPS2019_8928,conneau2019unsupervised}.
+In terms of Vietnamese language modeling, to the best of our knowledge, there are two main concerns as follows:
+\begin{itemize}[leftmargin=*]
+\setlength\itemsep{-1pt}
+    \item The Vietnamese Wikipedia corpus is the only data used to train  monolingual language models \citep{vu-xuan-etal-2019-etnlp}, and it also is the only Vietnamese dataset which is included in the pre-training data used by all multilingual language models except XLM-R. It is worth noting that Wikipedia data is not representative of a general language use, and the Vietnamese Wikipedia data is relatively small (1GB in size uncompressed), while pre-trained language models can be significantly improved by using more pre-training data \cite{RoBERTa}.
+    \item All publicly released monolingual and multilingual BERT-based language models are not aware of the difference between Vietnamese syllables and word tokens. This ambiguity comes from the fact that the white space is also used to separate syllables that constitute words when written in Vietnamese.\footnote{\newcite{DinhQuangThang2008} show that 85\% of  Vietnamese word types are composed of at least two syllables.}
+    For example, a 6-syllable written text ``Tôi là một nghiên cứu viên'' (I am a researcher) forms 4 words ``Tôi\textsubscript{I} là\textsubscript{am} một\textsubscript{a} nghiên\_cứu\_viên\textsubscript{researcher}''. \\
+Without doing a pre-process step of Vietnamese word segmentation, those models directly apply Byte-Pair encoding (BPE) methods \citep{sennrich-etal-2016-neural,kudo-richardson-2018-sentencepiece} to the syllable-level Vietnamese pre-training  data.\footnote{Although performing word segmentation before applying BPE on the Vietnamese Wikipedia corpus, ETNLP \citep{vu-xuan-etal-2019-etnlp} in fact {does not publicly release} any pre-trained BERT-based language model (\url{https://github.com/vietnlp/etnlp}). In particular, \newcite{vu-xuan-etal-2019-etnlp}  release a set of 15K BERT-based  word embeddings specialized only for the Vietnamese NER task.}
+Intuitively,  for word-level Vietnamese NLP tasks, those models pre-trained on  syllable-level data  might not perform as good as language models pre-trained on word-level data.
+\end{itemize}
+To handle the two concerns above, we train the {first} large-scale monolingual BERT-based ``base'' and ``large'' models  using a 20GB \textit{word-level} Vietnamese corpus.
+We evaluate our models on four downstream Vietnamese NLP tasks: the common word-level ones of Part-of-speech (POS) tagging,  Dependency parsing and Named-entity recognition (NER), and a language understanding task of Natural language inference (NLI) which can be formulated as either a syllable- or word-level task. Experimental results show that  our models obtain state-of-the-art (SOTA) results on all these  tasks.
+Our contributions are summarized as follows:
+\begin{itemize}[leftmargin=*]
+\setlength\itemsep{-1pt}
+    \item We present the \textit{first} large-scale monolingual   language models pre-trained for Vietnamese.
+    \item Our models help produce SOTA performances  on four downstream  tasks of POS tagging, Dependency parsing, NER and NLI, thus  showing  the effectiveness of large-scale BERT-based  monolingual language models for Vietnamese.
+    \item To the best of our knowledge, we also perform the \textit{first} set of experiments to compare monolingual language models with the recent best multilingual model XLM-R in multiple (i.e. four) different language-specific tasks. The experiments show that our models outperform XLM-R   on all these  tasks, thus convincingly confirming that dedicated language-specific models still outperform multilingual ones.
+    \item We publicly release our models under the name PhoBERT which can be used with  \texttt{fairseq}  \citep{ott2019fairseq} and  \texttt{transformers} \cite{Wolf2019HuggingFacesTS}. We hope that PhoBERT can serve as a strong baseline for future Vietnamese NLP research and  applications.
+\end{itemize}
+\section{PhoBERT}
+This section outlines the architecture and describes the   pre-training data and optimization setup that we use for PhoBERT.
+\vspace{3pt}
+\noindent\textbf{Architecture:}\  Our PhoBERT has two versions, PhoBERT\textsubscript{base} and PhoBERT\textsubscript{large}, using the same  architectures  of  BERT\textsubscript{base} and BERT\textsubscript{large}, respectively. PhoBERT pre-training approach is based on RoBERTa \citep{RoBERTa} which optimizes the BERT pre-training procedure for more robust performance.
+\vspace{3pt}
+\noindent\textbf{Pre-training data:}\ To handle the first concern mentioned in Section  \ref{sec:intro}, we use a 20GB pre-training dataset of uncompressed texts. This dataset is a concatenation of two corpora: (i) the first one is the Vietnamese Wikipedia corpus ($\sim$1GB), and (ii) the second corpus ($\sim$19GB) is generated by removing similar articles and duplication from a 50GB Vietnamese news corpus.\footnote{\url{https://github.com/binhvq/news-corpus}, crawled from a wide range of news websites  and  topics.} To solve the second concern,
+we employ RDRSegmenter \citep{nguyen-etal-2018-fast} from VnCoreNLP \citep{vu-etal-2018-vncorenlp} to perform word and sentence segmentation on the pre-training dataset, resulting in $\sim$145M word-segmented sentences  ($\sim$3B word tokens). Different from RoBERTa, we then apply \texttt{fastBPE} \citep{sennrich-etal-2016-neural} to segment these sentences with subword units, using a vocabulary of 64K subword types. On average there are 24.4 subword tokens per sentence.
+\vspace{3pt}
+\noindent\textbf{Optimization:}\  We employ the RoBERTa implementation in  \texttt{fairseq}  \citep{ott2019fairseq}. We set a maximum length at 256 subword tokens, thus generating 145M $\times$ 24.4 / 256 $\approx$ 13.8M sentence blocks. Following \newcite{RoBERTa}, we optimize the models using Adam \citep{KingmaB14}.  We use a batch size of 1024 across 4 V100 GPUs (16GB each) and a peak learning rate of 0.0004 for PhoBERT\textsubscript{base}, and a batch size of 512 and a peak learning rate of 0.0002 for PhoBERT\textsubscript{large}. We run for 40 epochs (here, the learning rate is warmed up for 2 epochs), thus resulting in 13.8M $\times$ 40 / 1024 $\approx$ 540K training steps for PhoBERT\textsubscript{base} and 1.08M training steps for PhoBERT\textsubscript{large}. We pre-train PhoBERT\textsubscript{base} during  3 weeks, and then PhoBERT\textsubscript{large} during  5 weeks.
+\begin{table}[!t]
+    \centering
+    \begin{tabular}{l|l|l|l}
+    \hline
+    \textbf{Task}  & \textbf{\#training} & \textbf{\#valid} & \textbf{\#test} \\
+    \hline
+    POS tagging$^\dagger$ & 27,000 & 870 & 2,120 \\
+    Dep. parsing$^\dagger$ & 8,977 & 200 & 1,020 \\
+    NER$^\dagger$ & 14,861 & 2,000 & 2,831\\
+    NLI$^\ddagger$ & 392,702 & 2,490 & 5,010\\
+    \hline
+    \end{tabular}
+    \caption{Statistics of the downstream task datasets. ``\#training'', ``\#valid''  and  ``\#test'' denote the size of the training, validation and test sets, respectively. $\dagger$ and $\ddagger$ refer to the dataset size as   the numbers of sentences and sentence pairs, respectively.}
+    \label{tab:data}
+\end{table}
+  \begin{table*}[!ht]
+     \centering
+      \resizebox{15.5cm}{!}{
+     %\setlength{\tabcolsep}{0.3em}
+     \begin{tabular}{l|l|l|l}
+    \hline
+          \multicolumn{2}{c|}{\textbf{POS tagging} (word-level)} & \multicolumn{2}{c}{\textbf{Dependency parsing} (word-level)}\\
+    \hline
+    Model & Acc. & Model & LAS / UAS \\
+    \hline
+    RDRPOSTagger \citep{nguyen-etal-2014-rdrpostagger} [$\clubsuit$] &  95.1 & \_ & \_  \\
+    BiLSTM-CNN-CRF \citep{ma-hovy-2016-end} [$\clubsuit$] & 95.4 & VnCoreNLP-DEP \citep{vu-etal-2018-vncorenlp} [$\bigstar$]  & 71.38 / 77.35 \\
+    VnCoreNLP-POS  \citep{nguyen-etal-2017-word} [$\clubsuit$] & 95.9 &jPTDP-v2  [$\bigstar$] & 73.12 / 79.63 \\
+   jPTDP-v2 \citep{nguyen-verspoor-2018-improved} [$\bigstar$] & 95.7  &jointWPD [$\bigstar$]  & 73.90 / 80.12  \\
+    jointWPD \citep{nguyen-2019-neural} [$\bigstar$]  & 96.0 & Biaffine \citep{DozatM17} [$\bigstar$]  & 74.99 / 81.19   \\
+    XLM-R\textsubscript{base} (our result) & 96.2  & Biaffine w/ XLM-R\textsubscript{base} (our result) &  76.46 /  83.10  \\
+    XLM-R\textsubscript{large} (our result) & 96.3 & Biaffine w/ XLM-R\textsubscript{large} (our result) & 75.87 / 82.70   \\
+    \hline
+    PhoBERT\textsubscript{base} & \underline{96.7} & Biaffine w/ PhoBERT\textsubscript{base} & \textbf{78.77} / \textbf{85.22}  \\
+    PhoBERT\textsubscript{large} & \textbf{96.8} & Biaffine w/ PhoBERT\textsubscript{large} & \underline{77.85} / \underline{84.32}   \\
+    \hline
+     \end{tabular}
+     }
+     \caption{Performance scores (in \%) on the POS tagging and Dependency parsing test sets. ``Acc.'', ``LAS'' and ``UAS'' abbreviate the Accuracy, the Labeled Attachment Score and the Unlabeled Attachment Score, respectively (here, all these evaluation metrics are computed on all word tokens, including punctuation).
+     [$\clubsuit$] and [$\bigstar$] denote
+    results  reported by  \newcite{nguyen-etal-2017-word} and  \newcite{nguyen-2019-neural}, respectively.}
+     \label{tab:posdep}
+ \end{table*}
+\section{Experimental setup}
+ We evaluate the performance of PhoBERT on four  downstream Vietnamese NLP tasks: POS tagging, Dependency parsing, NER and NLI.
+\subsubsection*{Downstream task datasets}
+Table \ref{tab:data} presents the statistics of the experimental datasets that we employ for downstream task evaluation.
+For POS tagging, Dependency parsing  and NER, we follow the  VnCoreNLP setup   \citep{vu-etal-2018-vncorenlp}, using standard benchmarks of the VLSP 2013 POS tagging dataset,\footnote{\url{https://vlsp.org.vn/vlsp2013/eval}} the VnDT dependency treebank v1.1 \cite{Nguyen2014NLDB} with   POS tags predicted by VnCoreNLP and the VLSP 2016 NER dataset \citep{JCC13161}.
+For NLI, we use the manually-constructed Vietnamese validation and test sets  from the cross-lingual NLI (XNLI) corpus v1.0 \citep{conneau-etal-2018-xnli} where the Vietnamese  training set is released  as a machine-translated version of the corresponding English training set \citep{N18-1101}.
+Unlike the  POS tagging, Dependency parsing   and NER datasets which provide the gold word segmentation, for NLI, we employ RDRSegmenter to segment the text into words before applying BPE to produce subwords from word tokens.
+\subsubsection*{Fine-tuning}
+Following \newcite{devlin-etal-2019-bert}, for POS tagging and NER, we append a linear prediction layer on top of the PhoBERT architecture (i.e. to the last Transformer layer of PhoBERT) w.r.t. the first  subword  of each word token.\footnote{In our preliminary experiments, using the average of contextualized embeddings of subword tokens of each word to represent the word produces slightly lower performance than using the contextualized embedding of the first subword.}
+For dependency parsing, following \newcite{nguyen-2019-neural}, we employ a reimplementation of the state-of-the-art Biaffine dependency  parser \citep{DozatM17} from \newcite{ma-etal-2018-stack} with default optimal hyper-parameters. %\footnote{\url{https://github.com/XuezheMax/NeuroNLP2}}
+We then extend this parser  by replacing the pre-trained word embedding of each word in an input sentence by the corresponding contextualized embedding (from the last layer) computed for the first subword token of the word.
+For POS tagging, NER and NLI, we employ \texttt{transformers} \cite{Wolf2019HuggingFacesTS} to fine-tune PhoBERT for each task and each dataset independently. We use AdamW \citep{loshchilov2018decoupled} with a fixed learning rate of 1.e-5 and a batch size of 32 \citep{RoBERTa}. We fine-tune in 30 training epochs, evaluate the task performance after each epoch on the validation set  (here, early stopping is applied when there is no improvement after 5 continuous epochs), and then select the best model checkpoint to report the final result on the test set (note that each of our scores is an average over 5 runs with different random seeds). %Section \ref{sec:results} shows that using this relatively straightforward fine-tuning manner can lead to SOTA  results. %Note that we might boost our downstream task performances even further by  doing a more careful hyper-parameter tuning.
+  \begin{table*}[!ht]
+     \centering
+     \resizebox{15.5cm}{!}{
+     %\setlength{\tabcolsep}{0.3em}
+     \begin{tabular}{l|l|l|l}
+    \hline
+          \multicolumn{2}{c|}{\textbf{NER} (word-level)} & \multicolumn{2}{c}{\textbf{NLI} (syllable- or word-level)} \\
+    \hline
+    Model & F\textsubscript{1} &  Model & Acc. \\
+    \hline
+    BiLSTM-CNN-CRF [$\blacklozenge$]  & 88.3 & \_ & \_\\
+    VnCoreNLP-NER \citep{vu-etal-2018-vncorenlp} [$\blacklozenge$] & 88.6  & BiLSTM-max \citep{conneau-etal-2018-xnli} & 66.4  \\
+    VNER \citep{8713740} & 89.6 &  mBiLSTM \citep{ArtetxeS19} & 72.0  \\
+    BiLSTM-CNN-CRF + ETNLP [$\spadesuit$] & 91.1  & multilingual BERT \citep{devlin-etal-2019-bert} [$\blacksquare$]  & 69.5  \\
+    VnCoreNLP-NER + ETNLP [$\spadesuit$] & 91.3   &  XLM\textsubscript{MLM+TLM} \citep{NIPS2019_8928} & 76.6  \\
+    XLM-R\textsubscript{base} (our result) & 92.0  & XLM-R\textsubscript{base} \citep{conneau2019unsupervised} & {75.4} \\
+    XLM-R\textsubscript{large} (our result) & 92.8   &  XLM-R\textsubscript{large} \citep{conneau2019unsupervised} & \underline{79.7} \\
+    \hline
+    PhoBERT\textsubscript{base}& \underline{93.6} & PhoBERT\textsubscript{base}& {78.5} \\
+    PhoBERT\textsubscript{large}& \textbf{94.7}   & PhoBERT\textsubscript{large}& \textbf{80.0} \\
+    \hline
+     \end{tabular}
+    }
+     \caption{Performance scores (in \%) on  the NER and NLI test sets.
+      [$\blacklozenge$], [$\spadesuit$] and [$\blacksquare$] denote
+    results  reported by  \newcite{vu-etal-2018-vncorenlp},  \newcite{vu-xuan-etal-2019-etnlp} and \newcite{wu-dredze-2019-beto}, respectively.
+    %``mBiLSTM'' denotes a BiLSTM-based multilingual embedding model.
+    Note that there are higher Vietnamese NLI results reported  for XLM-R when fine-tuning on the concatenation of all 15  training datasets from the XNLI corpus (i.e. TRANSLATE-TRAIN-ALL: 79.5\% for XLM-R\textsubscript{base} and 83.4\% XLM-R\textsubscript{large}). However, those results might not be comparable  as we only use the  monolingual Vietnamese  training data for fine-tuning. }
+     \label{tab:nernli}
+ \end{table*}
+\section{Experimental results}\label{sec:results}
+\subsubsection*{Main results}
+Tables \ref{tab:posdep} and \ref{tab:nernli} compare  PhoBERT scores with the previous highest reported results, using the same experimental setup. It is clear that our PhoBERT helps produce new SOTA performance results  for all four downstream tasks.
+For  \underline{POS tagging}, the neural model jointWPD for joint POS tagging and dependency parsing \citep{nguyen-2019-neural} and the feature-based model VnCoreNLP-POS \citep{nguyen-etal-2017-word} are the two previous SOTA models, obtaining accuracies at  about 96.0\%.  PhoBERT obtains 0.8\% absolute higher accuracy than these two models.
+For \underline{Dependency parsing}, the previous highest parsing scores LAS and UAS are obtained by the Biaffine  parser at 75.0\% and 81.2\%, respectively.  PhoBERT helps boost the Biaffine parser with about 4\% absolute improvement, achieving a LAS at 78.8\% and a UAS at 85.2\%.
+For  \underline{NER}, PhoBERT\textsubscript{large} produces 1.1 points higher F\textsubscript{1} than PhoBERT\textsubscript{base}. In addition,  PhoBERT\textsubscript{base} obtains 2+ points higher than the previous SOTA feature- and neural network-based models VnCoreNLP-NER \citep{vu-etal-2018-vncorenlp} and BiLSTM-CNN-CRF \citep{ma-hovy-2016-end} which are trained with the set of 15K BERT-based ETNLP word embeddings  \citep{vu-xuan-etal-2019-etnlp}.
+ For  \underline{NLI},
+PhoBERT outperforms the multilingual BERT \citep{devlin-etal-2019-bert}  and the BERT-based cross-lingual model with a new translation language modeling objective XLM\textsubscript{MLM+TLM} \citep{NIPS2019_8928}  by large margins.   PhoBERT also performs  better than the recent best pre-trained multilingual model XLM-R   but  using far fewer parameters than XLM-R:  135M (PhoBERT\textsubscript{base}) vs.  250M (XLM-R\textsubscript{base});  370M (PhoBERT\textsubscript{large}) vs.  560M (XLM-R\textsubscript{large}).
+\subsubsection*{Discussion}
+We find that PhoBERT\textsubscript{large} achieves 0.9\% lower dependency parsing scores than  PhoBERT\textsubscript{base}. One possible reason is that the last Transformer layer in the BERT architecture might not be the optimal one which encodes the richest information of syntactic structures \cite{hewitt-manning-2019-structural,jawahar-etal-2019-bert}.  Future work will study which PhoBERT's Transformer layer contains richer syntactic information by evaluating the Vietnamese parsing performance from each layer.
+Using more pre-training data can significantly improve the quality of the pre-trained language models \cite{RoBERTa}. Thus it is not surprising that PhoBERT helps produce better performance than  ETNLP on NER, and the multilingual BERT and XLM\textsubscript{MLM+TLM} on NLI (here, PhoBERT uses 20GB of Vietnamese texts while those models employ the 1GB Vietnamese Wikipedia corpus).
+Following the fine-tuning approach that we use for PhoBERT, we carefully fine-tune XLM-R for the remaining Vietnamese POS tagging, Dependency parsing and NER tasks (here, it is  applied to the first sub-syllable token of the first syllable of each word).\footnote{For fine-tuning XLM-R, we use a grid search on the validation set to select the AdamW learning rate from \{5e-6, 1e-5, 2e-5, 4e-5\} and the batch size from \{16, 32\}.}
+Tables \ref{tab:posdep} and \ref{tab:nernli} show  that our PhoBERT also does better than   XLM-R on these three word-level tasks.
+It is worth noting that XLM-R uses a 2.5TB pre-training corpus which contains 137GB of  Vietnamese texts (i.e. about 137\ /\ 20 $\approx$ 7 times bigger than our  pre-training corpus).
+Recall that PhoBERT performs Vietnamese word segmentation to segment  syllable-level  sentences  into word tokens before applying BPE to segment the word-segmented sentences into subword units, while XLM-R directly applies BPE to the syllable-level Vietnamese pre-training  sentences.
+ This  reconfirms that the dedicated language-specific models still outperform the  multilingual ones \citep{2019arXiv191103894M}.\footnote{Note that \newcite{2019arXiv191103894M} only  compare their model CamemBERT with XLM-R on the French NLI task.}
+ \section{Conclusion}
+In this paper, we have presented the first large-scale monolingual  PhoBERT language models pre-trained for Vietnamese. We demonstrate the usefulness of PhoBERT by showing that  PhoBERT  performs better than the recent best multilingual model XLM-R and helps produce the SOTA performances for four downstream Vietnamese NLP tasks of POS tagging, Dependency parsing, NER and NLI.
+By publicly releasing PhoBERT models, %\footnote{\url{https://github.com/VinAIResearch/PhoBERT}}
+we hope that they can foster future research and applications in Vietnamese NLP. %Our PhoBERT and its usage are available at: \url{https://github.com/VinAIResearch/PhoBERT}.
+{%\footnotesize
+\bibliographystyle{acl_natbib}
+\bibliography{REFs}
+}
+\end{document}

references/2020.emnlp.nguyen/source/acl_natbib.bst ADDED Viewed

	@@ -0,0 +1,1975 @@

+%%% acl_natbib.bst
+%%% Modification of BibTeX style file acl_natbib_nourl.bst
+%%% ... by urlbst, version 0.7 (marked with "% urlbst")
+%%% See <http://purl.org/nxg/dist/urlbst>
+%%% Added webpage entry type, and url and lastchecked fields.
+%%% Added eprint support.
+%%% Added DOI support.
+%%% Added PUBMED support.
+%%% Added hyperref support.
+%%% Original headers follow...
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+% BibTeX style file acl_natbib_nourl.bst
+%
+% intended as input to urlbst script
+% $ ./urlbst --hyperref --inlinelinks acl_natbib_nourl.bst > acl_natbib.bst
+%
+% adapted from compling.bst
+% in order to mimic the style files for ACL conferences prior to 2017
+% by making the following three changes:
+% - for @incollection, page numbers now follow volume title.
+% - for @inproceedings, address now follows conference name.
+%	(address is intended as location of conference,
+%	 not address of publisher.)
+% - for papers with three authors, use et al. in citation
+% Dan Gildea 2017/06/08
+% - fixed a bug with format.chapter - error given if chapter is empty
+%   with inbook.
+% Shay Cohen 2018/02/16
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+% BibTeX style file compling.bst
+%
+% Intended for the journal Computational Linguistics (ACL/MIT Press)
+% Created by Ron Artstein on 2005/08/22
+% For use with <natbib.sty> for author-year citations.
+%
+% I created this file in order to allow submissions to the journal
+% Computational Linguistics using the <natbib> package for author-year
+% citations, which offers a lot more flexibility than <fullname>, CL's
+% official citation package. This file adheres strictly to the official
+% style guide available from the MIT Press:
+%
+% http://mitpress.mit.edu/journals/coli/compling_style.pdf
+%
+% This includes all the various quirks of the style guide, for example:
+% - a chapter from a monograph (@inbook) has no page numbers.
+% - an article from an edited volume (@incollection) has page numbers
+%   after the publisher and address.
+% - an article from a proceedings volume (@inproceedings) has page
+%   numbers before the publisher and address.
+%
+% Where the style guide was inconsistent or not specific enough I
+% looked at actual published articles and exercised my own judgment.
+% I noticed two inconsistencies in the style guide:
+%
+% - The style guide gives one example of an article from an edited
+%   volume with the editor's name spelled out in full, and another
+%   with the editors' names abbreviated. I chose to accept the first
+%   one as correct, since the style guide generally shuns abbreviations,
+%   and editors' names are also spelled out in some recently published
+%   articles.
+%
+% - The style guide gives one example of a reference where the word
+%   "and" between two authors is preceded by a comma. This is most
+%   likely a typo, since in all other cases with just two authors or
+%   editors there is no comma before the word "and".
+%
+% One case where the style guide is not being specific is the placement
+% of the edition number, for which no example is given. I chose to put
+% it immediately after the title, which I (subjectively) find natural,
+% and is also the place of the edition in a few recently published
+% articles.
+%
+% This file correctly reproduces all of the examples in the official
+% style guide, except for the two inconsistencies noted above. I even
+% managed to get it to correctly format the proceedings example which
+% has an organization, a publisher, and two addresses (the conference
+% location and the publisher's address), though I cheated a bit by
+% putting the conference location and month as part of the title field;
+% I feel that in this case the conference location and month can be
+% considered as part of the title, and that adding a location field
+% is not justified. Note also that a location field is not standard,
+% so entries made with this field would not port nicely to other styles.
+% However, if authors feel that there's a need for a location field
+% then tell me and I'll see what I can do.
+%
+% The file also produces to my satisfaction all the bibliographical
+% entries in my recent (joint) submission to CL (this was the original
+% motivation for creating the file). I also tested it by running it
+% on a larger set of entries and eyeballing the results. There may of
+% course still be errors, especially with combinations of fields that
+% are not that common, or with cross-references (which I seldom use).
+% If you find such errors please write to me.
+%
+% I hope people find this file useful. Please email me with comments
+% and suggestions.
+%
+% Ron Artstein
+% artstein [at] essex.ac.uk
+% August 22, 2005.
+%
+% Some technical notes.
+%
+% This file is based on a file generated with the package <custom-bib>
+% by Patrick W. Daly (see selected options below), which was then
+% manually customized to conform with certain CL requirements which
+% cannot be met by <custom-bib>. Departures from the generated file
+% include:
+%
+% Function inbook: moved publisher and address to the end; moved
+% edition after title; replaced function format.chapter.pages by
+% new function format.chapter to output chapter without pages.
+%
+% Function inproceedings: moved publisher and address to the end;
+% replaced function format.in.ed.booktitle by new function
+% format.in.booktitle to output the proceedings title without
+% the editor.
+%
+% Functions book, incollection, manual: moved edition after title.
+%
+% Function mastersthesis: formatted title as for articles (unlike
+% phdthesis which is formatted as book) and added month.
+%
+% Function proceedings: added new.sentence between organization and
+% publisher when both are present.
+%
+% Function format.lab.names: modified so that it gives all the
+% authors' surnames for in-text citations for one, two and three
+% authors and only uses "et. al" for works with four authors or more
+% (thanks to Ken Shan for convincing me to go through the trouble of
+% modifying this function rather than using unreliable hacks).
+%
+% Changes:
+%
+% 2006-10-27: Changed function reverse.pass so that the extra label is
+% enclosed in parentheses when the year field ends in an uppercase or
+% lowercase letter (change modeled after Uli Sauerland's modification
+% of nals.bst). RA.
+%
+%
+% The preamble of the generated file begins below:
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%
+%% This is file `compling.bst',
+%% generated with the docstrip utility.
+%%
+%% The original source files were:
+%%
+%% merlin.mbs  (with options: `ay,nat,vonx,nm-revv1,jnrlst,keyxyr,blkyear,dt-beg,yr-per,note-yr,num-xser,pre-pub,xedn,nfss')
+%% ----------------------------------------
+%% *** Intended for the journal Computational Linguistics ***
+%%
+%% Copyright 1994-2002 Patrick W Daly
+ % ===============================================================
+ % IMPORTANT NOTICE:
+ % This bibliographic style (bst) file has been generated from one or
+ % more master bibliographic style (mbs) files, listed above.
+ %
+ % This generated file can be redistributed and/or modified under the terms
+ % of the LaTeX Project Public License Distributed from CTAN
+ % archives in directory macros/latex/base/lppl.txt; either
+ % version 1 of the License, or any later version.
+ % ===============================================================
+ % Name and version information of the main mbs file:
+ % \ProvidesFile{merlin.mbs}[2002/10/21 4.05 (PWD, AO, DPC)]
+ %   For use with BibTeX version 0.99a or later
+ %-------------------------------------------------------------------
+ % This bibliography style file is intended for texts in ENGLISH
+ % This is an author-year citation style bibliography. As such, it is
+ % non-standard LaTeX, and requires a special package file to function properly.
+ % Such a package is    natbib.sty   by Patrick W. Daly
+ % The form of the \bibitem entries is
+ %   \bibitem[Jones et al.(1990)]{key}...
+ %   \bibitem[Jones et al.(1990)Jones, Baker, and Smith]{key}...
+ % The essential feature is that the label (the part in brackets) consists
+ % of the author names, as they should appear in the citation, with the year
+ % in parentheses following. There must be no space before the opening
+ % parenthesis!
+ % With natbib v5.3, a full list of authors may also follow the year.
+ % In natbib.sty, it is possible to define the type of enclosures that is
+ % really wanted (brackets or parentheses), but in either case, there must
+ % be parentheses in the label.
+ % The \cite command functions as follows:
+ %   \citet{key} ==>>                Jones et al. (1990)
+ %   \citet*{key} ==>>               Jones, Baker, and Smith (1990)
+ %   \citep{key} ==>>                (Jones et al., 1990)
+ %   \citep*{key} ==>>               (Jones, Baker, and Smith, 1990)
+ %   \citep[chap. 2]{key} ==>>       (Jones et al., 1990, chap. 2)
+ %   \citep[e.g.][]{key} ==>>        (e.g. Jones et al., 1990)
+ %   \citep[e.g.][p. 32]{key} ==>>   (e.g. Jones et al., p. 32)
+ %   \citeauthor{key} ==>>           Jones et al.
+ %   \citeauthor*{key} ==>>          Jones, Baker, and Smith
+ %   \citeyear{key} ==>>             1990
+ %---------------------------------------------------------------------
+ENTRY
+  { address
+    author
+    booktitle
+    chapter
+    edition
+    editor
+    howpublished
+    institution
+    journal
+    key
+    month
+    note
+    number
+    organization
+    pages
+    publisher
+    school
+    series
+    title
+    type
+    volume
+    year
+    eprint % urlbst
+    doi % urlbst
+    pubmed % urlbst
+    url % urlbst
+    lastchecked % urlbst
+  }
+  {}
+  { label extra.label sort.label short.list }
+INTEGERS { output.state before.all mid.sentence after.sentence after.block }
+% urlbst...
+% urlbst constants and state variables
+STRINGS { urlintro
+  eprinturl eprintprefix doiprefix doiurl pubmedprefix pubmedurl
+  citedstring onlinestring linktextstring
+  openinlinelink closeinlinelink }
+INTEGERS { hrefform inlinelinks makeinlinelink
+  addeprints adddoiresolver addpubmedresolver }
+FUNCTION {init.urlbst.variables}
+{
+  % The following constants may be adjusted by hand, if desired
+  % The first set allow you to enable or disable certain functionality.
+  #1 'addeprints :=         % 0=no eprints; 1=include eprints
+  #1 'adddoiresolver :=     % 0=no DOI resolver; 1=include it
+  #1 'addpubmedresolver :=     % 0=no PUBMED resolver; 1=include it
+  #2 'hrefform :=           % 0=no crossrefs; 1=hypertex xrefs; 2=hyperref refs
+  #1 'inlinelinks :=        % 0=URLs explicit; 1=URLs attached to titles
+  % String constants, which you _might_ want to tweak.
+  "URL: " 'urlintro := % prefix before URL; typically "Available from:" or "URL":
+  "online" 'onlinestring := % indication that resource is online; typically "online"
+  "cited " 'citedstring := % indicator of citation date; typically "cited "
+  "[link]" 'linktextstring := % dummy link text; typically "[link]"
+  "http://arxiv.org/abs/" 'eprinturl := % prefix to make URL from eprint ref
+  "arXiv:" 'eprintprefix := % text prefix printed before eprint ref; typically "arXiv:"
+  "https://doi.org/" 'doiurl := % prefix to make URL from DOI
+  "doi:" 'doiprefix :=      % text prefix printed before DOI ref; typically "doi:"
+  "http://www.ncbi.nlm.nih.gov/pubmed/" 'pubmedurl := % prefix to make URL from PUBMED
+  "PMID:" 'pubmedprefix :=      % text prefix printed before PUBMED ref; typically "PMID:"
+  % The following are internal state variables, not configuration constants,
+  % so they shouldn't be fiddled with.
+  #0 'makeinlinelink :=     % state variable managed by possibly.setup.inlinelink
+  "" 'openinlinelink :=     % ditto
+  "" 'closeinlinelink :=    % ditto
+}
+INTEGERS {
+  bracket.state
+  outside.brackets
+  open.brackets
+  within.brackets
+  close.brackets
+}
+% ...urlbst to here
+FUNCTION {init.state.consts}
+{ #0 'outside.brackets := % urlbst...
+  #1 'open.brackets :=
+  #2 'within.brackets :=
+  #3 'close.brackets := % ...urlbst to here
+  #0 'before.all :=
+  #1 'mid.sentence :=
+  #2 'after.sentence :=
+  #3 'after.block :=
+}
+STRINGS { s t}
+% urlbst
+FUNCTION {output.nonnull.original}
+{ 's :=
+  output.state mid.sentence =
+    { ", " * write$ }
+    { output.state after.block =
+        { add.period$ write$
+          newline$
+          "\newblock " write$
+        }
+        { output.state before.all =
+            'write$
+            { add.period$ " " * write$ }
+          if$
+        }
+      if$
+      mid.sentence 'output.state :=
+    }
+  if$
+  s
+}
+% urlbst...
+% The following three functions are for handling inlinelink.  They wrap
+% a block of text which is potentially output with write$ by multiple
+% other functions, so we don't know the content a priori.
+% They communicate between each other using the variables makeinlinelink
+% (which is true if a link should be made), and closeinlinelink (which holds
+% the string which should close any current link.  They can be called
+% at any time, but start.inlinelink will be a no-op unless something has
+% previously set makeinlinelink true, and the two ...end.inlinelink functions
+% will only do their stuff if start.inlinelink has previously set
+% closeinlinelink to be non-empty.
+% (thanks to 'ijvm' for suggested code here)
+FUNCTION {uand}
+{ 'skip$ { pop$ #0 } if$ } % 'and' (which isn't defined at this point in the file)
+FUNCTION {possibly.setup.inlinelink}
+{ makeinlinelink hrefform #0 > uand
+    { doi empty$ adddoiresolver uand
+        { pubmed empty$ addpubmedresolver uand
+            { eprint empty$ addeprints uand
+                { url empty$
+                    { "" }
+                    { url }
+                  if$ }
+                { eprinturl eprint * }
+              if$ }
+            { pubmedurl pubmed * }
+          if$ }
+        { doiurl doi * }
+      if$
+      % an appropriately-formatted URL is now on the stack
+      hrefform #1 = % hypertex
+        { "\special {html:<a href=" quote$ * swap$ * quote$ * "> }{" * 'openinlinelink :=
+          "\special {html:</a>}" 'closeinlinelink := }
+        { "\href {" swap$ * "} {" * 'openinlinelink := % hrefform=#2 -- hyperref
+          % the space between "} {" matters: a URL of just the right length can cause "\% newline em"
+          "}" 'closeinlinelink := }
+      if$
+      #0 'makeinlinelink :=
+      }
+    'skip$
+  if$ % makeinlinelink
+}
+FUNCTION {add.inlinelink}
+{ openinlinelink empty$
+    'skip$
+    { openinlinelink swap$ * closeinlinelink *
+      "" 'openinlinelink :=
+      }
+  if$
+}
+FUNCTION {output.nonnull}
+{ % Save the thing we've been asked to output
+  's :=
+  % If the bracket-state is close.brackets, then add a close-bracket to
+  % what is currently at the top of the stack, and set bracket.state
+  % to outside.brackets
+  bracket.state close.brackets =
+    { "]" *
+      outside.brackets 'bracket.state :=
+    }
+    'skip$
+  if$
+  bracket.state outside.brackets =
+    { % We're outside all brackets -- this is the normal situation.
+      % Write out what's currently at the top of the stack, using the
+      % original output.nonnull function.
+      s
+      add.inlinelink
+      output.nonnull.original % invoke the original output.nonnull
+    }
+    { % Still in brackets.  Add open-bracket or (continuation) comma, add the
+      % new text (in s) to the top of the stack, and move to the close-brackets
+      % state, ready for next time (unless inbrackets resets it).  If we come
+      % into this branch, then output.state is carefully undisturbed.
+      bracket.state open.brackets =
+        { " [" * }
+        { ", " * } % bracket.state will be within.brackets
+      if$
+      s *
+      close.brackets 'bracket.state :=
+    }
+  if$
+}
+% Call this function just before adding something which should be presented in
+% brackets.  bracket.state is handled specially within output.nonnull.
+FUNCTION {inbrackets}
+{ bracket.state close.brackets =
+    { within.brackets 'bracket.state := } % reset the state: not open nor closed
+    { open.brackets 'bracket.state := }
+  if$
+}
+FUNCTION {format.lastchecked}
+{ lastchecked empty$
+    { "" }
+    { inbrackets citedstring lastchecked * }
+  if$
+}
+% ...urlbst to here
+FUNCTION {output}
+{ duplicate$ empty$
+    'pop$
+    'output.nonnull
+  if$
+}
+FUNCTION {output.check}
+{ 't :=
+  duplicate$ empty$
+    { pop$ "empty " t * " in " * cite$ * warning$ }
+    'output.nonnull
+  if$
+}
+FUNCTION {fin.entry.original} % urlbst (renamed from fin.entry, so it can be wrapped below)
+{ add.period$
+  write$
+  newline$
+}
+FUNCTION {new.block}
+{ output.state before.all =
+    'skip$
+    { after.block 'output.state := }
+  if$
+}
+FUNCTION {new.sentence}
+{ output.state after.block =
+    'skip$
+    { output.state before.all =
+        'skip$
+        { after.sentence 'output.state := }
+      if$
+    }
+  if$
+}
+FUNCTION {add.blank}
+{  " " * before.all 'output.state :=
+}
+FUNCTION {date.block}
+{
+  new.block
+}
+FUNCTION {not}
+{   { #0 }
+    { #1 }
+  if$
+}
+FUNCTION {and}
+{   'skip$
+    { pop$ #0 }
+  if$
+}
+FUNCTION {or}
+{   { pop$ #1 }
+    'skip$
+  if$
+}
+FUNCTION {new.block.checkb}
+{ empty$
+  swap$ empty$
+  and
+    'skip$
+    'new.block
+  if$
+}
+FUNCTION {field.or.null}
+{ duplicate$ empty$
+    { pop$ "" }
+    'skip$
+  if$
+}
+FUNCTION {emphasize}
+{ duplicate$ empty$
+    { pop$ "" }
+    { "\emph{" swap$ * "}" * }
+  if$
+}
+FUNCTION {tie.or.space.prefix}
+{ duplicate$ text.length$ #3 <
+    { "~" }
+    { " " }
+  if$
+  swap$
+}
+FUNCTION {capitalize}
+{ "u" change.case$ "t" change.case$ }
+FUNCTION {space.word}
+{ " " swap$ * " " * }
+ % Here are the language-specific definitions for explicit words.
+ % Each function has a name bbl.xxx where xxx is the English word.
+ % The language selected here is ENGLISH
+FUNCTION {bbl.and}
+{ "and"}
+FUNCTION {bbl.etal}
+{ "et~al." }
+FUNCTION {bbl.editors}
+{ "editors" }
+FUNCTION {bbl.editor}
+{ "editor" }
+FUNCTION {bbl.edby}
+{ "edited by" }
+FUNCTION {bbl.edition}
+{ "edition" }
+FUNCTION {bbl.volume}
+{ "volume" }
+FUNCTION {bbl.of}
+{ "of" }
+FUNCTION {bbl.number}
+{ "number" }
+FUNCTION {bbl.nr}
+{ "no." }
+FUNCTION {bbl.in}
+{ "in" }
+FUNCTION {bbl.pages}
+{ "pages" }
+FUNCTION {bbl.page}
+{ "page" }
+FUNCTION {bbl.chapter}
+{ "chapter" }
+FUNCTION {bbl.techrep}
+{ "Technical Report" }
+FUNCTION {bbl.mthesis}
+{ "Master's thesis" }
+FUNCTION {bbl.phdthesis}
+{ "Ph.D. thesis" }
+MACRO {jan} {"January"}
+MACRO {feb} {"February"}
+MACRO {mar} {"March"}
+MACRO {apr} {"April"}
+MACRO {may} {"May"}
+MACRO {jun} {"June"}
+MACRO {jul} {"July"}
+MACRO {aug} {"August"}
+MACRO {sep} {"September"}
+MACRO {oct} {"October"}
+MACRO {nov} {"November"}
+MACRO {dec} {"December"}
+MACRO {acmcs} {"ACM Computing Surveys"}
+MACRO {acta} {"Acta Informatica"}
+MACRO {cacm} {"Communications of the ACM"}
+MACRO {ibmjrd} {"IBM Journal of Research and Development"}
+MACRO {ibmsj} {"IBM Systems Journal"}
+MACRO {ieeese} {"IEEE Transactions on Software Engineering"}
+MACRO {ieeetc} {"IEEE Transactions on Computers"}
+MACRO {ieeetcad}
+ {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"}
+MACRO {ipl} {"Information Processing Letters"}
+MACRO {jacm} {"Journal of the ACM"}
+MACRO {jcss} {"Journal of Computer and System Sciences"}
+MACRO {scp} {"Science of Computer Programming"}
+MACRO {sicomp} {"SIAM Journal on Computing"}
+MACRO {tocs} {"ACM Transactions on Computer Systems"}
+MACRO {tods} {"ACM Transactions on Database Systems"}
+MACRO {tog} {"ACM Transactions on Graphics"}
+MACRO {toms} {"ACM Transactions on Mathematical Software"}
+MACRO {toois} {"ACM Transactions on Office Information Systems"}
+MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"}
+MACRO {tcs} {"Theoretical Computer Science"}
+FUNCTION {bibinfo.check}
+{ swap$
+  duplicate$ missing$
+    {
+      pop$ pop$
+      ""
+    }
+    { duplicate$ empty$
+        {
+          swap$ pop$
+        }
+        { swap$
+          pop$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {bibinfo.warn}
+{ swap$
+  duplicate$ missing$
+    {
+      swap$ "missing " swap$ * " in " * cite$ * warning$ pop$
+      ""
+    }
+    { duplicate$ empty$
+        {
+          swap$ "empty " swap$ * " in " * cite$ * warning$
+        }
+        { swap$
+          pop$
+        }
+      if$
+    }
+  if$
+}
+STRINGS  { bibinfo}
+INTEGERS { nameptr namesleft numnames }
+FUNCTION {format.names}
+{ 'bibinfo :=
+  duplicate$ empty$ 'skip$ {
+  's :=
+  "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      duplicate$ #1 >
+        { "{ff~}{vv~}{ll}{, jj}" }
+        { "{ff~}{vv~}{ll}{, jj}" }	% first name first for first author
+%        { "{vv~}{ll}{, ff}{, jj}" }	% last name first for first author
+      if$
+      format.name$
+      bibinfo bibinfo.check
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              numnames #2 >
+                { "," * }
+                'skip$
+              if$
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+  } if$
+}
+FUNCTION {format.names.ed}
+{
+  'bibinfo :=
+  duplicate$ empty$ 'skip$ {
+  's :=
+  "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{ff~}{vv~}{ll}{, jj}"
+      format.name$
+      bibinfo bibinfo.check
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              numnames #2 >
+                { "," * }
+                'skip$
+              if$
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+  } if$
+}
+FUNCTION {format.key}
+{ empty$
+    { key field.or.null }
+    { "" }
+  if$
+}
+FUNCTION {format.authors}
+{ author "author" format.names
+}
+FUNCTION {get.bbl.editor}
+{ editor num.names$ #1 > 'bbl.editors 'bbl.editor if$ }
+FUNCTION {format.editors}
+{ editor "editor" format.names duplicate$ empty$ 'skip$
+    {
+      "," *
+      " " *
+      get.bbl.editor
+      *
+    }
+  if$
+}
+FUNCTION {format.note}
+{
+ note empty$
+    { "" }
+    { note #1 #1 substring$
+      duplicate$ "{" =
+        'skip$
+        { output.state mid.sentence =
+          { "l" }
+          { "u" }
+        if$
+        change.case$
+        }
+      if$
+      note #2 global.max$ substring$ * "note" bibinfo.check
+    }
+  if$
+}
+FUNCTION {format.title}
+{ title
+  duplicate$ empty$ 'skip$
+    { "t" change.case$ }
+  if$
+  "title" bibinfo.check
+}
+FUNCTION {format.full.names}
+{'s :=
+ "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{vv~}{ll}" format.name$
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  numnames #2 >
+                    { "," * }
+                    'skip$
+                  if$
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+}
+FUNCTION {author.editor.key.full}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { cite$ #1 #3 substring$ }
+            'key
+          if$
+        }
+        { editor format.full.names }
+      if$
+    }
+    { author format.full.names }
+  if$
+}
+FUNCTION {author.key.full}
+{ author empty$
+    { key empty$
+         { cite$ #1 #3 substring$ }
+          'key
+      if$
+    }
+    { author format.full.names }
+  if$
+}
+FUNCTION {editor.key.full}
+{ editor empty$
+    { key empty$
+         { cite$ #1 #3 substring$ }
+          'key
+      if$
+    }
+    { editor format.full.names }
+  if$
+}
+FUNCTION {make.full.names}
+{ type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.key.full
+    { type$ "proceedings" =
+        'editor.key.full
+        'author.key.full
+      if$
+    }
+  if$
+}
+FUNCTION {output.bibitem.original} % urlbst (renamed from output.bibitem, so it can be wrapped below)
+{ newline$
+  "\bibitem[{" write$
+  label write$
+  ")" make.full.names duplicate$ short.list =
+     { pop$ }
+     { * }
+   if$
+  "}]{" * write$
+  cite$ write$
+  "}" write$
+  newline$
+  ""
+  before.all 'output.state :=
+}
+FUNCTION {n.dashify}
+{
+  't :=
+  ""
+    { t empty$ not }
+    { t #1 #1 substring$ "-" =
+        { t #1 #2 substring$ "--" = not
+            { "--" *
+              t #2 global.max$ substring$ 't :=
+            }
+            {   { t #1 #1 substring$ "-" = }
+                { "-" *
+                  t #2 global.max$ substring$ 't :=
+                }
+              while$
+            }
+          if$
+        }
+        { t #1 #1 substring$ *
+          t #2 global.max$ substring$ 't :=
+        }
+      if$
+    }
+  while$
+}
+FUNCTION {word.in}
+{ bbl.in capitalize
+  " " * }
+FUNCTION {format.date}
+{ year "year" bibinfo.check duplicate$ empty$
+    {
+    }
+    'skip$
+  if$
+  extra.label *
+  before.all 'output.state :=
+  after.sentence 'output.state :=
+}
+FUNCTION {format.btitle}
+{ title "title" bibinfo.check
+  duplicate$ empty$ 'skip$
+    {
+      emphasize
+    }
+  if$
+}
+FUNCTION {either.or.check}
+{ empty$
+    'pop$
+    { "can't use both " swap$ * " fields in " * cite$ * warning$ }
+  if$
+}
+FUNCTION {format.bvolume}
+{ volume empty$
+    { "" }
+    { bbl.volume volume tie.or.space.prefix
+      "volume" bibinfo.check * *
+      series "series" bibinfo.check
+      duplicate$ empty$ 'pop$
+        { swap$ bbl.of space.word * swap$
+          emphasize * }
+      if$
+      "volume and number" number either.or.check
+    }
+  if$
+}
+FUNCTION {format.number.series}
+{ volume empty$
+    { number empty$
+        { series field.or.null }
+        { series empty$
+            { number "number" bibinfo.check }
+        { output.state mid.sentence =
+            { bbl.number }
+            { bbl.number capitalize }
+          if$
+          number tie.or.space.prefix "number" bibinfo.check * *
+          bbl.in space.word *
+          series "series" bibinfo.check *
+        }
+      if$
+    }
+      if$
+    }
+    { "" }
+  if$
+}
+FUNCTION {format.edition}
+{ edition duplicate$ empty$ 'skip$
+    {
+      output.state mid.sentence =
+        { "l" }
+        { "t" }
+      if$ change.case$
+      "edition" bibinfo.check
+      " " * bbl.edition *
+    }
+  if$
+}
+INTEGERS { multiresult }
+FUNCTION {multi.page.check}
+{ 't :=
+  #0 'multiresult :=
+    { multiresult not
+      t empty$ not
+      and
+    }
+    { t #1 #1 substring$
+      duplicate$ "-" =
+      swap$ duplicate$ "," =
+      swap$ "+" =
+      or or
+        { #1 'multiresult := }
+        { t #2 global.max$ substring$ 't := }
+      if$
+    }
+  while$
+  multiresult
+}
+FUNCTION {format.pages}
+{ pages duplicate$ empty$ 'skip$
+    { duplicate$ multi.page.check
+        {
+          bbl.pages swap$
+          n.dashify
+        }
+        {
+          bbl.page swap$
+        }
+      if$
+      tie.or.space.prefix
+      "pages" bibinfo.check
+      * *
+    }
+  if$
+}
+FUNCTION {format.journal.pages}
+{ pages duplicate$ empty$ 'pop$
+    { swap$ duplicate$ empty$
+        { pop$ pop$ format.pages }
+        {
+          ":" *
+          swap$
+          n.dashify
+          "pages" bibinfo.check
+          *
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {format.vol.num.pages}
+{ volume field.or.null
+  duplicate$ empty$ 'skip$
+    {
+      "volume" bibinfo.check
+    }
+  if$
+  number "number" bibinfo.check duplicate$ empty$ 'skip$
+    {
+      swap$ duplicate$ empty$
+        { "there's a number but no volume in " cite$ * warning$ }
+        'skip$
+      if$
+      swap$
+      "(" swap$ * ")" *
+    }
+  if$ *
+  format.journal.pages
+}
+FUNCTION {format.chapter}
+{ chapter empty$
+    'format.pages
+    { type empty$
+        { bbl.chapter }
+        { type "l" change.case$
+          "type" bibinfo.check
+        }
+      if$
+      chapter tie.or.space.prefix
+      "chapter" bibinfo.check
+      * *
+    }
+  if$
+}
+FUNCTION {format.chapter.pages}
+{ chapter empty$
+    'format.pages
+    { type empty$
+        { bbl.chapter }
+        { type "l" change.case$
+          "type" bibinfo.check
+        }
+      if$
+      chapter tie.or.space.prefix
+      "chapter" bibinfo.check
+      * *
+      pages empty$
+        'skip$
+        { ", " * format.pages * }
+      if$
+    }
+  if$
+}
+FUNCTION {format.booktitle}
+{
+  booktitle "booktitle" bibinfo.check
+  emphasize
+}
+FUNCTION {format.in.booktitle}
+{ format.booktitle duplicate$ empty$ 'skip$
+    {
+      word.in swap$ *
+    }
+  if$
+}
+FUNCTION {format.in.ed.booktitle}
+{ format.booktitle duplicate$ empty$ 'skip$
+    {
+      editor "editor" format.names.ed duplicate$ empty$ 'pop$
+        {
+          "," *
+          " " *
+          get.bbl.editor
+          ", " *
+          * swap$
+          * }
+      if$
+      word.in swap$ *
+    }
+  if$
+}
+FUNCTION {format.thesis.type}
+{ type duplicate$ empty$
+    'pop$
+    { swap$ pop$
+      "t" change.case$ "type" bibinfo.check
+    }
+  if$
+}
+FUNCTION {format.tr.number}
+{ number "number" bibinfo.check
+  type duplicate$ empty$
+    { pop$ bbl.techrep }
+    'skip$
+  if$
+  "type" bibinfo.check
+  swap$ duplicate$ empty$
+    { pop$ "t" change.case$ }
+    { tie.or.space.prefix * * }
+  if$
+}
+FUNCTION {format.article.crossref}
+{
+  word.in
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.book.crossref}
+{ volume duplicate$ empty$
+    { "empty volume in " cite$ * "'s crossref of " * crossref * warning$
+      pop$ word.in
+    }
+    { bbl.volume
+      capitalize
+      swap$ tie.or.space.prefix "volume" bibinfo.check * * bbl.of space.word *
+    }
+  if$
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.incoll.inproc.crossref}
+{
+  word.in
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.org.or.pub}
+{ 't :=
+  ""
+  address empty$ t empty$ and
+    'skip$
+    {
+      t empty$
+        { address "address" bibinfo.check *
+        }
+        { t *
+          address empty$
+            'skip$
+            { ", " * address "address" bibinfo.check * }
+          if$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {format.publisher.address}
+{ publisher "publisher" bibinfo.warn format.org.or.pub
+}
+FUNCTION {format.organization.address}
+{ organization "organization" bibinfo.check format.org.or.pub
+}
+% urlbst...
+% Functions for making hypertext links.
+% In all cases, the stack has (link-text href-url)
+%
+% make 'null' specials
+FUNCTION {make.href.null}
+{
+  pop$
+}
+% make hypertex specials
+FUNCTION {make.href.hypertex}
+{
+  "\special {html:<a href=" quote$ *
+  swap$ * quote$ * "> }" * swap$ *
+  "\special {html:</a>}" *
+}
+% make hyperref specials
+FUNCTION {make.href.hyperref}
+{
+  "\href {" swap$ * "} {\path{" * swap$ * "}}" *
+}
+FUNCTION {make.href}
+{ hrefform #2 =
+    'make.href.hyperref      % hrefform = 2
+    { hrefform #1 =
+        'make.href.hypertex  % hrefform = 1
+        'make.href.null      % hrefform = 0 (or anything else)
+      if$
+    }
+  if$
+}
+% If inlinelinks is true, then format.url should be a no-op, since it's
+% (a) redundant, and (b) could end up as a link-within-a-link.
+FUNCTION {format.url}
+{ inlinelinks #1 = url empty$ or
+   { "" }
+   { hrefform #1 =
+       { % special case -- add HyperTeX specials
+         urlintro "\url{" url * "}" * url make.href.hypertex * }
+       { urlintro "\url{" * url * "}" * }
+     if$
+   }
+  if$
+}
+FUNCTION {format.eprint}
+{ eprint empty$
+    { "" }
+    { eprintprefix eprint * eprinturl eprint * make.href }
+  if$
+}
+FUNCTION {format.doi}
+{ doi empty$
+    { "" }
+    { doiprefix doi * doiurl doi * make.href }
+  if$
+}
+FUNCTION {format.pubmed}
+{ pubmed empty$
+    { "" }
+    { pubmedprefix pubmed * pubmedurl pubmed * make.href }
+  if$
+}
+% Output a URL.  We can't use the more normal idiom (something like
+% `format.url output'), because the `inbrackets' within
+% format.lastchecked applies to everything between calls to `output',
+% so that `format.url format.lastchecked * output' ends up with both
+% the URL and the lastchecked in brackets.
+FUNCTION {output.url}
+{ url empty$
+    'skip$
+    { new.block
+      format.url output
+      format.lastchecked output
+    }
+  if$
+}
+FUNCTION {output.web.refs}
+{
+  new.block
+  inlinelinks
+    'skip$ % links were inline -- don't repeat them
+    {
+      output.url
+      addeprints eprint empty$ not and
+        { format.eprint output.nonnull }
+        'skip$
+      if$
+      adddoiresolver doi empty$ not and
+        { format.doi output.nonnull }
+        'skip$
+      if$
+      addpubmedresolver pubmed empty$ not and
+        { format.pubmed output.nonnull }
+        'skip$
+      if$
+    }
+  if$
+}
+% Wrapper for output.bibitem.original.
+% If the URL field is not empty, set makeinlinelink to be true,
+% so that an inline link will be started at the next opportunity
+FUNCTION {output.bibitem}
+{ outside.brackets 'bracket.state :=
+  output.bibitem.original
+  inlinelinks url empty$ not doi empty$ not or pubmed empty$ not or eprint empty$ not or and
+    { #1 'makeinlinelink := }
+    { #0 'makeinlinelink := }
+  if$
+}
+% Wrapper for fin.entry.original
+FUNCTION {fin.entry}
+{ output.web.refs  % urlbst
+  makeinlinelink       % ooops, it appears we didn't have a title for inlinelink
+    { possibly.setup.inlinelink % add some artificial link text here, as a fallback
+      linktextstring output.nonnull }
+    'skip$
+  if$
+  bracket.state close.brackets = % urlbst
+    { "]" * }
+    'skip$
+  if$
+  fin.entry.original
+}
+% Webpage entry type.
+% Title and url fields required;
+% author, note, year, month, and lastchecked fields optional
+% See references
+%   ISO 690-2 http://www.nlc-bnc.ca/iso/tc46sc9/standard/690-2e.htm
+%   http://www.classroom.net/classroom/CitingNetResources.html
+%   http://neal.ctstateu.edu/history/cite.html
+%   http://www.cas.usf.edu/english/walker/mla.html
+% for citation formats for web pages.
+FUNCTION {webpage}
+{ output.bibitem
+  author empty$
+    { editor empty$
+        'skip$  % author and editor both optional
+        { format.editors output.nonnull }
+      if$
+    }
+    { editor empty$
+        { format.authors output.nonnull }
+        { "can't use both author and editor fields in " cite$ * warning$ }
+      if$
+    }
+  if$
+  new.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$
+  format.title "title" output.check
+  inbrackets onlinestring output
+  new.block
+  year empty$
+    'skip$
+    { format.date "year" output.check }
+  if$
+  % We don't need to output the URL details ('lastchecked' and 'url'),
+  % because fin.entry does that for us, using output.web.refs.  The only
+  % reason we would want to put them here is if we were to decide that
+  % they should go in front of the rather miscellaneous information in 'note'.
+  new.block
+  note output
+  fin.entry
+}
+% ...urlbst to here
+FUNCTION {article}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    {
+      journal
+      "journal" bibinfo.check
+      emphasize
+      "journal" output.check
+      possibly.setup.inlinelink format.vol.num.pages output% urlbst
+    }
+    { format.article.crossref output.nonnull
+      format.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {book}
+{ output.bibitem
+  author empty$
+    { format.editors "author and editor" output.check
+      editor format.key output
+    }
+    { format.authors output.nonnull
+      crossref missing$
+        { "author and editor" editor either.or.check }
+        'skip$
+      if$
+    }
+  if$
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  crossref missing$
+    { format.bvolume output
+      new.block
+      format.number.series output
+      new.sentence
+      format.publisher.address output
+    }
+    {
+      new.block
+      format.book.crossref output.nonnull
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {booklet}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  howpublished "howpublished" bibinfo.check output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {inbook}
+{ output.bibitem
+  author empty$
+    { format.editors "author and editor" output.check
+      editor format.key output
+    }
+    { format.authors output.nonnull
+      crossref missing$
+        { "author and editor" editor either.or.check }
+        'skip$
+      if$
+    }
+  if$
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  crossref missing$
+    {
+      format.bvolume output
+      format.number.series output
+      format.chapter "chapter" output.check
+      new.sentence
+      format.publisher.address output
+      new.block
+    }
+    {
+      format.chapter "chapter" output.check
+      new.block
+      format.book.crossref output.nonnull
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {incollection}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    { format.in.ed.booktitle "booktitle" output.check
+      format.edition output
+      format.bvolume output
+      format.number.series output
+      format.chapter.pages output
+      new.sentence
+      format.publisher.address output
+    }
+    { format.incoll.inproc.crossref output.nonnull
+      format.chapter.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {inproceedings}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    { format.in.booktitle "booktitle" output.check
+      format.bvolume output
+      format.number.series output
+      format.pages output
+      address "address" bibinfo.check output
+      new.sentence
+      organization "organization" bibinfo.check output
+      publisher "publisher" bibinfo.check output
+    }
+    { format.incoll.inproc.crossref output.nonnull
+      format.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {conference} { inproceedings }
+FUNCTION {manual}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  organization address new.block.checkb
+  organization "organization" bibinfo.check output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {mastersthesis}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title
+  "title" output.check
+  new.block
+  bbl.mthesis format.thesis.type output.nonnull
+  school "school" bibinfo.warn output
+  address "address" bibinfo.check output
+  month "month" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {misc}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title output
+  new.block
+  howpublished "howpublished" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {phdthesis}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle
+  "title" output.check
+  new.block
+  bbl.phdthesis format.thesis.type output.nonnull
+  school "school" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {proceedings}
+{ output.bibitem
+  format.editors output
+  editor format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.bvolume output
+  format.number.series output
+  new.sentence
+  publisher empty$
+    { format.organization.address output }
+    { organization "organization" bibinfo.check output
+      new.sentence
+      format.publisher.address output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {techreport}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title
+  "title" output.check
+  new.block
+  format.tr.number output.nonnull
+  institution "institution" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {unpublished}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  format.note "note" output.check
+  fin.entry
+}
+FUNCTION {default.type} { misc }
+READ
+FUNCTION {sortify}
+{ purify$
+  "l" change.case$
+}
+INTEGERS { len }
+FUNCTION {chop.word}
+{ 's :=
+  'len :=
+  s #1 len substring$ =
+    { s len #1 + global.max$ substring$ }
+    's
+  if$
+}
+FUNCTION {format.lab.names}
+{ 's :=
+  "" 't :=
+  s #1 "{vv~}{ll}" format.name$
+  s num.names$ duplicate$
+  #2 >
+    { pop$
+      " " * bbl.etal *
+    }
+    { #2 <
+        'skip$
+        { s #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" =
+            {
+              " " * bbl.etal *
+            }
+            { bbl.and space.word * s #2 "{vv~}{ll}" format.name$
+              * }
+          if$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {author.key.label}
+{ author empty$
+    { key empty$
+        { cite$ #1 #3 substring$ }
+        'key
+      if$
+    }
+    { author format.lab.names }
+  if$
+}
+FUNCTION {author.editor.key.label}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { cite$ #1 #3 substring$ }
+            'key
+          if$
+        }
+        { editor format.lab.names }
+      if$
+    }
+    { author format.lab.names }
+  if$
+}
+FUNCTION {editor.key.label}
+{ editor empty$
+    { key empty$
+        { cite$ #1 #3 substring$ }
+        'key
+      if$
+    }
+    { editor format.lab.names }
+  if$
+}
+FUNCTION {calc.short.authors}
+{ type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.key.label
+    { type$ "proceedings" =
+        'editor.key.label
+        'author.key.label
+      if$
+    }
+  if$
+  'short.list :=
+}
+FUNCTION {calc.label}
+{ calc.short.authors
+  short.list
+  "("
+  *
+  year duplicate$ empty$
+  short.list key field.or.null = or
+     { pop$ "" }
+     'skip$
+  if$
+  *
+  'label :=
+}
+FUNCTION {sort.format.names}
+{ 's :=
+  #1 'nameptr :=
+  ""
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{ll{ }}{  ff{ }}{  jj{ }}"
+      format.name$ 't :=
+      nameptr #1 >
+        {
+          "   "  *
+          namesleft #1 = t "others" = and
+            { "zzzzz" * }
+            { t sortify * }
+          if$
+        }
+        { t sortify * }
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+}
+FUNCTION {sort.format.title}
+{ 't :=
+  "A " #2
+    "An " #3
+      "The " #4 t chop.word
+    chop.word
+  chop.word
+  sortify
+  #1 global.max$ substring$
+}
+FUNCTION {author.sort}
+{ author empty$
+    { key empty$
+        { "to sort, need author or key in " cite$ * warning$
+          ""
+        }
+        { key sortify }
+      if$
+    }
+    { author sort.format.names }
+  if$
+}
+FUNCTION {author.editor.sort}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { "to sort, need author, editor, or key in " cite$ * warning$
+              ""
+            }
+            { key sortify }
+          if$
+        }
+        { editor sort.format.names }
+      if$
+    }
+    { author sort.format.names }
+  if$
+}
+FUNCTION {editor.sort}
+{ editor empty$
+    { key empty$
+        { "to sort, need editor or key in " cite$ * warning$
+          ""
+        }
+        { key sortify }
+      if$
+    }
+    { editor sort.format.names }
+  if$
+}
+FUNCTION {presort}
+{ calc.label
+  label sortify
+  "    "
+  *
+  type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.sort
+    { type$ "proceedings" =
+        'editor.sort
+        'author.sort
+      if$
+    }
+  if$
+  #1 entry.max$ substring$
+  'sort.label :=
+  sort.label
+  *
+  "    "
+  *
+  title field.or.null
+  sort.format.title
+  *
+  #1 entry.max$ substring$
+  'sort.key$ :=
+}
+ITERATE {presort}
+SORT
+STRINGS { last.label next.extra }
+INTEGERS { last.extra.num number.label }
+FUNCTION {initialize.extra.label.stuff}
+{ #0 int.to.chr$ 'last.label :=
+  "" 'next.extra :=
+  #0 'last.extra.num :=
+  #0 'number.label :=
+}
+FUNCTION {forward.pass}
+{ last.label label =
+    { last.extra.num #1 + 'last.extra.num :=
+      last.extra.num int.to.chr$ 'extra.label :=
+    }
+    { "a" chr.to.int$ 'last.extra.num :=
+      "" 'extra.label :=
+      label 'last.label :=
+    }
+  if$
+  number.label #1 + 'number.label :=
+}
+FUNCTION {reverse.pass}
+{ next.extra "b" =
+    { "a" 'extra.label := }
+    'skip$
+  if$
+  extra.label 'next.extra :=
+  extra.label
+  duplicate$ empty$
+    'skip$
+    { year field.or.null #-1 #1 substring$ chr.to.int$ #65 <
+      { "{\natexlab{" swap$ * "}}" * }
+      { "{(\natexlab{" swap$ * "})}" * }
+    if$ }
+  if$
+  'extra.label :=
+  label extra.label * 'label :=
+}
+EXECUTE {initialize.extra.label.stuff}
+ITERATE {forward.pass}
+REVERSE {reverse.pass}
+FUNCTION {bib.sort.order}
+{ sort.label
+  "    "
+  *
+  year field.or.null sortify
+  *
+  "    "
+  *
+  title field.or.null
+  sort.format.title
+  *
+  #1 entry.max$ substring$
+  'sort.key$ :=
+}
+ITERATE {bib.sort.order}
+SORT
+FUNCTION {begin.bib}
+{ preamble$ empty$
+    'skip$
+    { preamble$ write$ newline$ }
+  if$
+  "\begin{thebibliography}{" number.label int.to.str$ * "}" *
+  write$ newline$
+  "\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi"
+  write$ newline$
+}
+EXECUTE {begin.bib}
+EXECUTE {init.urlbst.variables} % urlbst
+EXECUTE {init.state.consts}
+ITERATE {call.type$}
+FUNCTION {end.bib}
+{ newline$
+  "\end{thebibliography}" write$ newline$
+}
+EXECUTE {end.bib}
+%% End of customized bst file
+%%
+%% End of file `compling.bst'.

references/2020.emnlp.nguyen/source/emnlp2020.sty ADDED Viewed

	@@ -0,0 +1,560 @@

+% This is the LaTex style file for EMNLP 2020, based off of ACL 2020.
+% Addressing bibtex issues mentioned in https://github.com/acl-org/acl-pub/issues/2
+% Other major modifications include
+% changing the color of the line numbers to a light gray; changing font size of abstract to be 10pt; changing caption font size to be 10pt.
+% -- M Mitchell and Stephanie Lukin
+% 2017: modified to support DOI links in bibliography.  Now uses
+% natbib package rather than defining citation commands in this file.
+% Use with acl_natbib.bst bib style.  -- Dan Gildea
+% This is the LaTeX style for ACL 2016. It contains Margaret Mitchell's
+% line number adaptations (ported by Hai Zhao and Yannick Versley).
+% It is nearly identical to the style files for ACL 2015,
+% ACL 2014, EACL 2006, ACL2005, ACL 2002, ACL 2001, ACL 2000,
+% EACL 95 and EACL 99.
+%
+% Changes made include: adapt layout to A4 and centimeters, widen abstract
+% This is the LaTeX style file for ACL 2000.  It is nearly identical to the
+% style files for EACL 95 and EACL 99.  Minor changes include editing the
+% instructions to reflect use of \documentclass rather than \documentstyle
+% and removing the white space before the title on the first page
+% -- John Chen, June 29, 2000
+% This is the LaTeX style file for EACL-95.  It is identical to the
+% style file for ANLP '94 except that the margins are adjusted for A4
+% paper.  -- abney 13 Dec 94
+% The ANLP '94 style file is a slightly modified
+% version of the style used for AAAI and IJCAI, using some changes
+% prepared by Fernando Pereira and others and some minor changes
+% by Paul Jacobs.
+% Papers prepared using the aclsub.sty file and acl.bst bibtex style
+% should be easily converted to final format using this style.
+% (1) Submission information (\wordcount, \subject, and \makeidpage)
+% should be removed.
+% (2) \summary should be removed.  The summary material should come
+% after \maketitle and should be in the ``abstract'' environment
+% (between \begin{abstract} and \end{abstract}).
+% (3) Check all citations.  This style should handle citations correctly
+% and also allows multiple citations separated by semicolons.
+% (4) Check figures and examples.  Because the final format is double-
+% column, some adjustments may have to be made to fit text in the column
+% or to choose full-width (\figure*} figures.
+% Place this in a file called aclap.sty in the TeX search path.
+% (Placing it in the same directory as the paper should also work.)
+% Prepared by Peter F. Patel-Schneider, liberally using the ideas of
+% other style hackers, including Barbara Beeton.
+% This style is NOT guaranteed to work.  It is provided in the hope
+% that it will make the preparation of papers easier.
+%
+% There are undoubtably bugs in this style.  If you make bug fixes,
+% improvements, etc.  please let me know.  My e-mail address is:
+%       pfps@research.att.com
+% Papers are to be prepared using the ``acl_natbib'' bibliography style,
+% as follows:
+%       \documentclass[11pt]{article}
+%       \usepackage{acl2000}
+%       \title{Title}
+%       \author{Author 1 \and Author 2 \\ Address line \\ Address line \And
+%               Author 3 \\ Address line \\ Address line}
+%       \begin{document}
+%       ...
+%       \bibliography{bibliography-file}
+%       \bibliographystyle{acl_natbib}
+%       \end{document}
+% Author information can be set in various styles:
+% For several authors from the same institution:
+% \author{Author 1 \and ... \and Author n \\
+%         Address line \\ ... \\ Address line}
+% if the names do not fit well on one line use
+%         Author 1 \\ {\bf Author 2} \\ ... \\ {\bf Author n} \\
+% For authors from different institutions:
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \And  ... \And
+%         Author n \\ Address line \\ ... \\ Address line}
+% To start a seperate ``row'' of authors use \AND, as in
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \AND
+%         Author 2 \\ Address line \\ ... \\ Address line \And
+%         Author 3 \\ Address line \\ ... \\ Address line}
+% If the title and author information does not fit in the area allocated,
+% place \setlength\titlebox{<new height>} right after
+% \usepackage{acl2015}
+% where <new height> can be something larger than 5cm
+% include hyperref, unless user specifies nohyperref option like this:
+% \usepackage[nohyperref]{naaclhlt2018}
+\newif\ifacl@hyperref
+\DeclareOption{hyperref}{\acl@hyperreftrue}
+\DeclareOption{nohyperref}{\acl@hyperreffalse}
+\ExecuteOptions{hyperref} % default is to use hyperref
+\ProcessOptions\relax
+\ifacl@hyperref
+  \RequirePackage{hyperref}
+  \usepackage{xcolor}		% make links dark blue
+  \definecolor{darkblue}{rgb}{0, 0, 0.5}
+  \hypersetup{colorlinks=true,citecolor=darkblue, linkcolor=darkblue, urlcolor=darkblue}
+\else
+  % This definition is used if the hyperref package is not loaded.
+  % It provides a backup, no-op definiton of \href.
+  % This is necessary because \href command is used in the acl_natbib.bst file.
+  \def\href#1#2{{#2}}
+  % We still need to load xcolor in this case because the lighter line numbers require it. (SC/KG/WL)
+  \usepackage{xcolor}
+\fi
+\typeout{Conference Style for EMNLP 2020}
+% NOTE:  Some laser printers have a serious problem printing TeX output.
+% These printing devices, commonly known as ``write-white'' laser
+% printers, tend to make characters too light.  To get around this
+% problem, a darker set of fonts must be created for these devices.
+%
+\newcommand{\Thanks}[1]{\thanks{\ #1}}
+% A4 modified by Eneko; again modified by Alexander for 5cm titlebox
+\setlength{\paperwidth}{21cm}   % A4
+\setlength{\paperheight}{29.7cm}% A4
+\setlength\topmargin{-0.5cm}
+\setlength\oddsidemargin{0cm}
+\setlength\textheight{24.7cm}
+\setlength\textwidth{16.0cm}
+\setlength\columnsep{0.6cm}
+\newlength\titlebox
+\setlength\titlebox{5cm}
+\setlength\headheight{5pt}
+\setlength\headsep{0pt}
+\thispagestyle{empty}
+\pagestyle{empty}
+\flushbottom \twocolumn \sloppy
+% We're never going to need a table of contents, so just flush it to
+% save space --- suggested by drstrip@sandia-2
+\def\addcontentsline#1#2#3{}
+\newif\ifaclfinal
+\aclfinalfalse
+\def\aclfinalcopy{\global\aclfinaltrue}
+%% ----- Set up hooks to repeat content on every page of the output doc,
+%% necessary for the line numbers in the submitted version.  --MM
+%%
+%% Copied from CVPR 2015's cvpr_eso.sty, which appears to be largely copied from everyshi.sty.
+%%
+%% Original cvpr_eso.sty available at: http://www.pamitc.org/cvpr15/author_guidelines.php
+%% Original evershi.sty available at: https://www.ctan.org/pkg/everyshi
+%%
+%% Copyright (C) 2001 Martin Schr\"oder:
+%%
+%%                         Martin Schr"oder
+%%                         Cr"usemannallee 3
+%%                         D-28213 Bremen
+%%                         Martin.Schroeder@ACM.org
+%%
+%% This program may be redistributed and/or modified under the terms
+%% of the LaTeX Project Public License, either version 1.0 of this
+%% license, or (at your option) any later version.
+%% The latest version of this license is in
+%%    CTAN:macros/latex/base/lppl.txt.
+%%
+%% Happy users are requested to send [Martin] a postcard. :-)
+%%
+\newcommand{\@EveryShipoutACL@Hook}{}
+\newcommand{\@EveryShipoutACL@AtNextHook}{}
+\newcommand*{\EveryShipoutACL}[1]
+   {\g@addto@macro\@EveryShipoutACL@Hook{#1}}
+\newcommand*{\AtNextShipoutACL@}[1]
+   {\g@addto@macro\@EveryShipoutACL@AtNextHook{#1}}
+\newcommand{\@EveryShipoutACL@Shipout}{%
+   \afterassignment\@EveryShipoutACL@Test
+   \global\setbox\@cclv= %
+   }
+\newcommand{\@EveryShipoutACL@Test}{%
+   \ifvoid\@cclv\relax
+      \aftergroup\@EveryShipoutACL@Output
+   \else
+      \@EveryShipoutACL@Output
+   \fi%
+   }
+\newcommand{\@EveryShipoutACL@Output}{%
+   \@EveryShipoutACL@Hook%
+   \@EveryShipoutACL@AtNextHook%
+      \gdef\@EveryShipoutACL@AtNextHook{}%
+   \@EveryShipoutACL@Org@Shipout\box\@cclv%
+   }
+\newcommand{\@EveryShipoutACL@Org@Shipout}{}
+\newcommand*{\@EveryShipoutACL@Init}{%
+   \message{ABD: EveryShipout initializing macros}%
+   \let\@EveryShipoutACL@Org@Shipout\shipout
+   \let\shipout\@EveryShipoutACL@Shipout
+   }
+\AtBeginDocument{\@EveryShipoutACL@Init}
+%% ----- Set up for placing additional items into the submitted version --MM
+%%
+%% Based on eso-pic.sty
+%%
+%% Original available at: https://www.ctan.org/tex-archive/macros/latex/contrib/eso-pic
+%% Copyright (C) 1998-2002 by Rolf Niepraschk <niepraschk@ptb.de>
+%%
+%% Which may be distributed and/or modified under the conditions of
+%% the LaTeX Project Public License, either version 1.2 of this license
+%% or (at your option) any later version.  The latest version of this
+%% license is in:
+%%
+%%    http://www.latex-project.org/lppl.txt
+%%
+%% and version 1.2 or later is part of all distributions of LaTeX version
+%% 1999/12/01 or later.
+%%
+%% In contrast to the original, we do not include the definitions for/using:
+%% gridpicture, div[2], isMEMOIR[1], gridSetup[6][], subgridstyle{dotted}, labelfactor{}, gap{}, gridunitname{}, gridunit{}, gridlines{\thinlines}, subgridlines{\thinlines}, the {keyval} package, evenside margin, nor any definitions with 'color'.
+%%
+%% These are beyond  what is needed for the NAACL/ACL style.
+%%
+\newcommand\LenToUnit[1]{#1\@gobble}
+\newcommand\AtPageUpperLeft[1]{%
+  \begingroup
+    \@tempdima=0pt\relax\@tempdimb=\ESO@yoffsetI\relax
+    \put(\LenToUnit{\@tempdima},\LenToUnit{\@tempdimb}){#1}%
+  \endgroup
+}
+\newcommand\AtPageLowerLeft[1]{\AtPageUpperLeft{%
+  \put(0,\LenToUnit{-\paperheight}){#1}}}
+\newcommand\AtPageCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-.5\paperheight}){#1}}}
+\newcommand\AtPageLowerCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-\paperheight}){#1}}}%
+\newcommand\AtPageLowishCenter[1]{\AtPageUpperLeft{%
+  \put(\LenToUnit{.5\paperwidth},\LenToUnit{-.96\paperheight}){#1}}}
+\newcommand\AtTextUpperLeft[1]{%
+  \begingroup
+    \setlength\@tempdima{1in}%
+    \advance\@tempdima\oddsidemargin%
+    \@tempdimb=\ESO@yoffsetI\relax\advance\@tempdimb-1in\relax%
+    \advance\@tempdimb-\topmargin%
+    \advance\@tempdimb-\headheight\advance\@tempdimb-\headsep%
+    \put(\LenToUnit{\@tempdima},\LenToUnit{\@tempdimb}){#1}%
+  \endgroup
+}
+\newcommand\AtTextLowerLeft[1]{\AtTextUpperLeft{%
+  \put(0,\LenToUnit{-\textheight}){#1}}}
+\newcommand\AtTextCenter[1]{\AtTextUpperLeft{%
+  \put(\LenToUnit{.5\textwidth},\LenToUnit{-.5\textheight}){#1}}}
+\newcommand{\ESO@HookI}{} \newcommand{\ESO@HookII}{}
+\newcommand{\ESO@HookIII}{}
+\newcommand{\AddToShipoutPicture}{%
+  \@ifstar{\g@addto@macro\ESO@HookII}{\g@addto@macro\ESO@HookI}}
+\newcommand{\ClearShipoutPicture}{\global\let\ESO@HookI\@empty}
+\newcommand{\@ShipoutPicture}{%
+  \bgroup
+    \@tempswafalse%
+    \ifx\ESO@HookI\@empty\else\@tempswatrue\fi%
+    \ifx\ESO@HookII\@empty\else\@tempswatrue\fi%
+    \ifx\ESO@HookIII\@empty\else\@tempswatrue\fi%
+    \if@tempswa%
+      \@tempdima=1in\@tempdimb=-\@tempdima%
+      \advance\@tempdimb\ESO@yoffsetI%
+      \unitlength=1pt%
+      \global\setbox\@cclv\vbox{%
+        \vbox{\let\protect\relax
+          \pictur@(0,0)(\strip@pt\@tempdima,\strip@pt\@tempdimb)%
+            \ESO@HookIII\ESO@HookI\ESO@HookII%
+            \global\let\ESO@HookII\@empty%
+          \endpicture}%
+          \nointerlineskip%
+        \box\@cclv}%
+    \fi
+  \egroup
+}
+\EveryShipoutACL{\@ShipoutPicture}
+\newif\ifESO@dvips\ESO@dvipsfalse
+\newif\ifESO@grid\ESO@gridfalse
+\newif\ifESO@texcoord\ESO@texcoordfalse
+\newcommand*\ESO@griddelta{}\newcommand*\ESO@griddeltaY{}
+\newcommand*\ESO@gridDelta{}\newcommand*\ESO@gridDeltaY{}
+\newcommand*\ESO@yoffsetI{}\newcommand*\ESO@yoffsetII{}
+\ifESO@texcoord
+  \def\ESO@yoffsetI{0pt}\def\ESO@yoffsetII{-\paperheight}
+  \edef\ESO@griddeltaY{-\ESO@griddelta}\edef\ESO@gridDeltaY{-\ESO@gridDelta}
+\else
+  \def\ESO@yoffsetI{\paperheight}\def\ESO@yoffsetII{0pt}
+  \edef\ESO@griddeltaY{\ESO@griddelta}\edef\ESO@gridDeltaY{\ESO@gridDelta}
+\fi
+%% ----- Submitted version markup: Page numbers, ruler, and confidentiality.  Using ideas/code from cvpr.sty 2015. --MM
+\font\aclhv  = phvb at 8pt
+%% Define vruler %%
+%\makeatletter
+\newbox\aclrulerbox
+\newcount\aclrulercount
+\newdimen\aclruleroffset
+\newdimen\cv@lineheight
+\newdimen\cv@boxheight
+\newbox\cv@tmpbox
+\newcount\cv@refno
+\newcount\cv@tot
+% NUMBER with left flushed zeros  \fillzeros[<WIDTH>]<NUMBER>
+\newcount\cv@tmpc@ \newcount\cv@tmpc
+\def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi
+\cv@tmpc=1 %
+\loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi
+   \ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat
+\ifnum#2<0\advance\cv@tmpc1\relax-\fi
+\loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat
+\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}%
+% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
+\def\makevruler[#1][#2][#3][#4][#5]{\begingroup\offinterlineskip
+\textheight=#5\vbadness=10000\vfuzz=120ex\overfullrule=0pt%
+\global\setbox\aclrulerbox=\vbox to \textheight{%
+{\parskip=0pt\hfuzz=150em\cv@boxheight=\textheight
+\color{gray}
+\cv@lineheight=#1\global\aclrulercount=#2%
+\cv@tot\cv@boxheight\divide\cv@tot\cv@lineheight\advance\cv@tot2%
+\cv@refno1\vskip-\cv@lineheight\vskip1ex%
+\loop\setbox\cv@tmpbox=\hbox to0cm{{\aclhv\hfil\fillzeros[#4]\aclrulercount}}%
+\ht\cv@tmpbox\cv@lineheight\dp\cv@tmpbox0pt\box\cv@tmpbox\break
+\advance\cv@refno1\global\advance\aclrulercount#3\relax
+\ifnum\cv@refno<\cv@tot\repeat}}\endgroup}%
+%\makeatother
+\def\aclpaperid{***}
+\def\confidential{\textcolor{black}{EMNLP 2020 Submission~\aclpaperid.  Confidential Review Copy.  DO NOT DISTRIBUTE.}}
+%% Page numbering, Vruler and Confidentiality %%
+% \makevruler[<SCALE>][<INITIAL_COUNT>][<STEP>][<DIGITS>][<HEIGHT>]
+% SC/KG/WL - changed line numbering to gainsboro
+\definecolor{gainsboro}{rgb}{0.8, 0.8, 0.8}
+%\def\aclruler#1{\makevruler[14.17pt][#1][1][3][\textheight]\usebox{\aclrulerbox}}  %% old line
+\def\aclruler#1{\textcolor{gainsboro}{\makevruler[14.17pt][#1][1][3][\textheight]\usebox{\aclrulerbox}}}
+\def\leftoffset{-2.1cm} %original: -45pt
+\def\rightoffset{17.5cm} %original: 500pt
+\ifaclfinal\else\pagenumbering{arabic}
+\AddToShipoutPicture{%
+\ifaclfinal\else
+\AtPageLowishCenter{\textcolor{black}{\thepage}}
+\aclruleroffset=\textheight
+\advance\aclruleroffset4pt
+  \AtTextUpperLeft{%
+    \put(\LenToUnit{\leftoffset},\LenToUnit{-\aclruleroffset}){%left ruler
+      \aclruler{\aclrulercount}}
+    \put(\LenToUnit{\rightoffset},\LenToUnit{-\aclruleroffset}){%right ruler
+      \aclruler{\aclrulercount}}
+  }
+  \AtTextUpperLeft{%confidential
+    \put(0,\LenToUnit{1cm}){\parbox{\textwidth}{\centering\aclhv\confidential}}
+  }
+\fi
+}
+%%%% ----- End settings for placing additional items into the submitted version --MM ----- %%%%
+%%%% ----- Begin settings for both submitted and camera-ready version ----- %%%%
+%% Title and Authors %%
+\newcommand\outauthor{
+    \begin{tabular}[t]{c}
+	\ifaclfinal
+	     \bf\@author
+	\else
+		% Avoiding common accidental de-anonymization issue. --MM
+        \bf Anonymous EMNLP submission
+	\fi
+    \end{tabular}}
+% Changing the expanded titlebox for submissions to 2.5 in (rather than 6.5cm)
+% and moving it to the style sheet, rather than within the example tex file. --MM
+\ifaclfinal
+\else
+	\addtolength\titlebox{.25in}
+\fi
+% Mostly taken from deproc.
+\def\maketitle{\par
+ \begingroup
+   \def\thefootnote{\fnsymbol{footnote}}
+   \def\@makefnmark{\hbox to 0pt{$^{\@thefnmark}$\hss}}
+   \twocolumn[\@maketitle] \@thanks
+ \endgroup
+ \setcounter{footnote}{0}
+ \let\maketitle\relax \let\@maketitle\relax
+ \gdef\@thanks{}\gdef\@author{}\gdef\@title{}\let\thanks\relax}
+\def\@maketitle{\vbox to \titlebox{\hsize\textwidth
+ \linewidth\hsize \vskip 0.125in minus 0.125in \centering
+ {\Large\bf \@title \par} \vskip 0.2in plus 1fil minus 0.1in
+ {\def\and{\unskip\enspace{\rm and}\enspace}%
+  \def\And{\end{tabular}\hss \egroup \hskip 1in plus 2fil
+           \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}%
+  \def\AND{\end{tabular}\hss\egroup \hfil\hfil\egroup
+          \vskip 0.25in plus 1fil minus 0.125in
+           \hbox to \linewidth\bgroup\large \hfil\hfil
+             \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}
+  \hbox to \linewidth\bgroup\large \hfil\hfil
+    \hbox to 0pt\bgroup\hss
+	\outauthor
+   \hss\egroup
+    \hfil\hfil\egroup}
+  \vskip 0.3in plus 2fil minus 0.1in
+}}
+% margins and font size for abstract
+\renewenvironment{abstract}%
+		 {\centerline{\large\bf Abstract}%
+		  \begin{list}{}%
+		     {\setlength{\rightmargin}{0.6cm}%
+		      \setlength{\leftmargin}{0.6cm}}%
+		   \item[]\ignorespaces%
+		   \@setsize\normalsize{12pt}\xpt\@xpt
+		   }%
+		 {\unskip\end{list}}
+%\renewenvironment{abstract}{\centerline{\large\bf
+% Abstract}\vspace{0.5ex}\begin{quote}}{\par\end{quote}\vskip 1ex}
+% Resizing figure and table captions - SL
+\newcommand{\figcapfont}{\rm}
+\newcommand{\tabcapfont}{\rm}
+\renewcommand{\fnum@figure}{\figcapfont Figure \thefigure}
+\renewcommand{\fnum@table}{\tabcapfont Table \thetable}
+\renewcommand{\figcapfont}{\@setsize\normalsize{12pt}\xpt\@xpt}
+\renewcommand{\tabcapfont}{\@setsize\normalsize{12pt}\xpt\@xpt}
+% Support for interacting with the caption, subfigure, and subcaption packages - SL
+\usepackage{caption}
+\DeclareCaptionFont{10pt}{\fontsize{10pt}{12pt}\selectfont}
+\captionsetup{font=10pt}
+\RequirePackage{natbib}
+% for citation commands in the .tex, authors can use:
+% \citep, \citet, and \citeyearpar for compatibility with natbib, or
+% \cite, \newcite, and \shortcite for compatibility with older ACL .sty files
+\renewcommand\cite{\citep}	% to get "(Author Year)" with natbib
+\newcommand\shortcite{\citeyearpar}% to get "(Year)" with natbib
+\newcommand\newcite{\citet}	% to get "Author (Year)" with natbib
+% DK/IV: Workaround for annoying hyperref pagewrap bug
+%\RequirePackage{etoolbox}
+%\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{\errmessage{\noexpand patch failed}}
+% bibliography
+\def\@up#1{\raise.2ex\hbox{#1}}
+% Don't put a label in the bibliography at all.  Just use the unlabeled format
+% instead.
+\def\thebibliography#1{\vskip\parskip%
+\vskip\baselineskip%
+\def\baselinestretch{1}%
+\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi%
+\vskip-\parskip%
+\vskip-\baselineskip%
+\section*{References\@mkboth
+ {References}{References}}\list
+ {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent}
+ \setlength{\itemindent}{-\parindent}}
+ \def\newblock{\hskip .11em plus .33em minus -.07em}
+ \sloppy\clubpenalty4000\widowpenalty4000
+ \sfcode`\.=1000\relax}
+\let\endthebibliography=\endlist
+% Allow for a bibliography of sources of attested examples
+\def\thesourcebibliography#1{\vskip\parskip%
+\vskip\baselineskip%
+\def\baselinestretch{1}%
+\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi%
+\vskip-\parskip%
+\vskip-\baselineskip%
+\section*{Sources of Attested Examples\@mkboth
+ {Sources of Attested Examples}{Sources of Attested Examples}}\list
+ {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent}
+ \setlength{\itemindent}{-\parindent}}
+ \def\newblock{\hskip .11em plus .33em minus -.07em}
+ \sloppy\clubpenalty4000\widowpenalty4000
+ \sfcode`\.=1000\relax}
+\let\endthesourcebibliography=\endlist
+% sections with less space
+\def\section{\@startsection {section}{1}{\z@}{-2.0ex plus
+    -0.5ex minus -.2ex}{1.5ex plus 0.3ex minus .2ex}{\large\bf\raggedright}}
+\def\subsection{\@startsection{subsection}{2}{\z@}{-1.8ex plus
+    -0.5ex minus -.2ex}{0.8ex plus .2ex}{\normalsize\bf\raggedright}}
+%% changed by KO to - values to get teh initial parindent right
+\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-1.5ex plus
+   -0.5ex minus -.2ex}{0.5ex plus .2ex}{\normalsize\bf\raggedright}}
+\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus
+   0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+\def\subparagraph{\@startsection{subparagraph}{5}{\parindent}{1.5ex plus
+   0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+% Footnotes
+\footnotesep 6.65pt %
+\skip\footins 9pt plus 4pt minus 2pt
+\def\footnoterule{\kern-3pt \hrule width 5pc \kern 2.6pt }
+\setcounter{footnote}{0}
+% Lists and paragraphs
+\parindent 1em
+\topsep 4pt plus 1pt minus 2pt
+\partopsep 1pt plus 0.5pt minus 0.5pt
+\itemsep 2pt plus 1pt minus 0.5pt
+\parsep 2pt plus 1pt minus 0.5pt
+\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em
+\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em \leftmarginvi .5em
+\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt
+\def\@listi{\leftmargin\leftmargini}
+\def\@listii{\leftmargin\leftmarginii
+   \labelwidth\leftmarginii\advance\labelwidth-\labelsep
+   \topsep 2pt plus 1pt minus 0.5pt
+   \parsep 1pt plus 0.5pt minus 0.5pt
+   \itemsep \parsep}
+\def\@listiii{\leftmargin\leftmarginiii
+    \labelwidth\leftmarginiii\advance\labelwidth-\labelsep
+    \topsep 1pt plus 0.5pt minus 0.5pt
+    \parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt
+    \itemsep \topsep}
+\def\@listiv{\leftmargin\leftmarginiv
+     \labelwidth\leftmarginiv\advance\labelwidth-\labelsep}
+\def\@listv{\leftmargin\leftmarginv
+     \labelwidth\leftmarginv\advance\labelwidth-\labelsep}
+\def\@listvi{\leftmargin\leftmarginvi
+     \labelwidth\leftmarginvi\advance\labelwidth-\labelsep}
+\abovedisplayskip 7pt plus2pt minus5pt%
+\belowdisplayskip \abovedisplayskip
+\abovedisplayshortskip  0pt plus3pt%
+\belowdisplayshortskip  4pt plus3pt minus3pt%
+% Less leading in most fonts (due to the narrow columns)
+% The choices were between 1-pt and 1.5-pt leading
+\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt}
+\def\small{\@setsize\small{10pt}\ixpt\@ixpt}
+\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt}
+\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt}
+\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt}
+\def\large{\@setsize\large{14pt}\xiipt\@xiipt}
+\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt}
+\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt}
+\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt}
+\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt}

references/2020.emnlp.nguyen/source/emnlp2020_PhoBERT.bbl ADDED Viewed

	@@ -0,0 +1,227 @@

+\begin{thebibliography}{34}
+\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
+\bibitem[{Artetxe and Schwenk(2019)}]{ArtetxeS19}
+Mikel Artetxe and Holger Schwenk. 2019.
+\newblock {Massively Multilingual Sentence Embeddings for Zero-Shot
+  Cross-Lingual Transfer and Beyond}.
+\newblock \emph{{TACL}}, 7:597--610.
+\bibitem[{Conneau et~al.(2020)Conneau, Khandelwal, Goyal, Chaudhary, Wenzek,
+  Guzm{\'a}n, Grave, Ott, Zettlemoyer, and Stoyanov}]{conneau2019unsupervised}
+Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume
+  Wenzek, Francisco Guzm{\'a}n, Edouard Grave, Myle Ott, Luke Zettlemoyer, and
+  Veselin Stoyanov. 2020.
+\newblock \href {https://arxiv.org/pdf/1911.02116v1.pdf} {{Unsupervised
+  Cross-lingual Representation Learning at Scale}}.
+\newblock In \emph{Proceedings of ACL}, pages 8440--8451.
+\bibitem[{Conneau and Lample(2019)}]{NIPS2019_8928}
+Alexis Conneau and Guillaume Lample. 2019.
+\newblock {Cross-lingual Language Model Pretraining}.
+\newblock In \emph{Proceedings of NeurIPS}, pages 7059--7069.
+\bibitem[{Conneau et~al.(2018)Conneau, Rinott, Lample, Schwenk, Stoyanov,
+  Williams, and Bowman}]{conneau-etal-2018-xnli}
+Alexis Conneau, Ruty Rinott, Guillaume Lample, Holger Schwenk, Ves Stoyanov,
+  Adina Williams, and Samuel~R. Bowman. 2018.
+\newblock {XNLI}: Evaluating cross-lingual sentence representations.
+\newblock In \emph{Proceedings of EMNLP}, pages 2475--2485.
+\bibitem[{Cui et~al.(2019)Cui, Che, Liu, Qin, Yang, Wang, and
+  Hu}]{abs-1906-08101}
+Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, and
+  Guoping Hu. 2019.
+\newblock {Pre-Training with Whole Word Masking for Chinese BERT}.
+\newblock \emph{arXiv preprint}, arXiv:1906.08101.
+\bibitem[{Devlin et~al.(2019)Devlin, Chang, Lee, and
+  Toutanova}]{devlin-etal-2019-bert}
+Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019.
+\newblock {BERT}: Pre-training of deep bidirectional transformers for language
+  understanding.
+\newblock In \emph{Proceedings of NAACL}, pages 4171--4186.
+\bibitem[{Dozat and Manning(2017)}]{DozatM17}
+Timothy Dozat and Christopher~D. Manning. 2017.
+\newblock {Deep Biaffine Attention for Neural Dependency Parsing}.
+\newblock In \emph{Proceedings of ICLR}.
+\bibitem[{Hewitt and Manning(2019)}]{hewitt-manning-2019-structural}
+John Hewitt and Christopher~D. Manning. 2019.
+\newblock {A} structural probe for finding syntax in word representations.
+\newblock In \emph{Proceedings of NAACL}, pages 4129--4138.
+\bibitem[{Jawahar et~al.(2019)Jawahar, Sagot, and
+  Seddah}]{jawahar-etal-2019-bert}
+Ganesh Jawahar, Beno{\^\i}t Sagot, and Djam{\'e} Seddah. 2019.
+\newblock What does {BERT} learn about the structure of language?
+\newblock In \emph{Proceedings of ACL}, pages 3651--3657.
+\bibitem[{Kingma and Ba(2014)}]{KingmaB14}
+Diederik~P. Kingma and Jimmy Ba. 2014.
+\newblock {Adam: {A} Method for Stochastic Optimization}.
+\newblock \emph{arXiv preprint}, arXiv:1412.6980.
+\bibitem[{Kudo and Richardson(2018)}]{kudo-richardson-2018-sentencepiece}
+Taku Kudo and John Richardson. 2018.
+\newblock {{S}entence{P}iece: A simple and language independent subword
+  tokenizer and detokenizer for Neural Text Processing}.
+\newblock In \emph{Proceedings of EMNLP: System Demonstrations}, pages 66--71.
+\bibitem[{Liu et~al.(2019)Liu, Ott, Goyal, Du, Joshi, Chen, Levy, Lewis,
+  Zettlemoyer, and Stoyanov}]{RoBERTa}
+Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
+  Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019.
+\newblock {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach}.
+\newblock \emph{arXiv preprint}, arXiv:1907.11692.
+\bibitem[{Loshchilov and Hutter(2019)}]{loshchilov2018decoupled}
+Ilya Loshchilov and Frank Hutter. 2019.
+\newblock {Decoupled Weight Decay Regularization}.
+\newblock In \emph{Proceedings of ICLR}.
+\bibitem[{Ma and Hovy(2016)}]{ma-hovy-2016-end}
+Xuezhe Ma and Eduard Hovy. 2016.
+\newblock End-to-end sequence labeling via bi-directional {LSTM}-{CNN}s-{CRF}.
+\newblock In \emph{Proceedings of ACL}, pages 1064--1074.
+\bibitem[{Ma et~al.(2018)Ma, Hu, Liu, Peng, Neubig, and
+  Hovy}]{ma-etal-2018-stack}
+Xuezhe Ma, Zecong Hu, Jingzhou Liu, Nanyun Peng, Graham Neubig, and Eduard
+  Hovy. 2018.
+\newblock {Stack-Pointer Networks for Dependency Parsing}.
+\newblock In \emph{Proceedings of ACL}, pages 1403--1414.
+\bibitem[{{Martin} et~al.(2020){Martin}, {Muller}, {Ortiz Su{\'a}rez},
+  {Dupont}, {Romary}, {Villemonte de la Clergerie}, {Seddah}, and
+  {Sagot}}]{2019arXiv191103894M}
+Louis {Martin}, Benjamin {Muller}, Pedro~Javier {Ortiz Su{\'a}rez}, Yoann
+  {Dupont}, Laurent {Romary}, {\'E}ric {Villemonte de la Clergerie}, Djam{\'e}
+  {Seddah}, and Beno{\^\i}t {Sagot}. 2020.
+\newblock {CamemBERT: a Tasty French Language Model}.
+\newblock In \emph{Proceedings of ACL}, pages 7203--7219.
+\bibitem[{Nguyen(2019)}]{nguyen-2019-neural}
+Dat~Quoc Nguyen. 2019.
+\newblock A neural joint model for {V}ietnamese word segmentation, {POS}
+  tagging and dependency parsing.
+\newblock In \emph{Proceedings of ALTA}, pages 28--34.
+\bibitem[{Nguyen et~al.(2014{\natexlab{a}})Nguyen, Nguyen, Pham, and
+  Pham}]{nguyen-etal-2014-rdrpostagger}
+Dat~Quoc Nguyen, Dai~Quoc Nguyen, Dang~Duc Pham, and Son~Bao Pham.
+  2014{\natexlab{a}}.
+\newblock {RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger}.
+\newblock In \emph{Proceedings of the Demonstrations at EACL}, pages 17--20.
+\bibitem[{Nguyen et~al.(2014{\natexlab{b}})Nguyen, Nguyen, Pham, Nguyen, and
+  Nguyen}]{Nguyen2014NLDB}
+Dat~Quoc Nguyen, Dai~Quoc Nguyen, Son~Bao Pham, Phuong-Thai Nguyen, and Minh~Le
+  Nguyen. 2014{\natexlab{b}}.
+\newblock {From Treebank Conversion to Automatic Dependency Parsing for
+  Vietnamese}.
+\newblock In \emph{{Proceedings of NLDB}}, pages 196--207.
+\bibitem[{Nguyen et~al.(2018)Nguyen, Nguyen, Vu, Dras, and
+  Johnson}]{nguyen-etal-2018-fast}
+Dat~Quoc Nguyen, Dai~Quoc Nguyen, Thanh Vu, Mark Dras, and Mark Johnson. 2018.
+\newblock {A Fast and Accurate Vietnamese Word Segmenter}.
+\newblock In \emph{Proceedings of LREC}, pages 2582--2587.
+\bibitem[{Nguyen and Verspoor(2018)}]{nguyen-verspoor-2018-improved}
+Dat~Quoc Nguyen and Karin Verspoor. 2018.
+\newblock An improved neural network model for joint {POS} tagging and
+  dependency parsing.
+\newblock In \emph{Proceedings of the {C}o{NLL} 2018 Shared Task}, pages
+  81--91.
+\bibitem[{Nguyen et~al.(2017)Nguyen, Vu, Nguyen, Dras, and
+  Johnson}]{nguyen-etal-2017-word}
+Dat~Quoc Nguyen, Thanh Vu, Dai~Quoc Nguyen, Mark Dras, and Mark Johnson. 2017.
+\newblock From word segmentation to {POS} tagging for {V}ietnamese.
+\newblock In \emph{Proceedings of ALTA}, pages 108--113.
+\bibitem[{Nguyen et~al.(2019{\natexlab{a}})Nguyen, Ngo, Vu, Tran, and
+  Nguyen}]{JCC13161}
+Huyen Nguyen, Quyen Ngo, Luong Vu, Vu~Tran, and Hien Nguyen.
+  2019{\natexlab{a}}.
+\newblock {VLSP Shared Task: Named Entity Recognition}.
+\newblock \emph{Journal of Computer Science and Cybernetics}, 34(4):283--294.
+\bibitem[{Nguyen et~al.(2019{\natexlab{b}})Nguyen, Dong, and Nguyen}]{8713740}
+Kim~Anh Nguyen, Ngan Dong, and Cam-Tu Nguyen. 2019{\natexlab{b}}.
+\newblock {Attentive Neural Network for Named Entity Recognition in
+  Vietnamese}.
+\newblock In \emph{Proceedings of RIVF}.
+\bibitem[{Ott et~al.(2019)Ott, Edunov, Baevski, Fan, Gross, Ng, Grangier, and
+  Auli}]{ott2019fairseq}
+Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng,
+  David Grangier, and Michael Auli. 2019.
+\newblock {fairseq: A Fast, Extensible Toolkit for Sequence Modeling}.
+\newblock In \emph{Proceedings of NAACL-HLT 2019: Demonstrations}, pages
+  48--53.
+\bibitem[{Sennrich et~al.(2016)Sennrich, Haddow, and
+  Birch}]{sennrich-etal-2016-neural}
+Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016.
+\newblock {Neural Machine Translation of Rare Words with Subword Units}.
+\newblock In \emph{Proceedings of ACL}, pages 1715--1725.
+\bibitem[{Thang et~al.(2008)Thang, Phuong, Huyen, Tu, Rossignol, and
+  Luong}]{DinhQuangThang2008}
+Dinh~Quang Thang, Le~Hong Phuong, Nguyen Thi~Minh Huyen, Nguyen~Cam Tu, Mathias
+  Rossignol, and Vu~Xuan Luong. 2008.
+\newblock {Word segmentation of Vietnamese texts: a comparison of approaches}.
+\newblock In \emph{Proceedings of LREC}, pages 1933--1936.
+\bibitem[{Vaswani et~al.(2017)Vaswani, Shazeer, Parmar, Uszkoreit, Jones,
+  Gomez, Kaiser, and Polosukhin}]{NIPS2017_7181}
+Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
+  Aidan~N Gomez, {\L}ukasz Kaiser, and Illia Polosukhin. 2017.
+\newblock {Attention is All you Need}.
+\newblock In \emph{Advances in Neural Information Processing Systems 30}, pages
+  5998--6008.
+\bibitem[{de~Vries et~al.(2019)de~Vries, van Cranenburgh, Bisazza, Caselli, van
+  Noord, and Nissim}]{vries2019bertje}
+Wietse de~Vries, Andreas van Cranenburgh, Arianna Bisazza, Tommaso Caselli,
+  Gertjan van Noord, and Malvina Nissim. 2019.
+\newblock {BERTje: A Dutch BERT Model}.
+\newblock \emph{arXiv preprint}, arXiv:1912.09582.
+\bibitem[{Vu et~al.(2018)Vu, Nguyen, Nguyen, Dras, and
+  Johnson}]{vu-etal-2018-vncorenlp}
+Thanh Vu, Dat~Quoc Nguyen, Dai~Quoc Nguyen, Mark Dras, and Mark Johnson. 2018.
+\newblock {VnCoreNLP: A Vietnamese Natural Language Processing Toolkit}.
+\newblock In \emph{Proceedings of NAACL: Demonstrations}, pages 56--60.
+\bibitem[{Vu et~al.(2019)Vu, Vu, Tran, and Jiang}]{vu-xuan-etal-2019-etnlp}
+Xuan-Son Vu, Thanh Vu, Son Tran, and Lili Jiang. 2019.
+\newblock {ETNLP}: A visual-aided systematic approach to select pre-trained
+  embeddings for a downstream task.
+\newblock In \emph{Proceedings of RANLP}, pages 1285--1294.
+\bibitem[{Williams et~al.(2018)Williams, Nangia, and Bowman}]{N18-1101}
+Adina Williams, Nikita Nangia, and Samuel Bowman. 2018.
+\newblock {A Broad-Coverage Challenge Corpus for Sentence Understanding through
+  Inference}.
+\newblock In \emph{Proceedings of NAACL}, pages 1112--1122.
+\bibitem[{Wolf et~al.(2019)Wolf, Debut, Sanh, Chaumond, Delangue, Moi, Cistac,
+  Rault, Louf, Funtowicz, and Brew}]{Wolf2019HuggingFacesTS}
+Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue,
+  Anthony Moi, Pierric Cistac, Tim Rault, R'emi Louf, Morgan Funtowicz, and
+  Jamie Brew. 2019.
+\newblock {HuggingFace's Transformers: State-of-the-art Natural Language
+  Processing}.
+\newblock \emph{arXiv preprint}, arXiv:1910.03771.
+\bibitem[{Wu and Dredze(2019)}]{wu-dredze-2019-beto}
+Shijie Wu and Mark Dredze. 2019.
+\newblock Beto, bentz, becas: The surprising cross-lingual effectiveness of
+  {BERT}.
+\newblock In \emph{Proceedings of EMNLP-IJCNLP}, pages 833--844.
+\end{thebibliography}

references/2020.emnlp.nguyen/source/emnlp2020_PhoBERT.tex ADDED Viewed

	@@ -0,0 +1,301 @@

+\documentclass[11pt,a4paper]{article}
+\usepackage[hyperref]{emnlp2020}
+\pdfoutput=1
+\usepackage{times}
+\usepackage{latexsym}
+%\renewcommand{\UrlFont}{\ttfamily\small}
+\usepackage{times}
+\usepackage{latexsym}
+\usepackage{amsmath}
+\usepackage{url}
+\usepackage{amssymb}
+\usepackage{amsfonts}
+\usepackage{graphicx}
+\usepackage{tabularx}
+\usepackage{multirow}
+\usepackage{arydshln}
+\usepackage{mathtools,nccmath}
+\usepackage[utf8]{inputenc}
+\usepackage[utf8]{vietnam}
+\usepackage{enumitem}
+% This is not strictly necessary, and may be commented out,
+% but it will improve the layout of the manuscript,
+% and will typically save some space.
+%\usepackage{microtype}
+\setlength{\textfloatsep}{15pt plus 5.0pt minus 5.0pt}
+\setlength{\floatsep}{15pt plus 5.0pt minus 5.0pt}
+%\setlength{\dbltextfloatsep }{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\dblfloatsep}{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\intextsep}{15pt plus 2.0pt minus 3.0pt}
+\setlength{\abovecaptionskip}{3pt plus 1pt minus 1pt}
+\aclfinalcopy % Uncomment this line for the final submission
+%\def\aclpaperid{***} %  Enter the acl Paper ID here
+\setlength\titlebox{5cm}
+% You can expand the titlebox if you need extra space
+% to show all the authors. Please do not make the titlebox
+% smaller than 5cm (the original size); we will check this
+% in the camera-ready version and ask you to change it back.
+\newcommand\BibTeX{B\textsc{ib}\TeX}
+\title{PhoBERT: Pre-trained language models for Vietnamese}
+\author{Dat Quoc Nguyen$^1$  \and Anh Tuan Nguyen$^{2,}$\thanks{\ \ Work done during internship at  VinAI Research.}  \\
+  $^1$VinAI Research, Vietnam; $^2$NVIDIA, USA\\
+   \tt{\normalsize v.datnq9@vinai.io, tuananhn@nvidia.com}}
+\date{}
+\begin{document}
+\maketitle
+\begin{abstract}
+We present \textbf{PhoBERT} with two versions---PhoBERT\textsubscript{base} and PhoBERT\textsubscript{large}---the \emph{first} public large-scale monolingual language models pre-trained for Vietnamese. Experimental results show that PhoBERT consistently outperforms the recent best pre-trained multilingual  model XLM-R \citep{conneau2019unsupervised} and improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part-of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference. We release PhoBERT  to facilitate future research and downstream applications for Vietnamese NLP. Our PhoBERT models are available at: \url{https://github.com/VinAIResearch/PhoBERT}.
+\end{abstract}
+\section{Introduction}\label{sec:intro}
+Pre-trained language models, especially BERT \citep{devlin-etal-2019-bert}---the Bidirectional Encoder Representations from Transformers \citep{NIPS2017_7181}, have recently become  extremely popular and helped to produce significant improvement gains for various NLP tasks. The success of pre-trained BERT and its variants has largely been limited to the English language. For other languages, one could retrain a language-specific model using the BERT architecture \citep{abs-1906-08101,vries2019bertje,vu-xuan-etal-2019-etnlp,2019arXiv191103894M} or  employ existing pre-trained multilingual BERT-based models  \citep{devlin-etal-2019-bert,NIPS2019_8928,conneau2019unsupervised}.
+In terms of Vietnamese language modeling, to the best of our knowledge, there are two main concerns as follows:
+\begin{itemize}[leftmargin=*]
+\setlength\itemsep{-1pt}
+    \item The Vietnamese Wikipedia corpus is the only data used to train  monolingual language models \citep{vu-xuan-etal-2019-etnlp}, and it also is the only Vietnamese dataset which is included in the pre-training data used by all multilingual language models except XLM-R. It is worth noting that Wikipedia data is not representative of a general language use, and the Vietnamese Wikipedia data is relatively small (1GB in size uncompressed), while pre-trained language models can be significantly improved by using more pre-training data \cite{RoBERTa}.
+    \item All publicly released monolingual and multilingual BERT-based language models are not aware of the difference between Vietnamese syllables and word tokens. This ambiguity comes from the fact that the white space is also used to separate syllables that constitute words when written in Vietnamese.\footnote{\newcite{DinhQuangThang2008} show that 85\% of  Vietnamese word types are composed of at least two syllables.}
+    For example, a 6-syllable written text ``Tôi là một nghiên cứu viên'' (I am a researcher) forms 4 words ``Tôi\textsubscript{I} là\textsubscript{am} một\textsubscript{a} nghiên\_cứu\_viên\textsubscript{researcher}''. \\
+Without doing a pre-process step of Vietnamese word segmentation, those models directly apply Byte-Pair encoding (BPE) methods \citep{sennrich-etal-2016-neural,kudo-richardson-2018-sentencepiece} to the syllable-level Vietnamese pre-training  data.\footnote{Although performing word segmentation before applying BPE on the Vietnamese Wikipedia corpus, ETNLP \citep{vu-xuan-etal-2019-etnlp} in fact {does not publicly release} any pre-trained BERT-based language model (\url{https://github.com/vietnlp/etnlp}). In particular, \newcite{vu-xuan-etal-2019-etnlp}  release a set of 15K BERT-based  word embeddings specialized only for the Vietnamese NER task.}
+Intuitively,  for word-level Vietnamese NLP tasks, those models pre-trained on  syllable-level data  might not perform as good as language models pre-trained on word-level data.
+\end{itemize}
+To handle the two concerns above, we train the {first} large-scale monolingual BERT-based ``base'' and ``large'' models  using a 20GB \textit{word-level} Vietnamese corpus.
+We evaluate our models on four downstream Vietnamese NLP tasks: the common word-level ones of Part-of-speech (POS) tagging,  Dependency parsing and Named-entity recognition (NER), and a language understanding task of Natural language inference (NLI) which can be formulated as either a syllable- or word-level task. Experimental results show that  our models obtain state-of-the-art (SOTA) results on all these  tasks.
+Our contributions are summarized as follows:
+\begin{itemize}[leftmargin=*]
+\setlength\itemsep{-1pt}
+    \item We present the \textit{first} large-scale monolingual   language models pre-trained for Vietnamese.
+    \item Our models help produce SOTA performances  on four downstream  tasks of POS tagging, Dependency parsing, NER and NLI, thus  showing  the effectiveness of large-scale BERT-based  monolingual language models for Vietnamese.
+    \item To the best of our knowledge, we also perform the \textit{first} set of experiments to compare monolingual language models with the recent best multilingual model XLM-R in multiple (i.e. four) different language-specific tasks. The experiments show that our models outperform XLM-R   on all these  tasks, thus convincingly confirming that dedicated language-specific models still outperform multilingual ones.
+    \item We publicly release our models under the name PhoBERT which can be used with  \texttt{fairseq}  \citep{ott2019fairseq} and  \texttt{transformers} \cite{Wolf2019HuggingFacesTS}. We hope that PhoBERT can serve as a strong baseline for future Vietnamese NLP research and  applications.
+\end{itemize}
+\section{PhoBERT}
+This section outlines the architecture and describes the   pre-training data and optimization setup that we use for PhoBERT.
+\vspace{3pt}
+\noindent\textbf{Architecture:}\  Our PhoBERT has two versions, PhoBERT\textsubscript{base} and PhoBERT\textsubscript{large}, using the same  architectures  of  BERT\textsubscript{base} and BERT\textsubscript{large}, respectively. PhoBERT pre-training approach is based on RoBERTa \citep{RoBERTa} which optimizes the BERT pre-training procedure for more robust performance.
+\vspace{3pt}
+\noindent\textbf{Pre-training data:}\ To handle the first concern mentioned in Section  \ref{sec:intro}, we use a 20GB pre-training dataset of uncompressed texts. This dataset is a concatenation of two corpora: (i) the first one is the Vietnamese Wikipedia corpus ($\sim$1GB), and (ii) the second corpus ($\sim$19GB) is generated by removing similar articles and duplication from a 50GB Vietnamese news corpus.\footnote{\url{https://github.com/binhvq/news-corpus}, crawled from a wide range of news websites  and  topics.} To solve the second concern,
+we employ RDRSegmenter \citep{nguyen-etal-2018-fast} from VnCoreNLP \citep{vu-etal-2018-vncorenlp} to perform word and sentence segmentation on the pre-training dataset, resulting in $\sim$145M word-segmented sentences  ($\sim$3B word tokens). Different from RoBERTa, we then apply \texttt{fastBPE} \citep{sennrich-etal-2016-neural} to segment these sentences with subword units, using a vocabulary of 64K subword types. On average there are 24.4 subword tokens per sentence.
+\vspace{3pt}
+\noindent\textbf{Optimization:}\  We employ the RoBERTa implementation in  \texttt{fairseq}  \citep{ott2019fairseq}. We set a maximum length at 256 subword tokens, thus generating 145M $\times$ 24.4 / 256 $\approx$ 13.8M sentence blocks. Following \newcite{RoBERTa}, we optimize the models using Adam \citep{KingmaB14}.  We use a batch size of 1024 across 4 V100 GPUs (16GB each) and a peak learning rate of 0.0004 for PhoBERT\textsubscript{base}, and a batch size of 512 and a peak learning rate of 0.0002 for PhoBERT\textsubscript{large}. We run for 40 epochs (here, the learning rate is warmed up for 2 epochs), thus resulting in 13.8M $\times$ 40 / 1024 $\approx$ 540K training steps for PhoBERT\textsubscript{base} and 1.08M training steps for PhoBERT\textsubscript{large}. We pre-train PhoBERT\textsubscript{base} during  3 weeks, and then PhoBERT\textsubscript{large} during  5 weeks.
+\begin{table}[!t]
+    \centering
+    \begin{tabular}{l|l|l|l}
+    \hline
+    \textbf{Task}  & \textbf{\#training} & \textbf{\#valid} & \textbf{\#test} \\
+    \hline
+    POS tagging$^\dagger$ & 27,000 & 870 & 2,120 \\
+    Dep. parsing$^\dagger$ & 8,977 & 200 & 1,020 \\
+    NER$^\dagger$ & 14,861 & 2,000 & 2,831\\
+    NLI$^\ddagger$ & 392,702 & 2,490 & 5,010\\
+    \hline
+    \end{tabular}
+    \caption{Statistics of the downstream task datasets. ``\#training'', ``\#valid''  and  ``\#test'' denote the size of the training, validation and test sets, respectively. $\dagger$ and $\ddagger$ refer to the dataset size as   the numbers of sentences and sentence pairs, respectively.}
+    \label{tab:data}
+\end{table}
+  \begin{table*}[!ht]
+     \centering
+      \resizebox{15.5cm}{!}{
+     %\setlength{\tabcolsep}{0.3em}
+     \begin{tabular}{l|l|l|l}
+    \hline
+          \multicolumn{2}{c|}{\textbf{POS tagging} (word-level)} & \multicolumn{2}{c}{\textbf{Dependency parsing} (word-level)}\\
+    \hline
+    Model & Acc. & Model & LAS / UAS \\
+    \hline
+    RDRPOSTagger \citep{nguyen-etal-2014-rdrpostagger} [$\clubsuit$] &  95.1 & \_ & \_  \\
+    BiLSTM-CNN-CRF \citep{ma-hovy-2016-end} [$\clubsuit$] & 95.4 & VnCoreNLP-DEP \citep{vu-etal-2018-vncorenlp} [$\bigstar$]  & 71.38 / 77.35 \\
+    VnCoreNLP-POS  \citep{nguyen-etal-2017-word} [$\clubsuit$] & 95.9 &jPTDP-v2  [$\bigstar$] & 73.12 / 79.63 \\
+   jPTDP-v2 \citep{nguyen-verspoor-2018-improved} [$\bigstar$] & 95.7  &jointWPD [$\bigstar$]  & 73.90 / 80.12  \\
+    jointWPD \citep{nguyen-2019-neural} [$\bigstar$]  & 96.0 & Biaffine \citep{DozatM17} [$\bigstar$]  & 74.99 / 81.19   \\
+    XLM-R\textsubscript{base} (our result) & 96.2  & Biaffine w/ XLM-R\textsubscript{base} (our result) &  76.46 /  83.10  \\
+    XLM-R\textsubscript{large} (our result) & 96.3 & Biaffine w/ XLM-R\textsubscript{large} (our result) & 75.87 / 82.70   \\
+    \hline
+    PhoBERT\textsubscript{base} & \underline{96.7} & Biaffine w/ PhoBERT\textsubscript{base} & \textbf{78.77} / \textbf{85.22}  \\
+    PhoBERT\textsubscript{large} & \textbf{96.8} & Biaffine w/ PhoBERT\textsubscript{large} & \underline{77.85} / \underline{84.32}   \\
+    \hline
+     \end{tabular}
+     }
+     \caption{Performance scores (in \%) on the POS tagging and Dependency parsing test sets. ``Acc.'', ``LAS'' and ``UAS'' abbreviate the Accuracy, the Labeled Attachment Score and the Unlabeled Attachment Score, respectively (here, all these evaluation metrics are computed on all word tokens, including punctuation).
+     [$\clubsuit$] and [$\bigstar$] denote
+    results  reported by  \newcite{nguyen-etal-2017-word} and  \newcite{nguyen-2019-neural}, respectively.}
+     \label{tab:posdep}
+ \end{table*}
+\section{Experimental setup}
+ We evaluate the performance of PhoBERT on four  downstream Vietnamese NLP tasks: POS tagging, Dependency parsing, NER and NLI.
+\subsubsection*{Downstream task datasets}
+Table \ref{tab:data} presents the statistics of the experimental datasets that we employ for downstream task evaluation.
+For POS tagging, Dependency parsing  and NER, we follow the  VnCoreNLP setup   \citep{vu-etal-2018-vncorenlp}, using standard benchmarks of the VLSP 2013 POS tagging dataset,\footnote{\url{https://vlsp.org.vn/vlsp2013/eval}} the VnDT dependency treebank v1.1 \cite{Nguyen2014NLDB} with   POS tags predicted by VnCoreNLP and the VLSP 2016 NER dataset \citep{JCC13161}.
+For NLI, we use the manually-constructed Vietnamese validation and test sets  from the cross-lingual NLI (XNLI) corpus v1.0 \citep{conneau-etal-2018-xnli} where the Vietnamese  training set is released  as a machine-translated version of the corresponding English training set \citep{N18-1101}.
+Unlike the  POS tagging, Dependency parsing   and NER datasets which provide the gold word segmentation, for NLI, we employ RDRSegmenter to segment the text into words before applying BPE to produce subwords from word tokens.
+\subsubsection*{Fine-tuning}
+Following \newcite{devlin-etal-2019-bert}, for POS tagging and NER, we append a linear prediction layer on top of the PhoBERT architecture (i.e. to the last Transformer layer of PhoBERT) w.r.t. the first  subword  of each word token.\footnote{In our preliminary experiments, using the average of contextualized embeddings of subword tokens of each word to represent the word produces slightly lower performance than using the contextualized embedding of the first subword.}
+For dependency parsing, following \newcite{nguyen-2019-neural}, we employ a reimplementation of the state-of-the-art Biaffine dependency  parser \citep{DozatM17} from \newcite{ma-etal-2018-stack} with default optimal hyper-parameters. %\footnote{\url{https://github.com/XuezheMax/NeuroNLP2}}
+We then extend this parser  by replacing the pre-trained word embedding of each word in an input sentence by the corresponding contextualized embedding (from the last layer) computed for the first subword token of the word.
+For POS tagging, NER and NLI, we employ \texttt{transformers} \cite{Wolf2019HuggingFacesTS} to fine-tune PhoBERT for each task and each dataset independently. We use AdamW \citep{loshchilov2018decoupled} with a fixed learning rate of 1.e-5 and a batch size of 32 \citep{RoBERTa}. We fine-tune in 30 training epochs, evaluate the task performance after each epoch on the validation set  (here, early stopping is applied when there is no improvement after 5 continuous epochs), and then select the best model checkpoint to report the final result on the test set (note that each of our scores is an average over 5 runs with different random seeds). %Section \ref{sec:results} shows that using this relatively straightforward fine-tuning manner can lead to SOTA  results. %Note that we might boost our downstream task performances even further by  doing a more careful hyper-parameter tuning.
+  \begin{table*}[!ht]
+     \centering
+     \resizebox{15.5cm}{!}{
+     %\setlength{\tabcolsep}{0.3em}
+     \begin{tabular}{l|l|l|l}
+    \hline
+          \multicolumn{2}{c|}{\textbf{NER} (word-level)} & \multicolumn{2}{c}{\textbf{NLI} (syllable- or word-level)} \\
+    \hline
+    Model & F\textsubscript{1} &  Model & Acc. \\
+    \hline
+    BiLSTM-CNN-CRF [$\blacklozenge$]  & 88.3 & \_ & \_\\
+    VnCoreNLP-NER \citep{vu-etal-2018-vncorenlp} [$\blacklozenge$] & 88.6  & BiLSTM-max \citep{conneau-etal-2018-xnli} & 66.4  \\
+    VNER \citep{8713740} & 89.6 &  mBiLSTM \citep{ArtetxeS19} & 72.0  \\
+    BiLSTM-CNN-CRF + ETNLP [$\spadesuit$] & 91.1  & multilingual BERT \citep{devlin-etal-2019-bert} [$\blacksquare$]  & 69.5  \\
+    VnCoreNLP-NER + ETNLP [$\spadesuit$] & 91.3   &  XLM\textsubscript{MLM+TLM} \citep{NIPS2019_8928} & 76.6  \\
+    XLM-R\textsubscript{base} (our result) & 92.0  & XLM-R\textsubscript{base} \citep{conneau2019unsupervised} & {75.4} \\
+    XLM-R\textsubscript{large} (our result) & 92.8   &  XLM-R\textsubscript{large} \citep{conneau2019unsupervised} & \underline{79.7} \\
+    \hline
+    PhoBERT\textsubscript{base}& \underline{93.6} & PhoBERT\textsubscript{base}& {78.5} \\
+    PhoBERT\textsubscript{large}& \textbf{94.7}   & PhoBERT\textsubscript{large}& \textbf{80.0} \\
+    \hline
+     \end{tabular}
+    }
+     \caption{Performance scores (in \%) on  the NER and NLI test sets.
+      [$\blacklozenge$], [$\spadesuit$] and [$\blacksquare$] denote
+    results  reported by  \newcite{vu-etal-2018-vncorenlp},  \newcite{vu-xuan-etal-2019-etnlp} and \newcite{wu-dredze-2019-beto}, respectively.
+    %``mBiLSTM'' denotes a BiLSTM-based multilingual embedding model.
+    Note that there are higher Vietnamese NLI results reported  for XLM-R when fine-tuning on the concatenation of all 15  training datasets from the XNLI corpus (i.e. TRANSLATE-TRAIN-ALL: 79.5\% for XLM-R\textsubscript{base} and 83.4\% XLM-R\textsubscript{large}). However, those results might not be comparable  as we only use the  monolingual Vietnamese  training data for fine-tuning. }
+     \label{tab:nernli}
+ \end{table*}
+\section{Experimental results}\label{sec:results}
+\subsubsection*{Main results}
+Tables \ref{tab:posdep} and \ref{tab:nernli} compare  PhoBERT scores with the previous highest reported results, using the same experimental setup. It is clear that our PhoBERT helps produce new SOTA performance results  for all four downstream tasks.
+For  \underline{POS tagging}, the neural model jointWPD for joint POS tagging and dependency parsing \citep{nguyen-2019-neural} and the feature-based model VnCoreNLP-POS \citep{nguyen-etal-2017-word} are the two previous SOTA models, obtaining accuracies at  about 96.0\%.  PhoBERT obtains 0.8\% absolute higher accuracy than these two models.
+For \underline{Dependency parsing}, the previous highest parsing scores LAS and UAS are obtained by the Biaffine  parser at 75.0\% and 81.2\%, respectively.  PhoBERT helps boost the Biaffine parser with about 4\% absolute improvement, achieving a LAS at 78.8\% and a UAS at 85.2\%.
+For  \underline{NER}, PhoBERT\textsubscript{large} produces 1.1 points higher F\textsubscript{1} than PhoBERT\textsubscript{base}. In addition,  PhoBERT\textsubscript{base} obtains 2+ points higher than the previous SOTA feature- and neural network-based models VnCoreNLP-NER \citep{vu-etal-2018-vncorenlp} and BiLSTM-CNN-CRF \citep{ma-hovy-2016-end} which are trained with the set of 15K BERT-based ETNLP word embeddings  \citep{vu-xuan-etal-2019-etnlp}.
+ For  \underline{NLI},
+PhoBERT outperforms the multilingual BERT \citep{devlin-etal-2019-bert}  and the BERT-based cross-lingual model with a new translation language modeling objective XLM\textsubscript{MLM+TLM} \citep{NIPS2019_8928}  by large margins.   PhoBERT also performs  better than the recent best pre-trained multilingual model XLM-R   but  using far fewer parameters than XLM-R:  135M (PhoBERT\textsubscript{base}) vs.  250M (XLM-R\textsubscript{base});  370M (PhoBERT\textsubscript{large}) vs.  560M (XLM-R\textsubscript{large}).
+\subsubsection*{Discussion}
+We find that PhoBERT\textsubscript{large} achieves 0.9\% lower dependency parsing scores than  PhoBERT\textsubscript{base}. One possible reason is that the last Transformer layer in the BERT architecture might not be the optimal one which encodes the richest information of syntactic structures \cite{hewitt-manning-2019-structural,jawahar-etal-2019-bert}.  Future work will study which PhoBERT's Transformer layer contains richer syntactic information by evaluating the Vietnamese parsing performance from each layer.
+Using more pre-training data can significantly improve the quality of the pre-trained language models \cite{RoBERTa}. Thus it is not surprising that PhoBERT helps produce better performance than  ETNLP on NER, and the multilingual BERT and XLM\textsubscript{MLM+TLM} on NLI (here, PhoBERT uses 20GB of Vietnamese texts while those models employ the 1GB Vietnamese Wikipedia corpus).
+Following the fine-tuning approach that we use for PhoBERT, we carefully fine-tune XLM-R for the remaining Vietnamese POS tagging, Dependency parsing and NER tasks (here, it is  applied to the first sub-syllable token of the first syllable of each word).\footnote{For fine-tuning XLM-R, we use a grid search on the validation set to select the AdamW learning rate from \{5e-6, 1e-5, 2e-5, 4e-5\} and the batch size from \{16, 32\}.}
+Tables \ref{tab:posdep} and \ref{tab:nernli} show  that our PhoBERT also does better than   XLM-R on these three word-level tasks.
+It is worth noting that XLM-R uses a 2.5TB pre-training corpus which contains 137GB of  Vietnamese texts (i.e. about 137\ /\ 20 $\approx$ 7 times bigger than our  pre-training corpus).
+Recall that PhoBERT performs Vietnamese word segmentation to segment  syllable-level  sentences  into word tokens before applying BPE to segment the word-segmented sentences into subword units, while XLM-R directly applies BPE to the syllable-level Vietnamese pre-training  sentences.
+ This  reconfirms that the dedicated language-specific models still outperform the  multilingual ones \citep{2019arXiv191103894M}.\footnote{Note that \newcite{2019arXiv191103894M} only  compare their model CamemBERT with XLM-R on the French NLI task.}
+ \section{Conclusion}
+In this paper, we have presented the first large-scale monolingual  PhoBERT language models pre-trained for Vietnamese. We demonstrate the usefulness of PhoBERT by showing that  PhoBERT  performs better than the recent best multilingual model XLM-R and helps produce the SOTA performances for four downstream Vietnamese NLP tasks of POS tagging, Dependency parsing, NER and NLI.
+By publicly releasing PhoBERT models, %\footnote{\url{https://github.com/VinAIResearch/PhoBERT}}
+we hope that they can foster future research and applications in Vietnamese NLP. %Our PhoBERT and its usage are available at: \url{https://github.com/VinAIResearch/PhoBERT}.
+{%\footnotesize
+\bibliographystyle{acl_natbib}
+\bibliography{REFs}
+}
+\end{document}

references/2021.naacl.nguyen/paper.md ADDED Viewed

	@@ -0,0 +1,167 @@

+---
+title: "PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing"
+authors:
+  - "Linh The Nguyen"
+  - "Dat Quoc Nguyen"
+year: 2021
+venue: "NAACL 2021 Demonstrations"
+url: "https://aclanthology.org/2021.naacl-demos.1/"
+---
+We present the first multi-task learning model---named PhoNLP---for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency
+parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art  results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT [phobert] for each task independently.  We publicly release PhoNLP as an open-source toolkit under the Apache License 2.0.
+Although we specify PhoNLP for Vietnamese, our PhoNLP training and evaluation command scripts in fact can directly work for other languages that have a pre-trained BERT-based language model and gold annotated corpora available for the three tasks of POS tagging, NER and dependency parsing.
+We hope that PhoNLP can serve as a strong baseline and useful toolkit for future NLP research and applications to not only  Vietnamese but also the other languages. Our PhoNLP is available  at https://github.com/VinAIResearch/PhoNLP.
+[!t]
+[width=12.5cm]{JointModel.pdf}
+{
+|
+**ID** | **Form** | **POS** | **NER** | **Head** | **DepRel** |
+|---|---|---|---|---|---|
+|
+1 | Đây | PRON | O | 2 | sub |
+| 2 | là | VERB | O | 0 | root |
+| 3 | Hà\_Nội | NOUN | B-LOC | 2 | vmod |
+|  |
+}
+# Introduction
+Vietnamese NLP research has been significantly explored recently. It has been boosted by the success of the national project on Vietnamese language and speech processing (VLSP) KC01.01/2006-2010 and VLSP workshops that have run shared tasks since 2013. Fundamental tasks of POS tagging, NER and dependency parsing thus play important roles, providing useful features for many downstream application tasks such as machine translation [7800281], sentiment analysis [BANG20182016IIP0038],  relation extraction [9287471], semantic parsing [vitext2sql], open information extraction [3155133.3155171] and question answering [NguyenNP_SWJ,3184558.3191535].
+Thus, there is a need to develop  NLP toolkits for linguistic annotations w.r.t. Vietnamese POS tagging, NER and dependency parsing.
+VnCoreNLP [vu-etal-2018-vncorenlp] is the previous public toolkit employing traditional feature-based machine learning models to handle those Vietnamese NLP tasks. However, VnCoreNLP is now no longer considered state-of-the-art  because its performance results are significantly outperformed by ones obtained when fine-tuning PhoBERT---the current state-of-the-art monolingual pre-trained language model for Vietnamese [phobert]. Note that there are no publicly available fine-tuned BERT-based models for the three Vietnamese tasks. Assuming that there would be, a potential drawback might be that an NLP package wrapping such fine-tuned BERT-based models would take a large storage space, i.e. three times larger than the storage space used by  a BERT model [devlin-etal-2019-bert], thus  it would not be suitable for practical applications that require a smaller storage space. Jointly multi-task learning is a promising solution as it might help reduce the storage space. In addition, POS tagging, NER and dependency parsing are related tasks: POS tags are essential input features used for dependency parsing and POS tags are also used as additional features for NER.   Jointly multi-task learning thus might also help improve the performance results against the single-task learning [Ruder2019Neural].
+In this paper, we present a new multi-task learning model---named PhoNLP---for joint POS tagging, NER and dependency parsing. In particular, given an input  sentence of words to PhoNLP, an encoding layer generates contextualized word embeddings that represent the input words. These contextualized word embeddings are fed into a POS tagging layer that is in fact a linear prediction layer [devlin-etal-2019-bert] to predict POS tags for the corresponding input words. Each predicted POS tag is then represented by two ``soft'' embeddings that are later fed into NER and dependency parsing layers separately.
+More specifically, based on both the contextualized word embeddings and the ``soft'' POS tag embeddings, the NER layer uses a linear-chain CRF predictor [Lafferty:2001] to predict NER labels for the  input words, while the dependency parsing layer uses a Biaffine classifier [DozatM17] to predict dependency arcs between the words and another Biaffine classifier to label the predicted arcs.
+Our contributions are summarized as follows:
+[leftmargin=*]
+- sep{-1pt}
+- To the best of our knowledge, PhoNLP is the first proposed model to jointly learn POS tagging, NER and dependency parsing for Vietnamese.
+- We discuss a data leakage issue in the Vietnamese benchmark datasets, that has not yet  been pointed out before. Experiments show that PhoNLP obtains state-of-the-art performance results, outperforming the PhoBERT-based single task learning.
+- We publicly release PhoNLP as an open-source toolkit that is simple to setup and efficiently run from both the  command-line and Python API. We hope that PhoNLP can serve as a strong baseline and useful toolkit for future NLP research and downstream applications.
+# Model description
+Figure [fig:architecture] illustrates our PhoNLP architecture that can be viewed as a mixture of a BERT-based encoding layer and three decoding layers of POS tagging, NER and dependency parsing.
+## Encoder \& Contextualized embeddings
+Given an input sentence consisting of $n$ word tokens $w_1, w_2, ..., w_n$, the encoding layer employs PhoBERT to generate contextualized latent feature embeddings $_{i}$ each representing the $i^{th}$ word $w_i$:
+$$
+_{i} = }({w}_{1:n}, i)
+$$
+In particular, the encoding layer employs the **PhoBERT version. Because PhoBERT uses BPE [sennrich-etal-2016-neural] to segment the input sentence with subword units, the encoding layer in fact represents the $i^{th}$ word $w_i$ by using the contextualized embedding of its first subword.
+## POS tagging
+Following a common manner when fine-tuning a pre-trained language model for a sequence labeling task [devlin-etal-2019-bert], the POS tagging layer is a linear prediction layer that is appended on top of the encoder. In particular, the POS tagging layer feeds the contextualized word embeddings $_{i}$ into a   feed-forward  network (FFNN) followed by a $$ predictor for POS tag prediction:
+$$
+_{i} = (}(_{i}))
+$$
+where the output layer size of FFNN   is the number of POS tags. Based on probability vectors $_{i}$, a cross-entropy objective loss **$}$} is calculated for POS tagging during training.
+## NER
+The NER layer creates a sequence of vectors $_{1:n}$ in which each $_{i}$ is resulted in  by concatenating the contextualized word embedding $_{i}$ and a ``soft'' POS tag embedding $_{i}^{(1)}$:
+$$
+_{i} = _{i}     _{i}^{(1)}
+$$
+where following , the ``soft'' POS tag embedding $_{i}^{(1)}$ is computed by multiplying a label weight matrix $^{(1)}$ with the corresponding probability vector $_{i}$:
+$$
+_{i}^{(1)} = ^{(1)}_{i}
+$$
+The NER layer then passes each vector   $_{i}$ into a  FFNN (FFNN):
+$$
+_{i} = }(_{i})
+$$
+where  the output layer size of FFNN is the number of BIO-based NER labels.
+The NER layer  feeds the output vectors $_{i}$ into a linear-chain
+CRF predictor for NER label prediction [Lafferty:2001]. A cross-entropy loss **$}$} is calculated for NER during
+training while the Viterbi algorithm is used for inference.
+## Dependency parsing
+The dependency parsing layer creates vectors $_{1:n}$ in which each $_{i}$ is resulted in by concatenating $_{i}$ and another ``soft'' POS tag embedding $_{i}^{(2)}$:
+_{i} &=& _{i}     _{i}^{(2)}
+_{i}^{(2)} &=& ^{(2)}_{i}
+Following , the dependency parsing layer uses FFNNs to split $_{i}$  into *head* and *dependent* representations:
+_{i}^{()}  &=& _{}(_{i})
+_{i}^{()}  &=& _{}(_{i})
+_{i}^{()}  &=& _{}(_{i})
+_{i}^{()}  &=& _{}(_{i})
+To predict potential dependency arcs, based on input vectors $_{i}^{()}$ and $_{j}^{()}$, the parsing layer uses  a Biaffine classifier's variant [qi-etal-2018-universal] that additionally takes into account the distance and relative ordering between two words to produce a probability distribution of
+arc heads for each word.
+For inference, the Chu–Liu/Edmonds' algorithm is used to find a maximum spanning tree [chuliu,Edmonds].
+The parsing layer also uses another Biaffine classifier to label the predicted arcs,  based on input vectors $_{i}^{()}$ and $_{j}^{()}$. An objective loss **$}$} is computed by summing a cross entropy loss for unlabeled dependency parsing and another cross entropy loss  for dependency label prediction during training based on gold arcs and arc labels.
+## Joint multi-task learning
+The final training objective loss **$ of our model PhoNLP is the weighted sum of the POS tagging loss {$_{}$}, the NER loss {$_{}$} and the dependency parsing loss {$_{}$}:
+$$
+**$ = \lambda_1_{} + \lambda_2_{} + (1 - \lambda_1 - \lambda_2)_{}
+$$
+#### Discussion: Our PhoNLP can be viewed as an extension of previous joint POS tagging and dependency parsing models [hashimoto-etal-2017-joint,li-etal-2018-joint-learning,nguyen-verspoor-2018-improved,NguyenALTA2019,kondratyuk-straka-2019-75], where we additionally incorporate a CRF-based prediction layer for NER. Unlike , ,  and  that use BiLSTM-based encoders to extract contextualized feature embeddings, we use a BERT-based encoder.  also employ a BERT-based encoder. However, different from PhoNLP where we construct a hierarchical architecture over the POS tagging and dependency parsing layers,  do not make use of POS tag embeddings for dependency parsing.
+# Experiments
+## Setup
+### Datasets
+To conduct experiments, we use the  benchmark datasets of the VLSP 2013 POS tagging   dataset, the VLSP 2016 NER  dataset [JCC13161] and  the VnDT dependency treebank v1.1 , following the setup used by the VnCoreNLP toolkit  [vu-etal-2018-vncorenlp]. Here, VnDT is converted from the Vietnamese constituent treebank [nguyen-etal-2009-building].
+#### Data leakage issue: We further discover an issue of data leakage, that has not yet been pointed out before. That is, all sentences from the VLSP 2016 NER dataset and the VnDT treebank are included in the VLSP 2013 POS tagging dataset. In particular, 90+\
+To handle the data leakage issue, we have to re-split the VLSP 2013 POS tagging dataset to avoid the data leakage issue: The POS tagging validation/test set now only contains sentences that appear in the union of the NER and dependency parsing validation/test sets (i.e. the validation/test sentences for NER and dependency parsing only appear in the POS tagging validation/test set).
+In addition, there are 594 duplicated sentences in the VLSP 2013 POS tagging dataset (here, sentence duplication is not found in the union of the NER and dependency parsing sentences). Thus we have to  perform duplication removal on the POS tagging dataset.
+Table [tab:Datasets] details the statistics of the experimental datasets.
+[!t]
+{!}{
+|
+**Task** | **\#train** | **\#valid** | **\#test** |
+|---|---|---|---|
+|
+{POS tagging (leakage)} | {27000} | {870} | {2120} |
+|
+POS tagging (re-split) | 23906 | 2009 | 3481 |
+|
+NER | 14861 | 2000 | 2831 |
+|
+Dependency parsing | 8977 | 200 | 1020 |
+|  |
+}
+'' and ``POS tagging (re-split)'' refer to the statistics for  POS tagging before and after re-splitting \& sentence duplication removal, respectively.}
+### Implementation
+PhoNLP is implemented based on PyTorch [NEURIPS2019_9015], employing the PhoBERT encoder implementation available from the $$ library [wolf-etal-2020-transformers] and the Biaffine classifier implementation from . We set both the label weight matrices $^{(1)}$ and $^{(2)}$ to have 100 rows, resulting in 100-dimensional soft POS tag embeddings. In addition, following , FFNNs in equations [equa:fc6]--[equa:fc9] use 400-dimensional output layers.
+We use the AdamW optimizer [loshchilov2018decoupled] and a fixed batch size at 32, and train for 40 epochs. The sizes of training sets are different, in which the POS tagging
+training set is the largest, consisting of 23906 sentences. Thus for each training epoch, we repeatedly sample from the NER and dependency parsing training sets to fill the gaps between the training set sizes. We perform a grid search to select the initial AdamW learning rate, $\lambda_1$ and $\lambda_2$. We find the optimal initial AdamW learning rate, $\lambda_1$ and $\lambda_2$ at 1e-5, 0.4 and 0.2, respectively. Here, we compute the average of the POS tagging accuracy, NER F-score and   dependency parsing score LAS after each training epoch on the validation sets. We select the model checkpoint that produces the highest average score over the validation sets to apply to the test sets.  Each of our reported scores is an average over 5 runs with different random seeds.
+## Results
+Table [tab:results] presents results obtained for our PhoNLP and compares them with those of a baseline approach of single-task training. For the  single-task training approach: (i) We follow a common approach to fine-tune a pre-trained language model for POS tagging, appending a linear prediction layer on top of PhoBERT,  as briefly described in Section [ssec:pos]. (ii) For NER,  instead of a linear prediction layer, we append a CRF prediction layer on top of PhoBERT. (iii) For dependency parsing, predicted POS tags are produced by the learned single-task POS tagging model; then POS tags are represented by embeddings that are concatenated with the corresponding PhoBERT-based contextualized word embeddings, resulting in a sequence of input vectors for the Biaffine-based classifiers for dependency parsing [qi-etal-2018-universal]. Here, the  single-task training approach is based on the PhoBERT version,  employing the same hyper-parameter tuning and model selection strategy that we use for PhoNLP.
+[!t]
+{!}{
+|  | **Model** | **POS** | **NER** | **LAS** | **UAS** |
+|---|---|---|---|---|---|
+|
+{*}{[origin=c]{90}{{Leak.}}} | Single-task | 96.7$^$ | 93.69 | 78.77$^$ | 85.22$^$ |
+|  | PhoNLP | **96.76** | **94.41** | **79.11** | **85.47** |
+|
+{*}{[origin=c]{90}{{Re-spl}}} | Single-task | 93.68 | 93.69 | 77.89 | 84.78 |
+|  | PhoNLP | **93.88** | **94.51** | **78.17** | **84.95** |
+|  |
+}
+Note that PhoBERT helps produce state-of-the-art results for multiple Vietnamese NLP tasks (including but not limited to POS tagging, NER and dependency parsing in a single-task training strategy), and obtains  higher performance results than VnCoreNLP.
+However, in both the PhoBERT and VnCoreNLP papers [phobert,vu-etal-2018-vncorenlp], results for POS tagging and dependency parsing are reported w.r.t. the data leakage issue. Our ``Single-task'' results in Table [tab:results] regarding ``Re-spl'' (i.e. the data re-split and duplication removal for POS tagging to avoid the data leakage issue) can be viewed as new PhoBERT results for a proper experimental setup. Table [tab:results]  shows that in both setups ``Leak.'' and ``Re-spl'',  our joint multi-task training approach PhoNLP performs better than the PhoBERT-based single-task training approach, thus resulting in state-of-the-art performances for the three tasks of Vietnamese POS tagging, NER and dependency parsing.
+# PhoNLP toolkit
+We present in this section a basic usage of our PhoNLP toolkit.
+We make PhoNLP simple to setup, i.e.  users can install PhoNLP from either source or $$ (e.g. $$). We also aim to make PhoNLP simple to run from both the command-line and the Python API. For example, annotating a corpus with POS tagging, NER and dependency parsing can be performed by using a simple command as in Figure [fig:command].
+Assume that the input file ``{ input.txt}'' in Figure [fig:command] contains a sentence ``Tôi đang làm\_việc tại VinAI .'' (I am working at VinAI). Table [tab:format] shows the annotated output  in plain text form for this
+sentence. Similarly, we also get the same output by using the Python API as simple as in Figure [fig:code].
+Furthermore,  commands to (re-)train  and evaluate PhoNLP using gold annotated corpora are detailed in the PhoNLP GitHub repository. Note that it is absolutely possible to directly employ our PhoNLP (re-)training and evaluation command scripts for other languages that have gold annotated corpora available for the three tasks and a pre-trained BERT-based language model available from the $$ library.
+[!t]
+{ python3 run\_phonlp.py {-}{-}save\_dir ./pretrained\_phonlp {-}{-}mode
+annotate {-}{-}input\_file input.txt {-}{-}output\_file output.txt}
+[!t]
+| 1 | Tôi | P | O | 3 | sub |
+|---|---|---|---|---|---|
+| 2 | đang | R | O | 3 | adv |
+| 3 | làm\_việc | V | O | 0 | root |
+| 4 | tại | E | O | 3 | loc |
+| 5 | VinAI | Np | B-ORG | 4 | pob |
+| 6 | . | CH | O | 3 | punct |
+'' for the sentence ``Tôi đang làm\_việc tại VinAI .'' from the input file ``{ input.txt}'' in Figure [fig:command]. The output is formatted with 6 columns representing word index, word form, POS tag, NER label, head index of the current word and its dependency relation type.}
+#### Speed test: We perform a sole CPU-based speed test using a personal computer with Intel Core i5 8265U 1.6GHz \& 8GB of memory. For a GPU-based speed test, we employ a machine with a single NVIDIA RTX 2080Ti GPU. For performing the three NLP tasks jointly, PhoNLP obtains a  speed at {15 sentences per second} for the CPU-based test and {129 sentences per second} for the GPU-based test, respectively, with an average of 23 word tokens per sentence and a batch size of 8.
+[!t]
+[commandchars=
+\{\}]
+{import} {phonlp}
+{ Automatically download the pretrained PhoNLP model}
+{ and save it in a local machine folder}
+{phonlp}{.}{download}{(}{savedir}{=}{./pretrainedphonlp}{)}
+{ Load the pretrained PhoNLP model}
+{model} {=} {phonlp}{.}{load}{(}{savedir}{=}{./pretrainedphonlp}{)}
+{ Annotate a corpus}
+{model}{.}{annotate}{(}{inputfile}{=}{input.txt}{,} {outputfile}{=}{output.txt}{)}
+{ Annotate a sentence}
+{model}{.}{printout}{(}{model}{.}{annotate}{(}{text}{=}{Tôi đang làmviệc tại VinAI .}{))}
+# Conclusion and future work
+We have presented the first multi-task learning model PhoNLP for joint  POS tagging, NER and dependency parsing in Vietnamese. Experiments on Vietnamese benchmark datasets show that PhoNLP outperforms its strong fine-tuned PhoBERT-based single-task training baseline, producing state-of-the-art performance results. We publicly release PhoNLP as an easy-to-use open-source toolkit and hope that PhoNLP can facilitate future NLP research and applications.
+In future work, we will also apply  PhoNLP  to other languages.

references/2021.naacl.nguyen/paper.tex ADDED Viewed

	@@ -0,0 +1,641 @@

+% This must be in the first 5 lines to tell arXiv to use pdfLaTeX, which is strongly recommended.
+\pdfoutput=1
+% In particular, the hyperref package requires pdfLaTeX in order to break URLs across lines.
+\documentclass[11pt]{article}
+% Remove the "review" option to generate the final version.
+\usepackage{naacl2021}
+% Standard package includes
+\usepackage{times}
+\usepackage{latexsym}
+%\renewcommand{\UrlFont}{\ttfamily\small}
+%\renewcommand{\UrlFont}{\ttfamily\small}
+\usepackage{amsmath}
+\usepackage{url}
+\usepackage{amssymb}
+\usepackage{amsfonts}
+\usepackage{graphicx}
+\usepackage{tabularx}
+\usepackage{multirow}
+\usepackage{arydshln}
+\usepackage{mathtools,nccmath}
+\usepackage{listings}
+\usepackage[T5]{fontenc}
+%\usepackage[utf8]{vietnam}
+\usepackage{enumitem}
+%\usepackage{ulem}
+\usepackage{todonotes}
+% \usepackage[usenames,dvipsnames]{color}
+\usepackage{cancel}
+\usepackage[draft]{minted}
+% This is not strictly necessary, and may be commented out,
+% but it will improve the layout of the manuscript,
+% and will typically save some space.
+\usepackage{microtype}
+\makeatletter
+\def\PYGdefault@reset{\let\PYGdefault@it=\relax \let\PYGdefault@bf=\relax%
+    \let\PYGdefault@ul=\relax \let\PYGdefault@tc=\relax%
+    \let\PYGdefault@bc=\relax \let\PYGdefault@ff=\relax}
+\def\PYGdefault@tok#1{\csname PYGdefault@tok@#1\endcsname}
+\def\PYGdefault@toks#1+{\ifx\relax#1\empty\else%
+    \PYGdefault@tok{#1}\expandafter\PYGdefault@toks\fi}
+\def\PYGdefault@do#1{\PYGdefault@bc{\PYGdefault@tc{\PYGdefault@ul{%
+    \PYGdefault@it{\PYGdefault@bf{\PYGdefault@ff{#1}}}}}}}
+\def\PYGdefault#1#2{\PYGdefault@reset\PYGdefault@toks#1+\relax+\PYGdefault@do{#2}}
+\expandafter\def\csname PYGdefault@tok@w\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.73,0.73}{##1}}}
+\expandafter\def\csname PYGdefault@tok@c\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@cp\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.74,0.48,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@k\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@kp\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@kt\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.69,0.00,0.25}{##1}}}
+\expandafter\def\csname PYGdefault@tok@o\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ow\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nb\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nf\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nc\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nn\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ne\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.82,0.25,0.23}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nv\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@no\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.53,0.00,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nl\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.63,0.63,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ni\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.60,0.60,0.60}{##1}}}
+\expandafter\def\csname PYGdefault@tok@na\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.49,0.56,0.16}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nt\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nd\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@s\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sd\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@si\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}
+\expandafter\def\csname PYGdefault@tok@se\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.40,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sr\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ss\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sx\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@m\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gh\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gu\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.50,0.00,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gd\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.63,0.00,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gi\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.63,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gr\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ge\endcsname{\let\PYGdefault@it=\textit}
+\expandafter\def\csname PYGdefault@tok@gs\endcsname{\let\PYGdefault@bf=\textbf}
+\expandafter\def\csname PYGdefault@tok@gp\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@go\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gt\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}}
+\expandafter\def\csname PYGdefault@tok@err\endcsname{\def\PYGdefault@bc##1{\setlength{\fboxsep}{0pt}\fcolorbox[rgb]{1.00,0.00,0.00}{1,1,1}{\strut ##1}}}
+\expandafter\def\csname PYGdefault@tok@kc\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@kd\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@kn\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@kr\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@bp\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@fm\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@vc\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@vg\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@vi\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@vm\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sa\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sb\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sc\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@dl\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@s2\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sh\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@s1\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@mb\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@mf\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@mh\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@mi\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@il\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@mo\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ch\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@cm\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@cpf\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@c1\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@cs\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\def\PYGdefaultZbs{\char`\\}
+\def\PYGdefaultZus{\char`\_}
+\def\PYGdefaultZob{\char`\{}
+\def\PYGdefaultZcb{\char`\}}
+\def\PYGdefaultZca{\char`\^}
+\def\PYGdefaultZam{\char`\&}
+\def\PYGdefaultZlt{\char`\<}
+\def\PYGdefaultZgt{\char`\>}
+\def\PYGdefaultZsh{\char`\#}
+\def\PYGdefaultZpc{\char`\%}
+\def\PYGdefaultZdl{\char`\$}
+\def\PYGdefaultZhy{\char`\-}
+\def\PYGdefaultZsq{\char`\'}
+\def\PYGdefaultZdq{\char`\"}
+\def\PYGdefaultZti{\char`\~}
+% for compatibility with earlier versions
+\def\PYGdefaultZat{@}
+\def\PYGdefaultZlb{[}
+\def\PYGdefaultZrb{]}
+\makeatother
+\makeatletter
+\def\PYG@reset{\let\PYG@it=\relax \let\PYG@bf=\relax%
+    \let\PYG@ul=\relax \let\PYG@tc=\relax%
+    \let\PYG@bc=\relax \let\PYG@ff=\relax}
+\def\PYG@tok#1{\csname PYG@tok@#1\endcsname}
+\def\PYG@toks#1+{\ifx\relax#1\empty\else%
+    \PYG@tok{#1}\expandafter\PYG@toks\fi}
+\def\PYG@do#1{\PYG@bc{\PYG@tc{\PYG@ul{%
+    \PYG@it{\PYG@bf{\PYG@ff{#1}}}}}}}
+\def\PYG#1#2{\PYG@reset\PYG@toks#1+\relax+\PYG@do{#2}}
+\expandafter\def\csname PYG@tok@w\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.73,0.73}{##1}}}
+\expandafter\def\csname PYG@tok@c\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@cp\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.74,0.48,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@k\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@kp\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@kt\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.69,0.00,0.25}{##1}}}
+\expandafter\def\csname PYG@tok@o\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@ow\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@nb\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@nf\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@nc\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@nn\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@ne\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.82,0.25,0.23}{##1}}}
+\expandafter\def\csname PYG@tok@nv\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@no\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.00,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@nl\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.63,0.63,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@ni\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.60,0.60,0.60}{##1}}}
+\expandafter\def\csname PYG@tok@na\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.49,0.56,0.16}{##1}}}
+\expandafter\def\csname PYG@tok@nt\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@nd\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@s\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@sd\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@si\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}
+\expandafter\def\csname PYG@tok@se\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.73,0.40,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@sr\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}
+\expandafter\def\csname PYG@tok@ss\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@sx\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@m\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@gh\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@gu\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.50,0.00,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@gd\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.63,0.00,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@gi\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.63,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@gr\endcsname{\def\PYG@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@ge\endcsname{\let\PYG@it=\textit}
+\expandafter\def\csname PYG@tok@gs\endcsname{\let\PYG@bf=\textbf}
+\expandafter\def\csname PYG@tok@gp\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@go\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
+\expandafter\def\csname PYG@tok@gt\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}}
+\expandafter\def\csname PYG@tok@err\endcsname{\def\PYG@bc##1{\setlength{\fboxsep}{0pt}\fcolorbox[rgb]{1.00,0.00,0.00}{1,1,1}{\strut ##1}}}
+\expandafter\def\csname PYG@tok@kc\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@kd\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@kn\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@kr\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@bp\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@fm\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@vc\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@vg\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@vi\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@vm\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@sa\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@sb\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@sc\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@dl\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@s2\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@sh\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@s1\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@mb\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@mf\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@mh\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@mi\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@il\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@mo\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@ch\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@cm\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@cpf\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@c1\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@cs\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\def\PYGZbs{\char`\\}
+\def\PYGZus{\char`\_}
+\def\PYGZob{\char`\{}
+\def\PYGZcb{\char`\}}
+\def\PYGZca{\char`\^}
+\def\PYGZam{\char`\&}
+\def\PYGZlt{\char`\<}
+\def\PYGZgt{\char`\>}
+\def\PYGZsh{\char`\#}
+\def\PYGZpc{\char`\%}
+\def\PYGZdl{\char`\$}
+\def\PYGZhy{\char`\-}
+\def\PYGZsq{\char`\'}
+\def\PYGZdq{\char`\"}
+\def\PYGZti{\char`\~}
+% for compatibility with earlier versions
+\def\PYGZat{@}
+\def\PYGZlb{[}
+\def\PYGZrb{]}
+\makeatother
+\setlength{\textfloatsep}{15pt plus 5.0pt minus 3.0pt}
+\setlength{\floatsep}{15pt plus 5.0pt minus 3.0pt}
+%\setlength{\dbltextfloatsep }{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\dblfloatsep}{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\intextsep}{15pt plus 2.0pt minus 3.0pt}
+\setlength{\abovecaptionskip}{5pt plus 1pt minus 1pt}
+% If the title and author information does not fit in the area allocated, uncomment the following
+%
+%\setlength\titlebox{<dim>}
+%
+% and set <dim> to something 5cm or larger.
+%\setlength\titlebox{5cm}
+%\setlength{\textfloatsep}{15pt plus 5.0pt minus 5.0pt}
+%\setlength{\floatsep}{15pt plus 5.0pt minus 5.0pt}
+%\setlength{\dbltextfloatsep }{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\dblfloatsep}{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\intextsep}{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\abovecaptionskip}{5pt plus 1pt minus 1pt}
+% If the title and author information does not fit in the area allocated, uncomment the following
+%
+\setlength\titlebox{5cm}
+%
+% and set <dim> to something 5cm or larger.
+\title{PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing}
+% Author information can be set in various styles:
+% For several authors from the same institution:
+\author{Linh The Nguyen \and Dat Quoc Nguyen\\
+         VinAI Research, Hanoi, Vietnam \\
+         \tt{\normalsize \{v.linhnt140, v.datnq9\}@vinai.io}}
+% if the names do not fit well on one line use
+%         Author 1 \\ {\bf Author 2} \\ ... \\ {\bf Author n} \\
+% For authors from different institutions:
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \And  ... \And
+%         Author n \\ Address line \\ ... \\ Address line}
+% To start a seperate ``row'' of authors use \AND, as in
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \AND
+%         Author 2 \\ Address line \\ ... \\ Address line \And
+%         Author 3 \\ Address line \\ ... \\ Address line}
+%\author{ }
+\begin{document}
+\maketitle
+\begin{abstract}
+We present the first multi-task learning model---named PhoNLP---for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency
+parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art  results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT \cite{phobert} for each task independently.  We publicly release PhoNLP as an open-source toolkit under the Apache License 2.0.
+Although we specify PhoNLP for Vietnamese, our PhoNLP training and evaluation command scripts in fact can directly work for other languages that have a pre-trained BERT-based language model and gold annotated corpora available for the three tasks of POS tagging, NER and dependency parsing.
+We hope that PhoNLP can serve as a strong baseline and useful toolkit for future NLP research and applications to not only  Vietnamese but also the other languages. Our PhoNLP is available  at \url{https://github.com/VinAIResearch/PhoNLP}.
+\end{abstract}
+\vspace{-5pt}
+\begin{figure*}[!t]
+\centering
+\includegraphics[width=12.5cm]{JointModel.pdf}
+{\small
+\begin{tabular}{crllll}
+\hline
+\textbf{ID} & \textbf{Form} & \textbf{POS} & \textbf{NER} & \textbf{Head} & \textbf{DepRel} \\
+\hline
+1 & Đây\textsubscript{This} & PRON & O & 2 & sub  \\
+2 & là\textsubscript{is} & VERB& O & 0 & root \\
+3 & Hà\_Nội\textsubscript{Ha\_Noi} & NOUN & B-LOC & 2 & vmod \\
+\hline
+\end{tabular}
+}
+\caption{Illustration of our PhoNLP model.}
+\label{fig:architecture}
+\end{figure*}
+\section{Introduction}
+Vietnamese NLP research has been significantly explored recently. It has been boosted by the success of the national project on Vietnamese language and speech processing (VLSP) KC01.01/2006-2010 and VLSP workshops that have run shared tasks since 2013.\footnote{\url{https://vlsp.org.vn/}} Fundamental tasks of POS tagging, NER and dependency parsing thus play important roles, providing useful features for many downstream application tasks such as machine translation \cite{7800281}, sentiment analysis \cite{BANG20182016IIP0038},  relation extraction \cite{9287471}, semantic parsing \cite{vitext2sql}, open information extraction \cite{3155133.3155171} and question answering \cite{NguyenNP_SWJ,3184558.3191535}. % \cite{7800281,BANG20182016IIP0038,3155133.3155171,9287471}.
+Thus, there is a need to develop  NLP toolkits for linguistic annotations w.r.t. Vietnamese POS tagging, NER and dependency parsing.
+VnCoreNLP \cite{vu-etal-2018-vncorenlp} is the previous public toolkit employing traditional feature-based machine learning models to handle those Vietnamese NLP tasks. However, VnCoreNLP is now no longer considered state-of-the-art  because its performance results are significantly outperformed by ones obtained when fine-tuning PhoBERT---the current state-of-the-art monolingual pre-trained language model for Vietnamese \cite{phobert}. Note that there are no publicly available fine-tuned BERT-based models for the three Vietnamese tasks. Assuming that there would be, a potential drawback might be that an NLP package wrapping such fine-tuned BERT-based models would take a large storage space, i.e. three times larger than the storage space used by  a BERT model \cite{devlin-etal-2019-bert}, thus  it would not be suitable for practical applications that require a smaller storage space. Jointly multi-task learning is a promising solution as it might help reduce the storage space. In addition, POS tagging, NER and dependency parsing are related tasks: POS tags are essential input features used for dependency parsing and POS tags are also used as additional features for NER.   Jointly multi-task learning thus might also help improve the performance results against the single-task learning \cite{Ruder2019Neural}.
+In this paper, we present a new multi-task learning model---named PhoNLP---for joint POS tagging, NER and dependency parsing. In particular, given an input  sentence of words to PhoNLP, an encoding layer generates contextualized word embeddings that represent the input words. These contextualized word embeddings are fed into a POS tagging layer that is in fact a linear prediction layer \cite{devlin-etal-2019-bert} to predict POS tags for the corresponding input words. Each predicted POS tag is then represented by two ``soft'' embeddings that are later fed into NER and dependency parsing layers separately.
+More specifically, based on both the contextualized word embeddings and the ``soft'' POS tag embeddings, the NER layer uses a linear-chain CRF predictor \cite{Lafferty:2001} to predict NER labels for the  input words, while the dependency parsing layer uses a Biaffine classifier \cite{DozatM17} to predict dependency arcs between the words and another Biaffine classifier to label the predicted arcs.
+%To the best of our knowledge, our PhoNLP is the first proposed model to jointly learn POS tagging, NER and dependency parsing for Vietnamese. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art  results.
+Our contributions are summarized as follows:
+%\vspace{-2pt}
+\begin{itemize}[leftmargin=*]
+\setlength\itemsep{-1pt}
+    \item To the best of our knowledge, PhoNLP is the first proposed model to jointly learn POS tagging, NER and dependency parsing for Vietnamese.
+    \item We discuss a data leakage issue in the Vietnamese benchmark datasets, that has not yet  been pointed out before. Experiments show that PhoNLP obtains state-of-the-art performance results, outperforming the PhoBERT-based single task learning.
+    \item We publicly release PhoNLP as an open-source toolkit that is simple to setup and efficiently run from both the  command-line and Python API. We hope that PhoNLP can serve as a strong baseline and useful toolkit for future NLP research and downstream applications.
+\end{itemize}
+\section{Model description}
+Figure \ref{fig:architecture} illustrates our PhoNLP architecture that can be viewed as a mixture of a BERT-based encoding layer and three decoding layers of POS tagging, NER and dependency parsing.
+\subsection{Encoder \& Contextualized embeddings}
+Given an input sentence consisting of $n$ word tokens $w_1, w_2, ..., w_n$, the encoding layer employs PhoBERT to generate contextualized latent feature embeddings $\mathbf{e}_{i}$ each representing the $i^{th}$ word $w_i$:
+\begin{equation}
+\mathbf{e}_{i} = \mathrm{PhoBERT\textsubscript{base}}\big({w}_{1:n}, i\big)
+\end{equation}
+In particular, the encoding layer employs the \textbf{PhoBERT\textsubscript{base}} version. Because PhoBERT uses BPE \cite{sennrich-etal-2016-neural} to segment the input sentence with subword units, the encoding layer in fact represents the $i^{th}$ word $w_i$ by using the contextualized embedding of its first subword.
+\subsection{POS tagging}\label{ssec:pos}
+Following a common manner when fine-tuning a pre-trained language model for a sequence labeling task \cite{devlin-etal-2019-bert}, the POS tagging layer is a linear prediction layer that is appended on top of the encoder. In particular, the POS tagging layer feeds the contextualized word embeddings $\mathbf{e}_{i}$ into a   feed-forward  network (FFNN\textsubscript{POS}) followed by a $\mathsf{softmax}$ predictor for POS tag prediction:
+\begin{equation}
+\mathbf{p}_{i} = \mathsf{softmax}\big(\mathrm{FFNN\textsubscript{POS}}\big(\mathbf{e}_{i}\big)\big) \label{eq2}
+\end{equation}
+\noindent where the output layer size of FFNN\textsubscript{POS}   is the number of POS tags. Based on probability vectors $\mathbf{p}_{i}$, a cross-entropy objective loss \textbf{$\mathcal{L}_{\text{POS}}$} is calculated for POS tagging during training.
+\subsection{NER}\label{ssec:ner}
+The NER layer creates a sequence of vectors $\mathbf{v}_{1:n}$ in which each $\mathbf{v}_{i}$ is resulted in  by concatenating the contextualized word embedding $\mathbf{e}_{i}$ and a ``soft'' POS tag embedding $\mathbf{t}_{i}^{(1)}$:
+\begin{equation}
+\mathbf{v}_{i} = \mathbf{e}_{i}  \circ   \mathbf{t}_{i}^{(1)}
+\label{equa:ner}
+\end{equation}
+\noindent where following \newcite{hashimoto-etal-2017-joint}, the ``soft'' POS tag embedding $\mathbf{t}_{i}^{(1)}$ is computed by multiplying a label weight matrix $\mathbf{W}^{(1)}$ with the corresponding probability vector $\mathbf{p}_{i}$:
+\begin{equation*}
+ \mathbf{t}_{i}^{(1)} = \mathbf{W}^{(1)}\mathbf{p}_{i}
+\end{equation*}
+The NER layer then passes each vector   $\mathbf{v}_{i}$ into a  FFNN (FFNN\textsubscript{NER}):
+\begin{equation}
+ \mathbf{h}_{i} = \mathrm{FFNN\textsubscript{NER}}\big(\mathbf{v}_{i}\big) \label{eq4}
+\end{equation}
+\noindent where  the output layer size of FFNN\textsubscript{NER} is the number of BIO-based NER labels.
+The NER layer  feeds the output vectors $\mathbf{h}_{i}$ into a linear-chain
+CRF predictor for NER label prediction \cite{Lafferty:2001}. A cross-entropy loss \textbf{$\mathcal{L}_{\text{NER}}$} is calculated for NER during
+training while the Viterbi algorithm is used for inference.
+\subsection{Dependency parsing}
+The dependency parsing layer creates vectors $\mathbf{z}_{1:n}$ in which each $\mathbf{z}_{i}$ is resulted in by concatenating $\mathbf{e}_{i}$ and another ``soft'' POS tag embedding $\mathbf{t}_{i}^{(2)}$:
+\begin{eqnarray}
+\mathbf{z}_{i} &=& \mathbf{e}_{i}  \circ   \mathbf{t}_{i}^{(2)} \label{equa:posdep}  \\
+\mathbf{t}_{i}^{(2)} &=& \mathbf{W}^{(2)}\mathbf{p}_{i} \nonumber
+\end{eqnarray}
+Following \newcite{DozatM17}, the dependency parsing layer uses FFNNs to split $\mathbf{z}_{i}$  into \emph{head} and \emph{dependent} representations:
+ \begin{eqnarray}
+\mathbf{h}_{i}^{(\textsc{a-h})}  &=& \mathrm{FFNN}_{\text{Arc-Head}}\big(\mathbf{z}_{i}\big)  \label{equa:fc6}  \\
+\mathbf{h}_{i}^{(\textsc{a-d})}  &=& \mathrm{FFNN}_{\text{Arc-Dep}}\big(\mathbf{z}_{i}\big)  \\
+\mathbf{h}_{i}^{(\textsc{l-h})}  &=& \mathrm{FFNN}_{\text{Label-Head}}\big(\mathbf{z}_{i}\big)  \\
+\mathbf{h}_{i}^{(\textsc{l-d})}  &=& \mathrm{FFNN}_{\text{Label-Dep}}\big(\mathbf{z}_{i}\big)  \label{equa:fc9}
+\end{eqnarray}
+To predict potential dependency arcs, based on input vectors $\mathbf{h}_{i}^{(\textsc{a-h})}$ and $\mathbf{h}_{j}^{(\textsc{a-d})}$, the parsing layer uses  a Biaffine classifier's variant \cite{qi-etal-2018-universal} that additionally takes into account the distance and relative ordering between two words to produce a probability distribution of
+arc heads for each word. %\footnote{We utilize an implementation of the Biaffine classifier's variant \cite{qi-etal-2018-universal} from \newcite{qi-etal-2020-stanza}.}
+For inference, the Chu–Liu/Edmonds' algorithm is used to find a maximum spanning tree \cite{chuliu,Edmonds}.
+The parsing layer also uses another Biaffine classifier to label the predicted arcs,  based on input vectors $\mathbf{h}_{i}^{(\textsc{l-h})}$ and $\mathbf{h}_{j}^{(\textsc{l-d})}$. An objective loss \textbf{$\mathcal{L}_{\text{DEP}}$} is computed by summing a cross entropy loss for unlabeled dependency parsing and another cross entropy loss  for dependency label prediction during training based on gold arcs and arc labels.
+%\begin{eqnarray}
+%  {s}_{i,j}  &=&\mathrm{Biaff\textsuperscript{(A)}}\Big(\mathbf{h}_{i}^{(\textsc{a-h})}, \mathbf{h}_{j}^{(\textsc{a-d})}\Big) \\
+%\mathrm{Biaff\textsuperscript{(A)}}\big(\mathbf{x}, \mathbf{y}\big) &=&  \mathbf{x}^{\mathsf{T}} \mathbf{U}_1 \mathbf{y}   +   \mathbf{w}_1^{\mathsf{T}}(\mathbf{x} \circ  \mathbf{y}) + {b}_1  \nonumber
+%\end{eqnarray}
+%
+%\noindent where $\mathbf{U}_1$, $\mathbf{w}_1$ and $b_1$ are a $k\times 1 \times k$ tensor, a $2k$-dimensional vector and a bias scalar, respectively
+%(here, $k$ is the size of the {head} and {dependent} representations).
+%
+%\begin{eqnarray}
+%  \mathbf{s}_{i,j}  &=&\mathrm{Biaff\textsuperscript{(L)}}\Big(\mathbf{h}_{i}^{(\textsc{l-h})}, \mathbf{h}_{j}^{(\textsc{l-d})}\Big) \\
+%\mathrm{Biaff\textsuperscript{(L)}}\big(\mathbf{x}, \mathbf{y}\big) &=&  \mathbf{x}^{\mathsf{T}} \mathbf{U}_2 \mathbf{y}   +   \mathbf{W}_2(\mathbf{x} \circ  \mathbf{y}) +  \mathbf{b}_2  \nonumber
+%\end{eqnarray}
+%
+%\noindent where $\mathbf{U}_2$, $\mathbf{W}_2$ and $\mathbf{b}_2$ are a $k\times l \times k$ tensor, a $l \times 2k$ matrix and a bias vector, respectively (here, $l$ is the number of dependency labels).
+\subsection{Joint multi-task learning}
+The final training objective loss \textbf{$\mathcal{L}$} of our model PhoNLP is the weighted sum of the POS tagging loss {$\mathcal{L}_{\text{POS}}$}, the NER loss {$\mathcal{L}_{\text{NER}}$} and the dependency parsing loss {$\mathcal{L}_{\text{DEP}}$}:
+\begin{equation}
+\textbf{$\mathcal{L}$} = \lambda_1\mathcal{L}_{\text{POS}} + \lambda_2\mathcal{L}_{\text{NER}} + (1 - \lambda_1 - \lambda_2)\mathcal{L}_{\text{DEP}}
+\end{equation}
+\paragraph{Discussion:} Our PhoNLP can be viewed as an extension of previous joint POS tagging and dependency parsing models \cite{hashimoto-etal-2017-joint,li-etal-2018-joint-learning,nguyen-verspoor-2018-improved,NguyenALTA2019,kondratyuk-straka-2019-75}, where we additionally incorporate a CRF-based prediction layer for NER. Unlike \newcite{hashimoto-etal-2017-joint}, \newcite{nguyen-verspoor-2018-improved}, \newcite{li-etal-2018-joint-learning} and \newcite{NguyenALTA2019} that use BiLSTM-based encoders to extract contextualized feature embeddings, we use a BERT-based encoder. \newcite{kondratyuk-straka-2019-75} also employ a BERT-based encoder. However, different from PhoNLP where we construct a hierarchical architecture over the POS tagging and dependency parsing layers, \newcite{kondratyuk-straka-2019-75} do not make use of POS tag embeddings for dependency parsing.\footnote{In our preliminary experiments, not feeding the POS tag embeddings into the dependency parsing layer decreases the performance.}
+\section{Experiments}
+\subsection{Setup}
+\subsubsection{Datasets}
+To conduct experiments, we use the  benchmark datasets of the VLSP 2013 POS tagging   dataset,\footnote{\url{https://vlsp.org.vn/vlsp2013/eval}} the VLSP 2016 NER  dataset \cite{JCC13161} and  the VnDT dependency treebank v1.1 \newcite{Nguyen2014NLDB}, following the setup used by the VnCoreNLP toolkit  \cite{vu-etal-2018-vncorenlp}. Here, VnDT is converted from the Vietnamese constituent treebank \cite{nguyen-etal-2009-building}.
+\paragraph{Data leakage issue:} We further discover an issue of data leakage, that has not yet been pointed out before. That is, all sentences from the VLSP 2016 NER dataset and the VnDT treebank are included in the VLSP 2013 POS tagging dataset. In particular, 90+\% of sentences from both validation and test sets for NER and dependency parsing are included in the POS tagging training set, resulting in an unrealistic evaluation scenario where the POS tags are used as input features for NER and dependency parsing.
+To handle the data leakage issue, we have to re-split the VLSP 2013 POS tagging dataset to avoid the data leakage issue: The POS tagging validation/test set now only contains sentences that appear in the union of the NER and dependency parsing validation/test sets (i.e. the validation/test sentences for NER and dependency parsing only appear in the POS tagging validation/test set).
+In addition, there are 594 duplicated sentences in the VLSP 2013 POS tagging dataset (here, sentence duplication is not found in the union of the NER and dependency parsing sentences). Thus we have to  perform duplication removal on the POS tagging dataset.
+Table \ref{tab:Datasets} details the statistics of the experimental datasets.
+\begin{table}[!t]
+\centering
+\resizebox{7.5cm}{!}{
+\begin{tabular}{l|l|l|l}
+\hline
+\textbf{Task} & \textbf{\#train} & \textbf{\#valid} & \textbf{\#test} \\
+\hline
+{POS tagging (leakage)} & {27000} & {870} & {2120} \\
+\hdashline
+POS tagging (re-split) & 23906 & 2009 & 3481\\
+\hline
+NER  & 14861 & 2000 & 2831 \\
+\hline
+Dependency parsing & 8977 & 200 & 1020 \\
+\hline
+\end{tabular}
+}
+\caption{Dataset statistics. \textbf{\#train}, \textbf{\#valid} and \textbf{\#test} denote the numbers of training, validation and test sentences, respectively. Here,
+``{POS tagging (leakage)}'' and ``POS tagging (re-split)'' refer to the statistics for  POS tagging before and after re-splitting \& sentence duplication removal, respectively.}
+\label{tab:Datasets}
+\end{table}
+\subsubsection{Implementation}
+PhoNLP is implemented based on PyTorch \cite{NEURIPS2019_9015}, employing the PhoBERT encoder implementation available from the $\mathrm{transformers}$ library \cite{wolf-etal-2020-transformers} and the Biaffine classifier implementation from \newcite{qi-etal-2020-stanza}. We set both the label weight matrices $\mathbf{W}^{(1)}$ and $\mathbf{W}^{(2)}$ to have 100 rows, resulting in 100-dimensional soft POS tag embeddings. In addition, following \newcite{qi-etal-2018-universal,qi-etal-2020-stanza}, FFNNs in equations \ref{equa:fc6}--\ref{equa:fc9} use 400-dimensional output layers.
+We use the AdamW optimizer \cite{loshchilov2018decoupled} and a fixed batch size at 32, and train for 40 epochs. The sizes of training sets are different, in which the POS tagging
+training set is the largest, consisting of 23906 sentences. Thus for each training epoch, we repeatedly sample from the NER and dependency parsing training sets to fill the gaps between the training set sizes. We perform a grid search to select the initial AdamW learning rate, $\lambda_1$ and $\lambda_2$. We find the optimal initial AdamW learning rate, $\lambda_1$ and $\lambda_2$ at 1e-5, 0.4 and 0.2, respectively. Here, we compute the average of the POS tagging accuracy, NER F\textsubscript{1}-score and   dependency parsing score LAS after each training epoch on the validation sets. We select the model checkpoint that produces the highest average score over the validation sets to apply to the test sets.  Each of our reported scores is an average over 5 runs with different random seeds.
+%\subsubsection{Baseline single-task training}
+%We  also conduct experiments for a single-task training strategy. We follow a common approach to fine-tune a pre-trained language model for POS tagging, appending a linear prediction layer on top of PhoBERT,  as briefly described in Section \ref{ssec:pos}.  For NER,  instead of a linear prediction layer, we append a CRF prediction layer on top of PhoBERT. For dependency parsing, predicted POS tags are produced by the learned single-task POS tagging model; then POS tags are represented by embeddings that are concatenated with the corresponding contextualized word embeddings, resulting in a sequence of input vectors for the Biaffine-based classifiers \cite{qi-etal-2018-universal}.
+\subsection{Results}
+%\subsubsection*{Main results}
+Table \ref{tab:results} presents results obtained for our PhoNLP and compares them with those of a baseline approach of single-task training. For the  single-task training approach: (i) We follow a common approach to fine-tune a pre-trained language model for POS tagging, appending a linear prediction layer on top of PhoBERT,  as briefly described in Section \ref{ssec:pos}. (ii) For NER,  instead of a linear prediction layer, we append a CRF prediction layer on top of PhoBERT. (iii) For dependency parsing, predicted POS tags are produced by the learned single-task POS tagging model; then POS tags are represented by embeddings that are concatenated with the corresponding PhoBERT-based contextualized word embeddings, resulting in a sequence of input vectors for the Biaffine-based classifiers for dependency parsing \cite{qi-etal-2018-universal}. Here, the  single-task training approach is based on the PhoBERT\textsubscript{base} version,  employing the same hyper-parameter tuning and model selection strategy that we use for PhoNLP.
+\begin{table}[!t]
+\centering
+\def\arraystretch{1.2}
+\resizebox{7.5cm}{!}{
+\begin{tabular}{ll|l|l|l|l}
+\hline
+& \textbf{Model} & \textbf{POS} & \textbf{NER} & \textbf{LAS} & \textbf{UAS} \\
+\hline
+\multirow{2}{*}{\rotatebox[origin=c]{90}{{Leak.}}}& Single-task & 96.7$^\dagger$ & 93.69 & 78.77$^\dagger$ & 85.22$^\dagger$ \\
+\cdashline{2-6}
+& PhoNLP &  \textbf{96.76} & \textbf{94.41} & \textbf{79.11} & \textbf{85.47}\\
+\hline
+\hline
+\multirow{2}{*}{\rotatebox[origin=c]{90}{{Re-spl}}}& Single-task & 93.68 & 93.69 & 77.89 & 84.78 \\
+\cdashline{2-6}
+& PhoNLP & \textbf{93.88}  & \textbf{94.51}  & \textbf{78.17}  & \textbf{84.95} \\
+%& PhoNLP & \textbf{93.88}$^{*}$ & \textbf{94.51}$^{**}$ & \textbf{78.17}$^{*}$ & \textbf{84.95} \\
+\hline
+\end{tabular}
+}
+\caption{Performance results (in \%) on the test sets  for POS tagging (i.e. accuracy), NER (i.e. F\textsubscript{1}-score) and dependency parsing (i.e. LAS and UAS scores). ``Leak.'' abbreviates ``leakage'', denoting the results obtained w.r.t. the data leakage issue. ``Re-spl'' denotes the results obtained w.r.t.  the data re-split and duplication removal for POS tagging to avoid the data leakage issue. ``Single-task'' refers to as the single-task training approach.
+$\dagger$ denotes scores taken from the PhoBERT paper \cite{phobert}. Note that ``Single-task'' NER is not affected by the data leakage issue.
+%Here, $^{*}$ and $^{**}$ denote  the statistically significant differences between ``Single-task'' and PhoNLP at p $\leq$ 0.05 and p $\leq$ 0.01, respectively.
+}
+\label{tab:results}
+\end{table}
+Note that PhoBERT helps produce state-of-the-art results for multiple Vietnamese NLP tasks (including but not limited to POS tagging, NER and dependency parsing in a single-task training strategy), and obtains  higher performance results than VnCoreNLP.
+However, in both the PhoBERT and VnCoreNLP papers \cite{phobert,vu-etal-2018-vncorenlp}, results for POS tagging and dependency parsing are reported w.r.t. the data leakage issue. Our ``Single-task'' results in Table \ref{tab:results} regarding ``Re-spl'' (i.e. the data re-split and duplication removal for POS tagging to avoid the data leakage issue) can be viewed as new PhoBERT results for a proper experimental setup. Table \ref{tab:results}  shows that in both setups ``Leak.'' and ``Re-spl'',  our joint multi-task training approach PhoNLP performs better than the PhoBERT-based single-task training approach, thus resulting in state-of-the-art performances for the three tasks of Vietnamese POS tagging, NER and dependency parsing.
+\section{PhoNLP toolkit}
+We present in this section a basic usage of our PhoNLP toolkit.
+We make PhoNLP simple to setup, i.e.  users can install PhoNLP from either source or $\mathsf{pip}$ (e.g. $\mathsf{pip3\ install\ phonlp}$). We also aim to make PhoNLP simple to run from both the command-line and the Python API. For example, annotating a corpus with POS tagging, NER and dependency parsing can be performed by using a simple command as in Figure \ref{fig:command}.
+Assume that the input file ``{\ttfamily input.txt}'' in Figure \ref{fig:command} contains a sentence ``Tôi đang làm\_việc tại VinAI .'' (I\textsubscript{Tôi} am\textsubscript{đang} working\textsubscript{làm\_việc} at\textsubscript{tại} VinAI). Table \ref{tab:format} shows the annotated output  in plain text form for this
+sentence. Similarly, we also get the same output by using the Python API as simple as in Figure \ref{fig:code}.
+Furthermore,  commands to (re-)train  and evaluate PhoNLP using gold annotated corpora are detailed in the PhoNLP GitHub repository. Note that it is absolutely possible to directly employ our PhoNLP (re-)training and evaluation command scripts for other languages that have gold annotated corpora available for the three tasks and a pre-trained BERT-based language model available from the $\mathrm{transformers}$ library.
+%\setcounter{figure}{1}
+\begin{figure}[!t]
+%{\footnotesize\ttfamily python3 phonlp.py {-}{-}save\_dir model\_folder\_path {-}{-}mode annotate {-}{-}input\_file path\_to\_input\_file {-}{-}output\_file path\_to\_output\_file}
+{\ttfamily python3 run\_phonlp.py {-}{-}save\_dir ./pretrained\_phonlp {-}{-}mode \\ annotate {-}{-}input\_file input.txt {-}{-}output\_file output.txt}
+\caption{Minimal command to run PhoNLP. Here ``save\_dir'' denote the path to the local machine folder that stores the pre-trained PhoNLP model.}
+\label{fig:command}
+\end{figure}
+\begin{table}[!t]
+\centering
+%\resizebox{7.5cm}{!}{
+\begin{tabular}{llllll}
+1 & Tôi & P & O & 3 & sub \\
+2 & đang & R & O & 3 & adv \\
+3 & làm\_việc & V & O & 0 & root \\
+4 & tại & E & O & 3 & loc \\
+5 & VinAI & Np & B-ORG & 4 & pob \\
+6 & . & CH & O & 3 & punct \\
+\end{tabular}
+%}
+\caption{The output in the output file ``{\ttfamily output.txt}'' for the sentence ``Tôi đang làm\_việc tại VinAI .'' from the input file ``{\ttfamily input.txt}'' in Figure \ref{fig:command}. The output is formatted with 6 columns representing word index, word form, POS tag, NER label, head index of the current word and its dependency relation type.}
+\label{tab:format}
+\end{table}
+\paragraph{Speed test:} We perform a sole CPU-based speed test using a personal computer with Intel Core i5 8265U 1.6GHz \& 8GB of memory. For a GPU-based speed test, we employ a machine with a single NVIDIA RTX 2080Ti GPU. For performing the three NLP tasks jointly, PhoNLP obtains a  speed at {15 sentences per second} for the CPU-based test and {129 sentences per second} for the GPU-based test, respectively, with an average of 23 word tokens per sentence and a batch size of 8.
+%\setcounter{figure}{2}
+\begin{figure*}[!t]
+%\begin{minted}{python}
+%import phonlp
+%# Automatically download the pre-trained PhoNLP model
+%# and save it in a local machine folder
+%phonlp.download(save_dir='./pretrained_phonlp')
+%# Load the pre-trained PhoNLP model
+%model = phonlp.load(save_dir='./pretrained_phonlp')
+%# Annotate a corpus
+%model.annotate(input_file='input.txt', output_file='output.txt')
+%# Annotate a sentence
+%model.print_out(model.annotate(text="Tôi đang làm_việc tại VinAI ."))
+%\end{minted}
+\begin{Verbatim}[commandchars=\\\{\}]
+\PYG{k+kn}{import} \PYG{n+nn}{phonlp}
+\PYG{c+c1}{\PYGZsh{} Automatically download the pre\PYGZhy{}trained PhoNLP model}
+\PYG{c+c1}{\PYGZsh{} and save it in a local machine folder}
+\PYG{n}{phonlp}\PYG{o}{.}\PYG{n}{download}\PYG{p}{(}\PYG{n}{save\PYGZus{}dir}\PYG{o}{=}\PYG{l+s+s1}{\PYGZsq{}./pretrained\PYGZus{}phonlp\PYGZsq{}}\PYG{p}{)}
+\PYG{c+c1}{\PYGZsh{} Load the pre\PYGZhy{}trained PhoNLP model}
+\PYG{n}{model} \PYG{o}{=} \PYG{n}{phonlp}\PYG{o}{.}\PYG{n}{load}\PYG{p}{(}\PYG{n}{save\PYGZus{}dir}\PYG{o}{=}\PYG{l+s+s1}{\PYGZsq{}./pretrained\PYGZus{}phonlp\PYGZsq{}}\PYG{p}{)}
+\PYG{c+c1}{\PYGZsh{} Annotate a corpus}
+\PYG{n}{model}\PYG{o}{.}\PYG{n}{annotate}\PYG{p}{(}\PYG{n}{input\PYGZus{}file}\PYG{o}{=}\PYG{l+s+s1}{\PYGZsq{}input.txt\PYGZsq{}}\PYG{p}{,} \PYG{n}{output\PYGZus{}file}\PYG{o}{=}\PYG{l+s+s1}{\PYGZsq{}output.txt\PYGZsq{}}\PYG{p}{)}
+\PYG{c+c1}{\PYGZsh{} Annotate a sentence}
+\PYG{n}{model}\PYG{o}{.}\PYG{n}{print\PYGZus{}out}\PYG{p}{(}\PYG{n}{model}\PYG{o}{.}\PYG{n}{annotate}\PYG{p}{(}\PYG{n}{text}\PYG{o}{=}\PYG{l+s+s2}{\PYGZdq{}Tôi đang làm\PYGZus{}việc tại VinAI .\PYGZdq{}}\PYG{p}{))}
+\end{Verbatim}
+%\vspace{-5pt}
+\caption{A simple and complete example code for using PhoNLP in Python.}
+\label{fig:code}
+\end{figure*}
+\section{Conclusion and future work}
+We have presented the first multi-task learning model PhoNLP for joint  POS tagging, NER and dependency parsing in Vietnamese. Experiments on Vietnamese benchmark datasets show that PhoNLP outperforms its strong fine-tuned PhoBERT-based single-task training baseline, producing state-of-the-art performance results. We publicly release PhoNLP as an easy-to-use open-source toolkit and hope that PhoNLP can facilitate future NLP research and applications. %  such as in question answering and dialogue  systems.
+In future work, we will also apply  PhoNLP  to other languages.
+%Although we specify PhoNLP for Vietnamese, the PhoNLP (re-)training and evaluation command scripts in fact can directly work for other languages that have gold annotated corpora available for the three tasks of POS tagging, NER and dependency parsing, and a pre-trained BERT-based language model available from the $\mathrm{transformers}$ library. In future work, we will apply our PhoNLP toolkit to those languages.
+\bibliography{refs}
+\bibliographystyle{acl_natbib}
+\end{document}

references/2021.naacl.nguyen/source/acl_natbib.bst ADDED Viewed

	@@ -0,0 +1,1979 @@

+%%% Modification of BibTeX style file acl_natbib_nourl.bst
+%%% ... by urlbst, version 0.7 (marked with "% urlbst")
+%%% See <http://purl.org/nxg/dist/urlbst>
+%%% Added webpage entry type, and url and lastchecked fields.
+%%% Added eprint support.
+%%% Added DOI support.
+%%% Added PUBMED support.
+%%% Added hyperref support.
+%%% Original headers follow...
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+% BibTeX style file acl_natbib_nourl.bst
+%
+% intended as input to urlbst script
+% $ ./urlbst --hyperref --inlinelinks acl_natbib_nourl.bst > acl_natbib.bst
+%
+% adapted from compling.bst
+% in order to mimic the style files for ACL conferences prior to 2017
+% by making the following three changes:
+% - for @incollection, page numbers now follow volume title.
+% - for @inproceedings, address now follows conference name.
+%	(address is intended as location of conference,
+%	 not address of publisher.)
+% - for papers with three authors, use et al. in citation
+% Dan Gildea 2017/06/08
+%
+% - fixed a bug with format.chapter - error given if chapter is empty
+%   with inbook.
+% Shay Cohen 2018/02/16
+%
+% - sort "van Noord" under "v" not "N"
+%   this is what previous ACL style files did and is pretty standard
+% Dan Gildea 2019/04/12
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+% BibTeX style file compling.bst
+%
+% Intended for the journal Computational Linguistics (ACL/MIT Press)
+% Created by Ron Artstein on 2005/08/22
+% For use with <natbib.sty> for author-year citations.
+%
+% I created this file in order to allow submissions to the journal
+% Computational Linguistics using the <natbib> package for author-year
+% citations, which offers a lot more flexibility than <fullname>, CL's
+% official citation package. This file adheres strictly to the official
+% style guide available from the MIT Press:
+%
+% http://mitpress.mit.edu/journals/coli/compling_style.pdf
+%
+% This includes all the various quirks of the style guide, for example:
+% - a chapter from a monograph (@inbook) has no page numbers.
+% - an article from an edited volume (@incollection) has page numbers
+%   after the publisher and address.
+% - an article from a proceedings volume (@inproceedings) has page
+%   numbers before the publisher and address.
+%
+% Where the style guide was inconsistent or not specific enough I
+% looked at actual published articles and exercised my own judgment.
+% I noticed two inconsistencies in the style guide:
+%
+% - The style guide gives one example of an article from an edited
+%   volume with the editor's name spelled out in full, and another
+%   with the editors' names abbreviated. I chose to accept the first
+%   one as correct, since the style guide generally shuns abbreviations,
+%   and editors' names are also spelled out in some recently published
+%   articles.
+%
+% - The style guide gives one example of a reference where the word
+%   "and" between two authors is preceded by a comma. This is most
+%   likely a typo, since in all other cases with just two authors or
+%   editors there is no comma before the word "and".
+%
+% One case where the style guide is not being specific is the placement
+% of the edition number, for which no example is given. I chose to put
+% it immediately after the title, which I (subjectively) find natural,
+% and is also the place of the edition in a few recently published
+% articles.
+%
+% This file correctly reproduces all of the examples in the official
+% style guide, except for the two inconsistencies noted above. I even
+% managed to get it to correctly format the proceedings example which
+% has an organization, a publisher, and two addresses (the conference
+% location and the publisher's address), though I cheated a bit by
+% putting the conference location and month as part of the title field;
+% I feel that in this case the conference location and month can be
+% considered as part of the title, and that adding a location field
+% is not justified. Note also that a location field is not standard,
+% so entries made with this field would not port nicely to other styles.
+% However, if authors feel that there's a need for a location field
+% then tell me and I'll see what I can do.
+%
+% The file also produces to my satisfaction all the bibliographical
+% entries in my recent (joint) submission to CL (this was the original
+% motivation for creating the file). I also tested it by running it
+% on a larger set of entries and eyeballing the results. There may of
+% course still be errors, especially with combinations of fields that
+% are not that common, or with cross-references (which I seldom use).
+% If you find such errors please write to me.
+%
+% I hope people find this file useful. Please email me with comments
+% and suggestions.
+%
+% Ron Artstein
+% artstein [at] essex.ac.uk
+% August 22, 2005.
+%
+% Some technical notes.
+%
+% This file is based on a file generated with the package <custom-bib>
+% by Patrick W. Daly (see selected options below), which was then
+% manually customized to conform with certain CL requirements which
+% cannot be met by <custom-bib>. Departures from the generated file
+% include:
+%
+% Function inbook: moved publisher and address to the end; moved
+% edition after title; replaced function format.chapter.pages by
+% new function format.chapter to output chapter without pages.
+%
+% Function inproceedings: moved publisher and address to the end;
+% replaced function format.in.ed.booktitle by new function
+% format.in.booktitle to output the proceedings title without
+% the editor.
+%
+% Functions book, incollection, manual: moved edition after title.
+%
+% Function mastersthesis: formatted title as for articles (unlike
+% phdthesis which is formatted as book) and added month.
+%
+% Function proceedings: added new.sentence between organization and
+% publisher when both are present.
+%
+% Function format.lab.names: modified so that it gives all the
+% authors' surnames for in-text citations for one, two and three
+% authors and only uses "et. al" for works with four authors or more
+% (thanks to Ken Shan for convincing me to go through the trouble of
+% modifying this function rather than using unreliable hacks).
+%
+% Changes:
+%
+% 2006-10-27: Changed function reverse.pass so that the extra label is
+% enclosed in parentheses when the year field ends in an uppercase or
+% lowercase letter (change modeled after Uli Sauerland's modification
+% of nals.bst). RA.
+%
+%
+% The preamble of the generated file begins below:
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%
+%% This is file `compling.bst',
+%% generated with the docstrip utility.
+%%
+%% The original source files were:
+%%
+%% merlin.mbs  (with options: `ay,nat,vonx,nm-revv1,jnrlst,keyxyr,blkyear,dt-beg,yr-per,note-yr,num-xser,pre-pub,xedn,nfss')
+%% ----------------------------------------
+%% *** Intended for the journal Computational Linguistics ***
+%%
+%% Copyright 1994-2002 Patrick W Daly
+ % ===============================================================
+ % IMPORTANT NOTICE:
+ % This bibliographic style (bst) file has been generated from one or
+ % more master bibliographic style (mbs) files, listed above.
+ %
+ % This generated file can be redistributed and/or modified under the terms
+ % of the LaTeX Project Public License Distributed from CTAN
+ % archives in directory macros/latex/base/lppl.txt; either
+ % version 1 of the License, or any later version.
+ % ===============================================================
+ % Name and version information of the main mbs file:
+ % \ProvidesFile{merlin.mbs}[2002/10/21 4.05 (PWD, AO, DPC)]
+ %   For use with BibTeX version 0.99a or later
+ %-------------------------------------------------------------------
+ % This bibliography style file is intended for texts in ENGLISH
+ % This is an author-year citation style bibliography. As such, it is
+ % non-standard LaTeX, and requires a special package file to function properly.
+ % Such a package is    natbib.sty   by Patrick W. Daly
+ % The form of the \bibitem entries is
+ %   \bibitem[Jones et al.(1990)]{key}...
+ %   \bibitem[Jones et al.(1990)Jones, Baker, and Smith]{key}...
+ % The essential feature is that the label (the part in brackets) consists
+ % of the author names, as they should appear in the citation, with the year
+ % in parentheses following. There must be no space before the opening
+ % parenthesis!
+ % With natbib v5.3, a full list of authors may also follow the year.
+ % In natbib.sty, it is possible to define the type of enclosures that is
+ % really wanted (brackets or parentheses), but in either case, there must
+ % be parentheses in the label.
+ % The \cite command functions as follows:
+ %   \citet{key} ==>>                Jones et al. (1990)
+ %   \citet*{key} ==>>               Jones, Baker, and Smith (1990)
+ %   \citep{key} ==>>                (Jones et al., 1990)
+ %   \citep*{key} ==>>               (Jones, Baker, and Smith, 1990)
+ %   \citep[chap. 2]{key} ==>>       (Jones et al., 1990, chap. 2)
+ %   \citep[e.g.][]{key} ==>>        (e.g. Jones et al., 1990)
+ %   \citep[e.g.][p. 32]{key} ==>>   (e.g. Jones et al., p. 32)
+ %   \citeauthor{key} ==>>           Jones et al.
+ %   \citeauthor*{key} ==>>          Jones, Baker, and Smith
+ %   \citeyear{key} ==>>             1990
+ %---------------------------------------------------------------------
+ENTRY
+  { address
+    author
+    booktitle
+    chapter
+    edition
+    editor
+    howpublished
+    institution
+    journal
+    key
+    month
+    note
+    number
+    organization
+    pages
+    publisher
+    school
+    series
+    title
+    type
+    volume
+    year
+    eprint % urlbst
+    doi % urlbst
+    pubmed % urlbst
+    url % urlbst
+    lastchecked % urlbst
+  }
+  {}
+  { label extra.label sort.label short.list }
+INTEGERS { output.state before.all mid.sentence after.sentence after.block }
+% urlbst...
+% urlbst constants and state variables
+STRINGS { urlintro
+  eprinturl eprintprefix doiprefix doiurl pubmedprefix pubmedurl
+  citedstring onlinestring linktextstring
+  openinlinelink closeinlinelink }
+INTEGERS { hrefform inlinelinks makeinlinelink
+  addeprints adddoiresolver addpubmedresolver }
+FUNCTION {init.urlbst.variables}
+{
+  % The following constants may be adjusted by hand, if desired
+  % The first set allow you to enable or disable certain functionality.
+  #1 'addeprints :=         % 0=no eprints; 1=include eprints
+  #1 'adddoiresolver :=     % 0=no DOI resolver; 1=include it
+  #1 'addpubmedresolver :=     % 0=no PUBMED resolver; 1=include it
+  #2 'hrefform :=           % 0=no crossrefs; 1=hypertex xrefs; 2=hyperref refs
+  #1 'inlinelinks :=        % 0=URLs explicit; 1=URLs attached to titles
+  % String constants, which you _might_ want to tweak.
+  "URL: " 'urlintro := % prefix before URL; typically "Available from:" or "URL":
+  "online" 'onlinestring := % indication that resource is online; typically "online"
+  "cited " 'citedstring := % indicator of citation date; typically "cited "
+  "[link]" 'linktextstring := % dummy link text; typically "[link]"
+  "http://arxiv.org/abs/" 'eprinturl := % prefix to make URL from eprint ref
+  "arXiv:" 'eprintprefix := % text prefix printed before eprint ref; typically "arXiv:"
+  "https://doi.org/" 'doiurl := % prefix to make URL from DOI
+  "doi:" 'doiprefix :=      % text prefix printed before DOI ref; typically "doi:"
+  "http://www.ncbi.nlm.nih.gov/pubmed/" 'pubmedurl := % prefix to make URL from PUBMED
+  "PMID:" 'pubmedprefix :=      % text prefix printed before PUBMED ref; typically "PMID:"
+  % The following are internal state variables, not configuration constants,
+  % so they shouldn't be fiddled with.
+  #0 'makeinlinelink :=     % state variable managed by possibly.setup.inlinelink
+  "" 'openinlinelink :=     % ditto
+  "" 'closeinlinelink :=    % ditto
+}
+INTEGERS {
+  bracket.state
+  outside.brackets
+  open.brackets
+  within.brackets
+  close.brackets
+}
+% ...urlbst to here
+FUNCTION {init.state.consts}
+{ #0 'outside.brackets := % urlbst...
+  #1 'open.brackets :=
+  #2 'within.brackets :=
+  #3 'close.brackets := % ...urlbst to here
+  #0 'before.all :=
+  #1 'mid.sentence :=
+  #2 'after.sentence :=
+  #3 'after.block :=
+}
+STRINGS { s t}
+% urlbst
+FUNCTION {output.nonnull.original}
+{ 's :=
+  output.state mid.sentence =
+    { ", " * write$ }
+    { output.state after.block =
+        { add.period$ write$
+          newline$
+          "\newblock " write$
+        }
+        { output.state before.all =
+            'write$
+            { add.period$ " " * write$ }
+          if$
+        }
+      if$
+      mid.sentence 'output.state :=
+    }
+  if$
+  s
+}
+% urlbst...
+% The following three functions are for handling inlinelink.  They wrap
+% a block of text which is potentially output with write$ by multiple
+% other functions, so we don't know the content a priori.
+% They communicate between each other using the variables makeinlinelink
+% (which is true if a link should be made), and closeinlinelink (which holds
+% the string which should close any current link.  They can be called
+% at any time, but start.inlinelink will be a no-op unless something has
+% previously set makeinlinelink true, and the two ...end.inlinelink functions
+% will only do their stuff if start.inlinelink has previously set
+% closeinlinelink to be non-empty.
+% (thanks to 'ijvm' for suggested code here)
+FUNCTION {uand}
+{ 'skip$ { pop$ #0 } if$ } % 'and' (which isn't defined at this point in the file)
+FUNCTION {possibly.setup.inlinelink}
+{ makeinlinelink hrefform #0 > uand
+    { doi empty$ adddoiresolver uand
+        { pubmed empty$ addpubmedresolver uand
+            { eprint empty$ addeprints uand
+                { url empty$
+                    { "" }
+                    { url }
+                  if$ }
+                { eprinturl eprint * }
+              if$ }
+            { pubmedurl pubmed * }
+          if$ }
+        { doiurl doi * }
+      if$
+      % an appropriately-formatted URL is now on the stack
+      hrefform #1 = % hypertex
+        { "\special {html:<a href=" quote$ * swap$ * quote$ * "> }{" * 'openinlinelink :=
+          "\special {html:</a>}" 'closeinlinelink := }
+        { "\href {" swap$ * "} {" * 'openinlinelink := % hrefform=#2 -- hyperref
+          % the space between "} {" matters: a URL of just the right length can cause "\% newline em"
+          "}" 'closeinlinelink := }
+      if$
+      #0 'makeinlinelink :=
+      }
+    'skip$
+  if$ % makeinlinelink
+}
+FUNCTION {add.inlinelink}
+{ openinlinelink empty$
+    'skip$
+    { openinlinelink swap$ * closeinlinelink *
+      "" 'openinlinelink :=
+      }
+  if$
+}
+FUNCTION {output.nonnull}
+{ % Save the thing we've been asked to output
+  's :=
+  % If the bracket-state is close.brackets, then add a close-bracket to
+  % what is currently at the top of the stack, and set bracket.state
+  % to outside.brackets
+  bracket.state close.brackets =
+    { "]" *
+      outside.brackets 'bracket.state :=
+    }
+    'skip$
+  if$
+  bracket.state outside.brackets =
+    { % We're outside all brackets -- this is the normal situation.
+      % Write out what's currently at the top of the stack, using the
+      % original output.nonnull function.
+      s
+      add.inlinelink
+      output.nonnull.original % invoke the original output.nonnull
+    }
+    { % Still in brackets.  Add open-bracket or (continuation) comma, add the
+      % new text (in s) to the top of the stack, and move to the close-brackets
+      % state, ready for next time (unless inbrackets resets it).  If we come
+      % into this branch, then output.state is carefully undisturbed.
+      bracket.state open.brackets =
+        { " [" * }
+        { ", " * } % bracket.state will be within.brackets
+      if$
+      s *
+      close.brackets 'bracket.state :=
+    }
+  if$
+}
+% Call this function just before adding something which should be presented in
+% brackets.  bracket.state is handled specially within output.nonnull.
+FUNCTION {inbrackets}
+{ bracket.state close.brackets =
+    { within.brackets 'bracket.state := } % reset the state: not open nor closed
+    { open.brackets 'bracket.state := }
+  if$
+}
+FUNCTION {format.lastchecked}
+{ lastchecked empty$
+    { "" }
+    { inbrackets citedstring lastchecked * }
+  if$
+}
+% ...urlbst to here
+FUNCTION {output}
+{ duplicate$ empty$
+    'pop$
+    'output.nonnull
+  if$
+}
+FUNCTION {output.check}
+{ 't :=
+  duplicate$ empty$
+    { pop$ "empty " t * " in " * cite$ * warning$ }
+    'output.nonnull
+  if$
+}
+FUNCTION {fin.entry.original} % urlbst (renamed from fin.entry, so it can be wrapped below)
+{ add.period$
+  write$
+  newline$
+}
+FUNCTION {new.block}
+{ output.state before.all =
+    'skip$
+    { after.block 'output.state := }
+  if$
+}
+FUNCTION {new.sentence}
+{ output.state after.block =
+    'skip$
+    { output.state before.all =
+        'skip$
+        { after.sentence 'output.state := }
+      if$
+    }
+  if$
+}
+FUNCTION {add.blank}
+{  " " * before.all 'output.state :=
+}
+FUNCTION {date.block}
+{
+  new.block
+}
+FUNCTION {not}
+{   { #0 }
+    { #1 }
+  if$
+}
+FUNCTION {and}
+{   'skip$
+    { pop$ #0 }
+  if$
+}
+FUNCTION {or}
+{   { pop$ #1 }
+    'skip$
+  if$
+}
+FUNCTION {new.block.checkb}
+{ empty$
+  swap$ empty$
+  and
+    'skip$
+    'new.block
+  if$
+}
+FUNCTION {field.or.null}
+{ duplicate$ empty$
+    { pop$ "" }
+    'skip$
+  if$
+}
+FUNCTION {emphasize}
+{ duplicate$ empty$
+    { pop$ "" }
+    { "\emph{" swap$ * "}" * }
+  if$
+}
+FUNCTION {tie.or.space.prefix}
+{ duplicate$ text.length$ #3 <
+    { "~" }
+    { " " }
+  if$
+  swap$
+}
+FUNCTION {capitalize}
+{ "u" change.case$ "t" change.case$ }
+FUNCTION {space.word}
+{ " " swap$ * " " * }
+ % Here are the language-specific definitions for explicit words.
+ % Each function has a name bbl.xxx where xxx is the English word.
+ % The language selected here is ENGLISH
+FUNCTION {bbl.and}
+{ "and"}
+FUNCTION {bbl.etal}
+{ "et~al." }
+FUNCTION {bbl.editors}
+{ "editors" }
+FUNCTION {bbl.editor}
+{ "editor" }
+FUNCTION {bbl.edby}
+{ "edited by" }
+FUNCTION {bbl.edition}
+{ "edition" }
+FUNCTION {bbl.volume}
+{ "volume" }
+FUNCTION {bbl.of}
+{ "of" }
+FUNCTION {bbl.number}
+{ "number" }
+FUNCTION {bbl.nr}
+{ "no." }
+FUNCTION {bbl.in}
+{ "in" }
+FUNCTION {bbl.pages}
+{ "pages" }
+FUNCTION {bbl.page}
+{ "page" }
+FUNCTION {bbl.chapter}
+{ "chapter" }
+FUNCTION {bbl.techrep}
+{ "Technical Report" }
+FUNCTION {bbl.mthesis}
+{ "Master's thesis" }
+FUNCTION {bbl.phdthesis}
+{ "Ph.D. thesis" }
+MACRO {jan} {"January"}
+MACRO {feb} {"February"}
+MACRO {mar} {"March"}
+MACRO {apr} {"April"}
+MACRO {may} {"May"}
+MACRO {jun} {"June"}
+MACRO {jul} {"July"}
+MACRO {aug} {"August"}
+MACRO {sep} {"September"}
+MACRO {oct} {"October"}
+MACRO {nov} {"November"}
+MACRO {dec} {"December"}
+MACRO {acmcs} {"ACM Computing Surveys"}
+MACRO {acta} {"Acta Informatica"}
+MACRO {cacm} {"Communications of the ACM"}
+MACRO {ibmjrd} {"IBM Journal of Research and Development"}
+MACRO {ibmsj} {"IBM Systems Journal"}
+MACRO {ieeese} {"IEEE Transactions on Software Engineering"}
+MACRO {ieeetc} {"IEEE Transactions on Computers"}
+MACRO {ieeetcad}
+ {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"}
+MACRO {ipl} {"Information Processing Letters"}
+MACRO {jacm} {"Journal of the ACM"}
+MACRO {jcss} {"Journal of Computer and System Sciences"}
+MACRO {scp} {"Science of Computer Programming"}
+MACRO {sicomp} {"SIAM Journal on Computing"}
+MACRO {tocs} {"ACM Transactions on Computer Systems"}
+MACRO {tods} {"ACM Transactions on Database Systems"}
+MACRO {tog} {"ACM Transactions on Graphics"}
+MACRO {toms} {"ACM Transactions on Mathematical Software"}
+MACRO {toois} {"ACM Transactions on Office Information Systems"}
+MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"}
+MACRO {tcs} {"Theoretical Computer Science"}
+FUNCTION {bibinfo.check}
+{ swap$
+  duplicate$ missing$
+    {
+      pop$ pop$
+      ""
+    }
+    { duplicate$ empty$
+        {
+          swap$ pop$
+        }
+        { swap$
+          pop$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {bibinfo.warn}
+{ swap$
+  duplicate$ missing$
+    {
+      swap$ "missing " swap$ * " in " * cite$ * warning$ pop$
+      ""
+    }
+    { duplicate$ empty$
+        {
+          swap$ "empty " swap$ * " in " * cite$ * warning$
+        }
+        { swap$
+          pop$
+        }
+      if$
+    }
+  if$
+}
+STRINGS  { bibinfo}
+INTEGERS { nameptr namesleft numnames }
+FUNCTION {format.names}
+{ 'bibinfo :=
+  duplicate$ empty$ 'skip$ {
+  's :=
+  "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      duplicate$ #1 >
+        { "{ff~}{vv~}{ll}{, jj}" }
+        { "{ff~}{vv~}{ll}{, jj}" }	% first name first for first author
+%        { "{vv~}{ll}{, ff}{, jj}" }	% last name first for first author
+      if$
+      format.name$
+      bibinfo bibinfo.check
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              numnames #2 >
+                { "," * }
+                'skip$
+              if$
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+  } if$
+}
+FUNCTION {format.names.ed}
+{
+  'bibinfo :=
+  duplicate$ empty$ 'skip$ {
+  's :=
+  "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{ff~}{vv~}{ll}{, jj}"
+      format.name$
+      bibinfo bibinfo.check
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              numnames #2 >
+                { "," * }
+                'skip$
+              if$
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+  } if$
+}
+FUNCTION {format.key}
+{ empty$
+    { key field.or.null }
+    { "" }
+  if$
+}
+FUNCTION {format.authors}
+{ author "author" format.names
+}
+FUNCTION {get.bbl.editor}
+{ editor num.names$ #1 > 'bbl.editors 'bbl.editor if$ }
+FUNCTION {format.editors}
+{ editor "editor" format.names duplicate$ empty$ 'skip$
+    {
+      "," *
+      " " *
+      get.bbl.editor
+      *
+    }
+  if$
+}
+FUNCTION {format.note}
+{
+ note empty$
+    { "" }
+    { note #1 #1 substring$
+      duplicate$ "{" =
+        'skip$
+        { output.state mid.sentence =
+          { "l" }
+          { "u" }
+        if$
+        change.case$
+        }
+      if$
+      note #2 global.max$ substring$ * "note" bibinfo.check
+    }
+  if$
+}
+FUNCTION {format.title}
+{ title
+  duplicate$ empty$ 'skip$
+    { "t" change.case$ }
+  if$
+  "title" bibinfo.check
+}
+FUNCTION {format.full.names}
+{'s :=
+ "" 't :=
+  #1 'nameptr :=
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{vv~}{ll}" format.name$
+      't :=
+      nameptr #1 >
+        {
+          namesleft #1 >
+            { ", " * t * }
+            {
+              s nameptr "{ll}" format.name$ duplicate$ "others" =
+                { 't := }
+                { pop$ }
+              if$
+              t "others" =
+                {
+                  " " * bbl.etal *
+                }
+                {
+                  numnames #2 >
+                    { "," * }
+                    'skip$
+                  if$
+                  bbl.and
+                  space.word * t *
+                }
+              if$
+            }
+          if$
+        }
+        't
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+}
+FUNCTION {author.editor.key.full}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { cite$ #1 #3 substring$ }
+            'key
+          if$
+        }
+        { editor format.full.names }
+      if$
+    }
+    { author format.full.names }
+  if$
+}
+FUNCTION {author.key.full}
+{ author empty$
+    { key empty$
+         { cite$ #1 #3 substring$ }
+          'key
+      if$
+    }
+    { author format.full.names }
+  if$
+}
+FUNCTION {editor.key.full}
+{ editor empty$
+    { key empty$
+         { cite$ #1 #3 substring$ }
+          'key
+      if$
+    }
+    { editor format.full.names }
+  if$
+}
+FUNCTION {make.full.names}
+{ type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.key.full
+    { type$ "proceedings" =
+        'editor.key.full
+        'author.key.full
+      if$
+    }
+  if$
+}
+FUNCTION {output.bibitem.original} % urlbst (renamed from output.bibitem, so it can be wrapped below)
+{ newline$
+  "\bibitem[{" write$
+  label write$
+  ")" make.full.names duplicate$ short.list =
+     { pop$ }
+     { * }
+   if$
+  "}]{" * write$
+  cite$ write$
+  "}" write$
+  newline$
+  ""
+  before.all 'output.state :=
+}
+FUNCTION {n.dashify}
+{
+  't :=
+  ""
+    { t empty$ not }
+    { t #1 #1 substring$ "-" =
+        { t #1 #2 substring$ "--" = not
+            { "--" *
+              t #2 global.max$ substring$ 't :=
+            }
+            {   { t #1 #1 substring$ "-" = }
+                { "-" *
+                  t #2 global.max$ substring$ 't :=
+                }
+              while$
+            }
+          if$
+        }
+        { t #1 #1 substring$ *
+          t #2 global.max$ substring$ 't :=
+        }
+      if$
+    }
+  while$
+}
+FUNCTION {word.in}
+{ bbl.in capitalize
+  " " * }
+FUNCTION {format.date}
+{ year "year" bibinfo.check duplicate$ empty$
+    {
+    }
+    'skip$
+  if$
+  extra.label *
+  before.all 'output.state :=
+  after.sentence 'output.state :=
+}
+FUNCTION {format.btitle}
+{ title "title" bibinfo.check
+  duplicate$ empty$ 'skip$
+    {
+      emphasize
+    }
+  if$
+}
+FUNCTION {either.or.check}
+{ empty$
+    'pop$
+    { "can't use both " swap$ * " fields in " * cite$ * warning$ }
+  if$
+}
+FUNCTION {format.bvolume}
+{ volume empty$
+    { "" }
+    { bbl.volume volume tie.or.space.prefix
+      "volume" bibinfo.check * *
+      series "series" bibinfo.check
+      duplicate$ empty$ 'pop$
+        { swap$ bbl.of space.word * swap$
+          emphasize * }
+      if$
+      "volume and number" number either.or.check
+    }
+  if$
+}
+FUNCTION {format.number.series}
+{ volume empty$
+    { number empty$
+        { series field.or.null }
+        { series empty$
+            { number "number" bibinfo.check }
+        { output.state mid.sentence =
+            { bbl.number }
+            { bbl.number capitalize }
+          if$
+          number tie.or.space.prefix "number" bibinfo.check * *
+          bbl.in space.word *
+          series "series" bibinfo.check *
+        }
+      if$
+    }
+      if$
+    }
+    { "" }
+  if$
+}
+FUNCTION {format.edition}
+{ edition duplicate$ empty$ 'skip$
+    {
+      output.state mid.sentence =
+        { "l" }
+        { "t" }
+      if$ change.case$
+      "edition" bibinfo.check
+      " " * bbl.edition *
+    }
+  if$
+}
+INTEGERS { multiresult }
+FUNCTION {multi.page.check}
+{ 't :=
+  #0 'multiresult :=
+    { multiresult not
+      t empty$ not
+      and
+    }
+    { t #1 #1 substring$
+      duplicate$ "-" =
+      swap$ duplicate$ "," =
+      swap$ "+" =
+      or or
+        { #1 'multiresult := }
+        { t #2 global.max$ substring$ 't := }
+      if$
+    }
+  while$
+  multiresult
+}
+FUNCTION {format.pages}
+{ pages duplicate$ empty$ 'skip$
+    { duplicate$ multi.page.check
+        {
+          bbl.pages swap$
+          n.dashify
+        }
+        {
+          bbl.page swap$
+        }
+      if$
+      tie.or.space.prefix
+      "pages" bibinfo.check
+      * *
+    }
+  if$
+}
+FUNCTION {format.journal.pages}
+{ pages duplicate$ empty$ 'pop$
+    { swap$ duplicate$ empty$
+        { pop$ pop$ format.pages }
+        {
+          ":" *
+          swap$
+          n.dashify
+          "pages" bibinfo.check
+          *
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {format.vol.num.pages}
+{ volume field.or.null
+  duplicate$ empty$ 'skip$
+    {
+      "volume" bibinfo.check
+    }
+  if$
+  number "number" bibinfo.check duplicate$ empty$ 'skip$
+    {
+      swap$ duplicate$ empty$
+        { "there's a number but no volume in " cite$ * warning$ }
+        'skip$
+      if$
+      swap$
+      "(" swap$ * ")" *
+    }
+  if$ *
+  format.journal.pages
+}
+FUNCTION {format.chapter}
+{ chapter empty$
+    'format.pages
+    { type empty$
+        { bbl.chapter }
+        { type "l" change.case$
+          "type" bibinfo.check
+        }
+      if$
+      chapter tie.or.space.prefix
+      "chapter" bibinfo.check
+      * *
+    }
+  if$
+}
+FUNCTION {format.chapter.pages}
+{ chapter empty$
+    'format.pages
+    { type empty$
+        { bbl.chapter }
+        { type "l" change.case$
+          "type" bibinfo.check
+        }
+      if$
+      chapter tie.or.space.prefix
+      "chapter" bibinfo.check
+      * *
+      pages empty$
+        'skip$
+        { ", " * format.pages * }
+      if$
+    }
+  if$
+}
+FUNCTION {format.booktitle}
+{
+  booktitle "booktitle" bibinfo.check
+  emphasize
+}
+FUNCTION {format.in.booktitle}
+{ format.booktitle duplicate$ empty$ 'skip$
+    {
+      word.in swap$ *
+    }
+  if$
+}
+FUNCTION {format.in.ed.booktitle}
+{ format.booktitle duplicate$ empty$ 'skip$
+    {
+      editor "editor" format.names.ed duplicate$ empty$ 'pop$
+        {
+          "," *
+          " " *
+          get.bbl.editor
+          ", " *
+          * swap$
+          * }
+      if$
+      word.in swap$ *
+    }
+  if$
+}
+FUNCTION {format.thesis.type}
+{ type duplicate$ empty$
+    'pop$
+    { swap$ pop$
+      "t" change.case$ "type" bibinfo.check
+    }
+  if$
+}
+FUNCTION {format.tr.number}
+{ number "number" bibinfo.check
+  type duplicate$ empty$
+    { pop$ bbl.techrep }
+    'skip$
+  if$
+  "type" bibinfo.check
+  swap$ duplicate$ empty$
+    { pop$ "t" change.case$ }
+    { tie.or.space.prefix * * }
+  if$
+}
+FUNCTION {format.article.crossref}
+{
+  word.in
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.book.crossref}
+{ volume duplicate$ empty$
+    { "empty volume in " cite$ * "'s crossref of " * crossref * warning$
+      pop$ word.in
+    }
+    { bbl.volume
+      capitalize
+      swap$ tie.or.space.prefix "volume" bibinfo.check * * bbl.of space.word *
+    }
+  if$
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.incoll.inproc.crossref}
+{
+  word.in
+  " \cite{" * crossref * "}" *
+}
+FUNCTION {format.org.or.pub}
+{ 't :=
+  ""
+  address empty$ t empty$ and
+    'skip$
+    {
+      t empty$
+        { address "address" bibinfo.check *
+        }
+        { t *
+          address empty$
+            'skip$
+            { ", " * address "address" bibinfo.check * }
+          if$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {format.publisher.address}
+{ publisher "publisher" bibinfo.warn format.org.or.pub
+}
+FUNCTION {format.organization.address}
+{ organization "organization" bibinfo.check format.org.or.pub
+}
+% urlbst...
+% Functions for making hypertext links.
+% In all cases, the stack has (link-text href-url)
+%
+% make 'null' specials
+FUNCTION {make.href.null}
+{
+  pop$
+}
+% make hypertex specials
+FUNCTION {make.href.hypertex}
+{
+  "\special {html:<a href=" quote$ *
+  swap$ * quote$ * "> }" * swap$ *
+  "\special {html:</a>}" *
+}
+% make hyperref specials
+FUNCTION {make.href.hyperref}
+{
+  "\href {" swap$ * "} {\path{" * swap$ * "}}" *
+}
+FUNCTION {make.href}
+{ hrefform #2 =
+    'make.href.hyperref      % hrefform = 2
+    { hrefform #1 =
+        'make.href.hypertex  % hrefform = 1
+        'make.href.null      % hrefform = 0 (or anything else)
+      if$
+    }
+  if$
+}
+% If inlinelinks is true, then format.url should be a no-op, since it's
+% (a) redundant, and (b) could end up as a link-within-a-link.
+FUNCTION {format.url}
+{ inlinelinks #1 = url empty$ or
+   { "" }
+   { hrefform #1 =
+       { % special case -- add HyperTeX specials
+         urlintro "\url{" url * "}" * url make.href.hypertex * }
+       { urlintro "\url{" * url * "}" * }
+     if$
+   }
+  if$
+}
+FUNCTION {format.eprint}
+{ eprint empty$
+    { "" }
+    { eprintprefix eprint * eprinturl eprint * make.href }
+  if$
+}
+FUNCTION {format.doi}
+{ doi empty$
+    { "" }
+    { doiprefix doi * doiurl doi * make.href }
+  if$
+}
+FUNCTION {format.pubmed}
+{ pubmed empty$
+    { "" }
+    { pubmedprefix pubmed * pubmedurl pubmed * make.href }
+  if$
+}
+% Output a URL.  We can't use the more normal idiom (something like
+% `format.url output'), because the `inbrackets' within
+% format.lastchecked applies to everything between calls to `output',
+% so that `format.url format.lastchecked * output' ends up with both
+% the URL and the lastchecked in brackets.
+FUNCTION {output.url}
+{ url empty$
+    'skip$
+    { new.block
+      format.url output
+      format.lastchecked output
+    }
+  if$
+}
+FUNCTION {output.web.refs}
+{
+  new.block
+  inlinelinks
+    'skip$ % links were inline -- don't repeat them
+    {
+      output.url
+      addeprints eprint empty$ not and
+        { format.eprint output.nonnull }
+        'skip$
+      if$
+      adddoiresolver doi empty$ not and
+        { format.doi output.nonnull }
+        'skip$
+      if$
+      addpubmedresolver pubmed empty$ not and
+        { format.pubmed output.nonnull }
+        'skip$
+      if$
+    }
+  if$
+}
+% Wrapper for output.bibitem.original.
+% If the URL field is not empty, set makeinlinelink to be true,
+% so that an inline link will be started at the next opportunity
+FUNCTION {output.bibitem}
+{ outside.brackets 'bracket.state :=
+  output.bibitem.original
+  inlinelinks url empty$ not doi empty$ not or pubmed empty$ not or eprint empty$ not or and
+    { #1 'makeinlinelink := }
+    { #0 'makeinlinelink := }
+  if$
+}
+% Wrapper for fin.entry.original
+FUNCTION {fin.entry}
+{ output.web.refs  % urlbst
+  makeinlinelink       % ooops, it appears we didn't have a title for inlinelink
+    { possibly.setup.inlinelink % add some artificial link text here, as a fallback
+      linktextstring output.nonnull }
+    'skip$
+  if$
+  bracket.state close.brackets = % urlbst
+    { "]" * }
+    'skip$
+  if$
+  fin.entry.original
+}
+% Webpage entry type.
+% Title and url fields required;
+% author, note, year, month, and lastchecked fields optional
+% See references
+%   ISO 690-2 http://www.nlc-bnc.ca/iso/tc46sc9/standard/690-2e.htm
+%   http://www.classroom.net/classroom/CitingNetResources.html
+%   http://neal.ctstateu.edu/history/cite.html
+%   http://www.cas.usf.edu/english/walker/mla.html
+% for citation formats for web pages.
+FUNCTION {webpage}
+{ output.bibitem
+  author empty$
+    { editor empty$
+        'skip$  % author and editor both optional
+        { format.editors output.nonnull }
+      if$
+    }
+    { editor empty$
+        { format.authors output.nonnull }
+        { "can't use both author and editor fields in " cite$ * warning$ }
+      if$
+    }
+  if$
+  new.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$
+  format.title "title" output.check
+  inbrackets onlinestring output
+  new.block
+  year empty$
+    'skip$
+    { format.date "year" output.check }
+  if$
+  % We don't need to output the URL details ('lastchecked' and 'url'),
+  % because fin.entry does that for us, using output.web.refs.  The only
+  % reason we would want to put them here is if we were to decide that
+  % they should go in front of the rather miscellaneous information in 'note'.
+  new.block
+  note output
+  fin.entry
+}
+% ...urlbst to here
+FUNCTION {article}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    {
+      journal
+      "journal" bibinfo.check
+      emphasize
+      "journal" output.check
+      possibly.setup.inlinelink format.vol.num.pages output% urlbst
+    }
+    { format.article.crossref output.nonnull
+      format.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {book}
+{ output.bibitem
+  author empty$
+    { format.editors "author and editor" output.check
+      editor format.key output
+    }
+    { format.authors output.nonnull
+      crossref missing$
+        { "author and editor" editor either.or.check }
+        'skip$
+      if$
+    }
+  if$
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  crossref missing$
+    { format.bvolume output
+      new.block
+      format.number.series output
+      new.sentence
+      format.publisher.address output
+    }
+    {
+      new.block
+      format.book.crossref output.nonnull
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {booklet}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  howpublished "howpublished" bibinfo.check output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {inbook}
+{ output.bibitem
+  author empty$
+    { format.editors "author and editor" output.check
+      editor format.key output
+    }
+    { format.authors output.nonnull
+      crossref missing$
+        { "author and editor" editor either.or.check }
+        'skip$
+      if$
+    }
+  if$
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  crossref missing$
+    {
+      format.bvolume output
+      format.number.series output
+      format.chapter "chapter" output.check
+      new.sentence
+      format.publisher.address output
+      new.block
+    }
+    {
+      format.chapter "chapter" output.check
+      new.block
+      format.book.crossref output.nonnull
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {incollection}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    { format.in.ed.booktitle "booktitle" output.check
+      format.edition output
+      format.bvolume output
+      format.number.series output
+      format.chapter.pages output
+      new.sentence
+      format.publisher.address output
+    }
+    { format.incoll.inproc.crossref output.nonnull
+      format.chapter.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {inproceedings}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  crossref missing$
+    { format.in.booktitle "booktitle" output.check
+      format.bvolume output
+      format.number.series output
+      format.pages output
+      address "address" bibinfo.check output
+      new.sentence
+      organization "organization" bibinfo.check output
+      publisher "publisher" bibinfo.check output
+    }
+    { format.incoll.inproc.crossref output.nonnull
+      format.pages output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {conference} { inproceedings }
+FUNCTION {manual}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.edition output
+  organization address new.block.checkb
+  organization "organization" bibinfo.check output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {mastersthesis}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title
+  "title" output.check
+  new.block
+  bbl.mthesis format.thesis.type output.nonnull
+  school "school" bibinfo.warn output
+  address "address" bibinfo.check output
+  month "month" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {misc}
+{ output.bibitem
+  format.authors output
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title output
+  new.block
+  howpublished "howpublished" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {phdthesis}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle
+  "title" output.check
+  new.block
+  bbl.phdthesis format.thesis.type output.nonnull
+  school "school" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {proceedings}
+{ output.bibitem
+  format.editors output
+  editor format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.btitle "title" output.check
+  format.bvolume output
+  format.number.series output
+  new.sentence
+  publisher empty$
+    { format.organization.address output }
+    { organization "organization" bibinfo.check output
+      new.sentence
+      format.publisher.address output
+    }
+  if$
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {techreport}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title
+  "title" output.check
+  new.block
+  format.tr.number output.nonnull
+  institution "institution" bibinfo.warn output
+  address "address" bibinfo.check output
+  new.block
+  format.note output
+  fin.entry
+}
+FUNCTION {unpublished}
+{ output.bibitem
+  format.authors "author" output.check
+  author format.key output
+  format.date "year" output.check
+  date.block
+  title empty$ 'skip$ 'possibly.setup.inlinelink if$ % urlbst
+  format.title "title" output.check
+  new.block
+  format.note "note" output.check
+  fin.entry
+}
+FUNCTION {default.type} { misc }
+READ
+FUNCTION {sortify}
+{ purify$
+  "l" change.case$
+}
+INTEGERS { len }
+FUNCTION {chop.word}
+{ 's :=
+  'len :=
+  s #1 len substring$ =
+    { s len #1 + global.max$ substring$ }
+    's
+  if$
+}
+FUNCTION {format.lab.names}
+{ 's :=
+  "" 't :=
+  s #1 "{vv~}{ll}" format.name$
+  s num.names$ duplicate$
+  #2 >
+    { pop$
+      " " * bbl.etal *
+    }
+    { #2 <
+        'skip$
+        { s #2 "{ff }{vv }{ll}{ jj}" format.name$ "others" =
+            {
+              " " * bbl.etal *
+            }
+            { bbl.and space.word * s #2 "{vv~}{ll}" format.name$
+              * }
+          if$
+        }
+      if$
+    }
+  if$
+}
+FUNCTION {author.key.label}
+{ author empty$
+    { key empty$
+        { cite$ #1 #3 substring$ }
+        'key
+      if$
+    }
+    { author format.lab.names }
+  if$
+}
+FUNCTION {author.editor.key.label}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { cite$ #1 #3 substring$ }
+            'key
+          if$
+        }
+        { editor format.lab.names }
+      if$
+    }
+    { author format.lab.names }
+  if$
+}
+FUNCTION {editor.key.label}
+{ editor empty$
+    { key empty$
+        { cite$ #1 #3 substring$ }
+        'key
+      if$
+    }
+    { editor format.lab.names }
+  if$
+}
+FUNCTION {calc.short.authors}
+{ type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.key.label
+    { type$ "proceedings" =
+        'editor.key.label
+        'author.key.label
+      if$
+    }
+  if$
+  'short.list :=
+}
+FUNCTION {calc.label}
+{ calc.short.authors
+  short.list
+  "("
+  *
+  year duplicate$ empty$
+  short.list key field.or.null = or
+     { pop$ "" }
+     'skip$
+  if$
+  *
+  'label :=
+}
+FUNCTION {sort.format.names}
+{ 's :=
+  #1 'nameptr :=
+  ""
+  s num.names$ 'numnames :=
+  numnames 'namesleft :=
+    { namesleft #0 > }
+    { s nameptr
+      "{vv{ } }{ll{ }}{  ff{ }}{  jj{ }}"
+      format.name$ 't :=
+      nameptr #1 >
+        {
+          "   "  *
+          namesleft #1 = t "others" = and
+            { "zzzzz" * }
+            { t sortify * }
+          if$
+        }
+        { t sortify * }
+      if$
+      nameptr #1 + 'nameptr :=
+      namesleft #1 - 'namesleft :=
+    }
+  while$
+}
+FUNCTION {sort.format.title}
+{ 't :=
+  "A " #2
+    "An " #3
+      "The " #4 t chop.word
+    chop.word
+  chop.word
+  sortify
+  #1 global.max$ substring$
+}
+FUNCTION {author.sort}
+{ author empty$
+    { key empty$
+        { "to sort, need author or key in " cite$ * warning$
+          ""
+        }
+        { key sortify }
+      if$
+    }
+    { author sort.format.names }
+  if$
+}
+FUNCTION {author.editor.sort}
+{ author empty$
+    { editor empty$
+        { key empty$
+            { "to sort, need author, editor, or key in " cite$ * warning$
+              ""
+            }
+            { key sortify }
+          if$
+        }
+        { editor sort.format.names }
+      if$
+    }
+    { author sort.format.names }
+  if$
+}
+FUNCTION {editor.sort}
+{ editor empty$
+    { key empty$
+        { "to sort, need editor or key in " cite$ * warning$
+          ""
+        }
+        { key sortify }
+      if$
+    }
+    { editor sort.format.names }
+  if$
+}
+FUNCTION {presort}
+{ calc.label
+  label sortify
+  "    "
+  *
+  type$ "book" =
+  type$ "inbook" =
+  or
+    'author.editor.sort
+    { type$ "proceedings" =
+        'editor.sort
+        'author.sort
+      if$
+    }
+  if$
+  #1 entry.max$ substring$
+  'sort.label :=
+  sort.label
+  *
+  "    "
+  *
+  title field.or.null
+  sort.format.title
+  *
+  #1 entry.max$ substring$
+  'sort.key$ :=
+}
+ITERATE {presort}
+SORT
+STRINGS { last.label next.extra }
+INTEGERS { last.extra.num number.label }
+FUNCTION {initialize.extra.label.stuff}
+{ #0 int.to.chr$ 'last.label :=
+  "" 'next.extra :=
+  #0 'last.extra.num :=
+  #0 'number.label :=
+}
+FUNCTION {forward.pass}
+{ last.label label =
+    { last.extra.num #1 + 'last.extra.num :=
+      last.extra.num int.to.chr$ 'extra.label :=
+    }
+    { "a" chr.to.int$ 'last.extra.num :=
+      "" 'extra.label :=
+      label 'last.label :=
+    }
+  if$
+  number.label #1 + 'number.label :=
+}
+FUNCTION {reverse.pass}
+{ next.extra "b" =
+    { "a" 'extra.label := }
+    'skip$
+  if$
+  extra.label 'next.extra :=
+  extra.label
+  duplicate$ empty$
+    'skip$
+    { year field.or.null #-1 #1 substring$ chr.to.int$ #65 <
+      { "{\natexlab{" swap$ * "}}" * }
+      { "{(\natexlab{" swap$ * "})}" * }
+    if$ }
+  if$
+  'extra.label :=
+  label extra.label * 'label :=
+}
+EXECUTE {initialize.extra.label.stuff}
+ITERATE {forward.pass}
+REVERSE {reverse.pass}
+FUNCTION {bib.sort.order}
+{ sort.label
+  "    "
+  *
+  year field.or.null sortify
+  *
+  "    "
+  *
+  title field.or.null
+  sort.format.title
+  *
+  #1 entry.max$ substring$
+  'sort.key$ :=
+}
+ITERATE {bib.sort.order}
+SORT
+FUNCTION {begin.bib}
+{ preamble$ empty$
+    'skip$
+    { preamble$ write$ newline$ }
+  if$
+  "\begin{thebibliography}{" number.label int.to.str$ * "}" *
+  write$ newline$
+  "\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi"
+  write$ newline$
+}
+EXECUTE {begin.bib}
+EXECUTE {init.urlbst.variables} % urlbst
+EXECUTE {init.state.consts}
+ITERATE {call.type$}
+FUNCTION {end.bib}
+{ newline$
+  "\end{thebibliography}" write$ newline$
+}
+EXECUTE {end.bib}
+%% End of customized bst file
+%%
+%% End of file `compling.bst'.

references/2021.naacl.nguyen/source/minted.sty ADDED Viewed

	@@ -0,0 +1,1212 @@

+%%
+%% This is file `minted.sty',
+%% generated with the docstrip utility.
+%%
+%% The original source files were:
+%%
+%% minted.dtx  (with options: `package')
+%% Copyright 2013--2017 Geoffrey M. Poore
+%% Copyright 2010--2011 Konrad Rudolph
+%%
+%% This work may be distributed and/or modified under the
+%% conditions of the LaTeX Project Public License, either version 1.3
+%% of this license or (at your option) any later version.
+%% The latest version of this license is in
+%%   http://www.latex-project.org/lppl.txt
+%% and version 1.3 or later is part of all distributions of LaTeX
+%% version 2005/12/01 or later.
+%%
+%% Additionally, the project may be distributed under the terms of the new BSD
+%% license.
+%%
+%% This work has the LPPL maintenance status `maintained'.
+%%
+%% The Current Maintainer of this work is Geoffrey Poore.
+%%
+%% This work consists of the files minted.dtx and minted.ins
+%% and the derived file minted.sty.
+\NeedsTeXFormat{LaTeX2e}
+\ProvidesPackage{minted}
+    [2017/09/03 v2.5.1dev Yet another Pygments shim for LaTeX]
+\RequirePackage{keyval}
+\RequirePackage{kvoptions}
+\RequirePackage{fvextra}
+\RequirePackage{ifthen}
+\RequirePackage{calc}
+\IfFileExists{shellesc.sty}
+ {\RequirePackage{shellesc}
+  \@ifpackagelater{shellesc}{2016/04/29}
+   {}
+   {\protected\def\ShellEscape{\immediate\write18 }}}
+ {\protected\def\ShellEscape{\immediate\write18 }}
+\RequirePackage{ifplatform}
+\RequirePackage{pdftexcmds}
+\RequirePackage{etoolbox}
+\RequirePackage{xstring}
+\RequirePackage{lineno}
+\RequirePackage{framed}
+\AtEndPreamble{%
+  \@ifpackageloaded{color}{}{%
+    \@ifpackageloaded{xcolor}{}{\RequirePackage{xcolor}}}%
+}
+\DeclareVoidOption{chapter}{\def\minted@float@within{chapter}}
+\DeclareVoidOption{section}{\def\minted@float@within{section}}
+\DeclareBoolOption{newfloat}
+\DeclareBoolOption[true]{cache}
+\StrSubstitute{\jobname}{ }{_}[\minted@jobname]
+\StrSubstitute{\minted@jobname}{*}{_}[\minted@jobname]
+\StrSubstitute{\minted@jobname}{"}{}[\minted@jobname]
+\StrSubstitute{\minted@jobname}{'}{_}[\minted@jobname]
+\newcommand{\minted@cachedir}{\detokenize{_}minted-\minted@jobname}
+\let\minted@cachedir@windows\minted@cachedir
+\define@key{minted}{cachedir}{%
+  \@namedef{minted@cachedir}{#1}%
+  \StrSubstitute{\minted@cachedir}{/}{\@backslashchar}[\minted@cachedir@windows]}
+\DeclareBoolOption{finalizecache}
+\DeclareBoolOption{frozencache}
+\let\minted@outputdir\@empty
+\let\minted@outputdir@windows\@empty
+\define@key{minted}{outputdir}{%
+  \@namedef{minted@outputdir}{#1/}%
+  \StrSubstitute{\minted@outputdir}{/}%
+    {\@backslashchar}[\minted@outputdir@windows]}
+\DeclareBoolOption{kpsewhich}
+\DeclareBoolOption{langlinenos}
+\DeclareBoolOption{draft}
+\DeclareComplementaryOption{final}{draft}
+\ProcessKeyvalOptions*
+\ifthenelse{\boolean{minted@newfloat}}{\RequirePackage{newfloat}}{\RequirePackage{float}}
+\ifcsname tikzifexternalizing\endcsname
+  \tikzifexternalizing{\minted@drafttrue\minted@cachefalse}{}
+\else
+  \ifcsname tikzexternalrealjob\endcsname
+    \minted@drafttrue
+    \minted@cachefalse
+  \else
+  \fi
+\fi
+\ifthenelse{\boolean{minted@finalizecache}}%
+ {\ifthenelse{\boolean{minted@frozencache}}%
+   {\PackageError{minted}%
+     {Options "finalizecache" and "frozencache" are not compatible}%
+     {Options "finalizecache" and "frozencache" are not compatible}}%
+   {}}%
+ {}
+\ifthenelse{\boolean{minted@cache}}%
+ {\ifthenelse{\boolean{minted@frozencache}}%
+   {}%
+   {\AtEndOfPackage{\ProvideDirectory{\minted@outputdir\minted@cachedir}}}}%
+ {}
+\newcommand{\minted@input}[1]{%
+  \IfFileExists{#1}%
+   {\input{#1}}%
+   {\PackageError{minted}{Missing Pygments output; \string\inputminted\space
+     was^^Jprobably given a file that does not exist--otherwise, you may need
+     ^^Jthe outputdir package option, or may be using an incompatible build
+     tool,^^Jor may be using frozencache with a missing file}%
+    {This could be caused by using -output-directory or -aux-directory
+     ^^Jwithout setting minted's outputdir, or by using a build tool that
+     ^^Jchanges paths in ways minted cannot detect,
+     ^^Jor using frozencache with a missing file.}}%
+}
+\newcommand{\minted@infile}{\minted@jobname.out.pyg}
+\newcommand{\minted@cachelist}{}
+\newcommand{\minted@addcachefile}[1]{%
+  \expandafter\long\expandafter\gdef\expandafter\minted@cachelist\expandafter{%
+    \minted@cachelist,^^J%
+    \space\space#1}%
+  \expandafter\gdef\csname minted@cached@#1\endcsname{}%
+}
+\newcommand{\minted@savecachelist}{%
+  \ifdefempty{\minted@cachelist}{}{%
+    \immediate\write\@mainaux{%
+      \string\gdef\string\minted@oldcachelist\string{%
+        \minted@cachelist\string}}%
+  }%
+}
+\newcommand{\minted@cleancache}{%
+  \ifcsname minted@oldcachelist\endcsname
+    \def\do##1{%
+      \ifthenelse{\equal{##1}{}}{}{%
+        \ifcsname minted@cached@##1\endcsname\else
+          \DeleteFile[\minted@outputdir\minted@cachedir]{##1}%
+        \fi
+      }%
+    }%
+    \expandafter\docsvlist\expandafter{\minted@oldcachelist}%
+  \else
+  \fi
+}
+\ifthenelse{\boolean{minted@draft}}%
+ {\AtEndDocument{%
+    \ifcsname minted@oldcachelist\endcsname
+      \StrSubstitute{\minted@oldcachelist}{,}{,^^J }[\minted@cachelist]
+      \minted@savecachelist
+    \fi}}%
+ {\ifthenelse{\boolean{minted@frozencache}}%
+   {\AtEndDocument{%
+      \ifcsname minted@oldcachelist\endcsname
+        \StrSubstitute{\minted@oldcachelist}{,}{,^^J }[\minted@cachelist]
+        \minted@savecachelist
+      \fi}}%
+   {\AtEndDocument{%
+    \minted@savecachelist
+    \minted@cleancache}}}%
+\ifwindows
+  \providecommand{\DeleteFile}[2][]{%
+    \ifthenelse{\equal{#1}{}}%
+      {\IfFileExists{#2}{\ShellEscape{del #2}}{}}%
+      {\IfFileExists{#1/#2}{%
+        \StrSubstitute{#1}{/}{\@backslashchar}[\minted@windir]
+        \ShellEscape{del \minted@windir\@backslashchar #2}}{}}}
+\else
+  \providecommand{\DeleteFile}[2][]{%
+    \ifthenelse{\equal{#1}{}}%
+      {\IfFileExists{#2}{\ShellEscape{rm #2}}{}}%
+      {\IfFileExists{#1/#2}{\ShellEscape{rm #1/#2}}{}}}
+\fi
+\ifwindows
+  \newcommand{\ProvideDirectory}[1]{%
+    \StrSubstitute{#1}{/}{\@backslashchar}[\minted@windir]
+    \ShellEscape{if not exist \minted@windir\space mkdir \minted@windir}}
+\else
+  \newcommand{\ProvideDirectory}[1]{%
+    \ShellEscape{mkdir -p #1}}
+\fi
+\newboolean{AppExists}
+\newread\minted@appexistsfile
+\newcommand{\TestAppExists}[1]{
+  \ifwindows
+    \DeleteFile{\minted@jobname.aex}
+    \ShellEscape{for \string^\@percentchar i in (#1.exe #1.bat #1.cmd)
+      do set > \minted@jobname.aex <nul: /p
+      x=\string^\@percentchar \string~$PATH:i>> \minted@jobname.aex}
+    %$ <- balance syntax highlighting
+    \immediate\openin\minted@appexistsfile\minted@jobname.aex
+    \expandafter\def\expandafter\@tmp@cr\expandafter{\the\endlinechar}
+    \endlinechar=-1\relax
+    \readline\minted@appexistsfile to \minted@apppathifexists
+    \endlinechar=\@tmp@cr
+    \ifthenelse{\equal{\minted@apppathifexists}{}}
+     {\AppExistsfalse}
+     {\AppExiststrue}
+    \immediate\closein\minted@appexistsfile
+    \DeleteFile{\minted@jobname.aex}
+  \else
+    \ShellEscape{which #1 && touch \minted@jobname.aex}
+    \IfFileExists{\minted@jobname.aex}
+      {\AppExiststrue
+        \DeleteFile{\minted@jobname.aex}}
+      {\AppExistsfalse}
+  \fi
+}
+\newcommand{\minted@optlistcl@g}{}
+\newcommand{\minted@optlistcl@g@i}{}
+\let\minted@lang\@empty
+\newcommand{\minted@optlistcl@lang}{}
+\newcommand{\minted@optlistcl@lang@i}{}
+\newcommand{\minted@optlistcl@cmd}{}
+\newcommand{\minted@optlistfv@g}{}
+\newcommand{\minted@optlistfv@g@i}{}
+\newcommand{\minted@optlistfv@lang}{}
+\newcommand{\minted@optlistfv@lang@i}{}
+\newcommand{\minted@optlistfv@cmd}{}
+\newcommand{\minted@configlang}[1]{%
+  \def\minted@lang{#1}%
+  \ifcsname minted@optlistcl@lang\minted@lang\endcsname\else
+    \expandafter\gdef\csname minted@optlistcl@lang\minted@lang\endcsname{}%
+  \fi
+  \ifcsname minted@optlistcl@lang\minted@lang @i\endcsname\else
+    \expandafter\gdef\csname minted@optlistcl@lang\minted@lang @i\endcsname{}%
+  \fi
+  \ifcsname minted@optlistfv@lang\minted@lang\endcsname\else
+    \expandafter\gdef\csname minted@optlistfv@lang\minted@lang\endcsname{}%
+  \fi
+  \ifcsname minted@optlistfv@lang\minted@lang @i\endcsname\else
+    \expandafter\gdef\csname minted@optlistfv@lang\minted@lang @i\endcsname{}%
+  \fi
+}
+\newcommand{\minted@addto@optlistcl}[2]{%
+  \expandafter\def\expandafter#1\expandafter{#1%
+    \detokenize{#2}\space}}
+\newcommand{\minted@addto@optlistcl@lang}[2]{%
+  \expandafter\let\expandafter\minted@tmp\csname #1\endcsname
+  \expandafter\def\expandafter\minted@tmp\expandafter{\minted@tmp%
+    \detokenize{#2}\space}%
+  \expandafter\let\csname #1\endcsname\minted@tmp}
+\newcommand{\minted@def@optcl}[4][]{%
+  \ifthenelse{\equal{#1}{}}%
+    {\define@key{minted@opt@g}{#2}{%
+        \minted@addto@optlistcl{\minted@optlistcl@g}{#3=#4}%
+        \@namedef{minted@opt@g:#2}{#4}}%
+      \define@key{minted@opt@g@i}{#2}{%
+        \minted@addto@optlistcl{\minted@optlistcl@g@i}{#3=#4}%
+        \@namedef{minted@opt@g@i:#2}{#4}}%
+      \define@key{minted@opt@lang}{#2}{%
+        \minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang}{#3=#4}%
+        \@namedef{minted@opt@lang\minted@lang:#2}{#4}}%
+      \define@key{minted@opt@lang@i}{#2}{%
+        \minted@addto@optlistcl@lang{%
+          minted@optlistcl@lang\minted@lang @i}{#3=#4}%
+        \@namedef{minted@opt@lang\minted@lang @i:#2}{#4}}%
+      \define@key{minted@opt@cmd}{#2}{%
+        \minted@addto@optlistcl{\minted@optlistcl@cmd}{#3=#4}%
+        \@namedef{minted@opt@cmd:#2}{#4}}}%
+    {\define@key{minted@opt@g}{#2}[#1]{%
+        \minted@addto@optlistcl{\minted@optlistcl@g}{#3=#4}%
+        \@namedef{minted@opt@g:#2}{#4}}%
+      \define@key{minted@opt@g@i}{#2}[#1]{%
+        \minted@addto@optlistcl{\minted@optlistcl@g@i}{#3=#4}%
+        \@namedef{minted@opt@g@i:#2}{#4}}%
+      \define@key{minted@opt@lang}{#2}[#1]{%
+        \minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang}{#3=#4}%
+        \@namedef{minted@opt@lang\minted@lang:#2}{#4}}%
+      \define@key{minted@opt@lang@i}{#2}[#1]{%
+        \minted@addto@optlistcl@lang{%
+          minted@optlistcl@lang\minted@lang @i}{#3=#4}%
+        \@namedef{minted@opt@lang\minted@lang @i:#2}{#4}}%
+      \define@key{minted@opt@cmd}{#2}[#1]{%
+        \minted@addto@optlistcl{\minted@optlistcl@cmd}{#3=#4}%
+        \@namedef{minted@opt@cmd:#2}{#4}}}%
+}
+\edef\minted@hashchar{\string#}
+\edef\minted@dollarchar{\string$}
+\edef\minted@ampchar{\string&}
+\edef\minted@underscorechar{\string_}
+\edef\minted@tildechar{\string~}
+\edef\minted@leftsquarebracket{\string[}
+\edef\minted@rightsquarebracket{\string]}
+\newcommand{\minted@escchars}{%
+  \let\#\minted@hashchar
+  \let\%\@percentchar
+  \let\{\@charlb
+  \let\}\@charrb
+  \let\$\minted@dollarchar
+  \let\&\minted@ampchar
+  \let\_\minted@underscorechar
+  \let\\\@backslashchar
+  \let~\minted@tildechar
+  \let\~\minted@tildechar
+  \let\[\minted@leftsquarebracket
+  \let\]\minted@rightsquarebracket
+} %$ <- highlighting
+\newcommand{\minted@addto@optlistcl@e}[2]{%
+  \begingroup
+  \minted@escchars
+  \xdef\minted@xtmp{#2}%
+  \endgroup
+  \expandafter\minted@addto@optlistcl@e@i\expandafter{\minted@xtmp}{#1}}
+\def\minted@addto@optlistcl@e@i#1#2{%
+  \expandafter\def\expandafter#2\expandafter{#2#1\space}}
+\newcommand{\minted@addto@optlistcl@lang@e}[2]{%
+  \begingroup
+  \minted@escchars
+  \xdef\minted@xtmp{#2}%
+  \endgroup
+  \expandafter\minted@addto@optlistcl@lang@e@i\expandafter{\minted@xtmp}{#1}}
+\def\minted@addto@optlistcl@lang@e@i#1#2{%
+  \expandafter\let\expandafter\minted@tmp\csname #2\endcsname
+  \expandafter\def\expandafter\minted@tmp\expandafter{\minted@tmp#1\space}%
+  \expandafter\let\csname #2\endcsname\minted@tmp}
+\newcommand{\minted@def@optcl@e}[4][]{%
+  \ifthenelse{\equal{#1}{}}%
+    {\define@key{minted@opt@g}{#2}{%
+        \minted@addto@optlistcl@e{\minted@optlistcl@g}{#3=#4}%
+        \@namedef{minted@opt@g:#2}{#4}}%
+      \define@key{minted@opt@g@i}{#2}{%
+        \minted@addto@optlistcl@e{\minted@optlistcl@g@i}{#3=#4}%
+        \@namedef{minted@opt@g@i:#2}{#4}}%
+      \define@key{minted@opt@lang}{#2}{%
+        \minted@addto@optlistcl@lang@e{minted@optlistcl@lang\minted@lang}{#3=#4}%
+        \@namedef{minted@opt@lang\minted@lang:#2}{#4}}%
+      \define@key{minted@opt@lang@i}{#2}{%
+        \minted@addto@optlistcl@lang@e{%
+          minted@optlistcl@lang\minted@lang @i}{#3=#4}%
+        \@namedef{minted@opt@lang\minted@lang @i:#2}{#4}}%
+      \define@key{minted@opt@cmd}{#2}{%
+        \minted@addto@optlistcl@e{\minted@optlistcl@cmd}{#3=#4}%
+        \@namedef{minted@opt@cmd:#2}{#4}}}%
+    {\define@key{minted@opt@g}{#2}[#1]{%
+        \minted@addto@optlistcl@e{\minted@optlistcl@g}{#3=#4}%
+        \@namedef{minted@opt@g:#2}{#4}}%
+      \define@key{minted@opt@g@i}{#2}[#1]{%
+        \minted@addto@optlistcl@e{\minted@optlistcl@g@i}{#3=#4}%
+        \@namedef{minted@opt@g@i:#2}{#4}}%
+      \define@key{minted@opt@lang}{#2}[#1]{%
+        \minted@addto@optlistcl@lang@e{minted@optlistcl@lang\minted@lang}{#3=#4}%
+        \@namedef{minted@opt@lang\minted@lang:#2}{#4}}%
+      \define@key{minted@opt@lang@i}{#2}[#1]{%
+        \minted@addto@optlistcl@lang@e{%
+          minted@optlistcl@lang\minted@lang @i}{#3=#4}%
+        \@namedef{minted@opt@lang\minted@lang @i:#2}{#4}}%
+      \define@key{minted@opt@cmd}{#2}[#1]{%
+        \minted@addto@optlistcl@e{\minted@optlistcl@cmd}{#3=#4}%
+        \@namedef{minted@opt@cmd:#2}{#4}}}%
+}
+\newcommand{\minted@def@optcl@switch}[2]{%
+  \define@booleankey{minted@opt@g}{#1}%
+    {\minted@addto@optlistcl{\minted@optlistcl@g}{#2=True}%
+      \@namedef{minted@opt@g:#1}{true}}
+    {\minted@addto@optlistcl{\minted@optlistcl@g}{#2=False}%
+      \@namedef{minted@opt@g:#1}{false}}
+  \define@booleankey{minted@opt@g@i}{#1}%
+    {\minted@addto@optlistcl{\minted@optlistcl@g@i}{#2=True}%
+      \@namedef{minted@opt@g@i:#1}{true}}
+    {\minted@addto@optlistcl{\minted@optlistcl@g@i}{#2=False}%
+      \@namedef{minted@opt@g@i:#1}{false}}
+  \define@booleankey{minted@opt@lang}{#1}%
+    {\minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang}{#2=True}%
+      \@namedef{minted@opt@lang\minted@lang:#1}{true}}
+    {\minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang}{#2=False}%
+      \@namedef{minted@opt@lang\minted@lang:#1}{false}}
+  \define@booleankey{minted@opt@lang@i}{#1}%
+    {\minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang @i}{#2=True}%
+      \@namedef{minted@opt@lang\minted@lang @i:#1}{true}}
+    {\minted@addto@optlistcl@lang{minted@optlistcl@lang\minted@lang @i}{#2=False}%
+      \@namedef{minted@opt@lang\minted@lang @i:#1}{false}}
+  \define@booleankey{minted@opt@cmd}{#1}%
+      {\minted@addto@optlistcl{\minted@optlistcl@cmd}{#2=True}%
+        \@namedef{minted@opt@cmd:#1}{true}}
+      {\minted@addto@optlistcl{\minted@optlistcl@cmd}{#2=False}%
+        \@namedef{minted@opt@cmd:#1}{false}}
+}
+\newcommand{\minted@def@optfv}[1]{%
+  \define@key{minted@opt@g}{#1}{%
+    \expandafter\def\expandafter\minted@optlistfv@g\expandafter{%
+      \minted@optlistfv@g#1={##1},}%
+    \@namedef{minted@opt@g:#1}{##1}}
+  \define@key{minted@opt@g@i}{#1}{%
+    \expandafter\def\expandafter\minted@optlistfv@g@i\expandafter{%
+      \minted@optlistfv@g@i#1={##1},}%
+    \@namedef{minted@opt@g@i:#1}{##1}}
+  \define@key{minted@opt@lang}{#1}{%
+    \expandafter\let\expandafter\minted@tmp%
+      \csname minted@optlistfv@lang\minted@lang\endcsname
+    \expandafter\def\expandafter\minted@tmp\expandafter{%
+      \minted@tmp#1={##1},}%
+    \expandafter\let\csname minted@optlistfv@lang\minted@lang\endcsname%
+      \minted@tmp
+    \@namedef{minted@opt@lang\minted@lang:#1}{##1}}
+  \define@key{minted@opt@lang@i}{#1}{%
+    \expandafter\let\expandafter\minted@tmp%
+      \csname minted@optlistfv@lang\minted@lang @i\endcsname
+    \expandafter\def\expandafter\minted@tmp\expandafter{%
+      \minted@tmp#1={##1},}%
+    \expandafter\let\csname minted@optlistfv@lang\minted@lang @i\endcsname%
+      \minted@tmp
+    \@namedef{minted@opt@lang\minted@lang @i:#1}{##1}}
+  \define@key{minted@opt@cmd}{#1}{%
+    \expandafter\def\expandafter\minted@optlistfv@cmd\expandafter{%
+      \minted@optlistfv@cmd#1={##1},}%
+    \@namedef{minted@opt@cmd:#1}{##1}}
+}
+\newcommand{\minted@def@optfv@switch}[1]{%
+  \define@booleankey{minted@opt@g}{#1}%
+    {\expandafter\def\expandafter\minted@optlistfv@g\expandafter{%
+      \minted@optlistfv@g#1=true,}%
+     \@namedef{minted@opt@g:#1}{true}}%
+    {\expandafter\def\expandafter\minted@optlistfv@g\expandafter{%
+      \minted@optlistfv@g#1=false,}%
+     \@namedef{minted@opt@g:#1}{false}}%
+  \define@booleankey{minted@opt@g@i}{#1}%
+    {\expandafter\def\expandafter\minted@optlistfv@g@i\expandafter{%
+      \minted@optlistfv@g@i#1=true,}%
+     \@namedef{minted@opt@g@i:#1}{true}}%
+    {\expandafter\def\expandafter\minted@optlistfv@g@i\expandafter{%
+      \minted@optlistfv@g@i#1=false,}%
+     \@namedef{minted@opt@g@i:#1}{false}}%
+  \define@booleankey{minted@opt@lang}{#1}%
+    {\expandafter\let\expandafter\minted@tmp%
+        \csname minted@optlistfv@lang\minted@lang\endcsname
+      \expandafter\def\expandafter\minted@tmp\expandafter{%
+        \minted@tmp#1=true,}%
+      \expandafter\let\csname minted@optlistfv@lang\minted@lang\endcsname%
+        \minted@tmp
+     \@namedef{minted@opt@lang\minted@lang:#1}{true}}%
+    {\expandafter\let\expandafter\minted@tmp%
+        \csname minted@optlistfv@lang\minted@lang\endcsname
+      \expandafter\def\expandafter\minted@tmp\expandafter{%
+        \minted@tmp#1=false,}%
+      \expandafter\let\csname minted@optlistfv@lang\minted@lang\endcsname%
+        \minted@tmp
+     \@namedef{minted@opt@lang\minted@lang:#1}{false}}%
+  \define@booleankey{minted@opt@lang@i}{#1}%
+    {\expandafter\let\expandafter\minted@tmp%
+        \csname minted@optlistfv@lang\minted@lang @i\endcsname
+      \expandafter\def\expandafter\minted@tmp\expandafter{%
+        \minted@tmp#1=true,}%
+      \expandafter\let\csname minted@optlistfv@lang\minted@lang @i\endcsname%
+        \minted@tmp
+     \@namedef{minted@opt@lang\minted@lang @i:#1}{true}}%
+    {\expandafter\let\expandafter\minted@tmp%
+        \csname minted@optlistfv@lang\minted@lang @i\endcsname
+      \expandafter\def\expandafter\minted@tmp\expandafter{%
+        \minted@tmp#1=false,}%
+      \expandafter\let\csname minted@optlistfv@lang\minted@lang @i\endcsname%
+        \minted@tmp
+     \@namedef{minted@opt@lang\minted@lang @i:#1}{false}}%
+  \define@booleankey{minted@opt@cmd}{#1}%
+    {\expandafter\def\expandafter\minted@optlistfv@cmd\expandafter{%
+      \minted@optlistfv@cmd#1=true,}%
+     \@namedef{minted@opt@cmd:#1}{true}}%
+    {\expandafter\def\expandafter\minted@optlistfv@cmd\expandafter{%
+      \minted@optlistfv@cmd#1=false,}%
+     \@namedef{minted@opt@cmd:#1}{false}}%
+}
+\newboolean{minted@isinline}
+\newcommand{\minted@fvset}{%
+  \expandafter\fvset\expandafter{\minted@optlistfv@g}%
+  \expandafter\let\expandafter\minted@tmp%
+    \csname minted@optlistfv@lang\minted@lang\endcsname
+  \expandafter\fvset\expandafter{\minted@tmp}%
+  \ifthenelse{\boolean{minted@isinline}}%
+   {\expandafter\fvset\expandafter{\minted@optlistfv@g@i}%
+    \expandafter\let\expandafter\minted@tmp%
+      \csname minted@optlistfv@lang\minted@lang @i\endcsname
+    \expandafter\fvset\expandafter{\minted@tmp}}%
+   {}%
+  \expandafter\fvset\expandafter{\minted@optlistfv@cmd}%
+}
+\newcommand{\minted@def@opt}[2][]{%
+  \define@key{minted@opt@g}{#2}{%
+    \@namedef{minted@opt@g:#2}{##1}}
+  \define@key{minted@opt@g@i}{#2}{%
+    \@namedef{minted@opt@g@i:#2}{##1}}
+  \define@key{minted@opt@lang}{#2}{%
+    \@namedef{minted@opt@lang\minted@lang:#2}{##1}}
+  \define@key{minted@opt@lang@i}{#2}{%
+    \@namedef{minted@opt@lang\minted@lang @i:#2}{##1}}
+  \define@key{minted@opt@cmd}{#2}{%
+    \@namedef{minted@opt@cmd:#2}{##1}}
+  \ifstrempty{#1}{}{\@namedef{minted@opt@g:#2}{#1}}%
+}
+\newcommand{\minted@checkstyle}[1]{%
+  \ifcsname minted@styleloaded@\ifstrempty{#1}{default-pyg-prefix}{#1}\endcsname\else
+    \ifstrempty{#1}{}{\ifcsname PYG\endcsname\else\minted@checkstyle{}\fi}%
+    \expandafter\gdef%
+      \csname minted@styleloaded@\ifstrempty{#1}{default-pyg-prefix}{#1}\endcsname{}%
+    \ifthenelse{\boolean{minted@cache}}%
+     {\IfFileExists
+       {\minted@outputdir\minted@cachedir/\ifstrempty{#1}{default-pyg-prefix}{#1}.pygstyle}%
+       {}%
+       {%
+        \ifthenelse{\boolean{minted@frozencache}}%
+         {\PackageError{minted}%
+           {Missing style definition for #1 with frozencache}%
+           {Missing style definition for #1 with frozencache}}%
+         {\ifwindows
+            \ShellEscape{%
+              \MintedPygmentize\space -S \ifstrempty{#1}{default}{#1} -f latex
+              -P commandprefix=PYG#1
+              > \minted@outputdir@windows\minted@cachedir@windows\@backslashchar%
+                   \ifstrempty{#1}{default-pyg-prefix}{#1}.pygstyle}%
+          \else
+            \ShellEscape{%
+              \MintedPygmentize\space -S \ifstrempty{#1}{default}{#1} -f latex
+              -P commandprefix=PYG#1
+              > \minted@outputdir\minted@cachedir/%
+                   \ifstrempty{#1}{default-pyg-prefix}{#1}.pygstyle}%
+          \fi}%
+        }%
+        \begingroup
+        \let\def\gdef
+        \catcode\string``=12
+        \catcode`\_=11
+        \catcode`\-=11
+        \catcode`\%=14
+        \endlinechar=-1\relax
+        \minted@input{%
+          \minted@outputdir\minted@cachedir/\ifstrempty{#1}{default-pyg-prefix}{#1}.pygstyle}%
+        \endgroup
+        \minted@addcachefile{\ifstrempty{#1}{default-pyg-prefix}{#1}.pygstyle}}%
+     {%
+        \ifwindows
+          \ShellEscape{%
+            \MintedPygmentize\space -S \ifstrempty{#1}{default}{#1} -f latex
+            -P commandprefix=PYG#1 > \minted@outputdir@windows\minted@jobname.out.pyg}%
+        \else
+          \ShellEscape{%
+            \MintedPygmentize\space -S \ifstrempty{#1}{default}{#1} -f latex
+            -P commandprefix=PYG#1 > \minted@outputdir\minted@jobname.out.pyg}%
+        \fi
+        \begingroup
+        \let\def\gdef
+        \catcode\string``=12
+        \catcode`\_=11
+        \catcode`\-=11
+        \catcode`\%=14
+        \endlinechar=-1\relax
+        \minted@input{\minted@outputdir\minted@jobname.out.pyg}%
+        \endgroup}%
+    \ifstrempty{#1}{\minted@patch@PYGZsq}{}%
+  \fi
+}
+\ifthenelse{\boolean{minted@draft}}{\renewcommand{\minted@checkstyle}[1]{}}{}
+\newcommand{\minted@patch@PYGZsq}{%
+  \ifcsname PYGZsq\endcsname
+    \expandafter\ifdefstring\expandafter{\csname PYGZsq\endcsname}{\char`\'}%
+     {\minted@patch@PYGZsq@i}%
+     {}%
+  \fi
+}
+\begingroup
+\catcode`\'=\active
+\gdef\minted@patch@PYGZsq@i{\gdef\PYGZsq{'}}
+\endgroup
+\ifthenelse{\boolean{minted@draft}}{}{\AtBeginDocument{\minted@patch@PYGZsq}}
+\newcommand{\minted@def@opt@switch}[2][false]{%
+  \define@booleankey{minted@opt@g}{#2}%
+    {\@namedef{minted@opt@g:#2}{true}}%
+    {\@namedef{minted@opt@g:#2}{false}}
+  \define@booleankey{minted@opt@g@i}{#2}%
+    {\@namedef{minted@opt@g@i:#2}{true}}%
+    {\@namedef{minted@opt@g@i:#2}{false}}
+  \define@booleankey{minted@opt@lang}{#2}%
+    {\@namedef{minted@opt@lang\minted@lang:#2}{true}}%
+    {\@namedef{minted@opt@lang\minted@lang:#2}{false}}
+  \define@booleankey{minted@opt@lang@i}{#2}%
+    {\@namedef{minted@opt@lang\minted@lang @i:#2}{true}}%
+    {\@namedef{minted@opt@lang\minted@lang @i:#2}{false}}
+  \define@booleankey{minted@opt@cmd}{#2}%
+    {\@namedef{minted@opt@cmd:#2}{true}}%
+    {\@namedef{minted@opt@cmd:#2}{false}}%
+  \@namedef{minted@opt@g:#2}{#1}%
+}
+\def\minted@get@opt#1#2{%
+  \ifcsname minted@opt@cmd:#1\endcsname
+    \csname minted@opt@cmd:#1\endcsname
+  \else
+    \ifminted@isinline
+      \ifcsname minted@opt@lang\minted@lang @i:#1\endcsname
+        \csname minted@opt@lang\minted@lang @i:#1\endcsname
+      \else
+        \ifcsname minted@opt@g@i:#1\endcsname
+          \csname minted@opt@g@i:#1\endcsname
+        \else
+          \ifcsname minted@opt@lang\minted@lang:#1\endcsname
+            \csname minted@opt@lang\minted@lang:#1\endcsname
+          \else
+            \ifcsname minted@opt@g:#1\endcsname
+              \csname minted@opt@g:#1\endcsname
+            \else
+              #2%
+            \fi
+          \fi
+        \fi
+      \fi
+    \else
+      \ifcsname minted@opt@lang\minted@lang:#1\endcsname
+        \csname minted@opt@lang\minted@lang:#1\endcsname
+      \else
+        \ifcsname minted@opt@g:#1\endcsname
+          \csname minted@opt@g:#1\endcsname
+        \else
+          #2%
+        \fi
+      \fi
+    \fi
+  \fi
+}%
+\minted@def@optcl{encoding}{-P encoding}{#1}
+\minted@def@optcl{outencoding}{-P outencoding}{#1}
+\minted@def@optcl@e{escapeinside}{-P "escapeinside}{#1"}
+\minted@def@optcl@switch{stripnl}{-P stripnl}
+\minted@def@optcl@switch{stripall}{-P stripall}
+\minted@def@optcl@switch{python3}{-P python3}
+\minted@def@optcl@switch{funcnamehighlighting}{-P funcnamehighlighting}
+\minted@def@optcl@switch{startinline}{-P startinline}
+\ifthenelse{\boolean{minted@draft}}%
+  {\minted@def@optfv{gobble}}%
+  {\minted@def@optcl{gobble}{-F gobble:n}{#1}}
+\minted@def@optcl{codetagify}{-F codetagify:codetags}{#1}
+\minted@def@optcl{keywordcase}{-F keywordcase:case}{#1}
+\minted@def@optcl@switch{texcl}{-P texcomments}
+\minted@def@optcl@switch{texcomments}{-P texcomments}
+\minted@def@optcl@switch{mathescape}{-P mathescape}
+\minted@def@optfv@switch{linenos}
+\minted@def@opt{style}
+\minted@def@optfv{frame}
+\minted@def@optfv{framesep}
+\minted@def@optfv{framerule}
+\minted@def@optfv{rulecolor}
+\minted@def@optfv{numbersep}
+\minted@def@optfv{numbers}
+\minted@def@optfv{firstnumber}
+\minted@def@optfv{stepnumber}
+\minted@def@optfv{firstline}
+\minted@def@optfv{lastline}
+\minted@def@optfv{baselinestretch}
+\minted@def@optfv{xleftmargin}
+\minted@def@optfv{xrightmargin}
+\minted@def@optfv{fillcolor}
+\minted@def@optfv{tabsize}
+\minted@def@optfv{fontfamily}
+\minted@def@optfv{fontsize}
+\minted@def@optfv{fontshape}
+\minted@def@optfv{fontseries}
+\minted@def@optfv{formatcom}
+\minted@def@optfv{label}
+\minted@def@optfv{labelposition}
+\minted@def@optfv{highlightlines}
+\minted@def@optfv{highlightcolor}
+\minted@def@optfv{space}
+\minted@def@optfv{spacecolor}
+\minted@def@optfv{tab}
+\minted@def@optfv{tabcolor}
+\minted@def@optfv{highlightcolor}
+\minted@def@optfv@switch{beameroverlays}
+\minted@def@optfv@switch{curlyquotes}
+\minted@def@optfv@switch{numberfirstline}
+\minted@def@optfv@switch{numberblanklines}
+\minted@def@optfv@switch{stepnumberfromfirst}
+\minted@def@optfv@switch{stepnumberoffsetvalues}
+\minted@def@optfv@switch{showspaces}
+\minted@def@optfv@switch{resetmargins}
+\minted@def@optfv@switch{samepage}
+\minted@def@optfv@switch{showtabs}
+\minted@def@optfv@switch{obeytabs}
+\minted@def@optfv@switch{breaklines}
+\minted@def@optfv@switch{breakbytoken}
+\minted@def@optfv@switch{breakbytokenanywhere}
+\minted@def@optfv{breakindent}
+\minted@def@optfv{breakindentnchars}
+\minted@def@optfv@switch{breakautoindent}
+\minted@def@optfv{breaksymbol}
+\minted@def@optfv{breaksymbolsep}
+\minted@def@optfv{breaksymbolsepnchars}
+\minted@def@optfv{breaksymbolindent}
+\minted@def@optfv{breaksymbolindentnchars}
+\minted@def@optfv{breaksymbolleft}
+\minted@def@optfv{breaksymbolsepleft}
+\minted@def@optfv{breaksymbolsepleftnchars}
+\minted@def@optfv{breaksymbolindentleft}
+\minted@def@optfv{breaksymbolindentleftnchars}
+\minted@def@optfv{breaksymbolright}
+\minted@def@optfv{breaksymbolsepright}
+\minted@def@optfv{breaksymbolseprightnchars}
+\minted@def@optfv{breaksymbolindentright}
+\minted@def@optfv{breaksymbolindentrightnchars}
+\minted@def@optfv{breakbefore}
+\minted@def@optfv{breakbeforesymbolpre}
+\minted@def@optfv{breakbeforesymbolpost}
+\minted@def@optfv@switch{breakbeforegroup}
+\minted@def@optfv{breakafter}
+\minted@def@optfv@switch{breakaftergroup}
+\minted@def@optfv{breakaftersymbolpre}
+\minted@def@optfv{breakaftersymbolpost}
+\minted@def@optfv@switch{breakanywhere}
+\minted@def@optfv{breakanywheresymbolpre}
+\minted@def@optfv{breakanywheresymbolpost}
+\minted@def@opt{bgcolor}
+\minted@def@opt@switch{autogobble}
+\newcommand{\minted@encoding}{\minted@get@opt{encoding}{UTF8}}
+\newenvironment{minted@snugshade*}[1]{%
+  \def\FrameCommand##1{\hskip\@totalleftmargin
+    \colorbox{#1}{##1}%
+    \hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
+  \MakeFramed{\advance\hsize-\width
+    \@totalleftmargin\z@ \linewidth\hsize
+    \advance\labelsep\fboxsep
+    \@setminipage}%
+ }{\par\unskip\@minipagefalse\endMakeFramed}
+\newsavebox{\minted@bgbox}
+\newenvironment{minted@colorbg}[1]{%
+  \setlength{\OuterFrameSep}{0pt}%
+  \let\minted@tmp\FV@NumberSep
+  \edef\FV@NumberSep{%
+    \the\numexpr\dimexpr\minted@tmp+\number\fboxsep\relax sp\relax}%
+  \medskip
+  \begin{minted@snugshade*}{#1}}
+ {\end{minted@snugshade*}%
+  \medskip\noindent}
+\newwrite\minted@code
+\newcommand{\minted@savecode}[1]{
+  \immediate\openout\minted@code\minted@jobname.pyg\relax
+  \immediate\write\minted@code{\expandafter\detokenize\expandafter{#1}}%
+  \immediate\closeout\minted@code}
+\newcounter{minted@FancyVerbLineTemp}
+\newcommand{\minted@write@detok}[1]{%
+  \immediate\write\FV@OutFile{\detokenize{#1}}}
+\newcommand{\minted@FVB@VerbatimOut}[1]{%
+  \setcounter{minted@FancyVerbLineTemp}{\value{FancyVerbLine}}%
+  \@bsphack
+  \begingroup
+    \FV@UseKeyValues
+    \FV@DefineWhiteSpace
+    \def\FV@Space{\space}%
+    \FV@DefineTabOut
+    \let\FV@ProcessLine\minted@write@detok
+    \immediate\openout\FV@OutFile #1\relax
+    \let\FV@FontScanPrep\relax
+    \let\@noligs\relax
+    \FV@Scan}
+\newcommand{\minted@FVE@VerbatimOut}{%
+  \immediate\closeout\FV@OutFile\endgroup\@esphack
+  \setcounter{FancyVerbLine}{\value{minted@FancyVerbLineTemp}}}%
+\ifcsname MintedPygmentize\endcsname\else
+  \newcommand{\MintedPygmentize}{pygmentize}
+\fi
+\newcounter{minted@pygmentizecounter}
+\newcommand{\minted@pygmentize}[2][\minted@outputdir\minted@jobname.pyg]{%
+  \minted@checkstyle{\minted@get@opt{style}{default}}%
+  \stepcounter{minted@pygmentizecounter}%
+  \ifthenelse{\equal{\minted@get@opt{autogobble}{false}}{true}}%
+    {\def\minted@codefile{\minted@outputdir\minted@jobname.pyg}}%
+    {\def\minted@codefile{#1}}%
+  \ifthenelse{\boolean{minted@isinline}}%
+    {\def\minted@optlistcl@inlines{%
+      \minted@optlistcl@g@i
+      \csname minted@optlistcl@lang\minted@lang @i\endcsname}}%
+    {\let\minted@optlistcl@inlines\@empty}%
+  \def\minted@cmd{%
+    \ifminted@kpsewhich
+      \ifwindows
+        \detokenize{for /f "usebackq tokens=*"}\space\@percentchar\detokenize{a in (`kpsewhich}\space\minted@codefile\detokenize{`) do}\space
+      \fi
+    \fi
+    \MintedPygmentize\space -l #2
+    -f latex -P commandprefix=PYG -F tokenmerge
+    \minted@optlistcl@g \csname minted@optlistcl@lang\minted@lang\endcsname
+    \minted@optlistcl@inlines
+    \minted@optlistcl@cmd -o \minted@outputdir\minted@infile\space
+    \ifminted@kpsewhich
+      \ifwindows
+        \@percentchar\detokenize{a}%
+      \else
+        \detokenize{`}kpsewhich \minted@codefile\space
+          \detokenize{||} \minted@codefile\detokenize{`}%
+      \fi
+    \else
+      \minted@codefile
+    \fi}%
+  % For debugging, uncomment: %%%%
+  % \immediate\typeout{\minted@cmd}%
+  % %%%%
+  \ifthenelse{\boolean{minted@cache}}%
+    {%
+      \ifminted@frozencache
+      \else
+        \ifx\XeTeXinterchartoks\minted@undefined
+          \ifthenelse{\equal{\minted@get@opt{autogobble}{false}}{true}}%
+            {\edef\minted@hash{\pdf@filemdfivesum{#1}%
+              \pdf@mdfivesum{\minted@cmd autogobble(\ifx\FancyVerbStartNum\z@ 0\else\FancyVerbStartNum\fi-\ifx\FancyVerbStopNum\z@ 0\else\FancyVerbStopNum\fi)}}}%
+            {\edef\minted@hash{\pdf@filemdfivesum{#1}%
+              \pdf@mdfivesum{\minted@cmd}}}%
+        \else
+          \ifx\mdfivesum\minted@undefined
+            \immediate\openout\minted@code\minted@jobname.mintedcmd\relax
+            \immediate\write\minted@code{\minted@cmd}%
+            \ifthenelse{\equal{\minted@get@opt{autogobble}{false}}{true}}%
+              {\immediate\write\minted@code{autogobble(\ifx\FancyVerbStartNum\z@ 0\else\FancyVerbStartNum\fi-\ifx\FancyVerbStopNum\z@ 0\else\FancyVerbStopNum\fi)}}{}%
+            \immediate\closeout\minted@code
+            \edef\minted@argone@esc{#1}%
+            \StrSubstitute{\minted@argone@esc}{\@backslashchar}{\@backslashchar\@backslashchar}[\minted@argone@esc]%
+            \StrSubstitute{\minted@argone@esc}{"}{\@backslashchar"}[\minted@argone@esc]%
+            \edef\minted@tmpfname@esc{\minted@outputdir\minted@jobname}%
+            \StrSubstitute{\minted@tmpfname@esc}{\@backslashchar}{\@backslashchar\@backslashchar}[\minted@tmpfname@esc]%
+            \StrSubstitute{\minted@tmpfname@esc}{"}{\@backslashchar"}[\minted@tmpfname@esc]%
+            %Cheating a little here by using ASCII codes to write `{` and `}`
+            %in the Python code
+            \def\minted@hashcmd{%
+              \detokenize{python -c "import hashlib; import os;
+                hasher = hashlib.sha1();
+                f = open(os.path.expanduser(os.path.expandvars(\"}\minted@tmpfname@esc.mintedcmd\detokenize{\")), \"rb\");
+                hasher.update(f.read());
+                f.close();
+                f = open(os.path.expanduser(os.path.expandvars(\"}\minted@argone@esc\detokenize{\")), \"rb\");
+                hasher.update(f.read());
+                f.close();
+                f = open(os.path.expanduser(os.path.expandvars(\"}\minted@tmpfname@esc.mintedmd5\detokenize{\")), \"w\");
+                macro = \"\\edef\\minted@hash\" + chr(123) + hasher.hexdigest() + chr(125) + \"\";
+                f.write(\"\\makeatletter\" + macro + \"\\makeatother\\endinput\n\");
+                f.close();"}}%
+            \ShellEscape{\minted@hashcmd}%
+            \minted@input{\minted@outputdir\minted@jobname.mintedmd5}%
+          \else
+            \ifthenelse{\equal{\minted@get@opt{autogobble}{false}}{true}}%
+             {\edef\minted@hash{\mdfivesum file {#1}%
+                \mdfivesum{\minted@cmd autogobble(\ifx\FancyVerbStartNum\z@ 0\else\FancyVerbStartNum\fi-\ifx\FancyVerbStopNum\z@ 0\else\FancyVerbStopNum\fi)}}}%
+             {\edef\minted@hash{\mdfivesum file {#1}%
+                \mdfivesum{\minted@cmd}}}%
+          \fi
+        \fi
+        \edef\minted@infile{\minted@cachedir/\minted@hash.pygtex}%
+        \IfFileExists{\minted@infile}{}{%
+          \ifthenelse{\equal{\minted@get@opt{autogobble}{false}}{true}}{%
+            \minted@autogobble{#1}}{}%
+          \ShellEscape{\minted@cmd}}%
+      \fi
+      \ifthenelse{\boolean{minted@finalizecache}}%
+       {%
+          \edef\minted@cachefilename{listing\arabic{minted@pygmentizecounter}.pygtex}%
+          \edef\minted@actualinfile{\minted@cachedir/\minted@cachefilename}%
+          \ifwindows
+            \StrSubstitute{\minted@infile}{/}{\@backslashchar}[\minted@infile@windows]
+            \StrSubstitute{\minted@actualinfile}{/}{\@backslashchar}[\minted@actualinfile@windows]
+            \ShellEscape{move /y \minted@outputdir\minted@infile@windows\space\minted@outputdir\minted@actualinfile@windows}%
+          \else
+            \ShellEscape{mv -f \minted@outputdir\minted@infile\space\minted@outputdir\minted@actualinfile}%
+          \fi
+          \let\minted@infile\minted@actualinfile
+          \expandafter\minted@addcachefile\expandafter{\minted@cachefilename}%
+       }%
+       {\ifthenelse{\boolean{minted@frozencache}}%
+         {%
+            \edef\minted@cachefilename{listing\arabic{minted@pygmentizecounter}.pygtex}%
+            \edef\minted@infile{\minted@cachedir/\minted@cachefilename}%
+            \expandafter\minted@addcachefile\expandafter{\minted@cachefilename}}%
+         {\expandafter\minted@addcachefile\expandafter{\minted@hash.pygtex}}%
+       }%
+      \minted@inputpyg}%
+    {%
+      \ifthenelse{\equal{\minted@get@opt{autogobble}{false}}{true}}{%
+        \minted@autogobble{#1}}{}%
+      \ShellEscape{\minted@cmd}%
+      \minted@inputpyg}%
+}
+\def\minted@autogobble#1{%
+  \edef\minted@argone@esc{#1}%
+  \StrSubstitute{\minted@argone@esc}{\@backslashchar}{\@backslashchar\@backslashchar}[\minted@argone@esc]%
+  \StrSubstitute{\minted@argone@esc}{"}{\@backslashchar"}[\minted@argone@esc]%
+  \edef\minted@tmpfname@esc{\minted@outputdir\minted@jobname}%
+  \StrSubstitute{\minted@tmpfname@esc}{\@backslashchar}{\@backslashchar\@backslashchar}[\minted@tmpfname@esc]%
+  \StrSubstitute{\minted@tmpfname@esc}{"}{\@backslashchar"}[\minted@tmpfname@esc]%
+  %Need a version of open() that supports encoding under Python 2
+  \edef\minted@autogobblecmd{%
+    \ifminted@kpsewhich
+      \ifwindows
+        \detokenize{for /f "usebackq tokens=*" }\@percentchar\detokenize{a in (`kpsewhich} #1\detokenize{`) do}\space
+      \fi
+    \fi
+    \detokenize{python -c "import sys; import os;
+    import textwrap;
+    from io import open;
+    fname = }%
+      \ifminted@kpsewhich
+        \detokenize{sys.argv[1];}\space%
+      \else
+        \detokenize{os.path.expanduser(os.path.expandvars(\"}\minted@argone@esc\detokenize{\"));}\space%
+      \fi
+    \detokenize{f = open(fname, \"r\", encoding=\"}\minted@encoding\detokenize{\") if os.path.isfile(fname) else None;
+    t = f.readlines() if f is not None else None;
+    t_opt = t if t is not None else [];
+    f.close() if f is not None else None;
+    tmpfname = os.path.expanduser(os.path.expandvars(\"}\minted@tmpfname@esc.pyg\detokenize{\"));
+    f = open(tmpfname, \"w\", encoding=\"}\minted@encoding\detokenize{\") if t is not None else None;
+    fvstartnum = }\ifx\FancyVerbStartNum\z@ 0\else\FancyVerbStartNum\fi\detokenize{;
+    fvstopnum = }\ifx\FancyVerbStopNum\z@ 0\else\FancyVerbStopNum\fi\detokenize{;
+    s = fvstartnum-1 if fvstartnum != 0 else 0;
+    e = fvstopnum if fvstopnum != 0 else len(t_opt);
+    [f.write(textwrap.dedent(\"\".join(x))) for x in (t_opt[0:s], t_opt[s:e], t_opt[e:]) if x and t is not None];
+    f.close() if t is not None else os.remove(tmpfname);"}%
+    \ifminted@kpsewhich
+      \ifwindows
+        \space\@percentchar\detokenize{a}%
+      \else
+        \space\detokenize{`}kpsewhich #1\space\detokenize{||} #1\detokenize{`}%
+      \fi
+    \fi
+  }%
+  \ShellEscape{\minted@autogobblecmd}%
+}
+\newcommand{\minted@inputpyg}{%
+  \expandafter\let\expandafter\minted@PYGstyle%
+    \csname PYG\minted@get@opt{style}{default}\endcsname
+  \VerbatimPygments{\PYG}{\minted@PYGstyle}%
+  \ifthenelse{\boolean{minted@isinline}}%
+   {\ifthenelse{\equal{\minted@get@opt{breaklines}{false}}{true}}%
+    {\let\FV@BeginVBox\relax
+     \let\FV@EndVBox\relax
+     \def\FV@BProcessLine##1{\FancyVerbFormatLine{##1}}%
+     \minted@inputpyg@inline}%
+    {\minted@inputpyg@inline}}%
+   {\minted@inputpyg@block}%
+}
+\def\minted@inputpyg@inline{%
+  \ifthenelse{\equal{\minted@get@opt{bgcolor}{}}{}}%
+   {\minted@input{\minted@outputdir\minted@infile}}%
+   {\colorbox{\minted@get@opt{bgcolor}{}}{%
+      \minted@input{\minted@outputdir\minted@infile}}}%
+}
+\def\minted@inputpyg@block{%
+  \ifthenelse{\equal{\minted@get@opt{bgcolor}{}}{}}%
+   {\minted@input{\minted@outputdir\minted@infile}}%
+   {\begin{minted@colorbg}{\minted@get@opt{bgcolor}{}}%
+    \minted@input{\minted@outputdir\minted@infile}%
+    \end{minted@colorbg}}}
+\newcommand{\minted@langlinenoson}{%
+  \ifcsname c@minted@lang\minted@lang\endcsname\else
+    \newcounter{minted@lang\minted@lang}%
+  \fi
+  \setcounter{minted@FancyVerbLineTemp}{\value{FancyVerbLine}}%
+  \setcounter{FancyVerbLine}{\value{minted@lang\minted@lang}}%
+}
+\newcommand{\minted@langlinenosoff}{%
+  \setcounter{minted@lang\minted@lang}{\value{FancyVerbLine}}%
+  \setcounter{FancyVerbLine}{\value{minted@FancyVerbLineTemp}}%
+}
+\ifthenelse{\boolean{minted@langlinenos}}{}{%
+  \let\minted@langlinenoson\relax
+  \let\minted@langlinenosoff\relax
+}
+\newcommand{\setminted}[2][]{%
+  \ifthenelse{\equal{#1}{}}%
+    {\setkeys{minted@opt@g}{#2}}%
+    {\minted@configlang{#1}%
+      \setkeys{minted@opt@lang}{#2}}}
+\newcommand{\setmintedinline}[2][]{%
+  \ifthenelse{\equal{#1}{}}%
+    {\setkeys{minted@opt@g@i}{#2}}%
+    {\minted@configlang{#1}%
+      \setkeys{minted@opt@lang@i}{#2}}}
+\setmintedinline[php]{startinline=true}
+\setminted{tabcolor=black}
+\newcommand{\usemintedstyle}[2][]{\setminted[#1]{style=#2}}
+\begingroup
+\catcode`\ =\active
+\catcode`\^^I=\active
+\gdef\minted@defwhitespace@retok{\def {\noexpand\FV@Space}\def^^I{\noexpand\FV@Tab}}%
+\endgroup
+\newcommand{\minted@writecmdcode}[1]{%
+  \immediate\openout\minted@code\minted@jobname.pyg\relax
+  \immediate\write\minted@code{\detokenize{#1}}%
+  \immediate\closeout\minted@code}
+\newrobustcmd{\mintinline}[2][]{%
+  \begingroup
+  \setboolean{minted@isinline}{true}%
+  \minted@configlang{#2}%
+  \setkeys{minted@opt@cmd}{#1}%
+  \minted@fvset
+  \begingroup
+  \let\do\@makeother\dospecials
+  \catcode`\{=1
+  \catcode`\}=2
+  \catcode`\^^I=\active
+  \@ifnextchar\bgroup
+    {\minted@inline@iii}%
+    {\catcode`\{=12\catcode`\}=12
+      \minted@inline@i}}
+\def\minted@inline@i#1{%
+  \endgroup
+  \def\minted@inline@ii##1#1{%
+    \minted@inline@iii{##1}}%
+  \begingroup
+  \let\do\@makeother\dospecials
+  \catcode`\^^I=\active
+  \minted@inline@ii}
+\ifthenelse{\boolean{minted@draft}}%
+  {\newcommand{\minted@inline@iii}[1]{%
+    \endgroup
+    \begingroup
+    \minted@defwhitespace@retok
+    \everyeof{\noexpand}%
+    \endlinechar-1\relax
+    \let\do\@makeother\dospecials
+    \catcode`\ =\active
+    \catcode`\^^I=\active
+    \xdef\minted@tmp{\scantokens{#1}}%
+    \endgroup
+    \let\FV@Line\minted@tmp
+    \def\FV@SV@minted@tmp{%
+      \FV@Gobble
+      \expandafter\FV@ProcessLine\expandafter{\FV@Line}}%
+    \ifthenelse{\equal{\minted@get@opt{breaklines}{false}}{true}}%
+     {\let\FV@BeginVBox\relax
+      \let\FV@EndVBox\relax
+      \def\FV@BProcessLine##1{\FancyVerbFormatLine{##1}}%
+      \BUseVerbatim{minted@tmp}}%
+     {\BUseVerbatim{minted@tmp}}%
+    \endgroup}}%
+  {\newcommand{\minted@inline@iii}[1]{%
+    \endgroup
+    \minted@writecmdcode{#1}%
+    \RecustomVerbatimEnvironment{Verbatim}{BVerbatim}{}%
+    \setcounter{minted@FancyVerbLineTemp}{\value{FancyVerbLine}}%
+    \minted@pygmentize{\minted@lang}%
+    \setcounter{FancyVerbLine}{\value{minted@FancyVerbLineTemp}}%
+    \endgroup}}
+\newrobustcmd{\mint}[2][]{%
+  \begingroup
+  \minted@configlang{#2}%
+  \setkeys{minted@opt@cmd}{#1}%
+  \minted@fvset
+  \begingroup
+  \let\do\@makeother\dospecials
+  \catcode`\{=1
+  \catcode`\}=2
+  \catcode`\^^I=\active
+  \@ifnextchar\bgroup
+    {\mint@iii}%
+    {\catcode`\{=12\catcode`\}=12
+      \mint@i}}
+\def\mint@i#1{%
+  \endgroup
+  \def\mint@ii##1#1{%
+    \mint@iii{##1}}%
+  \begingroup
+  \let\do\@makeother\dospecials
+  \catcode`\^^I=\active
+  \mint@ii}
+\ifthenelse{\boolean{minted@draft}}%
+  {\newcommand{\mint@iii}[1]{%
+    \endgroup
+    \begingroup
+    \minted@defwhitespace@retok
+    \everyeof{\noexpand}%
+    \endlinechar-1\relax
+    \let\do\@makeother\dospecials
+    \catcode`\ =\active
+    \catcode`\^^I=\active
+    \xdef\minted@tmp{\scantokens{#1}}%
+    \endgroup
+    \let\FV@Line\minted@tmp
+    \def\FV@SV@minted@tmp{%
+      \FV@CodeLineNo=1\FV@StepLineNo
+      \FV@Gobble
+      \expandafter\FV@ProcessLine\expandafter{\FV@Line}}%
+    \minted@langlinenoson
+    \UseVerbatim{minted@tmp}%
+    \minted@langlinenosoff
+    \endgroup}}%
+  {\newcommand{\mint@iii}[1]{%
+    \endgroup
+    \minted@writecmdcode{#1}%
+    \minted@langlinenoson
+    \minted@pygmentize{\minted@lang}%
+    \minted@langlinenosoff
+    \endgroup}}
+\ifthenelse{\boolean{minted@draft}}%
+  {\newenvironment{minted}[2][]
+    {\VerbatimEnvironment
+      \minted@configlang{#2}%
+      \setkeys{minted@opt@cmd}{#1}%
+      \minted@fvset
+      \minted@langlinenoson
+      \begin{Verbatim}}%
+    {\end{Verbatim}%
+      \minted@langlinenosoff}}%
+  {\newenvironment{minted}[2][]
+    {\VerbatimEnvironment
+      \let\FVB@VerbatimOut\minted@FVB@VerbatimOut
+      \let\FVE@VerbatimOut\minted@FVE@VerbatimOut
+      \minted@configlang{#2}%
+      \setkeys{minted@opt@cmd}{#1}%
+      \minted@fvset
+      \begin{VerbatimOut}[codes={\catcode`\^^I=12},firstline,lastline]{\minted@jobname.pyg}}%
+    {\end{VerbatimOut}%
+        \minted@langlinenoson
+        \minted@pygmentize{\minted@lang}%
+        \minted@langlinenosoff}}
+\ifthenelse{\boolean{minted@draft}}%
+  {\newcommand{\inputminted}[3][]{%
+    \begingroup
+    \minted@configlang{#2}%
+    \setkeys{minted@opt@cmd}{#1}%
+    \minted@fvset
+    \VerbatimInput{#3}%
+    \endgroup}}%
+  {\newcommand{\inputminted}[3][]{%
+    \begingroup
+    \minted@configlang{#2}%
+    \setkeys{minted@opt@cmd}{#1}%
+    \minted@fvset
+    \minted@pygmentize[#3]{#2}%
+    \endgroup}}
+\newcommand{\newminted}[3][]{
+  \ifthenelse{\equal{#1}{}}
+    {\def\minted@envname{#2code}}
+    {\def\minted@envname{#1}}
+  \newenvironment{\minted@envname}
+    {\VerbatimEnvironment
+      \begin{minted}[#3]{#2}}
+    {\end{minted}}
+  \newenvironment{\minted@envname *}[1]
+    {\VerbatimEnvironment\begin{minted}[#3,##1]{#2}}
+    {\end{minted}}}
+\newcommand{\newmint}[3][]{
+  \ifthenelse{\equal{#1}{}}
+    {\def\minted@shortname{#2}}
+    {\def\minted@shortname{#1}}
+  \expandafter\newcommand\csname\minted@shortname\endcsname[2][]{
+    \mint[#3,##1]{#2}##2}}
+\newcommand{\newmintedfile}[3][]{
+  \ifthenelse{\equal{#1}{}}
+    {\def\minted@shortname{#2file}}
+    {\def\minted@shortname{#1}}
+  \expandafter\newcommand\csname\minted@shortname\endcsname[2][]{
+    \inputminted[#3,##1]{#2}{##2}}}
+\newcommand{\newmintinline}[3][]{%
+  \ifthenelse{\equal{#1}{}}%
+    {\def\minted@shortname{#2inline}}%
+    {\def\minted@shortname{#1}}%
+    \expandafter\newrobustcmd\csname\minted@shortname\endcsname{%
+      \begingroup
+      \let\do\@makeother\dospecials
+      \catcode`\{=1
+      \catcode`\}=2
+      \@ifnextchar[{\endgroup\minted@inliner[#3][#2]}%
+        {\endgroup\minted@inliner[#3][#2][]}}%
+    \def\minted@inliner[##1][##2][##3]{\mintinline[##1,##3]{##2}}%
+}
+\ifthenelse{\boolean{minted@newfloat}}%
+ {\@ifundefined{minted@float@within}%
+    {\DeclareFloatingEnvironment[fileext=lol,placement=tbp]{listing}}%
+    {\def\minted@tmp#1{%
+       \DeclareFloatingEnvironment[fileext=lol,placement=tbp, within=#1]{listing}}%
+     \expandafter\minted@tmp\expandafter{\minted@float@within}}}%
+ {\@ifundefined{minted@float@within}%
+    {\newfloat{listing}{tbp}{lol}}%
+    {\newfloat{listing}{tbp}{lol}[\minted@float@within]}}
+\ifminted@newfloat\else
+\newcommand{\listingscaption}{Listing}
+\floatname{listing}{\listingscaption}
+\newcommand{\listoflistingscaption}{List of Listings}
+\providecommand{\listoflistings}{\listof{listing}{\listoflistingscaption}}
+\fi
+\AtEndOfPackage{%
+  \ifthenelse{\boolean{minted@draft}}%
+   {}%
+   {%
+    \ifthenelse{\boolean{minted@frozencache}}{}{%
+      \ifnum\pdf@shellescape=1\relax\else
+        \PackageError{minted}%
+         {You must invoke LaTeX with the
+          -shell-escape flag}%
+         {Pass the -shell-escape flag to LaTeX. Refer to the minted.sty
+          documentation for more information.}%
+      \fi}%
+   }%
+}
+\AtEndPreamble{%
+  \ifthenelse{\boolean{minted@draft}}%
+   {}%
+   {%
+    \ifthenelse{\boolean{minted@frozencache}}{}{%
+      \TestAppExists{\MintedPygmentize}%
+      \ifAppExists\else
+        \PackageError{minted}%
+         {You must have `pygmentize' installed
+          to use this package}%
+         {Refer to the installation instructions in the minted
+          documentation for more information.}%
+      \fi}%
+  }%
+}
+\AfterEndDocument{%
+  \ifthenelse{\boolean{minted@draft}}%
+   {}%
+   {\ifthenelse{\boolean{minted@frozencache}}%
+     {}
+     {\ifx\XeTeXinterchartoks\minted@undefined
+      \else
+        \DeleteFile[\minted@outputdir]{\minted@jobname.mintedcmd}%
+        \DeleteFile[\minted@outputdir]{\minted@jobname.mintedmd5}%
+      \fi
+      \DeleteFile[\minted@outputdir]{\minted@jobname.pyg}%
+      \DeleteFile[\minted@outputdir]{\minted@jobname.out.pyg}%
+     }%
+   }%
+}
+\endinput
+%%
+%% End of file `minted.sty'.

references/2021.naacl.nguyen/source/naacl2021.bbl ADDED Viewed

	@@ -0,0 +1,180 @@

+\begin{thebibliography}{29}
+\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
+\bibitem[{Bang and Sornlertlamvanich(2018)}]{BANG20182016IIP0038}
+Tran~Sy Bang and Virach Sornlertlamvanich. 2018.
+\newblock Sentiment classification for hotel booking review based on sentence
+  dependency structure and sub-opinion analysis.
+\newblock \emph{IEICE Transactions on Information and Systems},
+  E101.D(4):909--916.
+\bibitem[{Chu and Liu(1965)}]{chuliu}
+Yoeng-Jin Chu and Tseng-Hong Liu. 1965.
+\newblock {On the Shortest Arborescence of a Directed Graph}.
+\newblock \emph{Science Sinica}, 14:1396--1400.
+\bibitem[{Devlin et~al.(2019)Devlin, Chang, Lee, and
+  Toutanova}]{devlin-etal-2019-bert}
+Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019.
+\newblock {BERT}: Pre-training of deep bidirectional transformers for language
+  understanding.
+\newblock In \emph{Proceedings of NAACL}, pages 4171--4186.
+\bibitem[{Dozat and Manning(2017)}]{DozatM17}
+Timothy Dozat and Christopher~D. Manning. 2017.
+\newblock {Deep Biaffine Attention for Neural Dependency Parsing}.
+\newblock In \emph{Proceedings of ICLR}.
+\bibitem[{Edmonds(1967)}]{Edmonds}
+Jack Edmonds. 1967.
+\newblock {Optimum Branchings}.
+\newblock \emph{Journal of Research of the National Bureau of Standards},
+  71:233--240.
+\bibitem[{Hashimoto et~al.(2017)Hashimoto, Xiong, Tsuruoka, and
+  Socher}]{hashimoto-etal-2017-joint}
+Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017.
+\newblock {A Joint Many-Task Model: Growing a Neural Network for Multiple {NLP}
+  Tasks}.
+\newblock In \emph{Proceedings of EMNLP}, pages 1923--1933.
+\bibitem[{Kondratyuk and Straka(2019)}]{kondratyuk-straka-2019-75}
+Dan Kondratyuk and Milan Straka. 2019.
+\newblock {75 Languages, 1 Model: Parsing {U}niversal {D}ependencies
+  Universally}.
+\newblock In \emph{Proceedings of EMNLP-IJCNLP}, pages 2779--2795.
+\bibitem[{Lafferty et~al.(2001)Lafferty, McCallum, and Pereira}]{Lafferty:2001}
+John~D. Lafferty, Andrew McCallum, and Fernando C.~N. Pereira. 2001.
+\newblock {Conditional Random Fields: Probabilistic Models for Segmenting and
+  Labeling Sequence Data}.
+\newblock In \emph{Proceedings of ICML}, pages 282--289.
+\bibitem[{Le-Hong and Bui(2018)}]{3184558.3191535}
+Phuong Le-Hong and Duc-Thien Bui. 2018.
+\newblock {A Factoid Question Answering System for Vietnamese}.
+\newblock In \emph{Companion Proceedings of the The Web Conference 2018}, page
+  1049–1055.
+\bibitem[{Li et~al.(2018)Li, He, Zhang, and Zhao}]{li-etal-2018-joint-learning}
+Zuchao Li, Shexia He, Zhuosheng Zhang, and Hai Zhao. 2018.
+\newblock {Joint Learning of {POS} and Dependencies for Multilingual
+  {U}niversal {D}ependency Parsing}.
+\newblock In \emph{Proceedings of the {C}o{NLL} 2018 Shared Task}, pages
+  65--73.
+\bibitem[{Loshchilov and Hutter(2019)}]{loshchilov2018decoupled}
+Ilya Loshchilov and Frank Hutter. 2019.
+\newblock {Decoupled Weight Decay Regularization}.
+\newblock In \emph{Proceedings of ICLR}.
+\bibitem[{Nguyen et~al.(2020)Nguyen, Dao, and Nguyen}]{vitext2sql}
+Anh~Tuan Nguyen, Mai~Hoang Dao, and Dat~Quoc Nguyen. 2020.
+\newblock {A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese}.
+\newblock In \emph{Findings of EMNLP 2020}, pages 4079--4085.
+\bibitem[{Nguyen(2019)}]{NguyenALTA2019}
+Dat~Quoc Nguyen. 2019.
+\newblock {A neural joint model for Vietnamese word segmentation, POS tagging
+  and dependency parsing}.
+\newblock In \emph{Proceedings of ALTA}, pages 28--34.
+\bibitem[{Nguyen and Nguyen(2020)}]{phobert}
+Dat~Quoc Nguyen and Anh~Tuan Nguyen. 2020.
+\newblock {PhoBERT: Pre-trained language models for Vietnamese}.
+\newblock In \emph{Findings of EMNLP 2020}, pages 1037--1042.
+\bibitem[{Nguyen et~al.(2017)Nguyen, Nguyen, and Pham}]{NguyenNP_SWJ}
+Dat~Quoc Nguyen, Dai~Quoc Nguyen, and Son~Bao Pham. 2017.
+\newblock {Ripple Down Rules for Question Answering}.
+\newblock \emph{Semantic Web}, 8(4):511--532.
+\bibitem[{Nguyen et~al.(2014)Nguyen, Nguyen, Pham, Nguyen, and
+  Nguyen}]{Nguyen2014NLDB}
+Dat~Quoc Nguyen, Dai~Quoc Nguyen, Son~Bao Pham, Phuong-Thai Nguyen, and Minh~Le
+  Nguyen. 2014.
+\newblock {From Treebank Conversion to Automatic Dependency Parsing for
+  Vietnamese}.
+\newblock In \emph{{Proceedings of NLDB}}, pages 196--207.
+\bibitem[{Nguyen and Verspoor(2018)}]{nguyen-verspoor-2018-improved}
+Dat~Quoc Nguyen and Karin Verspoor. 2018.
+\newblock An improved neural network model for joint {POS} tagging and
+  dependency parsing.
+\newblock In \emph{Proceedings of the {C}o{NLL} 2018 Shared Task}, pages
+  81--91.
+\bibitem[{Nguyen et~al.(2019)Nguyen, Ngo, Vu, Tran, and Nguyen}]{JCC13161}
+Huyen Nguyen, Quyen Ngo, Luong Vu, Vu~Tran, and Hien Nguyen. 2019.
+\newblock {VLSP Shared Task: Named Entity Recognition}.
+\newblock \emph{Journal of Computer Science and Cybernetics}, 34(4):283--294.
+\bibitem[{Nguyen et~al.(2009)Nguyen, Vu, Nguyen, Nguyen, and
+  Le}]{nguyen-etal-2009-building}
+Phuong-Thai Nguyen, Xuan-Luong Vu, Thi-Minh-Huyen Nguyen, Van-Hiep Nguyen, and
+  Hong-Phuong Le. 2009.
+\newblock {Building a Large Syntactically-Annotated Corpus of {V}ietnamese}.
+\newblock In \emph{Proceedings of {LAW}}, pages 182--185.
+\bibitem[{Paszke et~al.(2019)Paszke, Gross et~al.}]{NEURIPS2019_9015}
+Adam Paszke, Sam Gross, et~al. 2019.
+\newblock {PyTorch: An Imperative Style, High-Performance Deep Learning
+  Library}.
+\newblock In \emph{Proceedings of NeurIPS 2019}, pages 8024--8035.
+\bibitem[{Qi et~al.(2018)Qi, Dozat, Zhang, and
+  Manning}]{qi-etal-2018-universal}
+Peng Qi, Timothy Dozat, Yuhao Zhang, and Christopher~D. Manning. 2018.
+\newblock {U}niversal {D}ependency parsing from scratch.
+\newblock In \emph{Proceedings of the {C}o{NLL} 2018 Shared Task}, pages
+  160--170.
+\bibitem[{Qi et~al.(2020)Qi, Zhang, Zhang, Bolton, and
+  Manning}]{qi-etal-2020-stanza}
+Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher~D. Manning.
+  2020.
+\newblock {S}tanza: A python natural language processing toolkit for many human
+  languages.
+\newblock In \emph{Proceedings of ACL: System Demonstrations}, pages 101--108.
+\bibitem[{Ruder(2019)}]{Ruder2019Neural}
+Sebastian Ruder. 2019.
+\newblock \emph{Neural Transfer Learning for Natural Language Processing}.
+\newblock Ph.D. thesis, National University of Ireland, Galway.
+\bibitem[{Sennrich et~al.(2016)Sennrich, Haddow, and
+  Birch}]{sennrich-etal-2016-neural}
+Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016.
+\newblock {Neural Machine Translation of Rare Words with Subword Units}.
+\newblock In \emph{Proceedings of ACL}, pages 1715--1725.
+\bibitem[{To and Do(2020)}]{9287471}
+Huong~Duong To and Phuc Do. 2020.
+\newblock Extracting triples from vietnamese text to create knowledge graph.
+\newblock In \emph{Proceedings of KSE}, pages 219--223.
+\bibitem[{Tran et~al.(2016)Tran, Vu, Pham, Nguyen, and Nguyen}]{7800281}
+Viet~Hong Tran, Huyen~Thuong Vu, Thu~Hoai Pham, Vinh~Van Nguyen, and Minh~Le
+  Nguyen. 2016.
+\newblock {A reordering model for Vietnamese-English statistical machine
+  translation using dependency information}.
+\newblock In \emph{Proceedings of RIVF}, pages 125--130.
+\bibitem[{Truong et~al.(2017)Truong, Vo, and Nguyen}]{3155133.3155171}
+Diem Truong, Duc-Thuan Vo, and Uyen~Trang Nguyen. 2017.
+\newblock Vietnamese open information extraction.
+\newblock In \emph{Proceedings of SoICT}, page 135–142.
+\bibitem[{Vu et~al.(2018)Vu, Nguyen, Nguyen, Dras, and
+  Johnson}]{vu-etal-2018-vncorenlp}
+Thanh Vu, Dat~Quoc Nguyen, Dai~Quoc Nguyen, Mark Dras, and Mark Johnson. 2018.
+\newblock {VnCoreNLP: A Vietnamese Natural Language Processing Toolkit}.
+\newblock In \emph{Proceedings of NAACL: Demonstrations}, pages 56--60.
+\bibitem[{Wolf et~al.(2020)Wolf, Debut et~al.}]{wolf-etal-2020-transformers}
+Thomas Wolf, Lysandre Debut, et~al. 2020.
+\newblock {Transformers: State-of-the-Art Natural Language Processing}.
+\newblock In \emph{Proceedings of EMNLP 2020: System Demonstrations}, pages
+  38--45.
+\end{thebibliography}

references/2021.naacl.nguyen/source/naacl2021.sty ADDED Viewed

	@@ -0,0 +1,310 @@

+% This is the LaTex style file for *ACL.
+% The official sources can be found at
+%
+%     https://github.com/acl-org/ACLPUB/
+%
+% This package is activated by adding
+%
+%    \usepackage{acl}
+%
+% to your LaTeX file. When submitting your paper for review, add the "review" option:
+%
+%    \usepackage[review]{acl}
+\newif\ifacl@finalcopy
+\DeclareOption{final}{\acl@finalcopytrue}
+\DeclareOption{review}{\acl@finalcopyfalse}
+\ExecuteOptions{final} % final copy is the default
+% include hyperref, unless user specifies nohyperref option like this:
+% \usepackage[nohyperref]{acl}
+\newif\ifacl@hyperref
+\DeclareOption{hyperref}{\acl@hyperreftrue}
+\DeclareOption{nohyperref}{\acl@hyperreffalse}
+\ExecuteOptions{hyperref} % default is to use hyperref
+\ProcessOptions\relax
+\typeout{Conference Style for ACL 2020}
+\usepackage{xcolor}
+\ifacl@hyperref
+  \PassOptionsToPackage{breaklinks}{hyperref}
+  \RequirePackage{hyperref}
+  % make links dark blue
+  \definecolor{darkblue}{rgb}{0, 0, 0.5}
+  \hypersetup{colorlinks=true, citecolor=darkblue, linkcolor=darkblue, urlcolor=darkblue}
+\else
+  % This definition is used if the hyperref package is not loaded.
+  % It provides a backup, no-op definiton of \href.
+  % This is necessary because \href command is used in the acl_natbib.bst file.
+  \def\href#1#2{{#2}}
+  \usepackage{url}
+\fi
+\ifacl@finalcopy
+  % Hack to ignore these commands, which review mode puts into the .aux file.
+  \newcommand{\@LN@col}[1]{}
+  \newcommand{\@LN}[2]{}
+\else
+  % Add draft line numbering via the lineno package
+  % https://texblog.org/2012/02/08/adding-line-numbers-to-documents/
+  \usepackage[switch,mathlines]{lineno}
+  % Line numbers in gray Helvetica 8pt
+  \font\aclhv = phvb at 8pt
+  \renewcommand\linenumberfont{\aclhv\color{lightgray}}
+  % Zero-fill line numbers
+  % NUMBER with left flushed zeros  \fillzeros[<WIDTH>]<NUMBER>
+  \newcount\cv@tmpc@ \newcount\cv@tmpc
+  \def\fillzeros[#1]#2{\cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi
+    \cv@tmpc=1 %
+    \loop\ifnum\cv@tmpc@<10 \else \divide\cv@tmpc@ by 10 \advance\cv@tmpc by 1 \fi
+       \ifnum\cv@tmpc@=10\relax\cv@tmpc@=11\relax\fi \ifnum\cv@tmpc@>10 \repeat
+    \ifnum#2<0\advance\cv@tmpc1\relax-\fi
+    \loop\ifnum\cv@tmpc<#1\relax0\advance\cv@tmpc1\relax\fi \ifnum\cv@tmpc<#1 \repeat
+    \cv@tmpc@=#2\relax\ifnum\cv@tmpc@<0\cv@tmpc@=-\cv@tmpc@\fi \relax\the\cv@tmpc@}%
+  \renewcommand\thelinenumber{\fillzeros[3]{\arabic{linenumber}}}
+  \linenumbers
+  \setlength{\linenumbersep}{1.6cm}
+  % Bug: An equation with $$ ... $$ isn't numbered, nor is the previous line.
+  % Patch amsmath commands so that the previous line and the equation itself
+  % are numbered. Bug: multline has an extra line number.
+  % https://tex.stackexchange.com/questions/461186/how-to-use-lineno-with-amsmath-align
+  \usepackage{etoolbox} %% <- for \pretocmd, \apptocmd and \patchcmd
+  \newcommand*\linenomathpatch[1]{%
+    \expandafter\pretocmd\csname #1\endcsname {\linenomath}{}{}%
+    \expandafter\pretocmd\csname #1*\endcsname {\linenomath}{}{}%
+    \expandafter\apptocmd\csname end#1\endcsname {\endlinenomath}{}{}%
+    \expandafter\apptocmd\csname end#1*\endcsname {\endlinenomath}{}{}%
+  }
+  \newcommand*\linenomathpatchAMS[1]{%
+    \expandafter\pretocmd\csname #1\endcsname {\linenomathAMS}{}{}%
+    \expandafter\pretocmd\csname #1*\endcsname {\linenomathAMS}{}{}%
+    \expandafter\apptocmd\csname end#1\endcsname {\endlinenomath}{}{}%
+    \expandafter\apptocmd\csname end#1*\endcsname {\endlinenomath}{}{}%
+  }
+  %% Definition of \linenomathAMS depends on whether the mathlines option is provided
+  \expandafter\ifx\linenomath\linenomathWithnumbers
+    \let\linenomathAMS\linenomathWithnumbers
+    %% The following line gets rid of an extra line numbers at the bottom:
+    \patchcmd\linenomathAMS{\advance\postdisplaypenalty\linenopenalty}{}{}{}
+  \else
+    \let\linenomathAMS\linenomathNonumbers
+  \fi
+  \AtBeginDocument{%
+    \linenomathpatch{equation}%
+    \linenomathpatchAMS{gather}%
+    \linenomathpatchAMS{multline}%
+    \linenomathpatchAMS{align}%
+    \linenomathpatchAMS{alignat}%
+    \linenomathpatchAMS{flalign}%
+  }
+\fi
+\iffalse
+\PassOptionsToPackage{
+  a4paper,
+  top=2.21573cm,left=2.54cm,
+  textheight=24.7cm,textwidth=16.0cm,
+  headheight=0.17573cm,headsep=0cm
+}{geometry}
+\fi
+\PassOptionsToPackage{a4paper,margin=2.5cm}{geometry}
+\RequirePackage{geometry}
+\setlength\columnsep{0.6cm}
+\newlength\titlebox
+\setlength\titlebox{5cm}
+\flushbottom \twocolumn \sloppy
+% We're never going to need a table of contents, so just flush it to
+% save space --- suggested by drstrip@sandia-2
+\def\addcontentsline#1#2#3{}
+\ifacl@finalcopy
+    \thispagestyle{empty}
+    \pagestyle{empty}
+\else
+    \pagenumbering{arabic}
+\fi
+%% Title and Authors %%
+\newcommand{\Thanks}[1]{\thanks{\ #1}}
+\newcommand\outauthor{
+    \begin{tabular}[t]{c}
+    \ifacl@finalcopy
+	     \bf\@author
+	\else
+		% Avoiding common accidental de-anonymization issue. --MM
+        \bf Anonymous NAACL-HLT 2021 submission
+	\fi
+    \end{tabular}}
+% Mostly taken from deproc.
+\def\maketitle{\par
+ \begingroup
+   \def\thefootnote{\fnsymbol{footnote}}
+   \def\@makefnmark{\hbox to 0pt{$^{\@thefnmark}$\hss}}
+   \twocolumn[\@maketitle] \@thanks
+ \endgroup
+ \setcounter{footnote}{0}
+ \let\maketitle\relax \let\@maketitle\relax
+ \gdef\@thanks{}\gdef\@author{}\gdef\@title{}\let\thanks\relax}
+\def\@maketitle{\vbox to \titlebox{\hsize\textwidth
+ \linewidth\hsize \vskip 0.125in minus 0.125in \centering
+ {\Large\bf \@title \par} \vskip 0.2in plus 1fil minus 0.1in
+ {\def\and{\unskip\enspace{\rm and}\enspace}%
+  \def\And{\end{tabular}\hss \egroup \hskip 1in plus 2fil
+           \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}%
+  \def\AND{\end{tabular}\hss\egroup \hfil\hfil\egroup
+          \vskip 0.25in plus 1fil minus 0.125in
+           \hbox to \linewidth\bgroup\large \hfil\hfil
+             \hbox to 0pt\bgroup\hss \begin{tabular}[t]{c}\bf}
+  \hbox to \linewidth\bgroup\large \hfil\hfil
+    \hbox to 0pt\bgroup\hss
+	\outauthor
+   \hss\egroup
+    \hfil\hfil\egroup}
+  \vskip 0.3in plus 2fil minus 0.1in
+}}
+% margins and font size for abstract
+\renewenvironment{abstract}%
+		 {\centerline{\large\bf Abstract}%
+		  \begin{list}{}%
+		     {\setlength{\rightmargin}{0.6cm}%
+		      \setlength{\leftmargin}{0.6cm}}%
+		   \item[]\ignorespaces%
+		   \@setsize\normalsize{12pt}\xpt\@xpt
+		   }%
+		 {\unskip\end{list}}
+%\renewenvironment{abstract}{\centerline{\large\bf
+% Abstract}\vspace{0.5ex}\begin{quote}}{\par\end{quote}\vskip 1ex}
+% Resizing figure and table captions - SL
+% Support for interacting with the caption, subfigure, and subcaption packages - SL
+\RequirePackage{caption}
+\DeclareCaptionFont{10pt}{\fontsize{10pt}{12pt}\selectfont}
+\captionsetup{font=10pt}
+\RequirePackage{natbib}
+% for citation commands in the .tex, authors can use:
+% \citep, \citet, and \citeyearpar for compatibility with natbib, or
+% \cite, \newcite, and \shortcite for compatibility with older ACL .sty files
+\renewcommand\cite{\citep}	% to get "(Author Year)" with natbib
+\newcommand\shortcite{\citeyearpar}% to get "(Year)" with natbib
+\newcommand\newcite{\citet}	% to get "Author (Year)" with natbib
+% Bibliography
+% Don't put a label in the bibliography at all.  Just use the unlabeled format
+% instead.
+\def\thebibliography#1{\vskip\parskip%
+\vskip\baselineskip%
+\def\baselinestretch{1}%
+\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi%
+\vskip-\parskip%
+\vskip-\baselineskip%
+\section*{References\@mkboth
+ {References}{References}}\list
+ {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent}
+ \setlength{\itemindent}{-\parindent}}
+ \def\newblock{\hskip .11em plus .33em minus -.07em}
+ \sloppy\clubpenalty4000\widowpenalty4000
+ \sfcode`\.=1000\relax}
+\let\endthebibliography=\endlist
+% Allow for a bibliography of sources of attested examples
+\def\thesourcebibliography#1{\vskip\parskip%
+\vskip\baselineskip%
+\def\baselinestretch{1}%
+\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi%
+\vskip-\parskip%
+\vskip-\baselineskip%
+\section*{Sources of Attested Examples\@mkboth
+ {Sources of Attested Examples}{Sources of Attested Examples}}\list
+ {}{\setlength{\labelwidth}{0pt}\setlength{\leftmargin}{\parindent}
+ \setlength{\itemindent}{-\parindent}}
+ \def\newblock{\hskip .11em plus .33em minus -.07em}
+ \sloppy\clubpenalty4000\widowpenalty4000
+ \sfcode`\.=1000\relax}
+\let\endthesourcebibliography=\endlist
+% sections with less space
+\def\section{\@startsection {section}{1}{\z@}{-2.0ex plus
+    -0.5ex minus -.2ex}{1.5ex plus 0.3ex minus .2ex}{\large\bf\raggedright}}
+\def\subsection{\@startsection{subsection}{2}{\z@}{-1.8ex plus
+    -0.5ex minus -.2ex}{0.8ex plus .2ex}{\normalsize\bf\raggedright}}
+%% changed by KO to - values to get the initial parindent right
+\def\subsubsection{\@startsection{subsubsection}{3}{\z@}{-1.5ex plus
+   -0.5ex minus -.2ex}{0.5ex plus .2ex}{\normalsize\bf\raggedright}}
+\def\paragraph{\@startsection{paragraph}{4}{\z@}{1.5ex plus
+   0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+\def\subparagraph{\@startsection{subparagraph}{5}{\parindent}{1.5ex plus
+   0.5ex minus .2ex}{-1em}{\normalsize\bf}}
+% Footnotes
+\footnotesep 6.65pt %
+\skip\footins 9pt plus 4pt minus 2pt
+\def\footnoterule{\kern-3pt \hrule width 5pc \kern 2.6pt }
+\setcounter{footnote}{0}
+% Lists and paragraphs
+\parindent 1em
+\topsep 4pt plus 1pt minus 2pt
+\partopsep 1pt plus 0.5pt minus 0.5pt
+\itemsep 2pt plus 1pt minus 0.5pt
+\parsep 2pt plus 1pt minus 0.5pt
+\leftmargin 2em \leftmargini\leftmargin \leftmarginii 2em
+\leftmarginiii 1.5em \leftmarginiv 1.0em \leftmarginv .5em \leftmarginvi .5em
+\labelwidth\leftmargini\advance\labelwidth-\labelsep \labelsep 5pt
+\def\@listi{\leftmargin\leftmargini}
+\def\@listii{\leftmargin\leftmarginii
+   \labelwidth\leftmarginii\advance\labelwidth-\labelsep
+   \topsep 2pt plus 1pt minus 0.5pt
+   \parsep 1pt plus 0.5pt minus 0.5pt
+   \itemsep \parsep}
+\def\@listiii{\leftmargin\leftmarginiii
+    \labelwidth\leftmarginiii\advance\labelwidth-\labelsep
+    \topsep 1pt plus 0.5pt minus 0.5pt
+    \parsep \z@ \partopsep 0.5pt plus 0pt minus 0.5pt
+    \itemsep \topsep}
+\def\@listiv{\leftmargin\leftmarginiv
+     \labelwidth\leftmarginiv\advance\labelwidth-\labelsep}
+\def\@listv{\leftmargin\leftmarginv
+     \labelwidth\leftmarginv\advance\labelwidth-\labelsep}
+\def\@listvi{\leftmargin\leftmarginvi
+     \labelwidth\leftmarginvi\advance\labelwidth-\labelsep}
+\abovedisplayskip 7pt plus2pt minus5pt%
+\belowdisplayskip \abovedisplayskip
+\abovedisplayshortskip  0pt plus3pt%
+\belowdisplayshortskip  4pt plus3pt minus3pt%
+% Less leading in most fonts (due to the narrow columns)
+% The choices were between 1-pt and 1.5-pt leading
+\def\@normalsize{\@setsize\normalsize{11pt}\xpt\@xpt}
+\def\small{\@setsize\small{10pt}\ixpt\@ixpt}
+\def\footnotesize{\@setsize\footnotesize{10pt}\ixpt\@ixpt}
+\def\scriptsize{\@setsize\scriptsize{8pt}\viipt\@viipt}
+\def\tiny{\@setsize\tiny{7pt}\vipt\@vipt}
+\def\large{\@setsize\large{14pt}\xiipt\@xiipt}
+\def\Large{\@setsize\Large{16pt}\xivpt\@xivpt}
+\def\LARGE{\@setsize\LARGE{20pt}\xviipt\@xviipt}
+\def\huge{\@setsize\huge{23pt}\xxpt\@xxpt}
+\def\Huge{\@setsize\Huge{28pt}\xxvpt\@xxvpt}

references/2021.naacl.nguyen/source/naacl2021.tex ADDED Viewed

	@@ -0,0 +1,641 @@

+% This must be in the first 5 lines to tell arXiv to use pdfLaTeX, which is strongly recommended.
+\pdfoutput=1
+% In particular, the hyperref package requires pdfLaTeX in order to break URLs across lines.
+\documentclass[11pt]{article}
+% Remove the "review" option to generate the final version.
+\usepackage{naacl2021}
+% Standard package includes
+\usepackage{times}
+\usepackage{latexsym}
+%\renewcommand{\UrlFont}{\ttfamily\small}
+%\renewcommand{\UrlFont}{\ttfamily\small}
+\usepackage{amsmath}
+\usepackage{url}
+\usepackage{amssymb}
+\usepackage{amsfonts}
+\usepackage{graphicx}
+\usepackage{tabularx}
+\usepackage{multirow}
+\usepackage{arydshln}
+\usepackage{mathtools,nccmath}
+\usepackage{listings}
+\usepackage[T5]{fontenc}
+%\usepackage[utf8]{vietnam}
+\usepackage{enumitem}
+%\usepackage{ulem}
+\usepackage{todonotes}
+% \usepackage[usenames,dvipsnames]{color}
+\usepackage{cancel}
+\usepackage[draft]{minted}
+% This is not strictly necessary, and may be commented out,
+% but it will improve the layout of the manuscript,
+% and will typically save some space.
+\usepackage{microtype}
+\makeatletter
+\def\PYGdefault@reset{\let\PYGdefault@it=\relax \let\PYGdefault@bf=\relax%
+    \let\PYGdefault@ul=\relax \let\PYGdefault@tc=\relax%
+    \let\PYGdefault@bc=\relax \let\PYGdefault@ff=\relax}
+\def\PYGdefault@tok#1{\csname PYGdefault@tok@#1\endcsname}
+\def\PYGdefault@toks#1+{\ifx\relax#1\empty\else%
+    \PYGdefault@tok{#1}\expandafter\PYGdefault@toks\fi}
+\def\PYGdefault@do#1{\PYGdefault@bc{\PYGdefault@tc{\PYGdefault@ul{%
+    \PYGdefault@it{\PYGdefault@bf{\PYGdefault@ff{#1}}}}}}}
+\def\PYGdefault#1#2{\PYGdefault@reset\PYGdefault@toks#1+\relax+\PYGdefault@do{#2}}
+\expandafter\def\csname PYGdefault@tok@w\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.73,0.73}{##1}}}
+\expandafter\def\csname PYGdefault@tok@c\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@cp\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.74,0.48,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@k\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@kp\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@kt\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.69,0.00,0.25}{##1}}}
+\expandafter\def\csname PYGdefault@tok@o\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ow\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nb\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nf\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nc\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nn\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ne\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.82,0.25,0.23}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nv\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@no\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.53,0.00,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nl\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.63,0.63,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ni\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.60,0.60,0.60}{##1}}}
+\expandafter\def\csname PYGdefault@tok@na\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.49,0.56,0.16}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nt\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@nd\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@s\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sd\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@si\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}
+\expandafter\def\csname PYGdefault@tok@se\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.40,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sr\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ss\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sx\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@m\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gh\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gu\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.50,0.00,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gd\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.63,0.00,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gi\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.63,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gr\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ge\endcsname{\let\PYGdefault@it=\textit}
+\expandafter\def\csname PYGdefault@tok@gs\endcsname{\let\PYGdefault@bf=\textbf}
+\expandafter\def\csname PYGdefault@tok@gp\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@go\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
+\expandafter\def\csname PYGdefault@tok@gt\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}}
+\expandafter\def\csname PYGdefault@tok@err\endcsname{\def\PYGdefault@bc##1{\setlength{\fboxsep}{0pt}\fcolorbox[rgb]{1.00,0.00,0.00}{1,1,1}{\strut ##1}}}
+\expandafter\def\csname PYGdefault@tok@kc\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@kd\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@kn\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@kr\endcsname{\let\PYGdefault@bf=\textbf\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@bp\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@fm\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYGdefault@tok@vc\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@vg\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@vi\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@vm\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sa\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sb\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sc\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@dl\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@s2\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@sh\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@s1\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYGdefault@tok@mb\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@mf\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@mh\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@mi\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@il\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@mo\endcsname{\def\PYGdefault@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYGdefault@tok@ch\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@cm\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@cpf\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@c1\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYGdefault@tok@cs\endcsname{\let\PYGdefault@it=\textit\def\PYGdefault@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\def\PYGdefaultZbs{\char`\\}
+\def\PYGdefaultZus{\char`\_}
+\def\PYGdefaultZob{\char`\{}
+\def\PYGdefaultZcb{\char`\}}
+\def\PYGdefaultZca{\char`\^}
+\def\PYGdefaultZam{\char`\&}
+\def\PYGdefaultZlt{\char`\<}
+\def\PYGdefaultZgt{\char`\>}
+\def\PYGdefaultZsh{\char`\#}
+\def\PYGdefaultZpc{\char`\%}
+\def\PYGdefaultZdl{\char`\$}
+\def\PYGdefaultZhy{\char`\-}
+\def\PYGdefaultZsq{\char`\'}
+\def\PYGdefaultZdq{\char`\"}
+\def\PYGdefaultZti{\char`\~}
+% for compatibility with earlier versions
+\def\PYGdefaultZat{@}
+\def\PYGdefaultZlb{[}
+\def\PYGdefaultZrb{]}
+\makeatother
+\makeatletter
+\def\PYG@reset{\let\PYG@it=\relax \let\PYG@bf=\relax%
+    \let\PYG@ul=\relax \let\PYG@tc=\relax%
+    \let\PYG@bc=\relax \let\PYG@ff=\relax}
+\def\PYG@tok#1{\csname PYG@tok@#1\endcsname}
+\def\PYG@toks#1+{\ifx\relax#1\empty\else%
+    \PYG@tok{#1}\expandafter\PYG@toks\fi}
+\def\PYG@do#1{\PYG@bc{\PYG@tc{\PYG@ul{%
+    \PYG@it{\PYG@bf{\PYG@ff{#1}}}}}}}
+\def\PYG#1#2{\PYG@reset\PYG@toks#1+\relax+\PYG@do{#2}}
+\expandafter\def\csname PYG@tok@w\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.73,0.73}{##1}}}
+\expandafter\def\csname PYG@tok@c\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@cp\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.74,0.48,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@k\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@kp\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@kt\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.69,0.00,0.25}{##1}}}
+\expandafter\def\csname PYG@tok@o\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@ow\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@nb\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@nf\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@nc\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@nn\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@ne\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.82,0.25,0.23}{##1}}}
+\expandafter\def\csname PYG@tok@nv\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@no\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.00,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@nl\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.63,0.63,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@ni\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.60,0.60,0.60}{##1}}}
+\expandafter\def\csname PYG@tok@na\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.49,0.56,0.16}{##1}}}
+\expandafter\def\csname PYG@tok@nt\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@nd\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@s\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@sd\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@si\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}
+\expandafter\def\csname PYG@tok@se\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.73,0.40,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@sr\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}
+\expandafter\def\csname PYG@tok@ss\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@sx\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@m\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@gh\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@gu\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.50,0.00,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@gd\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.63,0.00,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@gi\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.63,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@gr\endcsname{\def\PYG@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@ge\endcsname{\let\PYG@it=\textit}
+\expandafter\def\csname PYG@tok@gs\endcsname{\let\PYG@bf=\textbf}
+\expandafter\def\csname PYG@tok@gp\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@go\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}
+\expandafter\def\csname PYG@tok@gt\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}}
+\expandafter\def\csname PYG@tok@err\endcsname{\def\PYG@bc##1{\setlength{\fboxsep}{0pt}\fcolorbox[rgb]{1.00,0.00,0.00}{1,1,1}{\strut ##1}}}
+\expandafter\def\csname PYG@tok@kc\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@kd\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@kn\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@kr\endcsname{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@bp\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\expandafter\def\csname PYG@tok@fm\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\expandafter\def\csname PYG@tok@vc\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@vg\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@vi\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@vm\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\expandafter\def\csname PYG@tok@sa\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@sb\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@sc\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@dl\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@s2\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@sh\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@s1\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\expandafter\def\csname PYG@tok@mb\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@mf\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@mh\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@mi\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@il\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@mo\endcsname{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\expandafter\def\csname PYG@tok@ch\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@cm\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@cpf\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@c1\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\expandafter\def\csname PYG@tok@cs\endcsname{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}
+\def\PYGZbs{\char`\\}
+\def\PYGZus{\char`\_}
+\def\PYGZob{\char`\{}
+\def\PYGZcb{\char`\}}
+\def\PYGZca{\char`\^}
+\def\PYGZam{\char`\&}
+\def\PYGZlt{\char`\<}
+\def\PYGZgt{\char`\>}
+\def\PYGZsh{\char`\#}
+\def\PYGZpc{\char`\%}
+\def\PYGZdl{\char`\$}
+\def\PYGZhy{\char`\-}
+\def\PYGZsq{\char`\'}
+\def\PYGZdq{\char`\"}
+\def\PYGZti{\char`\~}
+% for compatibility with earlier versions
+\def\PYGZat{@}
+\def\PYGZlb{[}
+\def\PYGZrb{]}
+\makeatother
+\setlength{\textfloatsep}{15pt plus 5.0pt minus 3.0pt}
+\setlength{\floatsep}{15pt plus 5.0pt minus 3.0pt}
+%\setlength{\dbltextfloatsep }{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\dblfloatsep}{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\intextsep}{15pt plus 2.0pt minus 3.0pt}
+\setlength{\abovecaptionskip}{5pt plus 1pt minus 1pt}
+% If the title and author information does not fit in the area allocated, uncomment the following
+%
+%\setlength\titlebox{<dim>}
+%
+% and set <dim> to something 5cm or larger.
+%\setlength\titlebox{5cm}
+%\setlength{\textfloatsep}{15pt plus 5.0pt minus 5.0pt}
+%\setlength{\floatsep}{15pt plus 5.0pt minus 5.0pt}
+%\setlength{\dbltextfloatsep }{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\dblfloatsep}{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\intextsep}{15pt plus 2.0pt minus 3.0pt}
+%\setlength{\abovecaptionskip}{5pt plus 1pt minus 1pt}
+% If the title and author information does not fit in the area allocated, uncomment the following
+%
+\setlength\titlebox{5cm}
+%
+% and set <dim> to something 5cm or larger.
+\title{PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing}
+% Author information can be set in various styles:
+% For several authors from the same institution:
+\author{Linh The Nguyen \and Dat Quoc Nguyen\\
+         VinAI Research, Hanoi, Vietnam \\
+         \tt{\normalsize \{v.linhnt140, v.datnq9\}@vinai.io}}
+% if the names do not fit well on one line use
+%         Author 1 \\ {\bf Author 2} \\ ... \\ {\bf Author n} \\
+% For authors from different institutions:
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \And  ... \And
+%         Author n \\ Address line \\ ... \\ Address line}
+% To start a seperate ``row'' of authors use \AND, as in
+% \author{Author 1 \\ Address line \\  ... \\ Address line
+%         \AND
+%         Author 2 \\ Address line \\ ... \\ Address line \And
+%         Author 3 \\ Address line \\ ... \\ Address line}
+%\author{ }
+\begin{document}
+\maketitle
+\begin{abstract}
+We present the first multi-task learning model---named PhoNLP---for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency
+parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art  results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT \cite{phobert} for each task independently.  We publicly release PhoNLP as an open-source toolkit under the Apache License 2.0.
+Although we specify PhoNLP for Vietnamese, our PhoNLP training and evaluation command scripts in fact can directly work for other languages that have a pre-trained BERT-based language model and gold annotated corpora available for the three tasks of POS tagging, NER and dependency parsing.
+We hope that PhoNLP can serve as a strong baseline and useful toolkit for future NLP research and applications to not only  Vietnamese but also the other languages. Our PhoNLP is available  at \url{https://github.com/VinAIResearch/PhoNLP}.
+\end{abstract}
+\vspace{-5pt}
+\begin{figure*}[!t]
+\centering
+\includegraphics[width=12.5cm]{JointModel.pdf}
+{\small
+\begin{tabular}{crllll}
+\hline
+\textbf{ID} & \textbf{Form} & \textbf{POS} & \textbf{NER} & \textbf{Head} & \textbf{DepRel} \\
+\hline
+1 & Đây\textsubscript{This} & PRON & O & 2 & sub  \\
+2 & là\textsubscript{is} & VERB& O & 0 & root \\
+3 & Hà\_Nội\textsubscript{Ha\_Noi} & NOUN & B-LOC & 2 & vmod \\
+\hline
+\end{tabular}
+}
+\caption{Illustration of our PhoNLP model.}
+\label{fig:architecture}
+\end{figure*}
+\section{Introduction}
+Vietnamese NLP research has been significantly explored recently. It has been boosted by the success of the national project on Vietnamese language and speech processing (VLSP) KC01.01/2006-2010 and VLSP workshops that have run shared tasks since 2013.\footnote{\url{https://vlsp.org.vn/}} Fundamental tasks of POS tagging, NER and dependency parsing thus play important roles, providing useful features for many downstream application tasks such as machine translation \cite{7800281}, sentiment analysis \cite{BANG20182016IIP0038},  relation extraction \cite{9287471}, semantic parsing \cite{vitext2sql}, open information extraction \cite{3155133.3155171} and question answering \cite{NguyenNP_SWJ,3184558.3191535}. % \cite{7800281,BANG20182016IIP0038,3155133.3155171,9287471}.
+Thus, there is a need to develop  NLP toolkits for linguistic annotations w.r.t. Vietnamese POS tagging, NER and dependency parsing.
+VnCoreNLP \cite{vu-etal-2018-vncorenlp} is the previous public toolkit employing traditional feature-based machine learning models to handle those Vietnamese NLP tasks. However, VnCoreNLP is now no longer considered state-of-the-art  because its performance results are significantly outperformed by ones obtained when fine-tuning PhoBERT---the current state-of-the-art monolingual pre-trained language model for Vietnamese \cite{phobert}. Note that there are no publicly available fine-tuned BERT-based models for the three Vietnamese tasks. Assuming that there would be, a potential drawback might be that an NLP package wrapping such fine-tuned BERT-based models would take a large storage space, i.e. three times larger than the storage space used by  a BERT model \cite{devlin-etal-2019-bert}, thus  it would not be suitable for practical applications that require a smaller storage space. Jointly multi-task learning is a promising solution as it might help reduce the storage space. In addition, POS tagging, NER and dependency parsing are related tasks: POS tags are essential input features used for dependency parsing and POS tags are also used as additional features for NER.   Jointly multi-task learning thus might also help improve the performance results against the single-task learning \cite{Ruder2019Neural}.
+In this paper, we present a new multi-task learning model---named PhoNLP---for joint POS tagging, NER and dependency parsing. In particular, given an input  sentence of words to PhoNLP, an encoding layer generates contextualized word embeddings that represent the input words. These contextualized word embeddings are fed into a POS tagging layer that is in fact a linear prediction layer \cite{devlin-etal-2019-bert} to predict POS tags for the corresponding input words. Each predicted POS tag is then represented by two ``soft'' embeddings that are later fed into NER and dependency parsing layers separately.
+More specifically, based on both the contextualized word embeddings and the ``soft'' POS tag embeddings, the NER layer uses a linear-chain CRF predictor \cite{Lafferty:2001} to predict NER labels for the  input words, while the dependency parsing layer uses a Biaffine classifier \cite{DozatM17} to predict dependency arcs between the words and another Biaffine classifier to label the predicted arcs.
+%To the best of our knowledge, our PhoNLP is the first proposed model to jointly learn POS tagging, NER and dependency parsing for Vietnamese. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art  results.
+Our contributions are summarized as follows:
+%\vspace{-2pt}
+\begin{itemize}[leftmargin=*]
+\setlength\itemsep{-1pt}
+    \item To the best of our knowledge, PhoNLP is the first proposed model to jointly learn POS tagging, NER and dependency parsing for Vietnamese.
+    \item We discuss a data leakage issue in the Vietnamese benchmark datasets, that has not yet  been pointed out before. Experiments show that PhoNLP obtains state-of-the-art performance results, outperforming the PhoBERT-based single task learning.
+    \item We publicly release PhoNLP as an open-source toolkit that is simple to setup and efficiently run from both the  command-line and Python API. We hope that PhoNLP can serve as a strong baseline and useful toolkit for future NLP research and downstream applications.
+\end{itemize}
+\section{Model description}
+Figure \ref{fig:architecture} illustrates our PhoNLP architecture that can be viewed as a mixture of a BERT-based encoding layer and three decoding layers of POS tagging, NER and dependency parsing.
+\subsection{Encoder \& Contextualized embeddings}
+Given an input sentence consisting of $n$ word tokens $w_1, w_2, ..., w_n$, the encoding layer employs PhoBERT to generate contextualized latent feature embeddings $\mathbf{e}_{i}$ each representing the $i^{th}$ word $w_i$:
+\begin{equation}
+\mathbf{e}_{i} = \mathrm{PhoBERT\textsubscript{base}}\big({w}_{1:n}, i\big)
+\end{equation}
+In particular, the encoding layer employs the \textbf{PhoBERT\textsubscript{base}} version. Because PhoBERT uses BPE \cite{sennrich-etal-2016-neural} to segment the input sentence with subword units, the encoding layer in fact represents the $i^{th}$ word $w_i$ by using the contextualized embedding of its first subword.
+\subsection{POS tagging}\label{ssec:pos}
+Following a common manner when fine-tuning a pre-trained language model for a sequence labeling task \cite{devlin-etal-2019-bert}, the POS tagging layer is a linear prediction layer that is appended on top of the encoder. In particular, the POS tagging layer feeds the contextualized word embeddings $\mathbf{e}_{i}$ into a   feed-forward  network (FFNN\textsubscript{POS}) followed by a $\mathsf{softmax}$ predictor for POS tag prediction:
+\begin{equation}
+\mathbf{p}_{i} = \mathsf{softmax}\big(\mathrm{FFNN\textsubscript{POS}}\big(\mathbf{e}_{i}\big)\big) \label{eq2}
+\end{equation}
+\noindent where the output layer size of FFNN\textsubscript{POS}   is the number of POS tags. Based on probability vectors $\mathbf{p}_{i}$, a cross-entropy objective loss \textbf{$\mathcal{L}_{\text{POS}}$} is calculated for POS tagging during training.
+\subsection{NER}\label{ssec:ner}
+The NER layer creates a sequence of vectors $\mathbf{v}_{1:n}$ in which each $\mathbf{v}_{i}$ is resulted in  by concatenating the contextualized word embedding $\mathbf{e}_{i}$ and a ``soft'' POS tag embedding $\mathbf{t}_{i}^{(1)}$:
+\begin{equation}
+\mathbf{v}_{i} = \mathbf{e}_{i}  \circ   \mathbf{t}_{i}^{(1)}
+\label{equa:ner}
+\end{equation}
+\noindent where following \newcite{hashimoto-etal-2017-joint}, the ``soft'' POS tag embedding $\mathbf{t}_{i}^{(1)}$ is computed by multiplying a label weight matrix $\mathbf{W}^{(1)}$ with the corresponding probability vector $\mathbf{p}_{i}$:
+\begin{equation*}
+ \mathbf{t}_{i}^{(1)} = \mathbf{W}^{(1)}\mathbf{p}_{i}
+\end{equation*}
+The NER layer then passes each vector   $\mathbf{v}_{i}$ into a  FFNN (FFNN\textsubscript{NER}):
+\begin{equation}
+ \mathbf{h}_{i} = \mathrm{FFNN\textsubscript{NER}}\big(\mathbf{v}_{i}\big) \label{eq4}
+\end{equation}
+\noindent where  the output layer size of FFNN\textsubscript{NER} is the number of BIO-based NER labels.
+The NER layer  feeds the output vectors $\mathbf{h}_{i}$ into a linear-chain
+CRF predictor for NER label prediction \cite{Lafferty:2001}. A cross-entropy loss \textbf{$\mathcal{L}_{\text{NER}}$} is calculated for NER during
+training while the Viterbi algorithm is used for inference.
+\subsection{Dependency parsing}
+The dependency parsing layer creates vectors $\mathbf{z}_{1:n}$ in which each $\mathbf{z}_{i}$ is resulted in by concatenating $\mathbf{e}_{i}$ and another ``soft'' POS tag embedding $\mathbf{t}_{i}^{(2)}$:
+\begin{eqnarray}
+\mathbf{z}_{i} &=& \mathbf{e}_{i}  \circ   \mathbf{t}_{i}^{(2)} \label{equa:posdep}  \\
+\mathbf{t}_{i}^{(2)} &=& \mathbf{W}^{(2)}\mathbf{p}_{i} \nonumber
+\end{eqnarray}
+Following \newcite{DozatM17}, the dependency parsing layer uses FFNNs to split $\mathbf{z}_{i}$  into \emph{head} and \emph{dependent} representations:
+ \begin{eqnarray}
+\mathbf{h}_{i}^{(\textsc{a-h})}  &=& \mathrm{FFNN}_{\text{Arc-Head}}\big(\mathbf{z}_{i}\big)  \label{equa:fc6}  \\
+\mathbf{h}_{i}^{(\textsc{a-d})}  &=& \mathrm{FFNN}_{\text{Arc-Dep}}\big(\mathbf{z}_{i}\big)  \\
+\mathbf{h}_{i}^{(\textsc{l-h})}  &=& \mathrm{FFNN}_{\text{Label-Head}}\big(\mathbf{z}_{i}\big)  \\
+\mathbf{h}_{i}^{(\textsc{l-d})}  &=& \mathrm{FFNN}_{\text{Label-Dep}}\big(\mathbf{z}_{i}\big)  \label{equa:fc9}
+\end{eqnarray}
+To predict potential dependency arcs, based on input vectors $\mathbf{h}_{i}^{(\textsc{a-h})}$ and $\mathbf{h}_{j}^{(\textsc{a-d})}$, the parsing layer uses  a Biaffine classifier's variant \cite{qi-etal-2018-universal} that additionally takes into account the distance and relative ordering between two words to produce a probability distribution of
+arc heads for each word. %\footnote{We utilize an implementation of the Biaffine classifier's variant \cite{qi-etal-2018-universal} from \newcite{qi-etal-2020-stanza}.}
+For inference, the Chu–Liu/Edmonds' algorithm is used to find a maximum spanning tree \cite{chuliu,Edmonds}.
+The parsing layer also uses another Biaffine classifier to label the predicted arcs,  based on input vectors $\mathbf{h}_{i}^{(\textsc{l-h})}$ and $\mathbf{h}_{j}^{(\textsc{l-d})}$. An objective loss \textbf{$\mathcal{L}_{\text{DEP}}$} is computed by summing a cross entropy loss for unlabeled dependency parsing and another cross entropy loss  for dependency label prediction during training based on gold arcs and arc labels.
+%\begin{eqnarray}
+%  {s}_{i,j}  &=&\mathrm{Biaff\textsuperscript{(A)}}\Big(\mathbf{h}_{i}^{(\textsc{a-h})}, \mathbf{h}_{j}^{(\textsc{a-d})}\Big) \\
+%\mathrm{Biaff\textsuperscript{(A)}}\big(\mathbf{x}, \mathbf{y}\big) &=&  \mathbf{x}^{\mathsf{T}} \mathbf{U}_1 \mathbf{y}   +   \mathbf{w}_1^{\mathsf{T}}(\mathbf{x} \circ  \mathbf{y}) + {b}_1  \nonumber
+%\end{eqnarray}
+%
+%\noindent where $\mathbf{U}_1$, $\mathbf{w}_1$ and $b_1$ are a $k\times 1 \times k$ tensor, a $2k$-dimensional vector and a bias scalar, respectively
+%(here, $k$ is the size of the {head} and {dependent} representations).
+%
+%\begin{eqnarray}
+%  \mathbf{s}_{i,j}  &=&\mathrm{Biaff\textsuperscript{(L)}}\Big(\mathbf{h}_{i}^{(\textsc{l-h})}, \mathbf{h}_{j}^{(\textsc{l-d})}\Big) \\
+%\mathrm{Biaff\textsuperscript{(L)}}\big(\mathbf{x}, \mathbf{y}\big) &=&  \mathbf{x}^{\mathsf{T}} \mathbf{U}_2 \mathbf{y}   +   \mathbf{W}_2(\mathbf{x} \circ  \mathbf{y}) +  \mathbf{b}_2  \nonumber
+%\end{eqnarray}
+%
+%\noindent where $\mathbf{U}_2$, $\mathbf{W}_2$ and $\mathbf{b}_2$ are a $k\times l \times k$ tensor, a $l \times 2k$ matrix and a bias vector, respectively (here, $l$ is the number of dependency labels).
+\subsection{Joint multi-task learning}
+The final training objective loss \textbf{$\mathcal{L}$} of our model PhoNLP is the weighted sum of the POS tagging loss {$\mathcal{L}_{\text{POS}}$}, the NER loss {$\mathcal{L}_{\text{NER}}$} and the dependency parsing loss {$\mathcal{L}_{\text{DEP}}$}:
+\begin{equation}
+\textbf{$\mathcal{L}$} = \lambda_1\mathcal{L}_{\text{POS}} + \lambda_2\mathcal{L}_{\text{NER}} + (1 - \lambda_1 - \lambda_2)\mathcal{L}_{\text{DEP}}
+\end{equation}
+\paragraph{Discussion:} Our PhoNLP can be viewed as an extension of previous joint POS tagging and dependency parsing models \cite{hashimoto-etal-2017-joint,li-etal-2018-joint-learning,nguyen-verspoor-2018-improved,NguyenALTA2019,kondratyuk-straka-2019-75}, where we additionally incorporate a CRF-based prediction layer for NER. Unlike \newcite{hashimoto-etal-2017-joint}, \newcite{nguyen-verspoor-2018-improved}, \newcite{li-etal-2018-joint-learning} and \newcite{NguyenALTA2019} that use BiLSTM-based encoders to extract contextualized feature embeddings, we use a BERT-based encoder. \newcite{kondratyuk-straka-2019-75} also employ a BERT-based encoder. However, different from PhoNLP where we construct a hierarchical architecture over the POS tagging and dependency parsing layers, \newcite{kondratyuk-straka-2019-75} do not make use of POS tag embeddings for dependency parsing.\footnote{In our preliminary experiments, not feeding the POS tag embeddings into the dependency parsing layer decreases the performance.}
+\section{Experiments}
+\subsection{Setup}
+\subsubsection{Datasets}
+To conduct experiments, we use the  benchmark datasets of the VLSP 2013 POS tagging   dataset,\footnote{\url{https://vlsp.org.vn/vlsp2013/eval}} the VLSP 2016 NER  dataset \cite{JCC13161} and  the VnDT dependency treebank v1.1 \newcite{Nguyen2014NLDB}, following the setup used by the VnCoreNLP toolkit  \cite{vu-etal-2018-vncorenlp}. Here, VnDT is converted from the Vietnamese constituent treebank \cite{nguyen-etal-2009-building}.
+\paragraph{Data leakage issue:} We further discover an issue of data leakage, that has not yet been pointed out before. That is, all sentences from the VLSP 2016 NER dataset and the VnDT treebank are included in the VLSP 2013 POS tagging dataset. In particular, 90+\% of sentences from both validation and test sets for NER and dependency parsing are included in the POS tagging training set, resulting in an unrealistic evaluation scenario where the POS tags are used as input features for NER and dependency parsing.
+To handle the data leakage issue, we have to re-split the VLSP 2013 POS tagging dataset to avoid the data leakage issue: The POS tagging validation/test set now only contains sentences that appear in the union of the NER and dependency parsing validation/test sets (i.e. the validation/test sentences for NER and dependency parsing only appear in the POS tagging validation/test set).
+In addition, there are 594 duplicated sentences in the VLSP 2013 POS tagging dataset (here, sentence duplication is not found in the union of the NER and dependency parsing sentences). Thus we have to  perform duplication removal on the POS tagging dataset.
+Table \ref{tab:Datasets} details the statistics of the experimental datasets.
+\begin{table}[!t]
+\centering
+\resizebox{7.5cm}{!}{
+\begin{tabular}{l|l|l|l}
+\hline
+\textbf{Task} & \textbf{\#train} & \textbf{\#valid} & \textbf{\#test} \\
+\hline
+{POS tagging (leakage)} & {27000} & {870} & {2120} \\
+\hdashline
+POS tagging (re-split) & 23906 & 2009 & 3481\\
+\hline
+NER  & 14861 & 2000 & 2831 \\
+\hline
+Dependency parsing & 8977 & 200 & 1020 \\
+\hline
+\end{tabular}
+}
+\caption{Dataset statistics. \textbf{\#train}, \textbf{\#valid} and \textbf{\#test} denote the numbers of training, validation and test sentences, respectively. Here,
+``{POS tagging (leakage)}'' and ``POS tagging (re-split)'' refer to the statistics for  POS tagging before and after re-splitting \& sentence duplication removal, respectively.}
+\label{tab:Datasets}
+\end{table}
+\subsubsection{Implementation}
+PhoNLP is implemented based on PyTorch \cite{NEURIPS2019_9015}, employing the PhoBERT encoder implementation available from the $\mathrm{transformers}$ library \cite{wolf-etal-2020-transformers} and the Biaffine classifier implementation from \newcite{qi-etal-2020-stanza}. We set both the label weight matrices $\mathbf{W}^{(1)}$ and $\mathbf{W}^{(2)}$ to have 100 rows, resulting in 100-dimensional soft POS tag embeddings. In addition, following \newcite{qi-etal-2018-universal,qi-etal-2020-stanza}, FFNNs in equations \ref{equa:fc6}--\ref{equa:fc9} use 400-dimensional output layers.
+We use the AdamW optimizer \cite{loshchilov2018decoupled} and a fixed batch size at 32, and train for 40 epochs. The sizes of training sets are different, in which the POS tagging
+training set is the largest, consisting of 23906 sentences. Thus for each training epoch, we repeatedly sample from the NER and dependency parsing training sets to fill the gaps between the training set sizes. We perform a grid search to select the initial AdamW learning rate, $\lambda_1$ and $\lambda_2$. We find the optimal initial AdamW learning rate, $\lambda_1$ and $\lambda_2$ at 1e-5, 0.4 and 0.2, respectively. Here, we compute the average of the POS tagging accuracy, NER F\textsubscript{1}-score and   dependency parsing score LAS after each training epoch on the validation sets. We select the model checkpoint that produces the highest average score over the validation sets to apply to the test sets.  Each of our reported scores is an average over 5 runs with different random seeds.
+%\subsubsection{Baseline single-task training}
+%We  also conduct experiments for a single-task training strategy. We follow a common approach to fine-tune a pre-trained language model for POS tagging, appending a linear prediction layer on top of PhoBERT,  as briefly described in Section \ref{ssec:pos}.  For NER,  instead of a linear prediction layer, we append a CRF prediction layer on top of PhoBERT. For dependency parsing, predicted POS tags are produced by the learned single-task POS tagging model; then POS tags are represented by embeddings that are concatenated with the corresponding contextualized word embeddings, resulting in a sequence of input vectors for the Biaffine-based classifiers \cite{qi-etal-2018-universal}.
+\subsection{Results}
+%\subsubsection*{Main results}
+Table \ref{tab:results} presents results obtained for our PhoNLP and compares them with those of a baseline approach of single-task training. For the  single-task training approach: (i) We follow a common approach to fine-tune a pre-trained language model for POS tagging, appending a linear prediction layer on top of PhoBERT,  as briefly described in Section \ref{ssec:pos}. (ii) For NER,  instead of a linear prediction layer, we append a CRF prediction layer on top of PhoBERT. (iii) For dependency parsing, predicted POS tags are produced by the learned single-task POS tagging model; then POS tags are represented by embeddings that are concatenated with the corresponding PhoBERT-based contextualized word embeddings, resulting in a sequence of input vectors for the Biaffine-based classifiers for dependency parsing \cite{qi-etal-2018-universal}. Here, the  single-task training approach is based on the PhoBERT\textsubscript{base} version,  employing the same hyper-parameter tuning and model selection strategy that we use for PhoNLP.
+\begin{table}[!t]
+\centering
+\def\arraystretch{1.2}
+\resizebox{7.5cm}{!}{
+\begin{tabular}{ll|l|l|l|l}
+\hline
+& \textbf{Model} & \textbf{POS} & \textbf{NER} & \textbf{LAS} & \textbf{UAS} \\
+\hline
+\multirow{2}{*}{\rotatebox[origin=c]{90}{{Leak.}}}& Single-task & 96.7$^\dagger$ & 93.69 & 78.77$^\dagger$ & 85.22$^\dagger$ \\
+\cdashline{2-6}
+& PhoNLP &  \textbf{96.76} & \textbf{94.41} & \textbf{79.11} & \textbf{85.47}\\
+\hline
+\hline
+\multirow{2}{*}{\rotatebox[origin=c]{90}{{Re-spl}}}& Single-task & 93.68 & 93.69 & 77.89 & 84.78 \\
+\cdashline{2-6}
+& PhoNLP & \textbf{93.88}  & \textbf{94.51}  & \textbf{78.17}  & \textbf{84.95} \\
+%& PhoNLP & \textbf{93.88}$^{*}$ & \textbf{94.51}$^{**}$ & \textbf{78.17}$^{*}$ & \textbf{84.95} \\
+\hline
+\end{tabular}
+}
+\caption{Performance results (in \%) on the test sets  for POS tagging (i.e. accuracy), NER (i.e. F\textsubscript{1}-score) and dependency parsing (i.e. LAS and UAS scores). ``Leak.'' abbreviates ``leakage'', denoting the results obtained w.r.t. the data leakage issue. ``Re-spl'' denotes the results obtained w.r.t.  the data re-split and duplication removal for POS tagging to avoid the data leakage issue. ``Single-task'' refers to as the single-task training approach.
+$\dagger$ denotes scores taken from the PhoBERT paper \cite{phobert}. Note that ``Single-task'' NER is not affected by the data leakage issue.
+%Here, $^{*}$ and $^{**}$ denote  the statistically significant differences between ``Single-task'' and PhoNLP at p $\leq$ 0.05 and p $\leq$ 0.01, respectively.
+}
+\label{tab:results}
+\end{table}
+Note that PhoBERT helps produce state-of-the-art results for multiple Vietnamese NLP tasks (including but not limited to POS tagging, NER and dependency parsing in a single-task training strategy), and obtains  higher performance results than VnCoreNLP.
+However, in both the PhoBERT and VnCoreNLP papers \cite{phobert,vu-etal-2018-vncorenlp}, results for POS tagging and dependency parsing are reported w.r.t. the data leakage issue. Our ``Single-task'' results in Table \ref{tab:results} regarding ``Re-spl'' (i.e. the data re-split and duplication removal for POS tagging to avoid the data leakage issue) can be viewed as new PhoBERT results for a proper experimental setup. Table \ref{tab:results}  shows that in both setups ``Leak.'' and ``Re-spl'',  our joint multi-task training approach PhoNLP performs better than the PhoBERT-based single-task training approach, thus resulting in state-of-the-art performances for the three tasks of Vietnamese POS tagging, NER and dependency parsing.
+\section{PhoNLP toolkit}
+We present in this section a basic usage of our PhoNLP toolkit.
+We make PhoNLP simple to setup, i.e.  users can install PhoNLP from either source or $\mathsf{pip}$ (e.g. $\mathsf{pip3\ install\ phonlp}$). We also aim to make PhoNLP simple to run from both the command-line and the Python API. For example, annotating a corpus with POS tagging, NER and dependency parsing can be performed by using a simple command as in Figure \ref{fig:command}.
+Assume that the input file ``{\ttfamily input.txt}'' in Figure \ref{fig:command} contains a sentence ``Tôi đang làm\_việc tại VinAI .'' (I\textsubscript{Tôi} am\textsubscript{đang} working\textsubscript{làm\_việc} at\textsubscript{tại} VinAI). Table \ref{tab:format} shows the annotated output  in plain text form for this
+sentence. Similarly, we also get the same output by using the Python API as simple as in Figure \ref{fig:code}.
+Furthermore,  commands to (re-)train  and evaluate PhoNLP using gold annotated corpora are detailed in the PhoNLP GitHub repository. Note that it is absolutely possible to directly employ our PhoNLP (re-)training and evaluation command scripts for other languages that have gold annotated corpora available for the three tasks and a pre-trained BERT-based language model available from the $\mathrm{transformers}$ library.
+%\setcounter{figure}{1}
+\begin{figure}[!t]
+%{\footnotesize\ttfamily python3 phonlp.py {-}{-}save\_dir model\_folder\_path {-}{-}mode annotate {-}{-}input\_file path\_to\_input\_file {-}{-}output\_file path\_to\_output\_file}
+{\ttfamily python3 run\_phonlp.py {-}{-}save\_dir ./pretrained\_phonlp {-}{-}mode \\ annotate {-}{-}input\_file input.txt {-}{-}output\_file output.txt}
+\caption{Minimal command to run PhoNLP. Here ``save\_dir'' denote the path to the local machine folder that stores the pre-trained PhoNLP model.}
+\label{fig:command}
+\end{figure}
+\begin{table}[!t]
+\centering
+%\resizebox{7.5cm}{!}{
+\begin{tabular}{llllll}
+1 & Tôi & P & O & 3 & sub \\
+2 & đang & R & O & 3 & adv \\
+3 & làm\_việc & V & O & 0 & root \\
+4 & tại & E & O & 3 & loc \\
+5 & VinAI & Np & B-ORG & 4 & pob \\
+6 & . & CH & O & 3 & punct \\
+\end{tabular}
+%}
+\caption{The output in the output file ``{\ttfamily output.txt}'' for the sentence ``Tôi đang làm\_việc tại VinAI .'' from the input file ``{\ttfamily input.txt}'' in Figure \ref{fig:command}. The output is formatted with 6 columns representing word index, word form, POS tag, NER label, head index of the current word and its dependency relation type.}
+\label{tab:format}
+\end{table}
+\paragraph{Speed test:} We perform a sole CPU-based speed test using a personal computer with Intel Core i5 8265U 1.6GHz \& 8GB of memory. For a GPU-based speed test, we employ a machine with a single NVIDIA RTX 2080Ti GPU. For performing the three NLP tasks jointly, PhoNLP obtains a  speed at {15 sentences per second} for the CPU-based test and {129 sentences per second} for the GPU-based test, respectively, with an average of 23 word tokens per sentence and a batch size of 8.
+%\setcounter{figure}{2}
+\begin{figure*}[!t]
+%\begin{minted}{python}
+%import phonlp
+%# Automatically download the pre-trained PhoNLP model
+%# and save it in a local machine folder
+%phonlp.download(save_dir='./pretrained_phonlp')
+%# Load the pre-trained PhoNLP model
+%model = phonlp.load(save_dir='./pretrained_phonlp')
+%# Annotate a corpus
+%model.annotate(input_file='input.txt', output_file='output.txt')
+%# Annotate a sentence
+%model.print_out(model.annotate(text="Tôi đang làm_việc tại VinAI ."))
+%\end{minted}
+\begin{Verbatim}[commandchars=\\\{\}]
+\PYG{k+kn}{import} \PYG{n+nn}{phonlp}
+\PYG{c+c1}{\PYGZsh{} Automatically download the pre\PYGZhy{}trained PhoNLP model}
+\PYG{c+c1}{\PYGZsh{} and save it in a local machine folder}
+\PYG{n}{phonlp}\PYG{o}{.}\PYG{n}{download}\PYG{p}{(}\PYG{n}{save\PYGZus{}dir}\PYG{o}{=}\PYG{l+s+s1}{\PYGZsq{}./pretrained\PYGZus{}phonlp\PYGZsq{}}\PYG{p}{)}
+\PYG{c+c1}{\PYGZsh{} Load the pre\PYGZhy{}trained PhoNLP model}
+\PYG{n}{model} \PYG{o}{=} \PYG{n}{phonlp}\PYG{o}{.}\PYG{n}{load}\PYG{p}{(}\PYG{n}{save\PYGZus{}dir}\PYG{o}{=}\PYG{l+s+s1}{\PYGZsq{}./pretrained\PYGZus{}phonlp\PYGZsq{}}\PYG{p}{)}
+\PYG{c+c1}{\PYGZsh{} Annotate a corpus}
+\PYG{n}{model}\PYG{o}{.}\PYG{n}{annotate}\PYG{p}{(}\PYG{n}{input\PYGZus{}file}\PYG{o}{=}\PYG{l+s+s1}{\PYGZsq{}input.txt\PYGZsq{}}\PYG{p}{,} \PYG{n}{output\PYGZus{}file}\PYG{o}{=}\PYG{l+s+s1}{\PYGZsq{}output.txt\PYGZsq{}}\PYG{p}{)}
+\PYG{c+c1}{\PYGZsh{} Annotate a sentence}
+\PYG{n}{model}\PYG{o}{.}\PYG{n}{print\PYGZus{}out}\PYG{p}{(}\PYG{n}{model}\PYG{o}{.}\PYG{n}{annotate}\PYG{p}{(}\PYG{n}{text}\PYG{o}{=}\PYG{l+s+s2}{\PYGZdq{}Tôi đang làm\PYGZus{}việc tại VinAI .\PYGZdq{}}\PYG{p}{))}
+\end{Verbatim}
+%\vspace{-5pt}
+\caption{A simple and complete example code for using PhoNLP in Python.}
+\label{fig:code}
+\end{figure*}
+\section{Conclusion and future work}
+We have presented the first multi-task learning model PhoNLP for joint  POS tagging, NER and dependency parsing in Vietnamese. Experiments on Vietnamese benchmark datasets show that PhoNLP outperforms its strong fine-tuned PhoBERT-based single-task training baseline, producing state-of-the-art performance results. We publicly release PhoNLP as an easy-to-use open-source toolkit and hope that PhoNLP can facilitate future NLP research and applications. %  such as in question answering and dialogue  systems.
+In future work, we will also apply  PhoNLP  to other languages.
+%Although we specify PhoNLP for Vietnamese, the PhoNLP (re-)training and evaluation command scripts in fact can directly work for other languages that have gold annotated corpora available for the three tasks of POS tagging, NER and dependency parsing, and a pre-trained BERT-based language model available from the $\mathrm{transformers}$ library. In future work, we will apply our PhoNLP toolkit to those languages.
+\bibliography{refs}
+\bibliographystyle{acl_natbib}
+\end{document}

references/2021.naacl.nguyen/source/refs.bib ADDED Viewed

	@@ -0,0 +1,625 @@

+@inproceedings{3184558.3191535,
+author = {Le-Hong, Phuong and Bui, Duc-Thien},
+title = {{A Factoid Question Answering System for Vietnamese}},
+year = {2018},
+booktitle = {Companion Proceedings of the The Web Conference 2018},
+pages = {1049–1055},
+}
+@InProceedings{NguyenNMP11,
+    title = {{Automatic Ontology Construction from Vietnamese text}},
+    author = {Dai Quoc Nguyen and Dat Quoc Nguyen and Khoi Trong Ma and Son Bao Pham},
+    booktitle = {Proceedings of NLPKE},
+    year = {2011},
+    pages = {485--488}
+}
+@inproceedings{vitext2sql,
+    title       = {{A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese}},
+    author      = {Anh Tuan Nguyen and Mai Hoang Dao and Dat Quoc Nguyen},
+    booktitle   = {Findings of EMNLP 2020},
+    year        = {2020},
+    pages       = {4079--4085}
+}
+@article{NguyenNP_SWJ,
+    title = {{Ripple Down Rules for Question Answering}},
+    author = {Nguyen, Dat Quoc and Nguyen, Dai Quoc and Pham, Son Bao},
+    journal = {Semantic Web},
+    volume = {8},
+    number = {4},
+    pages = {511--532},
+    year = {2017}
+}
+@inproceedings{3155133.3155171,
+author = {Truong, Diem and Vo, Duc-Thuan and Nguyen, Uyen Trang},
+title = {Vietnamese Open Information Extraction},
+year = {2017},
+booktitle = {Proceedings of SoICT},
+pages = {135–142}
+}
+@PhdThesis{Ruder2019Neural,
+  title={Neural Transfer Learning for Natural Language Processing},
+  author={Ruder, Sebastian},
+  year={2019},
+  school={National University of Ireland, Galway}
+}
+@INPROCEEDINGS{7800281,
+  author={Viet Hong Tran and Huyen Thuong Vu and Thu Hoai Pham and Vinh Van Nguyen and Minh Le Nguyen},
+  booktitle={Proceedings of RIVF},
+  title={{A reordering model for Vietnamese-English statistical machine translation using dependency information}},
+  year={2016},
+  pages={125-130}
+  }
+@article{BANG20182016IIP0038,
+  title={Sentiment Classification for Hotel Booking Review Based on Sentence Dependency Structure and Sub-Opinion Analysis},
+  author={Tran Sy Bang and Virach Sornlertlamvanich},
+  journal={IEICE Transactions on Information and Systems},
+  volume={E101.D},
+  number={4},
+  pages={909-916},
+  year={2018}
+}
+@INPROCEEDINGS{9287471,
+  author={Huong Duong To and Phuc Do},
+  booktitle={Proceedings of KSE},
+  title={Extracting triples from Vietnamese text to create knowledge graph},
+  year={2020},
+  pages={219-223}}
+@article{chuliu,
+  title={{On the Shortest Arborescence of a Directed Graph}},
+  author={Yoeng-Jin Chu and Tseng-Hong Liu},
+  journal={Science Sinica},
+  volume={14},
+  year      = {1965},
+  pages={1396--1400}
+}
+@inproceedings{vielectra,
+    title = {{Improving Sequence Tagging for Vietnamese Text Using Transformer-based Neural Models}},
+    author = "Viet Bui The and Oanh Tran Thi and Phuong Le-Hong",
+    booktitle = "Proceedings of PACLIC 2020",
+    year = "2020"
+}
+@inproceedings{wolf-etal-2020-transformers,
+    title = {{Transformers: State-of-the-Art Natural Language Processing}},
+    author = "Thomas Wolf and Lysandre Debut and others",
+    booktitle = "Proceedings of EMNLP 2020: System Demonstrations",
+    year = "2020",
+    pages = "38--45"
+}
+@incollection{NEURIPS2019_9015,
+title = {{PyTorch: An Imperative Style, High-Performance Deep Learning Library}},
+author = {Paszke, Adam and Gross, Sam and
+others},
+booktitle = {Proceedings of NeurIPS 2019},
+pages = {8024--8035},
+year = {2019}
+}
+@inproceedings{nguyen-etal-2009-building,
+    title = {{Building a Large Syntactically-Annotated Corpus of {V}ietnamese}},
+    author = "Nguyen, Phuong-Thai  and
+      Vu, Xuan-Luong  and
+      Nguyen, Thi-Minh-Huyen  and
+      Nguyen, Van-Hiep  and
+      Le, Hong-Phuong",
+    booktitle = "Proceedings of {LAW}",
+    year = "2009",
+    pages = "182--185",
+}
+@inproceedings{li-etal-2018-joint-learning,
+    title = {{Joint Learning of {POS} and Dependencies for Multilingual {U}niversal {D}ependency Parsing}},
+    author = "Li, Zuchao  and
+      He, Shexia  and
+      Zhang, Zhuosheng  and
+      Zhao, Hai",
+    booktitle = "Proceedings of the {C}o{NLL} 2018 Shared Task",
+    year = "2018",
+    pages = "65--73"
+}
+@article{Edmonds,
+  title={{Optimum Branchings}},
+  author={Jack Edmonds},
+  journal={Journal of Research of the National Bureau of Standards},
+  volume={71},
+  year      = {1967},
+  pages={233--240}
+}
+@inproceedings{phobert,
+title     = {{PhoBERT: Pre-trained language models for Vietnamese}},
+author    = {Dat Quoc Nguyen and Anh Tuan Nguyen},
+booktitle = "Findings of EMNLP 2020",
+year      = {2020},
+pages     = {1037--1042}
+}
+@inproceedings{kondratyuk-straka-2019-75,
+    title = {{75 Languages, 1 Model: Parsing {U}niversal {D}ependencies Universally}},
+    author = "Kondratyuk, Dan  and
+      Straka, Milan",
+    booktitle = "Proceedings of EMNLP-IJCNLP",
+    year = "2019",
+    pages = "2779--2795"
+}
+@inproceedings{zhang-weiss-2016-stack,
+    title = "Stack-propagation: Improved Representation Learning for Syntax",
+    author = "Zhang, Yuan  and
+      Weiss, David",
+    booktitle = "Proceedings of ACL",
+    year = "2016",
+    pages = "1557--1566",
+}
+@InProceedings{nguyenverspoorK18,
+  author    = {Nguyen, Dat Quoc  and  Verspoor, Karin},
+  title     = {{An Improved Neural Network Model for Joint {POS} Tagging and Dependency Parsing}},
+  booktitle = {Proceedings of the {CoNLL} 2018 Shared Task},
+  year      = {2018},
+  pages     = {81--91}
+}
+@InProceedings{NguyenALTA2019,
+    title     = {{A neural joint model for Vietnamese word segmentation, POS tagging and dependency parsing}},
+    author    = {Dat Quoc Nguyen},
+    booktitle = {Proceedings of ALTA},
+    year      = {2019},
+    pages     = {28--34}
+}
+@inproceedings{hashimoto-etal-2017-joint,
+    title = {{A Joint Many-Task Model: Growing a Neural Network for Multiple {NLP} Tasks}},
+    author = "Hashimoto, Kazuma  and
+      Xiong, Caiming  and
+      Tsuruoka, Yoshimasa  and
+      Socher, Richard",
+    booktitle = "Proceedings of EMNLP",
+    year = "2017",
+   pages = "1923--1933"
+}
+@inproceedings{qi-etal-2020-stanza,
+    title = {{S}tanza: A Python Natural Language Processing Toolkit for Many Human Languages},
+    author = "Qi, Peng  and
+      Zhang, Yuhao  and
+      Zhang, Yuhui  and
+      Bolton, Jason  and
+      Manning, Christopher D.",
+    booktitle = "Proceedings of ACL: System Demonstrations",
+    year = "2020",
+    pages = "101--108"
+}
+@inproceedings{qi-etal-2018-universal,
+    title = "{U}niversal {D}ependency Parsing from Scratch",
+    author = "Qi, Peng  and
+      Dozat, Timothy  and
+      Zhang, Yuhao  and
+      Manning, Christopher D.",
+    booktitle = "Proceedings of the {C}o{NLL} 2018 Shared Task",
+    year = "2018",
+    pages = "160--170"
+}
+@inproceedings{Lafferty:2001,
+ author = {Lafferty, John D. and McCallum, Andrew and Pereira, Fernando C. N.},
+ title = {{Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data}},
+ booktitle = {Proceedings of ICML},
+ year = {2001},
+ pages = {282--289}
+}
+@article{Wolf2019HuggingFacesTS,
+  title={{HuggingFace's Transformers: State-of-the-art Natural Language Processing}},
+  author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
+  journal={arXiv preprint},
+  year={2019},
+  volume={arXiv:1910.03771}
+}
+@inproceedings{kudo-richardson-2018-sentencepiece,
+    title = {{{S}entence{P}iece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing}},
+    author = "Kudo, Taku  and
+      Richardson, John",
+    booktitle = "Proceedings of EMNLP: System Demonstrations",
+    year = "2018",
+    pages = "66--71"
+}
+@inproceedings{wang-etal-2019-tree,
+    title = {{Tree Transformer: Integrating Tree Structures into Self-Attention}},
+    author = "Wang, Yaushian  and
+      Lee, Hung-Yi  and
+      Chen, Yun-Nung",
+    booktitle = "Proceedings of EMNLP-IJCNLP",
+    year = "2019",
+    pages = "1061--1070",
+    }
+@incollection{NIPS2017_7181,
+title = {{Attention is All you Need}},
+author = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
+booktitle = {Advances in Neural Information Processing Systems 30},
+pages = {5998--6008},
+year = {2017},
+}
+@inproceedings{jawahar-etal-2019-bert,
+    title = "What Does {BERT} Learn about the Structure of Language?",
+    author = "Jawahar, Ganesh  and
+      Sagot, Beno{\^\i}t  and
+      Seddah, Djam{\'e}",
+    booktitle = "Proceedings of ACL",
+    year = "2019",
+    pages = "3651--3657"
+}
+@inproceedings{hewitt-manning-2019-structural,
+    title = "{A} Structural Probe for Finding Syntax in Word Representations",
+    author = "Hewitt, John  and
+      Manning, Christopher D.",
+    booktitle = "Proceedings of NAACL",
+    year = "2019",
+    pages = "4129--4138"
+}
+@inproceedings{ma-etal-2018-stack,
+    title = {{Stack-Pointer Networks for Dependency Parsing}},
+    author = "Ma, Xuezhe  and
+      Hu, Zecong  and
+      Liu, Jingzhou  and
+      Peng, Nanyun  and
+      Neubig, Graham  and
+      Hovy, Eduard",
+    booktitle = "Proceedings of ACL",
+    year = "2018",
+    pages = "1403--1414"
+}
+@article{lewis2019mlqa,
+  title={{MLQA: Evaluating Cross-lingual Extractive Question Answering}},
+  author={Lewis, Patrick and O\u{g}uz, Barlas and Rinott, Ruty and Riedel, Sebastian and Schwenk, Holger},
+  journal={arXiv preprint},
+  volume={arXiv:1910.07475},
+  year={2019}
+}
+@InProceedings{Nguyen2014NLDB,
+  author = {Nguyen, Dat Quoc  and  Nguyen, Dai Quoc  and  Pham, Son Bao and Nguyen, Phuong-Thai and Nguyen, Minh Le},
+  title = {{From Treebank Conversion to Automatic Dependency Parsing for Vietnamese}},
+  booktitle = {{Proceedings of NLDB}},
+  year = {2014},
+  pages = {196-207}
+}
+@InProceedings{N18-1101,
+  author = "Williams, Adina
+            and Nangia, Nikita
+            and Bowman, Samuel",
+  title = {{A Broad-Coverage Challenge Corpus for
+           Sentence Understanding through Inference}},
+  booktitle = "Proceedings of NAACL",
+  year = "2018",
+  pages = "1112--1122",
+}
+@inproceedings{rajpurkar-etal-2016-squad,
+    title = "{SQ}u{AD}: 100,000+ Questions for Machine Comprehension of Text",
+    author = "Rajpurkar, Pranav  and
+      Zhang, Jian  and
+      Lopyrev, Konstantin  and
+      Liang, Percy",
+    booktitle = "Proceedings of EMNLP",
+    year = "2016",
+    pages = "2383--2392",
+}
+@inproceedings{DozatM17,
+  author    = {Timothy Dozat and
+               Christopher D. Manning},
+  title     = {{Deep Biaffine Attention for Neural Dependency Parsing}},
+  booktitle   = {Proceedings of ICLR},
+  year      = {2017}
+}
+@inproceedings{DinhQuangThang2008,
+author = {Dinh Quang Thang and Phuong, Le Hong and Nguyen Thi Minh Huyen and Tu, Nguyen Cam and Rossignol, Mathias and Luong, Vu Xuan},
+booktitle = {Proceedings of LREC},
+pages = {1933--1936},
+title = {{Word segmentation of Vietnamese texts: a comparison of approaches}},
+year = {2008}
+}
+@inproceedings{peters-etal-2018-deep,
+    title = {{Deep Contextualized Word Representations}},
+    author = "Peters, Matthew  and
+      Neumann, Mark  and
+      Iyyer, Mohit  and
+      Gardner, Matt  and
+      Clark, Christopher  and
+      Lee, Kenton  and
+      Zettlemoyer, Luke",
+    booktitle = "Proceedings of NAACL",
+    year = "2018",
+    pages = "2227--2237"
+}
+@article{abs-1906-08101,
+  author    = {Yiming Cui and
+               Wanxiang Che and
+               Ting Liu and
+               Bing Qin and
+               Ziqing Yang and
+               Shijin Wang and
+               Guoping Hu},
+  title     = {{Pre-Training with Whole Word Masking for Chinese BERT}},
+  journal={arXiv preprint},
+  volume    = {arXiv:1906.08101},
+  year      = {2019}
+}
+@inproceedings{le2019flaubert,
+    title={{FlauBERT: Unsupervised Language Model Pre-training for French}},
+    author={Hang Le and Lo\"ic Vial and others},
+    booktitle = {Proceedings of LREC},
+     year = "2020",
+        pages = {2479--2490}
+}
+@inproceedings{conneau2019unsupervised,
+  title={{Unsupervised Cross-lingual Representation Learning at Scale}},
+  author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},
+  booktitle = {Proceedings of ACL},
+     year = "2020",
+        pages = {8440--8451},
+        url={https://arxiv.org/pdf/1911.02116v1.pdf}
+}
+@inproceedings{vu-xuan-etal-2019-etnlp,
+    title = "{ETNLP}: A Visual-Aided Systematic Approach to Select Pre-Trained Embeddings for a Downstream Task",
+    author = "Vu, Xuan-Son  and
+      Vu, Thanh  and
+      Tran, Son  and
+      Jiang, Lili",
+    booktitle = "Proceedings of RANLP",
+    year = "2019",
+    pages = "1285--1294"
+}
+@inproceedings{nguyen-2019-neural,
+    title = "A neural joint model for {V}ietnamese word segmentation, {POS} tagging and dependency parsing",
+    author = "Nguyen, Dat Quoc",
+    booktitle = "Proceedings of ALTA",
+    year = "2019",
+    pages = "28--34",
+}
+@inproceedings{nguyen-etal-2017-word,
+    title = "From Word Segmentation to {POS} Tagging for {V}ietnamese",
+    author = "Nguyen, Dat Quoc  and
+      Vu, Thanh  and
+      Nguyen, Dai Quoc  and
+      Dras, Mark  and
+      Johnson, Mark",
+    booktitle = "Proceedings of ALTA",
+    year = "2017",
+    pages = "108--113",
+}
+@inproceedings{nguyen-verspoor-2018-improved,
+    title = "An Improved Neural Network Model for Joint {POS} Tagging and Dependency Parsing",
+    author = "Nguyen, Dat Quoc  and
+      Verspoor, Karin",
+    booktitle = "Proceedings of the {C}o{NLL} 2018 Shared Task",
+    year = "2018",
+    pages = "81--91"
+}
+@inproceedings{ma-hovy-2016-end,
+    title = "End-to-end Sequence Labeling via Bi-directional {LSTM}-{CNN}s-{CRF}",
+    author = "Ma, Xuezhe  and
+      Hovy, Eduard",
+    booktitle = "Proceedings of ACL",
+    year = "2016",
+    pages = "1064--1074",
+}
+@inproceedings{nguyen-etal-2014-rdrpostagger,
+    title = {{RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger}},
+    author = "Nguyen, Dat Quoc  and
+      Nguyen, Dai Quoc  and
+      Pham, Dang Duc  and
+      Pham, Son Bao",
+    booktitle = "Proceedings of the Demonstrations at EACL",
+    year = "2014",
+    pages = "17--20"
+}
+@inproceedings{sennrich-etal-2016-neural,
+    title = {{Neural Machine Translation of Rare Words with Subword Units}},
+    author = "Sennrich, Rico  and
+      Haddow, Barry  and
+      Birch, Alexandra",
+    booktitle = "Proceedings of ACL",
+    year = "2016",
+    pages = "1715--1725",
+}
+@inproceedings{conneau-etal-2018-xnli,
+    title = "{XNLI}: Evaluating Cross-lingual Sentence Representations",
+    author = "Alexis Conneau and Ruty Rinott and Guillaume Lample and Holger Schwenk and  Ves Stoyanov and Adina Williams and Samuel R. Bowman",
+    booktitle = "Proceedings of EMNLP",
+    year = "2018",
+    pages = "2475--2485"
+}
+@article{ArtetxeS19,
+  author    = {Mikel Artetxe and
+               Holger Schwenk},
+  title     = {{Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual
+               Transfer and Beyond}},
+  journal   = {{TACL}},
+  volume    = {7},
+  pages     = {597--610},
+  year      = {2019}
+}
+@inproceedings{NIPS2019_8928,
+title = {{Cross-lingual Language Model Pretraining}},
+author = {Conneau, Alexis and Lample, Guillaume},
+booktitle = {Proceedings of NeurIPS},
+pages = {7059--7069},
+year = {2019},
+}
+@inproceedings{wu-dredze-2019-beto,
+    title = "Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of {BERT}",
+    author = "Wu, Shijie  and
+      Dredze, Mark",
+    booktitle = "Proceedings of EMNLP-IJCNLP",
+    year = "2019",
+    pages = "833--844"
+}
+@inproceedings{2019arXiv191103894M,
+    author = {{Martin}, Louis and {Muller}, Benjamin and
+         {Ortiz Su{\'a}rez}, Pedro Javier and {Dupont}, Yoann and
+         {Romary}, Laurent and {Villemonte de la Clergerie}, {\'E}ric and
+         {Seddah}, Djam{\'e} and {Sagot}, Beno{\^\i}t},
+    title = "{CamemBERT: a Tasty French Language Model}",
+      booktitle = {Proceedings of ACL},
+     year = "2020",
+        pages = {7203--7219}
+}
+@ARTICLE{vries2019bertje,
+    title={{BERTje: A Dutch BERT Model}},
+    author={Wietse de Vries and Andreas van Cranenburgh and Arianna Bisazza and Tommaso Caselli and Gertjan van Noord and Malvina Nissim},
+    year={2019},
+    volume={arXiv:1912.09582},
+    journal={arXiv preprint}
+}
+@INPROCEEDINGS{loshchilov2018decoupled,
+title={{Decoupled Weight Decay Regularization}},
+author={Ilya Loshchilov and Frank Hutter},
+booktitle={Proceedings of  ICLR},
+year={2019},
+}
+@INPROCEEDINGS{8713740,
+author={Kim Anh Nguyen and Ngan Dong and Cam-Tu Nguyen},
+booktitle={Proceedings of  RIVF},
+title={{Attentive Neural Network for Named Entity Recognition in Vietnamese}},
+year={2019}
+}
+@article{JCC13161,
+	author = {Huyen Nguyen and Quyen Ngo and Luong Vu and Vu Tran and Hien Nguyen},
+	title = {{VLSP Shared Task: Named Entity Recognition}},
+	journal = {Journal of Computer Science and Cybernetics},
+	volume = {34},
+	number = {4},
+	year = {2019},
+		pages = {283--294}
+}
+@article{JCC13163,
+	author = {Minh Quang Nhat Pham},
+	title = {{A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Evaluation Campaign}},
+	journal = {Journal of Computer Science and Cybernetics},
+	volume = {34},
+	number = {4},
+	year = {2019},
+pages = {311--321},
+}
+@article{KingmaB14,
+author    = {Diederik P. Kingma and
+Jimmy Ba},
+title = {{Adam: {A} Method for Stochastic Optimization}},
+journal   = {arXiv preprint},
+volume    = {arXiv:1412.6980},
+year      = {2014}
+}
+@inproceedings{ott2019fairseq,
+  title = {{fairseq: A Fast, Extensible Toolkit for Sequence Modeling}},
+  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
+  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
+  year = {2019},
+  pages={48--53}
+}
+@inproceedings{devlin-etal-2019-bert,
+    title = {{BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding},
+    author = "Devlin, Jacob  and
+      Chang, Ming-Wei  and
+      Lee, Kenton  and
+      Toutanova, Kristina",
+    booktitle = "Proceedings of NAACL",
+    year = "2019",
+    pages = "4171--4186",
+}
+@article{RoBERTa,
+  author    = {Yinhan Liu and
+               Myle Ott and
+               Naman Goyal and
+               Jingfei Du and
+               Mandar Joshi and
+               Danqi Chen and
+               Omer Levy and
+               Mike Lewis and
+               Luke Zettlemoyer and
+               Veselin Stoyanov},
+  title     = {{RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach}},
+  journal   = {arXiv preprint},
+  volume    = {arXiv:1907.11692},
+  year      = {2019}
+}
+@inproceedings{nguyen-etal-2018-fast,
+    title = {{A Fast and Accurate Vietnamese Word Segmenter}},
+    author = "Nguyen, Dat Quoc  and
+      Nguyen, Dai Quoc  and
+      Vu, Thanh  and
+      Dras, Mark  and
+      Johnson, Mark",
+    booktitle = "Proceedings of LREC",
+    year = "2018",
+    pages = "2582--2587"
+}
+@inproceedings{vu-etal-2018-vncorenlp,
+    title = {{VnCoreNLP: A Vietnamese Natural Language Processing Toolkit}},
+    author = "Vu, Thanh  and
+      Nguyen, Dat Quoc  and
+      Nguyen, Dai Quoc  and
+      Dras, Mark  and
+      Johnson, Mark",
+    booktitle = "Proceedings of NAACL: Demonstrations",
+    year = "2018",
+    pages = "56--60"
+}

references/README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+# References
+Reference materials for Vietnamese POS Tagger (TRE-1).
+## Papers
+| Folder | Title | Authors | Year |
+|--------|-------|---------|------|
+| [2001.icml.lafferty](2001.icml.lafferty/) | Conditional Random Fields | Lafferty, McCallum, Pereira | 2001 |
+| [2014.eacl.nguyen](2014.eacl.nguyen/) | RDRPOSTagger | Nguyen et al. | 2014 |
+| [2018.naacl.vu](2018.naacl.vu/) | VnCoreNLP | Vu, Nguyen et al. | 2018 |
+| [2020.emnlp.nguyen](2020.emnlp.nguyen/) | PhoBERT | Nguyen & Nguyen | 2020 |
+| [2021.naacl.nguyen](2021.naacl.nguyen/) | PhoNLP | Nguyen & Nguyen | 2021 |
+Each paper folder contains:
+- `paper.md` - Markdown with YAML front matter (for LLM/RAG)
+- `paper.tex` - LaTeX source (original from arXiv or generated)
+- `paper.pdf` - PDF file
+- `source/` - Full arXiv source (if available)
+## Resources
+| File | Title | Type |
+|------|-------|------|
+| [universal_dependencies.md](universal_dependencies.md) | Universal Dependencies | Annotation Framework |
+| [underthesea.md](underthesea.md) | Underthesea | Vietnamese NLP Toolkit |
+| [python_crfsuite.md](python_crfsuite.md) | python-crfsuite | CRF Library |
+## Research Notes
+| Folder | Description |
+|--------|-------------|
+| [research_vietnamese_pos](research_vietnamese_pos/) | Literature review on Vietnamese POS tagging |
+## Vietnamese POS Tagging Benchmarks
+| Model | Dataset | Accuracy | Year |
+|-------|---------|----------|------|
+| PhoNLP | VLSP 2013 | 96.91% | 2021 |
+| PhoBERT-large | VLSP 2013 | 96.8% | 2020 |
+| VnMarMoT | VLSP 2013 | 95.88% | 2018 |
+| **TRE-1** | **UDD-1** | **95.89%** | **2026** |
+| RDRPOSTagger | VLSP 2013 | 95.11% | 2014 |

references/python_crfsuite.md ADDED Viewed

	@@ -0,0 +1,131 @@

+---
+title: "python-crfsuite"
+type: "resource"
+url: "https://github.com/scrapinghub/python-crfsuite"
+---
+## Overview
+python-crfsuite provides Python bindings for the CRFsuite conditional random field toolkit, enabling efficient sequence labeling in Python.
+## Key Information
+| Field | Value |
+|-------|-------|
+| **GitHub** | https://github.com/scrapinghub/python-crfsuite |
+| **PyPI** | https://pypi.org/project/python-crfsuite/ |
+| **Documentation** | https://python-crfsuite.readthedocs.io/ |
+| **License** | MIT (python-crfsuite), BSD (CRFsuite) |
+| **Latest Version** | 0.9.12 (December 2025) |
+| **Stars** | 771 |
+## Features
+- **Fast Performance**: Faster than official SWIG wrapper
+- **No External Dependencies**: CRFsuite bundled; NumPy/SciPy not required
+- **Python 2 & 3 Support**: Works with both Python versions
+- **Cython-based**: High-performance C++ bindings
+## Installation
+```bash
+# Using pip
+pip install python-crfsuite
+# Using conda
+conda install -c conda-forge python-crfsuite
+```
+## Usage
+### Training
+```python
+import pycrfsuite
+# Create trainer
+trainer = pycrfsuite.Trainer(verbose=True)
+# Add training data
+for xseq, yseq in zip(X_train, y_train):
+    trainer.append(xseq, yseq)
+# Set parameters
+trainer.set_params({
+    'c1': 1.0,           # L1 regularization
+    'c2': 0.001,         # L2 regularization
+    'max_iterations': 100,
+    'feature.possible_transitions': True
+})
+# Train model
+trainer.train('model.crfsuite')
+```
+### Inference
+```python
+import pycrfsuite
+# Load model
+tagger = pycrfsuite.Tagger()
+tagger.open('model.crfsuite')
+# Predict
+y_pred = tagger.tag(x_seq)
+```
+### Feature Format
+Features are lists of strings in `name=value` format:
+```python
+features = [
+    ['word=hello', 'pos=NN', 'is_capitalized=True'],
+    ['word=world', 'pos=NN', 'is_capitalized=False'],
+]
+```
+## Training Algorithms
+| Algorithm | Description |
+|-----------|-------------|
+| `lbfgs` | Limited-memory BFGS (default) |
+| `l2sgd` | SGD with L2 regularization |
+| `ap` | Averaged Perceptron |
+| `pa` | Passive Aggressive |
+| `arow` | Adaptive Regularization of Weights |
+## Parameters (L-BFGS)
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `c1` | 0 | L1 regularization coefficient |
+| `c2` | 1.0 | L2 regularization coefficient |
+| `max_iterations` | unlimited | Maximum iterations |
+| `num_memories` | 6 | Number of memories for L-BFGS |
+| `epsilon` | 1e-5 | Convergence threshold |
+## Related Projects
+- **sklearn-crfsuite**: Scikit-learn compatible wrapper
+- **CRFsuite**: Original C++ implementation
+## Citation
+```bibtex
+@misc{python-crfsuite,
+  author = {Scrapinghub},
+  title = {python-crfsuite: Python binding to CRFsuite},
+  year = {2014},
+  publisher = {GitHub},
+  url = {https://github.com/scrapinghub/python-crfsuite}
+}
+@misc{crfsuite,
+  author = {Okazaki, Naoaki},
+  title = {CRFsuite: A fast implementation of Conditional Random Fields},
+  year = {2007},
+  url = {http://www.chokkan.org/software/crfsuite/}
+}
+```

references/research_vietnamese_pos/README.md ADDED Viewed

	@@ -0,0 +1,145 @@

+# Literature Review: Vietnamese POS Tagging
+**Date**: 2026-01-31
+**Project**: TRE-1 Vietnamese POS Tagger
+## Executive Summary
+This literature review surveys the state-of-the-art in Vietnamese Part-of-Speech (POS) tagging. The field has evolved from rule-based systems (RDRPOSTagger, 2014) through feature-based CRF models (VnCoreNLP/VnMarMoT, 2018) to transformer-based approaches (PhoBERT, 2020; PhoNLP, 2021). Current SOTA on VLSP 2013 benchmark achieves ~96.8% accuracy using PhoBERT-based models.
+## Research Questions
+- **RQ1**: What is the current state-of-the-art for Vietnamese POS tagging?
+- **RQ2**: How do CRF-based methods compare to neural/transformer approaches?
+- **RQ3**: What datasets and benchmarks exist for evaluation?
+- **RQ4**: What are the remaining challenges and research gaps?
+## Methodology
+- **Search sources**: ACL Anthology, Semantic Scholar, arXiv, VLSP
+- **Search terms**: "Vietnamese POS tagging", "PhoBERT", "VnCoreNLP", "VLSP 2013"
+- **Timeframe**: 2014-2026
+- **Inclusion criteria**: Peer-reviewed papers on Vietnamese POS tagging
+## PRISMA Flow
+- Records identified: 30+
+- Duplicates removed: 5
+- Records screened: 25
+- Studies included: 8 key papers (5 fetched, 3 referenced)
+---
+## Findings
+### RQ1: State-of-the-Art Results
+| Model | Year | Method | VLSP 2013 Acc | Notes |
+|-------|------|--------|---------------|-------|
+| **PhoNLP** | 2021 | PhoBERT + MTL | **96.91%** | Multi-task with NER, DP |
+| PhoBERT-large | 2020 | Transformer | 96.8% | Pre-trained LM |
+| vELECTRA | 2020 | ELECTRA | 96.77% | FPT.AI |
+| PhoBERT-base | 2020 | Transformer | 96.7% | VinAI Research |
+| VnMarMoT | 2018 | CRF | 95.88% | Part of VnCoreNLP |
+| **TRE-1** | 2026 | CRF | 95.89%* | This work (UDD-1) |
+| RDRPOSTagger | 2014 | Rules (RDR) | 95.11% | Ripple Down Rules |
+| Neural BiLSTM-CRF | 2018 | BiLSTM-CRF | 93.52% | Without PLM |
+*Note: TRE-1 evaluated on UDD-1, not VLSP 2013. Direct comparison limited.
+### RQ2: CRF vs Neural Methods
+#### Traditional/CRF Approaches
+**Strengths:**
+- Fast inference (90K words/sec in Java)
+- Interpretable feature templates
+- Low resource requirements
+- No GPU needed
+**Limitations:**
+- Manual feature engineering required
+- Limited context window (typically ±2 tokens)
+- Cannot leverage pre-trained embeddings
+#### Transformer-Based Approaches
+**Strengths:**
+- Leverage large-scale pre-training (PhoBERT: 20GB Vietnamese text)
+- Capture long-range dependencies
+- State-of-the-art accuracy (+1-2% over CRF)
+- Transfer learning benefits
+**Limitations:**
+- Require GPU for training/inference
+- Slower inference
+- Larger model size (135M-370M parameters)
+- Need more training data to fine-tune effectively
+### RQ3: Datasets and Benchmarks
+| Dataset | Sentences | Domain | Annotation | Access |
+|---------|-----------|--------|------------|--------|
+| **VLSP 2013** | 27,870 | News | Manual | Request |
+| VietTreeBank | 10,000+ | Mixed | Manual | Research |
+| **UDD-1** | 20,000 | Legal+News | Machine | HuggingFace |
+| VnDT v1.1 | 3,000 | News | Manual | Research |
+**Note on Data Leakage**: VLSP 2016 NER and VnDT sentences are included in VLSP 2013, causing potential leakage issues.
+### RQ4: Challenges and Research Gaps
+1. **Lexical Ambiguity**: Many Vietnamese words can be multiple POS (e.g., "năm" = NUM "five" or NOUN "year")
+2. **Word Segmentation Dependency**: POS tagging requires pre-segmented input; errors propagate
+3. **Domain Adaptation**: Models trained on news/legal may underperform on social media, conversational text
+4. **Rare Tags**: PART, X tags have limited samples, causing lower performance
+5. **Benchmark Standardization**: Multiple datasets with different tagsets make comparison difficult
+---
+## Related Work Synthesis
+### Rule-Based Methods
+**RDRPOSTagger** (Nguyen et al., 2014) uses Ripple Down Rules to automatically learn transformation rules from training data. Achieves 95.11% on VLSP 2013. Fast and interpretable but requires manual feature selection.
+### Statistical/CRF Methods
+**VnMarMoT** (part of VnCoreNLP, 2018) is a CRF-based tagger achieving 95.88% on VLSP 2013. Uses MarMoT architecture with hand-crafted features including word forms, prefixes/suffixes, and context windows.
+**TRE-1** (this work) follows similar CRF approach with 27 feature templates inspired by underthesea library. Achieves 95.89% on UDD-1 dataset.
+### Neural/Transformer Methods
+**PhoBERT** (Nguyen & Nguyen, 2020) is a Vietnamese pre-trained language model based on RoBERTa architecture, trained on 20GB of Vietnamese text. Fine-tuned for POS tagging, achieves 96.8% on VLSP 2013.
+**PhoNLP** (Nguyen & Nguyen, 2021) extends PhoBERT with multi-task learning for joint POS tagging, NER, and dependency parsing. Achieves state-of-the-art 96.91% on VLSP 2013 by sharing representations across tasks.
+---
+## Recommendations for TRE-1
+1. **Evaluate on VLSP 2013**: Enable direct comparison with prior work
+2. **Consider PhoBERT Fine-tuning**: Could improve accuracy by ~1%
+3. **Multi-domain Training**: Include social media data for robustness
+4. **Error Analysis on Rare Tags**: Focus on PART, X, DET improvement
+---
+## References
+1. Nguyen, D. Q., Nguyen, D. Q., Pham, D. D., & Pham, S. B. (2014). RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger. EACL. https://aclanthology.org/E14-2005/
+2. Vu, T., Nguyen, D. Q., Nguyen, D. Q., Dras, M., & Johnson, M. (2018). VnCoreNLP: A Vietnamese Natural Language Processing Toolkit. NAACL. https://aclanthology.org/N18-5012/
+3. Nguyen, D. Q., & Nguyen, A. T. (2020). PhoBERT: Pre-trained language models for Vietnamese. EMNLP Findings. https://aclanthology.org/2020.findings-emnlp.92/
+4. Nguyen, L. T., & Nguyen, D. Q. (2021). PhoNLP: A joint multi-task learning model for Vietnamese POS tagging, NER and dependency parsing. NAACL. https://aclanthology.org/2021.naacl-demos.1/
+5. VLSP 2013 POS Tagging Shared Task. https://vlsp.org.vn/vlsp2013/eval/ws-pos
+6. NLP-progress Vietnamese. https://nlpprogress.com/vietnamese/vietnamese.html

references/research_vietnamese_pos/bibliography.bib ADDED Viewed

	@@ -0,0 +1,171 @@

+@inproceedings{nguyen-nguyen-2020-phobert,
+    title = "{P}ho{BERT}: Pre-trained language models for {V}ietnamese",
+    author = "Nguyen, Dat Quoc and Nguyen, Anh Tuan",
+    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
+    month = nov,
+    year = "2020",
+    address = "Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2020.findings-emnlp.92",
+    doi = "10.18653/v1/2020.findings-emnlp.92",
+    pages = "1037--1042",
+}
+@inproceedings{vu-etal-2018-vncorenlp,
+    title = "{V}n{C}ore{NLP}: A {V}ietnamese Natural Language Processing Toolkit",
+    author = "Vu, Thanh and Nguyen, Dat Quoc and Nguyen, Dai Quoc and Dras, Mark and Johnson, Mark",
+    booktitle = "Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations",
+    month = jun,
+    year = "2018",
+    address = "New Orleans, Louisiana",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/N18-5012",
+    doi = "10.18653/v1/N18-5012",
+    pages = "56--60",
+}
+@inproceedings{nguyen-etal-2014-rdrpostagger,
+    title = "{RDRPOS}Tagger: A Ripple Down Rules-based Part-Of-Speech Tagger",
+    author = "Nguyen, Dat Quoc and Nguyen, Dai Quoc and Pham, Dang Duc and Pham, Son Bao",
+    booktitle = "Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics",
+    month = apr,
+    year = "2014",
+    address = "Gothenburg, Sweden",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/E14-2005",
+    doi = "10.3115/v1/E14-2005",
+    pages = "17--20",
+}
+@inproceedings{nguyen-nguyen-2021-phonlp,
+    title = "{P}ho{NLP}: A joint multi-task learning model for {V}ietnamese part-of-speech tagging, named entity recognition and dependency parsing",
+    author = "Nguyen, Linh The and Nguyen, Dat Quoc",
+    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations",
+    month = jun,
+    year = "2021",
+    address = "Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2021.naacl-demos.1",
+    doi = "10.18653/v1/2021.naacl-demos.1",
+    pages = "1--7",
+}
+@misc{vlsp2013,
+    title = "{VLSP} 2013 {POS} Tagging Shared Task",
+    author = "{VLSP}",
+    year = "2013",
+    url = "https://vlsp.org.vn/vlsp2013/eval/ws-pos",
+}
+@inproceedings{lafferty2001crf,
+    title = "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data",
+    author = "Lafferty, John D. and McCallum, Andrew and Pereira, Fernando C. N.",
+    booktitle = "Proceedings of the Eighteenth International Conference on Machine Learning (ICML)",
+    pages = "282--289",
+    year = "2001",
+    url = "https://dl.acm.org/doi/10.5555/645530.655813",
+}
+@inproceedings{bui2020velecra,
+    title = "Improving Sequence Tagging for Vietnamese Text using Transformer-based Neural Models",
+    author = "Bui, Tuan Viet and Tran, Oanh Thi Kim and Le-Hong, Phuong",
+    booktitle = "Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation (PACLIC)",
+    year = "2020",
+    url = "https://github.com/fpt-corp/vELECTRA",
+}
+@inproceedings{ma-hovy-2016-end,
+    title = "End-to-end Sequence Labeling via Bi-directional {LSTM}-{CNN}s-{CRF}",
+    author = "Ma, Xuezhe and Hovy, Eduard",
+    booktitle = "Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics",
+    month = aug,
+    year = "2016",
+    url = "https://aclanthology.org/P16-1101",
+    doi = "10.18653/v1/P16-1101",
+    pages = "1064--1074",
+}
+@inproceedings{nguyen2018neural,
+    title = "Neural Sequence Labeling for Vietnamese POS Tagging and NER",
+    author = "Nguyen, Hoang Anh DU and Nguyen, Kiem Hieu and Van, Victor",
+    booktitle = "2019 IEEE-RIVF International Conference on Computing and Communication Technologies",
+    year = "2019",
+    url = "https://arxiv.org/abs/1811.03754",
+    doi = "10.1109/RIVF.2019.8713710",
+}
+@misc{universaldependencies,
+    title = "Universal Dependencies",
+    author = "{Universal Dependencies Contributors}",
+    year = "2024",
+    url = "https://universaldependencies.org/",
+}
+@misc{underthesea,
+    title = "Underthesea: Vietnamese NLP Toolkit",
+    author = "{Underthesea Team}",
+    year = "2024",
+    url = "https://github.com/undertheseanlp/underthesea",
+}
+@article{tran2021bartpho,
+    title = "BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese",
+    author = "Tran, Nguyen Luong and Le, Duong Minh and Nguyen, Dat Quoc",
+    journal = "arXiv preprint arXiv:2109.09701",
+    year = "2021",
+    url = "https://arxiv.org/abs/2109.09701",
+}
+@inproceedings{phan2022vit5,
+    title = "{V}i{T}5: Pretrained Text-to-Text Transformer for {V}ietnamese Language Generation",
+    author = "Phan, Long and Tran, Hieu and Nguyen, Hieu and Trinh, Trieu H.",
+    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop",
+    month = jul,
+    year = "2022",
+    address = "Seattle, Washington",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2022.naacl-srw.18",
+    pages = "136--142",
+}
+@inproceedings{tran2023videberta,
+    title = "{V}i{D}e{BERT}a: A powerful pre-trained language model for {V}ietnamese",
+    author = "Tran, Cong Dao and Pham, Nhut Huy and Nguyen, Anh and Hy, Truong-Son",
+    booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
+    month = may,
+    year = "2023",
+    address = "Dubrovnik, Croatia",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2023.findings-eacl.79",
+    pages = "1071--1078",
+}
+@inproceedings{nguyen2023visobert,
+    title = "{V}i{S}o{BERT}: A Pre-Trained Language Model for {V}ietnamese Social Media Text Processing",
+    author = "Nguyen, Nam and Phan, Thang and Nguyen, Duc-Vu and Nguyen, Kiet",
+    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
+    month = dec,
+    year = "2023",
+    address = "Singapore",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2023.emnlp-main.315",
+    pages = "5191--5207",
+}
+@article{nguyen2023phogpt,
+    title = "PhoGPT: Generative Pre-training for Vietnamese",
+    author = "Nguyen, Dat Quoc and Nguyen, Linh The and Tran, Chi and Nguyen, Dung Ngoc and Phung, Dinh and Bui, Hung",
+    journal = "arXiv preprint arXiv:2311.02945",
+    year = "2023",
+    url = "https://arxiv.org/abs/2311.02945",
+}
+@article{zheng2022vietnamese,
+    title = "Deep Neural Networks Algorithm for Vietnamese Word Segmentation",
+    author = "Zheng, Yi and others",
+    journal = "Scientific Programming",
+    volume = "2022",
+    year = "2022",
+    doi = "10.1155/2022/8187680",
+    url = "https://doi.org/10.1155/2022/8187680",
+}

references/research_vietnamese_pos/papers.md ADDED Viewed

	@@ -0,0 +1,378 @@

+# Paper Database: Vietnamese POS Tagging
+## Paper 1: PhoBERT
+- **Title**: PhoBERT: Pre-trained language models for Vietnamese
+- **Authors**: Dat Quoc Nguyen, Anh Tuan Nguyen
+- **Venue**: EMNLP Findings 2020
+- **URL**: https://aclanthology.org/2020.findings-emnlp.92/
+- **Citations**: 411+
+- **Local**: [paper.pdf](../2020.emnlp.nguyen/paper.pdf)
+### Summary
+First public large-scale monolingual language models pre-trained for Vietnamese. PhoBERT-base (135M params) and PhoBERT-large (370M params) trained on 20GB Vietnamese Wikipedia and news text using RoBERTa architecture.
+### Key Contributions
+1. Release pre-trained Vietnamese language models (PhoBERT-base, PhoBERT-large)
+2. State-of-the-art results on 4 Vietnamese NLP tasks including POS tagging
+3. Show monolingual models outperform multilingual BERT for Vietnamese
+### Methodology
+- **Approach**: RoBERTa-style pre-training on Vietnamese text
+- **Dataset**: 20GB Vietnamese text (Wikipedia + news)
+- **Pre-training**: Masked language modeling, 40 epochs
+### Results
+| Task | Dataset | PhoBERT-base | PhoBERT-large |
+|------|---------|--------------|---------------|
+| POS Tagging | VLSP 2013 | 96.7% | 96.8% |
+| NER | VLSP 2016 | 94.0% F1 | 94.5% F1 |
+### Relevance to TRE-1
+PhoBERT represents the transformer-based SOTA that TRE-1's CRF approach competes against. TRE-1 achieves 95.89% vs PhoBERT's 96.8% - a gap of ~1%.
+---
+## Paper 2: VnCoreNLP
+- **Title**: VnCoreNLP: A Vietnamese Natural Language Processing Toolkit
+- **Authors**: Thanh Vu, Dat Quoc Nguyen, Dai Quoc Nguyen, Mark Dras, Mark Johnson
+- **Venue**: NAACL 2018 Demonstrations
+- **URL**: https://aclanthology.org/N18-5012/
+- **Citations**: 300+
+- **Local**: [paper.pdf](../2018.naacl.vu/paper.pdf)
+### Summary
+Fast and accurate NLP pipeline for Vietnamese covering word segmentation, POS tagging, NER, and dependency parsing. POS component (VnMarMoT) uses CRF with MarMoT architecture.
+### Key Contributions
+1. Integrated toolkit for Vietnamese NLP pipeline
+2. State-of-the-art results on standard benchmarks
+3. Fast processing (8K-90K words/sec)
+### Methodology
+- **Approach**: CRF-based (MarMoT) for POS tagging
+- **Features**: Word forms, prefixes, suffixes, context windows
+- **Dataset**: VLSP 2013
+### Results
+| Task | Dataset | Accuracy/F1 |
+|------|---------|-------------|
+| Word Segmentation | - | 97.90% |
+| POS Tagging | VLSP 2013 | 95.88% |
+| NER | VLSP 2016 | 88.55% F1 |
+### Relevance to TRE-1
+VnMarMoT is the closest comparable CRF-based system. TRE-1 (95.89%) matches VnMarMoT (95.88%) performance level using similar feature engineering approach.
+---
+## Paper 3: RDRPOSTagger
+- **Title**: RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger
+- **Authors**: Dat Quoc Nguyen, Dai Quoc Nguyen, Dang Duc Pham, Son Bao Pham
+- **Venue**: EACL 2014 Demonstrations
+- **URL**: https://aclanthology.org/E14-2005/
+- **Citations**: 150+
+- **Local**: [paper.pdf](../2014.eacl.nguyen/paper.pdf)
+### Summary
+Error-driven approach using Ripple Down Rules to automatically construct transformation rules for POS tagging. Language-independent and supports 80+ languages including Vietnamese.
+### Key Contributions
+1. Novel error-driven rule learning approach
+2. Fast training and inference
+3. Multi-language support
+### Methodology
+- **Approach**: Single Classification Ripple Down Rules (SCRDR)
+- **Learning**: Error-driven transformation rules
+- **Features**: Template-based rules
+### Results
+| Language | Dataset | Accuracy |
+|----------|---------|----------|
+| English | Penn WSJ | 97.10% |
+| Vietnamese | VietTreeBank | 95.11% |
+### Relevance to TRE-1
+RDRPOSTagger represents earlier rule-based approach. TRE-1's CRF approach (95.89%) outperforms RDRPOSTagger (95.11%) by 0.78%.
+---
+## Paper 4: PhoNLP
+- **Title**: PhoNLP: A joint multi-task learning model for Vietnamese POS tagging, NER and dependency parsing
+- **Authors**: Linh The Nguyen, Dat Quoc Nguyen
+- **Venue**: NAACL 2021 Demonstrations
+- **URL**: https://aclanthology.org/2021.naacl-demos.1/
+- **Citations**: 50+
+- **Local**: [paper.pdf](../2021.naacl.nguyen/paper.pdf)
+### Summary
+First multi-task learning model for joint Vietnamese POS tagging, NER, and dependency parsing. Uses PhoBERT as encoder with task-specific prediction layers.
+### Key Contributions
+1. Joint multi-task learning for Vietnamese NLP
+2. Soft POS embeddings shared across tasks
+3. State-of-the-art on all three tasks
+### Methodology
+- **Approach**: PhoBERT encoder + multi-task prediction heads
+- **Architecture**: Shared encoder, task-specific CRF/Biaffine classifiers
+- **Training**: Joint optimization on all tasks
+### Results
+| Task | Dataset | Score |
+|------|---------|-------|
+| POS Tagging | VLSP 2013 | 96.91% |
+| NER | VLSP 2016 | 94.51% F1 |
+| Dep Parsing | VnDT | 77.85% LAS |
+### Relevance to TRE-1
+PhoNLP represents current SOTA (96.91%). TRE-1's CRF approach (95.89%) is 1.02% behind but requires no GPU and is significantly faster.
+---
+---
+## Paper 5: CRF (Lafferty et al. 2001)
+- **Title**: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
+- **Authors**: John D. Lafferty, Andrew McCallum, Fernando C. N. Pereira
+- **Venue**: ICML 2001
+- **URL**: https://dl.acm.org/doi/10.5555/645530.655813
+- **Citations**: 20,000+
+- **Local**: [paper.pdf](../2001.icml.lafferty/paper.pdf)
+### Summary
+Foundational paper introducing Conditional Random Fields (CRF) for sequence labeling. Addresses the label bias problem inherent in MEMMs and provides theoretical foundation for discriminative sequence models.
+### Key Contributions
+1. CRF framework as undirected graphical model for sequences
+2. Solution to label bias problem in directed models
+3. Iterative parameter estimation algorithms
+### Methodology
+- **Approach**: Undirected graphical model with global normalization
+- **Training**: Maximum conditional likelihood with L-BFGS
+- **Inference**: Viterbi algorithm for most likely sequence
+### Relevance to TRE-1
+This is the foundational algorithm used in TRE-1. Our implementation uses python-crfsuite with L-BFGS optimization, c1=1.0 (L1), c2=0.001 (L2).
+---
+## Paper 6: vELECTRA (Bui et al. 2020)
+- **Title**: Improving Sequence Tagging for Vietnamese Text using Transformer-based Neural Models
+- **Authors**: T. V. Bui, O. T. Tran, P. Le-Hong
+- **Venue**: PACLIC 2020
+- **URL**: https://github.com/fpt-corp/vELECTRA
+- **Citations**: 50+
+### Summary
+Vietnamese ELECTRA model pre-trained on 60GB of text using replaced token detection task. Alternative to PhoBERT with different pre-training objective.
+### Key Contributions
+1. Vietnamese ELECTRA architecture
+2. Pre-training on 60GB Vietnamese text
+3. Competitive results with PhoBERT
+### Results
+| Task | Dataset | Accuracy |
+|------|---------|----------|
+| POS Tagging | VLSP 2013 | 96.77% |
+### Relevance to TRE-1
+Another transformer baseline showing ~1% gap over CRF approaches.
+---
+## Paper 7: Neural Sequence Labeling (Nguyen et al. 2018)
+- **Title**: Neural Sequence Labeling for Vietnamese POS Tagging and NER
+- **Authors**: DU Nguyen Hoang Anh, Hieu Nguyen Kiem, Victor Van
+- **Venue**: RIVF 2019
+- **arXiv**: https://arxiv.org/abs/1811.03754
+- **Citations**: 13
+### Summary
+BiLSTM-CRF model with character and word embeddings for Vietnamese sequence labeling. Pre-PhoBERT era neural approach.
+### Results
+| Task | Score |
+|------|-------|
+| POS Tagging | 93.52% |
+| NER | 94.88% F1 |
+### Relevance to TRE-1
+Shows that without pre-trained LMs, neural approaches (93.52%) underperform CRF with good features (95.89%).
+---
+## Paper 8: BiLSTM-CNN-CRF (Ma & Hovy 2016)
+- **Title**: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
+- **Authors**: Xuezhe Ma, Eduard Hovy
+- **Venue**: ACL 2016
+- **arXiv**: https://arxiv.org/abs/1603.01354
+- **Citations**: 5,000+
+### Summary
+Seminal neural sequence labeling architecture combining BiLSTM, CNN character embeddings, and CRF layer. Established neural+CRF as dominant approach before transformers.
+### Results
+| Task | Dataset | Score |
+|------|---------|-------|
+| POS Tagging | Penn Treebank | 97.55% |
+| NER | CoNLL 2003 | 91.21% F1 |
+### Relevance to TRE-1
+The neural+CRF architecture this paper introduced was later adapted for Vietnamese (Paper 7). Shows potential path for TRE-1 improvement.
+---
+## Paper 9: BARTpho (Tran et al. 2021)
+- **Title**: BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
+- **Authors**: Nguyen Luong Tran, Duong Minh Le, Dat Quoc Nguyen
+- **Venue**: arXiv 2021
+- **arXiv**: https://arxiv.org/abs/2109.09701
+- **Citations**: 100+
+### Summary
+First large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese. Two versions: BARTpho-syllable and BARTpho-word. Uses BART "large" architecture with denoising pre-training.
+### Key Contributions
+1. First Vietnamese seq2seq pre-trained models
+2. Two tokenization strategies (syllable vs word)
+3. SOTA on Vietnamese summarization and translation
+### Relevance to TRE-1
+Seq2seq architecture not directly applicable to sequence labeling, but demonstrates Vietnamese pre-training advances.
+---
+## Paper 10: ViT5 (Phan et al. 2022)
+- **Title**: ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation
+- **Authors**: Long Phan, Hieu Tran, Hieu Nguyen, Trieu H. Trinh
+- **Venue**: NAACL 2022 (Student Research Workshop)
+- **URL**: https://aclanthology.org/2022.naacl-srw.18/
+- **arXiv**: https://arxiv.org/abs/2205.06457
+- **Citations**: 50+
+### Summary
+T5-style encoder-decoder model for Vietnamese. Base (310M) and large (866M) versions trained on CC100 Vietnamese corpus.
+### Results
+| Task | Dataset | Score |
+|------|---------|-------|
+| Summarization | WikiLingua | SOTA |
+| NER | - | Competitive |
+### Relevance to TRE-1
+Text-to-text formulation could be applied to sequence labeling but not typical approach.
+---
+## Paper 11: ViDeBERTa (Tran et al. 2023)
+- **Title**: ViDeBERTa: A powerful pre-trained language model for Vietnamese
+- **Authors**: Cong Dao Tran, Nhut Huy Pham, Anh Nguyen, Truong-Son Hy
+- **Venue**: EACL 2023 Findings
+- **URL**: https://aclanthology.org/2023.findings-eacl.79/
+- **arXiv**: https://arxiv.org/abs/2301.10439
+- **Citations**: 30+
+### Summary
+DeBERTa-based Vietnamese model with xsmall, base, and large versions. Trained on CC100 Vietnamese. More parameter-efficient than PhoBERT.
+### Key Results
+| Model | Params | POS (VLSP 2013) |
+|-------|--------|-----------------|
+| ViDeBERTa_base | 86M | ~96.8%* |
+| PhoBERT_large | 370M | 96.8% |
+*ViDeBERTa_base achieves similar results with only 23% of PhoBERT_large parameters.
+### Relevance to TRE-1
+Demonstrates efficiency gains in Vietnamese PLMs. Could be alternative to PhoBERT for neural approaches.
+---
+## Paper 12: ViSoBERT (Nguyen et al. 2023)
+- **Title**: ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing
+- **Authors**: Nam Nguyen, Thang Phan, Duc-Vu Nguyen, Kiet Nguyen
+- **Venue**: EMNLP 2023
+- **URL**: https://aclanthology.org/2023.emnlp-main.315/
+- **arXiv**: https://arxiv.org/abs/2310.11166
+- **Citations**: 20+
+### Summary
+First pre-trained model specifically for Vietnamese social media. Uses XLM-R architecture trained on social media corpus.
+### Key Contributions
+1. Domain-specific pre-training for social media
+2. SOTA on emotion recognition, hate speech, sentiment analysis
+3. Addresses domain gap in existing Vietnamese PLMs
+### Relevance to TRE-1
+Highlights domain adaptation challenge. TRE-1 (legal/news) may underperform on social media text.
+---
+## Paper 13: PhoGPT (Nguyen et al. 2023)
+- **Title**: PhoGPT: Generative Pre-training for Vietnamese
+- **Authors**: Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Dinh Phung, Hung Bui
+- **Venue**: arXiv 2023
+- **arXiv**: https://arxiv.org/abs/2311.02945
+- **Citations**: 20+
+### Summary
+Open-source 4B-parameter generative model for Vietnamese. PhoGPT-4B base model and PhoGPT-4B-Chat fine-tuned version. Trained on 102B Vietnamese tokens.
+### Key Contributions
+1. First large-scale Vietnamese GPT model
+2. 8192 context length
+3. Instruction-following capability (Chat variant)
+### Relevance to TRE-1
+Generative models not directly applicable to sequence labeling, but represents frontier of Vietnamese LLMs.
+---
+## Paper 14: Deep Neural Networks for Vietnamese Word Segmentation (Zheng et al. 2022)
+- **Title**: Deep Neural Networks Algorithm for Vietnamese Word Segmentation
+- **Authors**: Zheng et al.
+- **Venue**: Scientific Programming 2022
+- **DOI**: https://doi.org/10.1155/2022/8187680
+### Summary
+LSTM-based approach to Vietnamese word segmentation addressing combination and cross ambiguity challenges.
+### Relevance to TRE-1
+Word segmentation is prerequisite for POS tagging. Errors in segmentation propagate to TRE-1.
+---
+## Comparison Summary
+| Paper | Year | Method | VLSP 2013 Acc | Params | GPU Required |
+|-------|------|--------|---------------|--------|--------------|
+| RDRPOSTagger | 2014 | Rules | 95.11% | - | No |
+| Neural Seq | 2018 | BiLSTM-CRF | 93.52% | ~10M | Yes |
+| VnCoreNLP | 2018 | CRF | 95.88% | - | No |
+| **TRE-1** | 2026 | CRF | 95.89%* | - | No |
+| vELECTRA | 2020 | ELECTRA | 96.77% | 110M | Yes |
+| PhoBERT-large | 2020 | RoBERTa | 96.8% | 370M | Yes |
+| PhoNLP | 2021 | PhoBERT+MTL | **96.91%** | 135M+ | Yes |
+| ViDeBERTa-base | 2023 | DeBERTa | ~96.8% | 86M | Yes |
+| ViSoBERT | 2023 | XLM-R | - | 278M | Yes |
+| PhoGPT | 2023 | GPT | - | 3.7B | Yes |
+*TRE-1 evaluated on UDD-1, not VLSP 2013.

references/research_vietnamese_pos/sota.md ADDED Viewed

	@@ -0,0 +1,95 @@

+# State-of-the-Art: Vietnamese POS Tagging
+**Last Updated**: 2026-01-31
+## Current Best Results (VLSP 2013 Benchmark)
+| Rank | Model | Year | Accuracy | Method | Params | Paper |
+|------|-------|------|----------|--------|--------|-------|
+| 1 | **PhoNLP** | 2021 | **96.91%** | PhoBERT + MTL | 135M+ | [Nguyen & Nguyen, 2021](https://aclanthology.org/2021.naacl-demos.1/) |
+| 2 | ViDeBERTa-base | 2023 | ~96.8% | DeBERTa | 86M | [Tran et al., 2023](https://aclanthology.org/2023.findings-eacl.79/) |
+| 3 | PhoBERT-large | 2020 | 96.8% | RoBERTa | 370M | [Nguyen & Nguyen, 2020](https://aclanthology.org/2020.findings-emnlp.92/) |
+| 4 | vELECTRA | 2020 | 96.77% | ELECTRA | 110M | [Bui et al., 2020](https://github.com/fpt-corp/vELECTRA) |
+| 5 | PhoBERT-base | 2020 | 96.7% | RoBERTa | 135M | [Nguyen & Nguyen, 2020](https://aclanthology.org/2020.findings-emnlp.92/) |
+| 6 | VnMarMoT | 2018 | 95.88% | CRF | - | [Vu et al., 2018](https://aclanthology.org/N18-5012/) |
+| 7 | RDRPOSTagger | 2014 | 95.11% | Rules (RDR) | - | [Nguyen et al., 2014](https://aclanthology.org/E14-2005/) |
+## TRE-1 Position
+| Model | Dataset | Accuracy | F1 (macro) |
+|-------|---------|----------|------------|
+| **TRE-1 v1.1** | UDD-1 | 95.89% | 92.71% |
+**Note**: TRE-1 evaluated on UDD-1 (20K sentences), not VLSP 2013. Direct comparison requires evaluation on same benchmark.
+## Trends
+### 1. Shift to Pre-trained Models (2020+)
+- PhoBERT established transformer-based SOTA
+- ~1% accuracy gain over CRF methods
+- Trend toward multi-task learning (PhoNLP)
+### 2. Efficiency Focus (2023+)
+- ViDeBERTa achieves PhoBERT-level accuracy with 4x fewer parameters
+- Trade-off between accuracy and computational cost becoming important
+- Edge deployment considerations
+### 3. Domain Adaptation
+- ViSoBERT (EMNLP 2023) targets social media domain
+- General models underperform on domain-specific text
+- Need for domain-specific pre-training
+### 4. Generative Models (2023+)
+- PhoGPT (4B params) represents Vietnamese LLM frontier
+- Potential for zero-shot/few-shot sequence labeling
+- Not yet competitive with fine-tuned discriminative models
+### 5. CRF Still Competitive
+- VnMarMoT (95.88%) very close to TRE-1 (95.89%)
+- Advantages: Fast, no GPU, interpretable
+- Gap to SOTA: ~1%
+## Performance by Method Type
+| Method Type | Best Model | VLSP 2013 | Params | Speed | Resources |
+|-------------|------------|-----------|--------|-------|-----------|
+| **Multi-task** | PhoNLP | 96.91% | 135M+ | Slow | GPU |
+| **Efficient** | ViDeBERTa | ~96.8% | 86M | Medium | GPU |
+| **Transformer** | PhoBERT-large | 96.8% | 370M | Slow | GPU |
+| **CRF** | VnMarMoT, TRE-1 | 95.88% | - | Fast | CPU |
+| **Rules** | RDRPOSTagger | 95.11% | - | Fast | CPU |
+## Open Challenges
+1. **Rare Tags**: PART, X tags still underperform
+2. **Domain Transfer**: News-trained models struggle on social media
+3. **Word Segmentation**: Errors propagate to POS tagging
+4. **Benchmark Fragmentation**: Multiple datasets complicate comparison
+## Recommended Baselines for New Work
+1. **SOTA comparison**: PhoNLP (96.91%)
+2. **Efficient neural**: ViDeBERTa-base (~96.8%, 86M params)
+3. **CRF baseline**: VnCoreNLP/VnMarMoT (95.88%)
+4. **Rule baseline**: RDRPOSTagger (95.11%)
+5. **Domain-specific**: ViSoBERT (social media)
+## Vietnamese Pre-trained Models (2020-2024)
+| Model | Year | Architecture | Params | Focus |
+|-------|------|--------------|--------|-------|
+| PhoBERT | 2020 | RoBERTa | 135M/370M | General |
+| vELECTRA | 2020 | ELECTRA | 110M | General |
+| BARTpho | 2021 | BART | Large | Seq2Seq |
+| ViT5 | 2022 | T5 | 310M/866M | Text-to-Text |
+| ViDeBERTa | 2023 | DeBERTa | 86M/304M | Efficient |
+| ViSoBERT | 2023 | XLM-R | 278M | Social Media |
+| PhoGPT | 2023 | GPT | 3.7B | Generative |
+## Resources
+- **NLP-progress Vietnamese**: https://nlpprogress.com/vietnamese/vietnamese.html
+- **Papers With Code**: https://paperswithcode.com/task/part-of-speech-tagging
+- **VLSP Resources**: https://vlsp.org.vn/resources
+- **VinAI Research**: https://research.vinai.io/
+- **Hugging Face Vietnamese**: https://huggingface.co/models?language=vi

references/underthesea.md ADDED Viewed

	@@ -0,0 +1,137 @@

+---
+title: "Underthesea: Vietnamese NLP Toolkit"
+type: "resource"
+url: "https://github.com/undertheseanlp/underthesea"
+---
+## Overview
+Underthesea is an open-source Python library providing a suite of tools for Vietnamese natural language processing.
+## Key Information
+| Field | Value |
+|-------|-------|
+| **Website** | https://undertheseanlp.com/ |
+| **GitHub** | https://github.com/undertheseanlp/underthesea |
+| **Documentation** | https://undertheseanlp.github.io/underthesea/ |
+| **PyPI** | https://pypi.org/project/underthesea/ |
+| **License** | GPL-3.0 |
+| **Language** | Python 3.6+ |
+## Features
+### Core NLP Tasks
+| Task | Description |
+|------|-------------|
+| Sentence Segmentation | Split text into sentences |
+| Text Normalization | Normalize Vietnamese text |
+| Word Tokenization | Segment words (with compound word support) |
+| POS Tagging | Part-of-speech tagging |
+| Chunking | Phrase grouping |
+| NER | Named Entity Recognition |
+| Dependency Parsing | Syntactic dependency analysis |
+| Text Classification | Document classification |
+| Sentiment Analysis | Sentiment detection |
+### Advanced Features
+- Machine Translation (Vietnamese ↔ English)
+- Language Detection
+- Text-to-Speech
+- Conversational AI Agent
+## Installation
+```bash
+# Basic installation
+pip install underthesea
+# With deep learning support
+pip install "underthesea[deep]"
+# With text-to-speech
+pip install "underthesea[voice]"
+# With AI agent
+pip install "underthesea[agent]"
+```
+## Usage Examples
+### Word Segmentation
+```python
+from underthesea import word_tokenize
+text = "Chàng trai 9X Quảng Trị khởi nghiệp từ nấm sò"
+tokens = word_tokenize(text)
+# ['Chàng trai', '9X', 'Quảng Trị', 'khởi nghiệp', 'từ', 'nấm sò']
+```
+### POS Tagging
+```python
+from underthesea import pos_tag
+text = "Tôi yêu Việt Nam"
+tagged = pos_tag(text)
+# [('Tôi', 'P'), ('yêu', 'V'), ('Việt Nam', 'Np')]
+```
+### Named Entity Recognition
+```python
+from underthesea import ner
+text = "Bộ Công Thương xóa một tổng cục"
+entities = ner(text)
+# [('Bộ Công Thương', 'B-ORG'), ...]
+```
+### Text Classification
+```python
+from underthesea import classify
+text = "HLV đầu tiên ở Premier League bị sa thải"
+category = classify(text)
+# 'Thể thao'
+```
+## POS Tag Set
+Underthesea uses Vietnamese-specific POS tags:
+| Tag | Description |
+|-----|-------------|
+| N | Noun |
+| V | Verb |
+| A | Adjective |
+| P | Pronoun |
+| Np | Proper noun |
+| E | Preposition |
+| C | Conjunction |
+| R | Adverb |
+| M | Numeral |
+| L | Determiner |
+| T | Particle |
+| X | Unknown |
+## Related Projects
+- [underthesea-core](https://github.com/undertheseanlp/underthesea-core) - Core algorithms
+- [NLP-Vietnamese-progress](https://github.com/undertheseanlp/NLP-Vietnamese-progress) - Vietnamese NLP benchmarks
+## Citation
+```bibtex
+@misc{underthesea,
+  author = {Underthesea Team},
+  title = {Underthesea: Vietnamese NLP Toolkit},
+  year = {2018},
+  publisher = {GitHub},
+  url = {https://github.com/undertheseanlp/underthesea}
+}
+```

references/universal_dependencies.md ADDED Viewed

	@@ -0,0 +1,100 @@

+---
+title: "Universal Dependencies"
+type: "resource"
+url: "https://universaldependencies.org/"
+---
+## Overview
+Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages.
+## Key Information
+| Field | Value |
+|-------|-------|
+| **Website** | https://universaldependencies.org/ |
+| **Languages** | 150+ languages |
+| **Treebanks** | 200+ treebanks |
+| **Contributors** | 600+ contributors |
+| **Latest Version** | 2.17 (November 2025) |
+| **License** | Various (mostly CC-BY) |
+## Purpose
+UD aims to facilitate:
+- Cross-linguistic research in NLP
+- Multilingual parser development
+- Typological studies
+- Language learning applications
+## Annotation Layers
+### 1. Universal POS Tags (UPOS)
+| Tag | Description | Example |
+|-----|-------------|---------|
+| ADJ | Adjective | big, old |
+| ADP | Adposition | in, to |
+| ADV | Adverb | very, well |
+| AUX | Auxiliary | is, has |
+| CCONJ | Coordinating conjunction | and, or |
+| DET | Determiner | the, a |
+| INTJ | Interjection | oh, wow |
+| NOUN | Noun | house, cat |
+| NUM | Numeral | one, 2 |
+| PART | Particle | not, 's |
+| PRON | Pronoun | I, she |
+| PROPN | Proper noun | John, Paris |
+| PUNCT | Punctuation | . , ? |
+| SCONJ | Subordinating conjunction | if, that |
+| SYM | Symbol | $, % |
+| VERB | Verb | run, eat |
+| X | Other | - |
+### 2. Morphological Features
+- Case, Gender, Number, Person, Tense, Aspect, Mood, etc.
+### 3. Dependency Relations
+- 37 universal syntactic relations (nsubj, obj, iobj, csubj, ccomp, xcomp, etc.)
+## CoNLL-U Format
+Standard format for UD treebanks:
+```
+# sent_id = 1
+# text = They buy and sell books.
+1   They    they    PRON    PRP   _   2   nsubj   _   _
+2   buy     buy     VERB    VBP   _   0   root    _   _
+3   and     and     CCONJ   CC    _   4   cc      _   _
+4   sell    sell    VERB    VBP   _   2   conj    _   _
+5   books   book    NOUN    NNS   _   2   obj     _   _
+6   .       .       PUNCT   .     _   2   punct   _   _
+```
+## Vietnamese Treebanks
+| Treebank | Sentences | Tokens |
+|----------|-----------|--------|
+| UD_Vietnamese-VTB | 3,000 | 43,754 |
+## Links
+- **Website**: https://universaldependencies.org/
+- **GitHub**: https://github.com/UniversalDependencies
+- **Documentation**: https://universaldependencies.org/guidelines.html
+- **Vietnamese UD**: https://universaldependencies.org/treebanks/vi_vtb/
+## BibTeX
+```bibtex
+@inproceedings{nivre-etal-2020-universal,
+    title = "Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection",
+    author = "Nivre, Joakim and de Marneffe, Marie-Catherine and Ginter, Filip and others",
+    booktitle = "Proceedings of LREC 2020",
+    year = "2020",
+    pages = "4034--4043",
+}
+```