arxiv:2501.07723

ESURF: Simple and Effective EDU Segmentation

Published on Jan 13, 2025

Authors:

Abstract

A simple random forest classification method using lexical and character n-gram features outperforms existing approaches for segmenting text into elemental discourse units, demonstrating the effectiveness of these features for discourse analysis.

AI-generated summary

Segmenting text into Elemental Discourse Units (EDUs) is a fundamental task in discourse parsing. We present a new simple method for identifying EDU boundaries, and hence segmenting them, based on lexical and character n-gram features, using random forest classification. We show that the method, despite its simplicity, outperforms other methods both for segmentation and within a state of the art discourse parser. This indicates the importance of such features for identifying basic discourse elements, pointing towards potentially more training-efficient methods for discourse analysis.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2501.07723

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.07723 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.07723 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.07723 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.