Unsupervised acquisition of a markov model for word correction using Wikipedia

Rubén Dorado

Resumen


ONTARE. REVISTA DE INVESTIGACIÓN DE LA FACULTAD DE INGENIERÍA

This paper presents a work in progress on the area of automatic acquisition of corpora for spelling correction. Wikipedia contains a high quantity of information including relationships between concepts and named annotations. However, it also contains linguistic information such as misspellings written by many of the Wikipedia collaborators. In this paper, we propose an efficient method to analyze the link structure of Web-based dictionaries to construct a list of misspelled words and their corrections. The method is currently being researched and applied to the Wikipedia as a corpus

 


Palabras clave


Wikipedia -- Spelling mistakes

Texto completo:

PDF


DOI: https://doi.org/10.21158/23823399.v2.n2.2014.1246

Métricas de artículo

Vistas de resumen
57




Cargando métricas ...
_



Copyright (c) 2016 Revista Ontare

Licencia de Creative Commons
Este obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.

El Nogal: Cl. 79 nº. 11 - 45 Av. Chile: Cl. 71 nº. 9 - 84 - Bogotá D.C., Cundinamarca, Colombia, Suramérica | Línea gratuita nacional: 01 8000 93 1000 Centro de contacto en Bogotá: +(57-1) 593 6464 | E-mails: biblioteca@universidadean.edu.co ; revistas@universidadean.edu.co

Sistema OJS - Metabiblioteca |