CFP last date
22 April 2024
Call for Paper
May Edition
IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper
Know more
Reseach Article

Development of Hindi-Punjabi Parallel Corpus Using Existing Hindi-Punjabi Machine Translation System and Using Sentence Alignments

by Vishal Goyal, Pardeep Kumar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 5 - Number 9
Year of Publication: 2010
Authors: Vishal Goyal, Pardeep Kumar
10.5120/941-1319

Vishal Goyal, Pardeep Kumar . Development of Hindi-Punjabi Parallel Corpus Using Existing Hindi-Punjabi Machine Translation System and Using Sentence Alignments. International Journal of Computer Applications. 5, 9 ( August 2010), 15-19. DOI=10.5120/941-1319

@article{ 10.5120/941-1319,
author = { Vishal Goyal, Pardeep Kumar },
title = { Development of Hindi-Punjabi Parallel Corpus Using Existing Hindi-Punjabi Machine Translation System and Using Sentence Alignments },
journal = { International Journal of Computer Applications },
issue_date = { August 2010 },
volume = { 5 },
number = { 9 },
month = { August },
year = { 2010 },
issn = { 0975-8887 },
pages = { 15-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume5/number9/941-1319/ },
doi = { 10.5120/941-1319 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:53:48.411603+05:30
%A Vishal Goyal
%A Pardeep Kumar
%T Development of Hindi-Punjabi Parallel Corpus Using Existing Hindi-Punjabi Machine Translation System and Using Sentence Alignments
%J International Journal of Computer Applications
%@ 0975-8887
%V 5
%N 9
%P 15-19
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this survey paper, we have taken problem of “development of Hindi-Punjabi parallel corpus using existing Hindi to Punjabi machine translation system and using sentence alignment”. The alignment based on the length based technique, location based technique and lexical techniques. We will use Hindi-Punjabi machine translation system (i.e h2p.learnpunjabi.org). These tasks are need to Hindi-Punjabi parallel corpus. Sentence alignment is useful to developing Hindi-Punjabi parallel corpus and Hindi-Punjabi dictionary. The accuracy is basically depending upon the complexity of the corpus, more the complexity less the accuracy. Complexity means how to distribution of sentence in the target file. If any of these categories 1:1, 1:2, 2:1, 1:3, 3:1 sentences occur simultaneously in a paragraph. Our objective in this research paper is to developed Hindi-Punjabi parallel corpus using latest and existing techniques and method with a high accuracy and time efficiency.

References
  1. Bridget Thomson McInnes, Ted Pedersen, “The Duluth Word Alignment System”, participated in the 2003 HLT-NAACL Workshop on Parallel Text.
  2. Brown, P.; Lai, J.; and Mercer, R. (1991)."Aligning sentences in parallel corpora."
  3. D. Wu. “Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria” In: Proc. of the 32nd Annual Conference of the ACL: 80-87. Las Cruces, NM in 1994. http://acl.ldc.upenn.edu/P/P94/P94-1012.pdf
  4. Gale William A., Church Kenneth W., 1993, A Program for Aligning Sentences in Bilingual Corpora, AT&T Bell Laboratories
  5. Kay, M. and Röscheisen, M: Text-Translation Alignment, Computational Linguistics 19:1 (1994) 121-142
  6. John C. Henderson, “sentence Alignment Baselines” HLT-NAACL 2003Workshop: Building and Using Parallel Texts Data Driven MT and Beyond, Edmonton.
  7. Weigang Li, Ting Liu, Zhen Wang and Sheng Li: Aligning Bilingual Corpora Using Sentences Location Information, Proceedings of 3rd ACL SIGHAN Workshop, 141-147, (1994)
  8. Zhonghua xiao, Tony McEnergy, Paul Baker, Andrew Hardie “Developing Asian language corpora in (200809)” standard and practice in Department of Linguistics Lancaster University Lancaster.
Index Terms

Computer Science
Information Sciences

Keywords

Parallel Corpus Hindi-Punjabi Sentence Alignment length based Location based