Under the Indian Languages Corpora Initiative (ILCI) project initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, New Delhi had collected corpus in Hindi as source language and translated it in Marathi as the target language. There are 70,000 sentences, including Health, Tourism, Agriculture and Entertainment domain in this corpus. This corpus has a unique sentence ID for each sentence, UTF-8 encoding, and text file format. The translated sentences have been POS tagged and Chunked properly. The chunking guideline used in this corpus creation, is provided in supporting document.
Added on December 27, 2019