Welcome to our portal with new look.This portal has been redesigned to improve user friendliness
and appeal. If you have questions, comments or suggestions please give us your valuable
feedback
Indian Language Technology Proliferation & Deployment Centre
भारतीय भाषा प्रौद्योगिकी प्रसरण एवं विस्तारण केंद्र
You refine search by :
Linguistic Resources
Handwritten Data
Document Image Corpora
Text Corpora
Named Entity Resources
Dictionary
Lexicon
Speech Corpora
BIS standard "IS 16333 (Part 3)" defines the requirements for mobile handset for inputting of text in English, Hindi and at least one additional official Indian language along with facility of message readability in the phones for all 22 Indian official languages. So to help the mobile manufacturer in the internal verification and to check the effectiveness of language support, TDIL along with CDAC-GIST have prepared a robust test data covering relevant language Consonant (C), Vowels (V), Numerals (N), Matras(M), Halant (H), Diacritic(D), combinations of C, V, N, M, H, D along with word list and sentences. Test data, thus created can be used to test the inputting and display on the mobile handsets.
For best view download SakalBharati Font.
Under the Indian Languages Corpora Initiative phase –II (ILCI Phase-II) project, initiated by the MeitY, Govt. of India, Jawaharlal Nehru University, New Delhi had collected monolingual corpus in Telugu. This is the final outcome of the project and there are approx. 32,000 sentences of general domain. The translated sentences have been POS tagged according to BIS (Bureau of Indian Standards) tagset. This corpus has following features: unique ID, UTF-8 encoding, and text file format.mat.
Hindi Named Entity and Multi Word Expression List are developed in Unicode under Sandhan (CLIA) Consortium. Its a monolingual search system for tourism domain and the provided resources were used in the work for translating queries.
Hindi Translation and Transliteration Word List are developed in Unicode under Sandhan (CLIA) Consortium. Its a monolingual search system for tourism domain and the provided resources were used in the work for translating queries.
Gujarati Translation and Transliteration Word List are developed in Unicode under Sandhan (CLIA) Consortium. Its a monolingual search system for tourism domain and the provided resources were used in the work for translating queries from 9 Indian Language to English and Hindi.