EDU-AI: Enhancing Data Retrieval and Dialogue with Retrieval-Augmented Generation and NLP Techniques

Additional Authors

Puskar Joshi, Department of Computing, College of Business and Technology, East Tennessee State University, Johnson City, TN. Gabriel Vieira Ramos, Department of Computing, College of Business and Technology, East Tennessee State University, Johnson City, TN. Dr. Shehenaz Shaik, Department of Computing, College of Business and Technology, East Tennessee State University, Johnson City, TN.

Abstract

This study introduces EDU-AI (Educational Data Utility with AI), an advanced platform designed to provide accurate, context-aware and conversational response to academic queries within institutions. It excels at responding to queries such as “What is the minimum GPA for graduate admission to ETSU’s CS program?” by integrating real-time information retrieval with sophisticated language generation. The system searches a vast knowledge base to provide clear and concise responses in seconds. EDU-AI integrates Retrieval-Augmented Generation (RAG) architecture with large language models (LLMs) and Natural Language Processing (NLP) techniques. It employs techniques such as topic modeling, coreference resolution, similarity search, document ranking, query-rewriting, and prompt engineering to deliver accurate, contextually relevant, and conversational responses. To evaluate its effectiveness, EDU-AI was implemented as a prototype for East Tennessee State University (ETSU) publicly available web data to build an intelligent question-answering system for the ETSU scholars. The system achieved an 83% retrieval accuracy in document extraction from a pool of 5000 documents, and with average response time ~3 seconds, demonstrating reliability and timely information. A perplexity score of 26, measured via GPT2LMHeadModel, indicates sensible, confident responses, while an 88% coherence with ground truth, assessed using all-mpnet-base-v2, reflects human-like relevance. Qualitative feedback from ETSU focus groups praised its accuracy, context-awareness, and inclusivity (e.g., voice command support), but noted areas for improvement: reducing context carryover, expanding domain coverage, and enhancing accessibility with multi-language support and mobile compatibility. The study’s key contributions include the improvement of information accessibility within academic environments through development of a scalable AI-driven platform for educational data retrieval, presentation, and inclusivity features. Future enhancements will focus on refining conversation management, broadening knowledge coverage, and expanding accessibility features. Keywords: Artificial Intelligence, Retrieval-Augmented Generation (RAG), Education, Natural Language Processing (NLP), Large Language Model (LLM)

Start Time

16-4-2025 10:00 AM

End Time

16-4-2025 11:00 AM

Room Number

303

Presentation Type

Oral Presentation

Presentation Subtype

Grad/Comp Orals

Presentation Category

Science, Technology and Engineering

Faculty Mentor

Ahmad Al Doulat

This document is currently not available here.

Share

COinS
 
Apr 16th, 10:00 AM Apr 16th, 11:00 AM

EDU-AI: Enhancing Data Retrieval and Dialogue with Retrieval-Augmented Generation and NLP Techniques

303

This study introduces EDU-AI (Educational Data Utility with AI), an advanced platform designed to provide accurate, context-aware and conversational response to academic queries within institutions. It excels at responding to queries such as “What is the minimum GPA for graduate admission to ETSU’s CS program?” by integrating real-time information retrieval with sophisticated language generation. The system searches a vast knowledge base to provide clear and concise responses in seconds. EDU-AI integrates Retrieval-Augmented Generation (RAG) architecture with large language models (LLMs) and Natural Language Processing (NLP) techniques. It employs techniques such as topic modeling, coreference resolution, similarity search, document ranking, query-rewriting, and prompt engineering to deliver accurate, contextually relevant, and conversational responses. To evaluate its effectiveness, EDU-AI was implemented as a prototype for East Tennessee State University (ETSU) publicly available web data to build an intelligent question-answering system for the ETSU scholars. The system achieved an 83% retrieval accuracy in document extraction from a pool of 5000 documents, and with average response time ~3 seconds, demonstrating reliability and timely information. A perplexity score of 26, measured via GPT2LMHeadModel, indicates sensible, confident responses, while an 88% coherence with ground truth, assessed using all-mpnet-base-v2, reflects human-like relevance. Qualitative feedback from ETSU focus groups praised its accuracy, context-awareness, and inclusivity (e.g., voice command support), but noted areas for improvement: reducing context carryover, expanding domain coverage, and enhancing accessibility with multi-language support and mobile compatibility. The study’s key contributions include the improvement of information accessibility within academic environments through development of a scalable AI-driven platform for educational data retrieval, presentation, and inclusivity features. Future enhancements will focus on refining conversation management, broadening knowledge coverage, and expanding accessibility features. Keywords: Artificial Intelligence, Retrieval-Augmented Generation (RAG), Education, Natural Language Processing (NLP), Large Language Model (LLM)