Environmental monitoring generates vast quantities of sensor data distributed across heterogeneous sources — relational databases, time-series stores, and web APIs. Stakeholders ranging from policy makers to hydrologists need access to this data, yet traditional approaches require either technical expertise in SQL and programming or the mediation of data specialists. This project addresses the challenge of making complex water resource data accessible to diverse users through natural language, while investigating which LLM architectural patterns are best suited to different types of environmental queries. Conducted as a fellowship at the University of Illinois Urbana-Champaign in collaboration with the Prairie Research Institute, the project developed Talking Aquifer — a natural language interface that enables stakeholders from policy makers and water managers to farmers and researchers to query heterogeneous groundwater and climate data through conversational interaction. The platform integrates multiple data sources, including aquifer monitoring databases, climate observation networks, and real-time hydrological APIs, through a unified tool registry that abstracts differences in query languages and data formats. A central contribution is the systematic evaluation of LLM pipeline architectures for environmental data access. The project developed and compared multiple architectural patterns — including single-agent function calling, code generation, multi-agent collaboration, and planning-based approaches — to understand which designs best serve different query types. The evaluation covered questions spanning multiple complexity levels, from simple lookups to multi-source analytical queries requiring cross-database joins and statistical computation. Key findings reveal that simpler single-agent architectures often suffice for routine queries, while multi-agent and planning-based approaches provide measurable benefits only for complex multi-source analytical requests. The work provides a decision framework enabling practitioners to select appropriate architectures based on their data heterogeneity, stakeholder diversity, and accuracy-cost requirements. The broader vision of the project is to develop a comprehensive water observatory that integrates diverse environmental data sources — from subsurface imaging and groundwater monitoring to satellite imagery and environmental sensors — into a cohesive knowledge platform. By combining heterogeneous data integration with LLM-powered natural language access, the project aims to democratise water resource data for interdisciplinary collaboration and evidence-based policy decisions that support sustainable water management, agricultural productivity, and environmental conservation.
Talking Aquifer is a natural language interface enabling diverse stakeholders to query heterogeneous environmental sensor data through conversational interaction. The system comprises a natural language frontend, a pipeline engine implementing multiple architectural patterns, a tool registry organised by data source and function type, and a data access layer providing unified interfaces to heterogeneous data sources.
Stakeholder Diversity. Environmental sensor data serves diverse stakeholders with varying expertise and information needs. The platform is designed to serve a wide range of users — from policy makers asking about long-term water level trends to research scientists requesting cross-source statistical correlations, from farmers comparing soil moisture readings to emergency managers assessing drought conditions. Each stakeholder type represents fundamentally different information needs, query complexity levels, and technical expertise, requiring the interface to adapt its responses accordingly.
Architecture Evaluation. The project systematically evaluates multiple LLM pipeline variants spanning fundamental architectural patterns. Each variant processes queries through its specific pattern — function calling, code generation, multi-agent collaboration, or planning-based orchestration — and returns a unified response structure. The evaluation framework captures stakeholder diversity through ground-truth questions spanning multiple complexity levels, from simple spatial lookups to multi-source analytical queries requiring joins across databases and statistical computation. Results reveal that simpler single-agent architectures often suffice for routine queries, while multi-agent and planning-based approaches provide measurable benefits only for complex multi-source analytical requests. Query characteristics reliably predict optimal architecture choice.
Design Guidelines. The project derives actionable, evidence-based design guidelines enabling practitioners building similar multi-stakeholder environmental data interfaces to select appropriate architectures based on their data heterogeneity, stakeholder diversity, and accuracy-cost requirements. These guidelines address a critical gap: while LLM agent architectures offer promising patterns for data access, practitioners have lacked empirical evidence to guide their architectural decisions.
Water Observatory Vision. Beyond the Talking Aquifer platform, the project’s broader vision encompasses developing a comprehensive water observatory that integrates diverse data sources — subsurface imaging, groundwater monitoring, surface water observations, soil moisture sensors, and climate data — into a unified knowledge platform. This integrated approach, combining linked data techniques with LLM-powered natural language access, aims to support sustainable water resource management by enabling both technical and non-technical stakeholders to explore, query, and draw insights from complex environmental data.
Fellowship Acknowledgement
This research was conducted via a fellowship under the OECD Co-operative Research Programme: Sustainable Agriculture and Food Systems. We gratefully acknowledge the OECD CRP for funding this fellowship and the Prairie Research Institute at the University of Illinois Urbana-Champaign for hosting the research visit. The CRP fellowship programme supports international research exchanges that advance scientific knowledge in sustainable agriculture, food systems, and natural resource management.