Exploring a Metadata-Driven Reporting Approach for OpenELIS (Prototype + Feedback Request)

junaid · April 17, 2026, 5:17am

OpenELIS Reporting Prototype – Feedback Request

Hi everyone,

I wanted to share a small prototype I’ve been working on for the OpenELIS reporting framework and get your feedback on whether this direction makes sense.

Over the past few weeks, I’ve been trying to understand how the current reporting system is structured and how reports are generated. Based on that, I explored a metadata-driven approach, where reports are defined as database records instead of Java classes.

Prototype Scope

Dynamic query construction using JPA Criteria API
Metadata-driven report definition:
- datasets
- fields
- filters
- columns
Basic separation between:
- query definition
- query execution
- output rendering (CSV)

Supported Features

FIXED filters (applied silently)
PROMPT_USER filters (validated at runtime)
Basic whitelist/validation to keep query construction safe

The goal here was mainly to validate the core query-building approach and safety model, not to fully generalize everything yet.

Feedback I’m Looking For

Does this approach align with the future direction of the OpenELIS reporting framework?
Are there any suggestions on how I could better adapt this to fit upcoming plans or architectural goals?

I’d really appreciate any guidance or advice. I’m happy to learn from feedback and iterate on this further.

Context from Yesterday’s Meeting (16th April)

I found the suggestions around:

FHIR analytics / data pipelines
CQRS

really useful, and I’ve started exploring how this design could align with those ideas.

At a high level, I’m thinking of this as:

a potential read-model layer (in line with CQRS concepts)
something that could later integrate with pipeline-based analytics systems

But I’m still exploring this, so any guidance here would be really helpful.

Flow Explanation (Simple)

At a high level, the prototype follows this flow:

A report is defined using metadata(db record):
- datasets
- fields
- filters
- columns
The system dynamically builds a query using the JPA Criteria API
- FIXED filters are enforced automatically
- PROMPT_USER filters are taken as input and validated
The query is executed against the database
Results are formatted into CSV output

The idea is to keep report logic outside of Java code and instead drive everything through structured metadata, making it easier to extend and modify reports without code changes.

Links

Prototype: https://dynamicreportengine-709628627841.europe-west1.run.app/
Repository: GitHub - Junaidarmaan/DynamicReportEngine · GitHub

Note

This is a minimal prototype intended to demonstrate the core idea. Some parts (such as field path resolution and join handling) are intentionally hardcoded or simplified to validate the approach against the OpenELIS entity model.

Agaba_Derrick_Junior · April 17, 2026, 8:28am

thanks for this approach it really gives direction @junaid, I have gone through it and its really soul lifting how ever i have a few concerns,
Querying the live transactional database for heavy ad-hoc reports is generally discouraged. Specifically, using a correlated subquery for SampleHuman could result in N+1 execution bottlenecks when generating reports across tens of thousands of samples. How do you plan to mitigate the load this places on the live OLTP database?

It’s great to see how the engine handles standard column filtering. I noticed that aggregations (like COUNT/SUM) aren’t supported and that complex crosstabs are routed back to LEGACY_PIVOT. Since those are often the most time-consuming reports to maintain, did you run into limitations with the Criteria API for handling dynamic GROUP BY and pivoting? I’m wondering if piping this data into an external analytics or BI tool might be an easier way to handle those complex matrix reports without having to write custom Java builder logic.

I also really like the fhir disccusion in our last call and data pipelines. the long-term goal for OpenELIS has been to adopt FHIR as the standard data pipeline which we are already implementning, how do you see this custom JPA query engine fitting in? If we want dynamic reporting and analytics, doesn’t it make more architectural sense to extract those FHIR bundles into a dedicated FHIR repository or analytics pipeline?

junaid · April 18, 2026, 2:17pm

Hi Derrick,

Thank you for the thoughtful feedback — I really thankful and appreciate you taking the time to stress-test the design. Your points around OLTP load, aggregation complexity, and the FHIR roadmap helped me re-evaluate the system from a more production-oriented perspective.

1. Managing OLTP Performance & Stability

You’re absolutely right about the risks of querying the live transactional database. In the current prototype, the goal was to validate the metadata-driven query engine, but I agree this approach needs stronger safeguards for production use.

Specifically regarding the correlated subquery and potential N+1 issues, I’m planning to refactor this into a join-based or batch-fetch strategy to avoid per-row execution overhead.

In addition, I see a layered mitigation approach:

Query-level safeguards: row limits, timeouts, and restricted query shapes
Execution strategy: avoiding correlated subqueries in favor of joins
Architectural direction: eventually routing reporting workloads to a read replica or asynchronous reporting store

This ensures we reduce both query inefficiency and overall load on the OLTP system.

2. Scoping Aggregation & BI Complexity

You’re right that aggregations and cross-tab reports are some of the most complex and maintenance-heavy parts of reporting systems.

I did explore dynamic GROUP BY and pivoting using the Criteria API, but it quickly became clear that building a fully generic and maintainable abstraction for these within the GSoC scope would introduce significant complexity and long-term maintenance risk.

So the decision to defer them was intentional:

Focus on delivering a stable, metadata-driven filtering engine first
Avoid introducing fragile query-generation logic early

Your suggestion to integrate with external BI/analytics tools makes a lot of sense here. I see this as a cleaner separation of concerns:

Core engine → structured, operational reporting
External tools → advanced aggregations, pivots, and analytics

I’d definitely like to explore how we can design the system to support that kind of integration cleanly.

3. FHIR & Data Pipeline Alignment

I really liked the FHIR discussion as well — it helped clarify the long-term direction significantly.

I see this query engine acting as a transitional layer (“bridge”):

In the current state → operates on the relational schema for immediate reporting needs
In the future → the same abstraction can point to a FHIR-backed data source or analytics pipeline

To support this, I’m designing the engine with a decoupled data source layer, so that:

Query construction remains consistent
Only the underlying data provider changes (JPA → FHIR repository / analytics store)

Long-term, I agree that extracting FHIR bundles into a dedicated analytics pipeline is the more scalable direction. My goal is to ensure that what we build now doesn’t become a dead-end, but instead evolves naturally into that architecture.

I’d be happy to iterate further on any of these areas, especially around query optimization or how this could align more closely with the FHIR pipeline design.

Topic		Replies	Views
Prospective GSoC 2026 Contributor \| Enhancing OpenELIS Reporting with Automation & UI Innovation	1	16	April 17, 2026
Introduction - Akhil \| GSoC 2026 \| Generic Reporting Framework GSOC intro , gsoc2026	2	30	April 17, 2026
GSoC 2025: Reporting Framework Discussion GSOC gsoc2025	3	140	April 25, 2025
Clarification on the Scope of the "Robust Reporting Framework" Project GSOC	3	73	March 14, 2025
Review request PR : #3018 and #30360	2	31	March 16, 2026