GSoC 2024 - Enhanced Search Functionality for OpenELIS

rahul6603 · May 7, 2024, 7:41am

Hi everyone, I am excited to start this thread where I will be sharing regular weekly updates on my progress, challenges, and discoveries as I work on my GSoC project.

Project Name - Enhanced Search Functionality for OpenELIS

Project Summary - This project aims to enhance the search capabilities within the OpenELIS system, particularly the Search Patient module, providing users with a more efficient way of retrieving patient information. We aim to integrate a Java search engine library Hibernate Search with an Apache Lucene backend, to index key columns such as patient ID, patient name and other relevant data, providing users a faster way to search results. This will also include the integration of fuzzy search capabilities which will allow for approximate matching based on similarity, wildcard search which will allow for specified pattern matching and range search which will allow for searching for values within a specified range.

Mentor - @Moses_Mutesasira

rahul6603 · May 20, 2024, 8:24am

Community Bonding Period (May 1 - 26)

I had an insightful call with my mentor @Moses_Mutesasira to discuss and refine our project plan, particularly the low-level design of the Search Patient module. Here is a brief summary of the existing and proposed solutions that we discussed -

Existing solution

The current implementation of the patient search functionality is structured around a controller method getPatientResults, a service layer using the interface SearchResultsService and its implementation SearchResultsServiceImpl, and a Data Access Object (DAO) using the interface SearchResultsDAO and its implementation SearchResultsDAOImp.

Proposed Solution

We will create a new implementation of the SearchResultsService interface in the service layer, called LuceneSearchResultsService and a new implementation of the SearchResultsDAO interface, called LuceneSearchResultsDAO.
We will modify the existing service layer implementation SearchResultsServiceImpl to DatabaseSearchResultsService and the existing DAO implementation SearchResultsDAOImp to DatabaseSearchResultsDAO.

The coding period officially starts on May 27 and before that, I will focusing on learning more about Apache Lucene search queries and finalising the project plan with my mentor.

CC: @Moses_Mutesasira

rahul6603 · June 4, 2024, 4:36pm

Coding Period - Week 1 (May 27 - June 2)

This week marked the beginning of the GSoC coding period, and I am excited to share the updates of the progress made during this week.

Focus Areas

Adding the required dependencies
Configuring end-to-end setup for Hibernate Search and Lucene index storage

Accomplishments

Issues
- GSoC 2024 - Enhanced Search Functionality for OpenELIS
  This issue serves as the main tracker for the project, where all related pull requests will be linked.
Pull Requests
- Add Hibernate Search dependencies to pom.xml (Merged)
  To configure Hibernate Search, I first added the necessary dependencies to the project’s pom.xml file.
- Add configuration for Lucene index storage (Merged)
  After adding the dependencies, we needed to specify where Lucene should store the search indexes. This was done through configuration properties in the persistence.xml file.
- Implement persistent Lucene indexing with Docker volumes (Under review)
  - To ensure that the search indexes persist when the backend container (openelisglobal-webapp) restarts, we will use Docker volumes.
  - We will create a directory to store Lucene indexes when the container is created, and then define a volume for this directory to ensure that the indexes are not lost when the container is restarted or recreated.

Challenges

I faced a blocker when I was experimenting with initialising the Lucene index for Patient entity.
The error message I encountered was - Error: HSEARCH600154: Unable to start index: HSEARCH600015: Unable to initialize index directory: /var/lib/lucene_index Context: index 'Patient'
After some troubleshooting, I concluded the reason for this error was the user running the application openelisglobal-webapp - tomcat_admin lacked permissions to create the directory.
I solved this issue by making changes in the Dockerfile. I created the necessary directory structure (/var/lib/lucene_index) and setting the required ownership and permissions.

Next steps

Getting the remaining PR merged by reviewing feedback and making changes if required
Mapping entities to indexes
Adding new implementations for the service layer and the DAO
Modifying the existing implementations

CC: @Moses_Mutesasira

rahul6603 · June 10, 2024, 10:43am

Coding Period - Week 2 (June 3 - June 9)

Focus Areas

Finishing configuration of setup for Hibernate Search and Lucene index storage
Mapping entities to indexes

Accomplishments

Discussions with mentor

This week, I had a productive discussion with my mentor @Moses_Mutesasira during the weekly developer meeting. We discussed on the following topics -
- The fields which should be indexed
- Indexing already existing data if the indexes become out of sync
- Offering choice of Database search and Lucene-based search to the admin user
Pull Requests
- Implement persistent Lucene indexing with Docker volumes (Merged)

Next Steps

Reviewing and getting feedback from the mentor for the changes addressing the mapping of entities to indexes
Adding new implementations for the service layer and the DAO
Modifying the existing implementations

CC: @Moses_Mutesasira

rahul6603 · June 17, 2024, 5:05pm

Coding Period - Week 3 (June 10 - June 16)

Focus Areas

Mapping entities to indexes

Accomplishments

Pull Requests
- Add bidirectional association for patients in person mapping (Under review)
  - We will add <set> element to Person mapping to establish bidirectional association with Patient.
  - This change is needed to prevent the Hibernate Search exception caused when using @IndexedEmbedded with a unidirectional association. We will be using @IndexedEmbedded to embed the fields of Person object in the main Patient object.
  - I also had a discussion with @calebslane about the potential side effects of this change.
- Add mapping for indexing Patient and Person entities (Under review)
  - We will use @Indexed and @*Field (such as @KeywordField) annotations to map Patient and Person entities and their fields to Lucene indexes.
  - We will create a custom normalizer that includes lowercase and ascii-folding filters for the Keyword fields. These filters will convert every character to lowercase and replace characters with diacritics (“é”, “à”, …) with their ASCII equivalent (“e”, “a”, …).

Challenges

I faced a couple of blockers while using the @IndexedEmbedded annotation.
First, I encountered a Hibernate Search exception - org.hibernate.search.util.common.SearchException: HSEARCH700020: Unable to find the inverse side of the association on type 'org.openelisglobal.patient.valueholder.Patient' at path '.person<no value extractors>'.
- I solved the issue by using the @AssociationInverseSide annotation to explicitly specify the inverse side of the association.
After this change, I encountered a warning message - Warn: HSEARCH700122: An unexpected failure occurred while configuring resolution of association inverse side for reindexing. This may lead to incomplete reindexing and thus out-of-sync indexes.
- I solved this issue in the PR - Add bidirectional association for patients in person mapping, establishing a bidirectional association.

Next Steps

Getting the PRs merged by reviewing feedback and making changes if required
Adding new implementations for the service layer and the DAO and modifying the existing implementations
Designing and implementing Apache Lucene search queries

CC: @Moses_Mutesasira

rahul6603 · June 25, 2024, 1:05pm

Coding Period - Week 4 (June 17 - June 23)

Focus Areas

Finishing mapping of entities to indexes
Adding the required tests related to the changes

Accomplishments

Pull Requests
- Add bidirectional association for patients in person mapping (Merged)
  - Apart from the changes related to mapping, I also added the required tests after discussing with my mentor on the specifics.
  - I added the test method createPersonWithMultiplePatients_shouldLinkPatientsToPerson in PersonServiceTest. The test ensures that when multiple Patient entities are linked to a Person entity, the relationships are correctly established and persisted.
  - I also added init and tearDown methods for clean-up in PersonServiceTest and PatientServiceTest to ensure that the existing tests work properly.
- Add mapping for indexing Patient and Person entities (Merged)

Next steps

Adding new implementations for the service layer and the DAO and modifying the existing implementations
Designing and implementing Apache Lucene search queries
Adding support for manually re-indexing the already persisted data using Hibernate Search MassIndexer

CC: @Moses_Mutesasira

rahul6603 · July 3, 2024, 4:50pm

Coding Period - Week 5 (June 24 - June 30)

Focus Areas

Adding new implementations for the service layer and the DAO and modifying the existing implementations

Accomplishments

Pull Requests
- Add new implementations for the search results service and the DAO and modify the existing implementations (Under Review)
  - We will create a new implementation of the SearchResultsService interface in the service layer, called LuceneSearchResultsService and a new implementation of the SearchResultsDAO interface, called LuceneSearchResultsDAO. Currently, these implementations contain placeholders for Apache Lucene search logic which will be added later.
  - We will modify the existing service layer implementation SearchResultsServiceImpl to DatabaseSearchResultsService and the existing DAO implementation SearchResultsDAOImp to DatabaseSearchResultsDAO.
  - The @Primary annotations on DatabaseSearchResultsService and DatabaseSearchResultsDAO are temporary and will be moved to the new implementations once they are fully functional.
  - Reference UML diagram
- Change index directory type to local heap in testing configuration (Merged)
  - Apache Lucene index files were being created in the base directory because earlier the test configuration for storing indexes was configured to local filesystem. I have addressed this issue in this PR to configure it to local heap instead.

Next steps

Designing and implementing Apache Lucene search queries
Adding support for manually re-indexing the already persisted data using Hibernate Search MassIndexer

CC: @Moses_Mutesasira

rahul6603 · July 10, 2024, 12:38pm

Coding Period - Week 6 (July 1 - July 7)

Focus Areas

Designing and implementing Apache Lucene search queries

Progress

I am currently working on implementing Lucene queries in getSearchResults and getSearchResultsExact methods of LuceneSearchResultsDAOImpl class. I will soon raise a draft PR for the changes.
This change also includes tests for the LuceneSearchResultsServiceImpl which will replace the existing placeholder tests.

Next steps

Reviewing feedback from the mentor and getting the PR finalised
Adding support for manually re-indexing the already persisted data using Hibernate Search MassIndexer

CC: @Moses_Mutesasira

rahul6603 · July 21, 2024, 5:32am

Coding Period - Week 7 and 8 (July 8 - July 21)

Focus Areas

Finishing implementation of Apache Lucene search queries

Accomplishments

Pull Requests
- Implement Lucene queries in getSearchResults and getSearchResultsExact methods of LuceneSearchResultsDAOImpl (Merged)
  - I implemented Lucene queries in getSearchResults and getSearchResultsExact methods of LuceneSearchResultsDAOImpl class.
  - This change also included tests for the LuceneSearchResultsServiceImpl which replaced the existing placeholder tests in SearchResultsServiceTest.
I will soon create a PR to make LuceneSearchResultsServiceImpl the primary implementation for SearchResultsService interface.

Next steps

Adding support for manually re-indexing the already persisted data using Hibernate Search MassIndexer

CC: @Moses_Mutesasira

rahul6603 · August 7, 2024, 6:15pm

Coding Period - Week 9 and 10 (July 22 - August 4)

Focus Areas

Adding support for manually re-indexing the already persisted data using Hibernate Search MassIndexer

Accomplishments

Pull Requests
- Make LuceneSearchResultsServiceImpl the primary implementation for SearchResultsService interface (Under Review)
- Add support for manual re-indexing using MassIndexer (Draft)
  - I am currently working on this draft PR which aims to add support for manually re-indexing the already persisted data using Hibernate Search MassIndexer.

CC: @Moses_Mutesasira

rahul6603 · August 23, 2024, 8:12pm

Coding Period - Week 11 and 12 (August 5 - August 18)

Focus Areas

Finishing adding support for manual re-indexing
Adding Documentation

Accomplishments

Pull Requests
- Add support for manual re-indexing using MassIndexer (Merged)
- Add test for MassIndexerService (Draft)
Documentation
- Project Report
- Final Presentation

CC: @Moses_Mutesasira

Topic		Replies	Views
GSOC 2024: Enhanced Search Functionality for OpenELIS GSOC gsoc2024	4	267	March 7, 2024
GSoC 2024 Enhanced Search Functionality for OpenELIS GSOC	0	116	March 6, 2024
GSoC 2024 - Enhanced Search Functionality for OpenELIS - Final Presentation GSOC gsoc2024-finaleval , gsoc2024	0	33	August 23, 2024
OpenELIS Global has been Accepted as a Google Summer of Code 2024 mentor organization GSOC gsoc2024	22	463	March 8, 2024
GSoC 2024: Integrating OpenELIS with a FHIR-Based Open Client Registry - project updates GSOC gsoc2024	7	286	July 16, 2024

GSoC 2024 - Enhanced Search Functionality for OpenELIS

Community Bonding Period (May 1 - 26)

Existing solution

Proposed Solution

Coding Period - Week 1 (May 27 - June 2)

Focus Areas

Accomplishments

Issues

Pull Requests

Challenges

Next steps

Coding Period - Week 2 (June 3 - June 9)

Focus Areas

Accomplishments

Discussions with mentor

Pull Requests

Next Steps

Coding Period - Week 3 (June 10 - June 16)

Focus Areas

Accomplishments

Pull Requests

Challenges

Next Steps

Coding Period - Week 4 (June 17 - June 23)

Focus Areas

Accomplishments

Pull Requests

Next steps

Coding Period - Week 5 (June 24 - June 30)

Focus Areas

Accomplishments

Pull Requests

Next steps

Coding Period - Week 6 (July 1 - July 7)

Focus Areas

Progress

Next steps

Coding Period - Week 7 and 8 (July 8 - July 21)

Focus Areas

Accomplishments

Pull Requests

Next steps

Coding Period - Week 9 and 10 (July 22 - August 4)

Focus Areas

Accomplishments

Pull Requests

Coding Period - Week 11 and 12 (August 5 - August 18)

Focus Areas

Accomplishments

Pull Requests

Documentation

Related topics