GSoC 2024 - Enhanced Search Functionality for OpenELIS

Hi everyone, I am excited to start this thread where I will be sharing regular weekly updates on my progress, challenges, and discoveries as I work on my GSoC project.

Project Name - Enhanced Search Functionality for OpenELIS

Project Summary - This project aims to enhance the search capabilities within the OpenELIS system, particularly the Search Patient module, providing users with a more efficient way of retrieving patient information. We aim to integrate a Java search engine library Hibernate Search with an Apache Lucene backend, to index key columns such as patient ID, patient name and other relevant data, providing users a faster way to search results. This will also include the integration of fuzzy search capabilities which will allow for approximate matching based on similarity, wildcard search which will allow for specified pattern matching and range search which will allow for searching for values within a specified range.

Mentor - @Moses_Mutesasira

2 Likes

Community Bonding Period (May 1 - 26)

I had an insightful call with my mentor @Moses_Mutesasira to discuss and refine our project plan, particularly the low-level design of the Search Patient module. Here is a brief summary of the existing and proposed solutions that we discussed -

Existing solution

The current implementation of the patient search functionality is structured around a controller method getPatientResults, a service layer using the interface SearchResultsService and its implementation SearchResultsServiceImpl, and a Data Access Object (DAO) using the interface SearchResultsDAO and its implementation SearchResultsDAOImp.

Proposed Solution

  • We will create a new implementation of the SearchResultsService interface in the service layer, called LuceneSearchResultsService and a new implementation of the SearchResultsDAO interface, called LuceneSearchResultsDAO.

  • We will modify the existing service layer implementation SearchResultsServiceImpl to DatabaseSearchResultsService and the existing DAO implementation SearchResultsDAOImp to DatabaseSearchResultsDAO.

The coding period officially starts on May 27 and before that, I will focusing on learning more about Apache Lucene search queries and finalising the project plan with my mentor.

CC: @Moses_Mutesasira

1 Like

Coding Period - Week 1 (May 27 - June 2)

This week marked the beginning of the GSoC coding period, and I am excited to share the updates of the progress made during this week.

Focus Areas

  • Adding the required dependencies
  • Configuring end-to-end setup for Hibernate Search and Lucene index storage

Accomplishments

  • Issues

  • Pull Requests

    • Add Hibernate Search dependencies to pom.xml (Merged)
      To configure Hibernate Search, I first added the necessary dependencies to the project’s pom.xml file.

    • Add configuration for Lucene index storage (Merged)
      After adding the dependencies, we needed to specify where Lucene should store the search indexes. This was done through configuration properties in the persistence.xml file.

    • Implement persistent Lucene indexing with Docker volumes (Under review)

      • To ensure that the search indexes persist when the backend container (openelisglobal-webapp) restarts, we will use Docker volumes.
      • We will create a directory to store Lucene indexes when the container is created, and then define a volume for this directory to ensure that the indexes are not lost when the container is restarted or recreated.

Challenges

  • I faced a blocker when I was experimenting with initialising the Lucene index for Patient entity.
  • The error message I encountered was - Error: HSEARCH600154: Unable to start index: HSEARCH600015: Unable to initialize index directory: /var/lib/lucene_index Context: index 'Patient'
  • After some troubleshooting, I concluded the reason for this error was the user running the application openelisglobal-webapp - tomcat_admin lacked permissions to create the directory.
  • I solved this issue by making changes in the Dockerfile. I created the necessary directory structure (/var/lib/lucene_index) and setting the required ownership and permissions.

Next steps

  • Getting the remaining PR merged by reviewing feedback and making changes if required
  • Mapping entities to indexes
  • Adding new implementations for the service layer and the DAO
  • Modifying the existing implementations

CC: @Moses_Mutesasira

2 Likes

Coding Period - Week 2 (June 3 - June 9)

Focus Areas

  • Finishing configuration of setup for Hibernate Search and Lucene index storage
  • Mapping entities to indexes

Accomplishments

  • Discussions with mentor

    This week, I had a productive discussion with my mentor @Moses_Mutesasira during the weekly developer meeting. We discussed on the following topics -

    • The fields which should be indexed
    • Indexing already existing data if the indexes become out of sync
    • Offering choice of Database search and Lucene-based search to the admin user
  • Pull Requests

Next Steps

  • Reviewing and getting feedback from the mentor for the changes addressing the mapping of entities to indexes
  • Adding new implementations for the service layer and the DAO
  • Modifying the existing implementations

CC: @Moses_Mutesasira

4 Likes

Coding Period - Week 3 (June 10 - June 16)

Focus Areas

  • Mapping entities to indexes

Accomplishments

  • Pull Requests

    • Add bidirectional association for patients in person mapping (Under review)

      • We will add <set> element to Person mapping to establish bidirectional association with Patient.
      • This change is needed to prevent the Hibernate Search exception caused when using @IndexedEmbedded with a unidirectional association. We will be using @IndexedEmbedded to embed the fields of Person object in the main Patient object.
      • I also had a discussion with @calebslane about the potential side effects of this change.
    • Add mapping for indexing Patient and Person entities (Under review)

      • We will use @Indexed and @*Field (such as @KeywordField) annotations to map Patient and Person entities and their fields to Lucene indexes.
      • We will create a custom normalizer that includes lowercase and ascii-folding filters for the Keyword fields. These filters will convert every character to lowercase and replace characters with diacritics (β€œΓ©β€, β€œΓ β€, …​) with their ASCII equivalent (β€œe”, β€œa”, …​).

Challenges

  • I faced a couple of blockers while using the @IndexedEmbedded annotation.

  • First, I encountered a Hibernate Search exception - org.hibernate.search.util.common.SearchException: HSEARCH700020: Unable to find the inverse side of the association on type 'org.openelisglobal.patient.valueholder.Patient' at path '.person<no value extractors>'.

    • I solved the issue by using the @AssociationInverseSide annotation to explicitly specify the inverse side of the association.
  • After this change, I encountered a warning message - Warn: HSEARCH700122: An unexpected failure occurred while configuring resolution of association inverse side for reindexing. This may lead to incomplete reindexing and thus out-of-sync indexes.

Next Steps

  • Getting the PRs merged by reviewing feedback and making changes if required
  • Adding new implementations for the service layer and the DAO and modifying the existing implementations
  • Designing and implementing Apache Lucene search queries

CC: @Moses_Mutesasira

3 Likes

Coding Period - Week 4 (June 17 - June 23)

Focus Areas

  • Finishing mapping of entities to indexes
  • Adding the required tests related to the changes

Accomplishments

  • Pull Requests

    • Add bidirectional association for patients in person mapping (Merged)

      • Apart from the changes related to mapping, I also added the required tests after discussing with my mentor on the specifics.
      • I added the test method createPersonWithMultiplePatients_shouldLinkPatientsToPerson in PersonServiceTest. The test ensures that when multiple Patient entities are linked to a Person entity, the relationships are correctly established and persisted.
      • I also added init and tearDown methods for clean-up in PersonServiceTest and PatientServiceTest to ensure that the existing tests work properly.
    • Add mapping for indexing Patient and Person entities (Merged)

Next steps

  • Adding new implementations for the service layer and the DAO and modifying the existing implementations
  • Designing and implementing Apache Lucene search queries
  • Adding support for manually re-indexing the already persisted data using Hibernate Search MassIndexer

CC: @Moses_Mutesasira

4 Likes

Coding Period - Week 5 (June 24 - June 30)

Focus Areas

  • Adding new implementations for the service layer and the DAO and modifying the existing implementations

Accomplishments

  • Pull Requests

    • Add new implementations for the search results service and the DAO and modify the existing implementations (Under Review)

      • We will create a new implementation of the SearchResultsService interface in the service layer, called LuceneSearchResultsService and a new implementation of the SearchResultsDAO interface, called LuceneSearchResultsDAO. Currently, these implementations contain placeholders for Apache Lucene search logic which will be added later.
      • We will modify the existing service layer implementation SearchResultsServiceImpl to DatabaseSearchResultsService and the existing DAO implementation SearchResultsDAOImp to DatabaseSearchResultsDAO.
      • The @Primary annotations on DatabaseSearchResultsService and DatabaseSearchResultsDAO are temporary and will be moved to the new implementations once they are fully functional.
      • Reference UML diagram
    • Change index directory type to local heap in testing configuration (Merged)

      • Apache Lucene index files were being created in the base directory because earlier the test configuration for storing indexes was configured to local filesystem. I have addressed this issue in this PR to configure it to local heap instead.

Next steps

  • Designing and implementing Apache Lucene search queries
  • Adding support for manually re-indexing the already persisted data using Hibernate Search MassIndexer

CC: @Moses_Mutesasira

3 Likes

Coding Period - Week 6 (July 1 - July 7)

Focus Areas

  • Designing and implementing Apache Lucene search queries

Progress

  • I am currently working on implementing Lucene queries in getSearchResults and getSearchResultsExact methods of LuceneSearchResultsDAOImpl class. I will soon raise a draft PR for the changes.

  • This change also includes tests for the LuceneSearchResultsServiceImpl which will replace the existing placeholder tests.

Next steps

  • Reviewing feedback from the mentor and getting the PR finalised
  • Adding support for manually re-indexing the already persisted data using Hibernate Search MassIndexer

CC: @Moses_Mutesasira

5 Likes

Coding Period - Week 7 and 8 (July 8 - July 21)

Focus Areas

  • Finishing implementation of Apache Lucene search queries

Accomplishments

Next steps

  • Adding support for manually re-indexing the already persisted data using Hibernate Search MassIndexer

CC: @Moses_Mutesasira

2 Likes

Coding Period - Week 9 and 10 (July 22 - August 4)

Focus Areas

  • Adding support for manually re-indexing the already persisted data using Hibernate Search MassIndexer

Accomplishments

CC: @Moses_Mutesasira

4 Likes

Coding Period - Week 11 and 12 (August 5 - August 18)

Focus Areas

  • Finishing adding support for manual re-indexing
  • Adding Documentation

Accomplishments

CC: @Moses_Mutesasira

3 Likes