ElasticSearch Query

This Elasticsearch query is designed to search for documents based on a given text input and geographic coordinates. The query utilizes the function_score feature to combine text relevance scoring with geographic proximity scoring. The main components of the query include:

Text Search: Multi-match query for searching across multiple fields with different boosting factors.
Geographic Proximity: Utilization of a Gaussian decay function to score documents based on their proximity to a specified geographic point.
Script Fields: Custom script field to calculate the distance in kilometres between the document's coordinates and a reference point.

Query Structure

{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "<text>",
          "fields": [
            'service.opCategory^10',
            'service.name^9',
            'service.description^8',
            'service.alternateName^7',
            'description^6',
            'subOpcategory^5',
            'opCategory^4',
            'nteeClassification^3',
            'name^2',
            'alternateName^1'
          ]
        }
      },
      "functions": [
        {
          "gauss": {
            "coords": {
              "origin": {
                "lat": <latitude>,
                "lon": <longitude>
              },
              "scale": "10km",
              "offset": "0km",
              "decay": 0.5
            }
          }
        }
      ]
    }
  },
  "script_fields": {
    "distance_km": {
      "script": {
        "lang": "painless",
        "source": "doc['coords'].arcDistance(<latitude>, <longitude>) / 1000"
      }
    }
  }
}

How it works

Text Search

The multi_match query searches for the specified text across multiple fields with varying boosting factors. This allows for a more nuanced relevance scoring based on different fields.

FOR NEEDS SEARCH (searchOrganization)

service.opCategory^10: Boosts the field service.opCategory with a factor of 10. This means that matches in this field will contribute more significantly to the overall relevance score compared to other fields.

service.name^9: Boosts the field service.name with a factor of 9. Similar to the previous point, matches in this field have a slightly lower boosting factor but still contribute significantly to the relevance score.

service.description^8: Boosts the field service.description with a factor of 8. Matches in this field will have a lower impact on the relevance score compared to service.opCategory and service.name, but higher than some other fields.

service.alternateName^7: Boosts the field service.alternateName with a factor of 7. This field has a moderate boosting factor, indicating its relevance in the overall scoring.

description^6: Boosts the field description with a factor of 6. This non-specific field contributes to the relevance score, but with a lower impact compared to more specific fields like service.opCategory.

subOpcategory^5: Boosts the field subOpcategory with a factor of 5. This field has a lower boosting factor, suggesting it has less influence on the overall relevance score.

opCategory^4: Boosts the field opCategory with a factor of 4. Similar to subOpcategory, this field has a lower boosting factor.

nteeClassification^3: Boosts the field nteeClassification with a factor of 3. This field has a relatively low boosting factor, indicating its lower impact on the relevance score.

name^2: Boosts the field name with a factor of 2. This field has a very low boosting factor, suggesting it contributes minimally to the overall relevance score.

alternateName^1: Boosts the field alternateName with a factor of 1. This field has the lowest boosting factor, indicating it has the least impact on the relevance score

FOR ORGANIZATION SEARCH (searchSpecOrganization)

name^9: Boosts the field name with a factor of 9. Matches in this field will contribute significantly to the overall relevance score.

alternateName^8: Boosts the field alternateName with a factor of 8. This field has a slightly lower boosting factor than name, indicating its importance in the relevance score.

description^7: Boosts the field description with a factor of 7. Matches in this field will have a significant impact on the relevance score.

opCategory^6: Boosts the field opCategory with a factor of 6. This field contributes moderately to the relevance score.

subOpcategory^5: Boosts the field subOpcategory with a factor of 5. This field has a moderate boosting factor.

service.opCategory^4: Boosts the nested field service.opCategory with a factor of 4. Nested fields often represent a hierarchical or structured data model. In this case, service.opCategory is given moderate importance.

service.name^3: Boosts the nested field service.name with a factor of 3. Similar to service.opCategory, this nested field has a lower boosting factor.

service.description^2: Boosts the nested field service.description with a factor of 2. Matches in this nested field contribute to the relevance score, but with less impact than other fields.

service.alternateName^1: Boosts the nested field service.alternateName with a factor of 1. This nested field has the lowest boosting factor, indicating it has the least impact on the relevance score.

Geographic Proximity

The function_score wraps the text query and introduces a Gaussian decay function (gauss) to score documents based on their geographic proximity to the specified coordinates (latitude and longitude).

The decay parameter controls how quickly the influence of a document diminishes with distance.

Script Fields

The script_fields section includes a custom script to calculate the distance in kilometers (distance_km) between the document's coordinates (coords) and the reference point.

Expected Result

The query will return documents sorted by a combined score of text relevance and geographic proximity. Additionally, the result will include a custom field (distance_km) indicating the distance of each document from the specified geographic point in kilometers.

PreviousInstall ElasticSearch on GCP NextETL Strategy for Neo4j Database: Scraping, Transformation, and Enrichment

Last updated 1 year ago

Was this helpful?