Skip to content

Resolve "expand filter results with semantic similar values"

Laurian Gridinoc requested to merge 48-semantically-similar-filterset-results into main

Closes #48 (closed)

Offline: python ./api/filtersets/create_sim_map.py uses filterset_request_20240913_trimmed.json to create semantic_map.json where each filter has for each value a list of similar tuples ('similar value', cosine)

At Runtime: api/filterset.py loads the map and expands the results with similar ones, by default using min_cosine 0.5 and max_expansions pf 5; which can be overridden via query parameters

Example result without expansion:

{
  "filters": {
    "current_work_locations": [
      "Pennsylvania"
    ],
    "future_work_locations": [
      "This is a national project, based in Philadelphia, PA" // <-- this is bad value from Torque
    ],
    "key_words_and_phrases": [

    ]
  },
  "id": "b294345a0dd6bd4f7160f9fbdbbf2e7f",
  "keywords": [
    "Philadelphia",
    "literacy projects"
  ],
  "query": "literacy projects in philidelphia"
}

now with semantic expansion:

{
  "filters": {
    "current_work_locations": [
      "Pennsylvania",
      "Connecticut",
      "Delaware",
      "Kentucky",
      "Maryland",
      "Nevada"
    ],
    "future_work_locations": [
      "This is a national project, based in Philadelphia, PA", // <-- this is bad value from Torque
      "United States: Pennsylvania",
      "Pennsylvania"
    ],
    "key_words_and_phrases": [

    ]
  },
  "id": "b294345a0dd6bd4f7160f9fbdbbf2e7f",
  "keywords": [
    "Philadelphia",
    "literacy projects"
  ],
  "query": "literacy projects in philidelphia"
}

Merge request reports

Loading