📋
Philanthrolab
  • Philanthrolab Technical Docs
  • SSN Component Library
  • Datalabs
    • Introduction
    • Architecture
    • Schema Dictionary
    • Project Status/Timeline
  • Social Safety Network
    • Introduction
    • Architecture
    • Schema Dictionary
    • Project Status/Timeline
      • V1
      • V2
  • SSN for Organisations
    • Introduction
    • Features and user stories
    • Architecture
    • Schema Dictionary
    • Project Status/Timeline
  • Developer Resources
    • Frontend Project Guide
    • Coding Guide
    • Creating a Neo4j instance on GCP vm
    • Set up local deploy for staging and production envs
    • Install ElasticSearch on GCP
    • ElasticSearch Query
    • ETL Strategy for Neo4j Database: Scraping, Transformation, and Enrichment
    • ETL Checklist
  • SSN Authentication
    • Introduction
    • Architecture
    • Schema
  • SSN Admin Dashboard
    • Introduction
    • Architecture
  • SSN Job Board
    • Introduction
    • Architecture
    • User Stories
    • Schema Dictionary
  • SSN Eligibility criteria AI feature
    • Introduction
    • Working Principles
    • Architecture
    • Schema Dictionary
  • DataBase Repopulation
    • Introduction
    • Proposed Solution
    • DB Details
    • Batch 1
  • LLM INTEGRATION
    • LLM Strategy and Implementation
Powered by GitBook
On this page

Was this helpful?

  1. DataBase Repopulation

Batch 1

Fetching data from the production db, cleaning it and populating in sandbox for testing against backend data querying

PreviousDB DetailsNextLLM Strategy and Implementation

Last updated 2 years ago

Was this helpful?

Data Source: Production DB

Data Destination: Sandbox

Target: Organizations

Export Query:

MATCH (n:Organization) WHERE NOT(n.ein="" OR n.description="")
WITH n
MATCH (n)-[:LOCATED_IN]->(l:Location) 
MATCH (n)-[:CALL_WITH]->(p:Phone)
MATCH (n)-[:CONTACT_AT]->(c:Contact)
RETURN n.deductibility,n.subsection,n.assetAmount,n.description,n.ein,
n.latest990,n.subOpCategory,n.deductibilityCode,n.affiliation,n.foundationStatus,
n.opCategory,n.id,n.accountingPeriod,n.email,n.nteeLetter,n.nteeType,
n.incomeAmount,n.nteeSuffix,n.filingRequirement,n.alternateName,
n.classification,n.url,n.rulingDate,n.nteeCode,n.groupName,n.name,n.tagline,
n.nteeClassification,n.exemptOrgStatus,n.exemptOrgStatusCode

Export Format: Json

Total: 8547

Data Processing: The resulting descriptions were found incompatible for json parsing. we manually made corrections to the description to match json format.

Data repopulating script:

Post data population activities:

  • create fulltext index: CALL db.index.fulltext.createNodeIndex("<indexName>",["NodeName"],["NodeProperty1",])

  • Data testing and validation against codebase breaking

https://github.com/PhilanthroLab/irs-uploader/blob/main/db_repopulation_batch1.js