Data Quality

The discursus.io project is a data platform that processes raw data, performs enhancements, transforms it and exposes data through an API. The data platform is built around 5 data products:
  • Public sources: which is where we scrape public sources
  • Pre-process: this is where we clean, prepare and enhance the data from public sources
  • Core entities: this is where we transform the data into the data warehouse entities
  • API: which is where we expose the entities, attributes and relationships for public consumption
  • Monitoring dashboard: this is a public app where users can explore social movements, events, actors, narratives, etc.


Below is a dashboard that assesses the data quality of our data platform. For now we only have source profiling of the GDELT public source.