Cloud Projects

Principles of Data Architecture

Data First

Artifacts must include data flows accurately and track data interactions at a dataset level.

Data is Secure

  • Personal and confidential attributes should be encrypted/tokenized
  • Data should be categorized based on Enterprise Data Privacy Standards
  • Data at rest ( using AWS SSE – KMS ) in motion (TLS 1.2x) must be encrypted

Loosely coupled Data

  • Data Exchange Interfaces should be de-coupled
  • No Direct connectivity or interfaces to Databases/Application specific data stores
  • No Point-to-Point connectivity/interfaces to Databases/Application specific data stores
  • Data should be sourced from either an authoritative stores like Electronic Data Interchange(EDI) containers and Enterprise Data Lake (EDL) or It should be from System of Record (SoR)
  • Data exchange patterns should adhere to a Canonical Standard ( If applicable)

 Data Should be governed

  • Data completeness checks and validation process is implemented and well documented
  • Data Quality validation rules should enforced and publish the results periodically.
  • Design time metadata and Operational metadata should be tracked to ensure data lineage to support audit requirements
  • Strong enforce for Data Stewardship – Stewards are accountable for manage and authority for data

Data flow – Unidirectional

  • Data should flow from transactional grain to reporting not vice versa
  • Data should not flow from coarse grain aggregators like Reporting System to Transactional systems
  • Data should not flow from Operational systems to Transactional systems
  • Data should not flow from Enterprise Data Lake (EDL) to Transactional Systems.

Data – Shared Asset

  • Datasets are registered and cataloged in centralized system like Collibra system
  • All data exchange interfaces should registered through Enterprise process
  • Data should hydrated to Enterprise Data lake (EDL) and applicable authoritative data source

Data – Timely accessible

  • Real Time or Near Real Time
    • Interface with SoR systems via either Streaming Data Platform or API’s
  • Near Real Time and Batch
    • Interface with authoritative data sources via standardized interface patterns
  • Access just enough data and just in time

 

Maintain common vocabulary and Data Definitions

  • Dictionary and definitions are published to centralized system like Collibra
  • Data should be alignment with Enterprise Data Management (EDM) for well defined, named and consistently throughout the organization.

Leave feedback about this

  • Quality
  • Price
  • Service
Choose Image