About me

AWS Certified Data Engineer with around 4 years of experience in designing scalable data pipelines and cloud-based solutions. Skilled in using Python, AWS, and big data technologies like Apache Spark and Kafka to build efficient, high-performance data processing systems.

Proficient in implementing ETL workflows, real-time data streaming, and automated data integration across multi-cloud environments. Experienced in leveraging AWS Glue, Snowflake, and Airflow to optimize data pipelines and ensure seamless data flow.

Passionate about improving data accessibility and operational efficiency through automation and robust architecture. Proven track record of reducing processing times, enhancing data quality, and enabling data-driven insights.

What i'm doing

  • design icon

    Programming and Scripting

    Writing efficient code for data manipulation, pipeline automation, and ETL tasks.

  • Web development icon

    Data Pipelines and ETL Tools

    Building and orchestrating data workflows to extract, transform, and load data.

  • mobile app icon

    Data Warehousing and Storage

    Managing structured and unstructured data in scalable, high-performance storage solutions.

  • camera icon

    Cloud Platforms and Tools

    Leveraging cloud infrastructure to build, deploy, and maintain data processing applications.

Recommendations

  • Naoki Yamaguchi

    Naoki Yamaguchi

    Atul Tiwari is an outstanding engineer and a fantastic colleague. We hired him as a fresh graduate, and it quickly became clear we were lucky—his skills in data scraping, database management, cloud services, and Python were impressive. Beyond technical abilities, he brought energy, patience, and leadership, which led to his promotion to Scrum Leader within six months. I’d gladly work with him again if the chance arises.

  • Amandeep Kaur Sandhu

    Amandeep Kaur Sandhu

    Atul is a standout student with a deep passion for Computer Science. During his undergraduate studies, he consistently delivered high-quality work and exceeded expectations. His research projects were impressive and even presented at national and international conferences. Atul’s dedication, teamwork, and ability to manage multiple responsibilities make him a strong candidate for any graduate program. I’m confident he will excel in the field of Computer Science and Engineering.

  • Muhammad Thouseef

    Muhammad Thouseef

    Working with Atul was one of the highlights of my time at Propre. He's not only a skilled data engineer with deep knowledge of pipelines, cloud platforms, and Python—but also someone you can always count on. Whether we were debugging a tricky job in Spark or collaborating on sprint planning, Atul brought a calm, focused energy to the team. He's collaborative, proactive, and always ready to help. I’d jump at the chance to work with him again!

Certifications

Resume

Education

  1. Troy University

    2023 — 2024

    Major in Computer Science and Engineering with coursework in Distributed Computing and Computer Vision. Specialized studies include a project on building ETL pipelines integrated with machine learning models.

  2. International Institute of Information Technology Bangalore

    2022 — 2023

    Advanced Diploma in Data Engineering and Analysis from the International Institute of Information Technology Bangalore. Coursework included Data Preprocessing and Exploratory Data Analysis, with a specialized project focused on Amazon EMR and batch processing using PySpark.

  3. Lovely Professional University

    2018 — 2021

    Bachelor's degree in Computer Science, where I gained foundational knowledge in computer science and programming. Developed a keen interest in data analysis and completed projects such as feature extraction and music identification.

Experience

  1. Data Engineer

    Principal Financial Group 2024 — Present

    Designed and deployed high-performance data pipelines using Apache Spark, AWS Glue, and Snowflake, reducing ETL execution times by 45%.


    Implemented scalable event-driven streaming solutions with Apache Kafka and AWS Kinesis, increasing data availability by 35% and supporting millions of transactions daily.


    Led the migration of legacy ETL processes to AWS S3 and Redshift, cutting data storage costs by 30% and boosting query performance for risk modeling.


    Automated data workflows using Apache Airflow and Prefect, improving reliability by 40% and ensuring seamless multi-cloud execution.


    Enhanced database performance for SQL/NoSQL systems (PostgreSQL, MongoDB) by optimizing indexing and caching, achieving a 50% reduction in query response times.


  2. Data Engineer

    Propre Pte. Ltd. 2020 — 2023

    Engineered high-throughput ETL pipelines using Apache Spark (PySpark), Hive, and Snowflake, reducing batch processing times by 40% and ensuring scalable data analytics.


    Re-engineered data infrastructure with Azure Data Factory, Databricks, and Delta Lake, enabling incremental data loading and cutting storage costs by 30%.


    Developed real-time data ingestion frameworks with Apache Kafka, Flink, and AWS Kinesis, enhancing processing efficiency by 40% and supporting microservice integration.


    Optimized SQL Server, PostgreSQL, and Cassandra performance through indexing and query plan refinement, reducing read/write latencies by 50%.


    Implemented data governance and compliance using Apache Ranger, data masking, and encryption standards, ensuring GDPR, HIPAA, and ISO 27001 adherence.


    Automated data validation and anomaly detection using ML models in TensorFlow and AWS SageMaker, reducing data deviations by 40%.


    Orchestrated scalable data workflows with Kubernetes, Helm, and Airflow, enhancing fault tolerance and reducing operational overhead by 25%.


    Managed metadata and data cataloging systems with Apache Atlas, AWS Glue, and Alation, improving data lineage and governance compliance.


Portfolio

Skills

Programming and Scripting

Python SQL Scala Shell Scripting Java

Data Warehousing and Storage

Snowflake BigQuery Delta Lake MySQL PostgreSQL MS SQL Server MongoDB Oracle

Big Data and Cloud Platforms

Apache Spark Databricks AWS (S3, Glue, Lambda) GCP (BigQuery, Dataflow) Azure (Synapse, Data Factory)

ETL and Data Integration

Apache Airflow dbt Informatica PowerCenter Matillion ETL

Streaming and Real-Time Processing

Apache Kafka Apache Flink RabbitMQ AWS Kinesis

DevOps and Automation

Terraform Docker Kubernetes CI/CD Pipelines

Data Governance and Security

Apache Ranger Data Masking Encryption (AES-256) GDPR HIPAA Compliance

Data Visualization and Reporting

Tableau Power BI Looker

Workflow and Project Management

Jira Confluence Agile (Scrum, Kanban)

Contact

Contact Form