Howdy Logo
Image of Cézar A.

Cézar A.
Data Engineer

Spark
Sql
Apache Hadoop
Java
Python
Amazon Aws
Bio

A dedicated student and passionate data science enthusiast, actively seeking an initial opportunity for professional growth and enhancement. The primary goal is to acquire competencies and skills essential for comprehending business strategies and requirements. This includes the ability to manage, design, and develop comprehensive analytical solutions within the field of Data Science.

  • Data Engineer
    8/1/2022 - Present

    Developed extensive proficiency in AWS, leveraging services such as S3, Redshift, and EMR for optimized data storage, processing, and analytics. Gained considerable expertise in using Apache Spark for large-scale data processing and analytics tasks, ensuring efficient data transformation and integration processes. Showcased advanced skills in Python, deploying it for scripting, automation, and the development of data pipelines. Utilized Control-M for scheduling and monitoring batch jobs, ensuring streamlined workflows and timely data processing. Demonstrated comprehensive knowledge of SQL for querying, manipulating, and managing relational databases, which facilitated the extraction of insightful and actionable data reports. Successfully applied these technical skills to design and implement scalable data architecture solutions, contributing to improved data quality and operational efficiency.

  • Data Engineer
    5/1/2021 - 8/1/2022

    Acquired advanced technical expertise in managing and optimizing Hadoop CDP Clusters for large-scale data processing. Engineered robust data pipelines utilizing Apache NiFi, Apache Hive, and Apache Spark for efficient data ingestion, transformation, and analysis. Leveraged HDFS and YARN to maximize storage capabilities and resource management. Employed Sqoop and Kafka for seamless data migration and real-time data streaming, respectively. Augmented skills in orchestrating workflows using Apache Oozie, and implemented security protocols with Apache Ranger. Proficient in SQL for querying large datasets and developed automated reporting solutions. Gained experience in version control and collaborative development practices using Git.

  • Big Data Engineering
    10/1/2020 - 4/1/2021

    Oversaw the design, deployment, and maintenance of a high-performance Hadoop cluster infrastructure. Developed advanced skills in Hadoop ecosystem tools including Hive, Pig, and HBase. Successfully implemented solutions to optimize data storage and processing efficiency, utilizing MapReduce for parallel data processing. Deployed and managed clusters using Ambari and Cloudera Manager, ensuring high availability and fault tolerance.

    Extensive experience in scripting and automation using Python and Bash to streamline cluster management tasks. Employed monitoring and logging systems such as Nagios and Splunk to ensure system reliability and performance. Leveraged AWS and Azure to scale infrastructure efficiently, implementing best practices for data security and compliance.

    Collaborated closely with data engineers and analysts to develop data pipelines and workflows using Apache NiFi. Applied machine learning models on large datasets, enhancing data analysis capabilities. Demonstrated proficiency in optimizing query performance and troubleshooting complex data processing issues. Conducted regular performance tuning and cluster upgrades to maintain state-of-the-art infrastructure.

  • Data Engineer
    4/1/2020 - 10/1/2020

    Developed expertise in AWS services, including S3, EMR, Glue, and Lake Formation, to design and implement data lake solutions. Engineered data ingestion pipelines utilizing AWS Lambda, Kinesis, and Data Pipeline to facilitate seamless data processing. Demonstrated strong skills in ETL processes with AWS Glue, optimizing data transformation and cleaning tasks. Built and maintained data cataloging and searching capabilities using AWS Glue Data Catalog, enabling efficient metadata management and data discovery. Implemented robust security measures with IAM, KMS, and CloudTrail to ensure data governance and compliance. Utilized Redshift and Athena to perform complex queries and data analysis, showcasing proficiency in SQL and data warehousing concepts. Automated cloud infrastructure deployments using CloudFormation and Terraform, ensuring consistent and repeatable setups. Developed and maintained monitoring dashboards with CloudWatch and implemented alerting mechanisms to ensure system reliability and performance. Successfully orchestrated batch and real-time data processing workflows, contributing to data-driven decision-making processes.

  • Dentistry at Federal University of Bahia
    1988 - 1992

  • Big Data (data science) at Unyleya College
    2020 - 2021

  • Big Data Engineer at Semantix Academy
    2021 - 2021

  • Data Science at Institute of Management and Information Technology
    2020 - 2020

  • Data Science at Digital House
    2021 - 2021

  • Oracle Next Education - Labora: Entrepreneurship at Grupo Alura
    11/2/2020

  • Oracle Next Education - Labora: Java at Grupo Alura
    11/2/2020

  • Certificate of Authority: Algorithm at DevMedia
    10/2/2020

  • Oracle Next Education - Labora: Front-End at Grupo Alura
    10/2/2020

  • Certificate of Authority: Python at DevMedia
    10/2/2020

  • Certificate of Authority: Database at DevMedia
    10/2/2020

  • Introduction to Data Science: Basic Concepts at LinkedIn
    9/2/2020

  • SQL Course at DevMedia
    9/2/2020

  • Introduction to Data Science: How to Tell Stories with Data at LinkedIn
    9/2/2020

  • Introduction to the Brazilian Personal Data Protection Law at Escola Nacional de Administração Pública ENAP
    9/2/2020

  • Programming Logic at Grupo Alura
    9/2/2020

  • Data Analysis in R Language at Escola Nacional de Administração Pública ENAP
    8/2/2020

  • Relational Database Modeling at DevMedia
    8/2/2020

Cézar is available for hire

Hire Cézar A.
Check icon

All Howdy Candidates are vetted for skills and english proficiency.