Howdy Logo
Image of Vinícius A.

Vinícius A.
Data Engineer

Airflow
Spark
Sql
Github
Databricks
Python
Postgresql
Amazon Aws
Docker Cloud
Bio

Data Engineer specializing in the design, implementation, and maintenance of Data Lakehouse architectures, data pipelines, and ETL/ELT processes within cloud-based Big Data environments. Proficient in data ingestion and storage from various sources such as APIs and databases, as well as in data processing, integration, and provisioning with a focus on ensuring data quality and access control. Experienced in collaborating with multidisciplinary and global teams across both multinational corporations and startups, utilizing traditional and agile methodologies. Industry experience spans the automotive and financial sectors. Holds a degree in Control and Automation Engineering from the University of Brasília, complemented by courses and certifications in Data Engineering. Demonstrates advanced knowledge in Python, SQL, AWS, Databricks, PySpark, Delta Lake, Airflow, Nifi, Docker, Deequ, PostgreSQL, MySQL, Git CI/CD strategies, Scrum methodologies, and in the modeling of Data Lake/Warehouse/Lakehouse architectures.

  • Data Engineer
    6/1/2023 - Present

    Designed, implemented, and maintained a comprehensive Data Lakehouse environment using AWS and Databricks, focusing on efficient data storage and processing. Expertly managed data extraction, transformation, and loading (ETL/ELT) processes, ensuring seamless data flow and consistency. Orchestrated data ingestion pipelines from APIs and databases utilizing Apache Nifi, Apache Airflow, and various AWS services including DMS, SNS, SQS, API Gateway, and S3.

    Implemented parallel data processing pipelines for both batch and streaming data using Spark (PySpark), Delta Lake, and Databricks, optimizing data throughput and reliability. Developed data pipelines with Kedro and engaged in relational and dimensional data modeling techniques, including Star schema, Snowflake, and Data Vault. Enhanced data quality through automated tests using Deequ (PyDeequ) and ensured code reliability with comprehensive unit and integration tests developed with Pytest.

    Established and maintained CI/CD pipelines using GitLab CI/CD, contributing to continuous integration and deployment flow. Deployed open-source tools in cloud environments leveraging Docker and AWS EC2 for scalable solutions. Facilitated training on Data Lakehouse concepts and practices, enhancing team knowledge and performance. Managed code versioning and collaborative development using Git (Gitlab) and produced clear project documentation with MkDocs.

    Led the project “Financial Transaction Recurrence Classification System, where the infrastructure was developed to classify financial transactions as recurring or not. This involved setting up an event-driven architecture using AWS services (API Gateway, SNS, SQS), Apache Nifi, Databricks, PyDeequ, and PostgreSQL, ensuring accurate data ingestion and processing. Gathered requirements effectively for data engineering projects, ensuring alignment with stakeholder needs and project goals.

    Developed proficiency in Python, SQL, R, C, and C++, in addition to hands-on experience with AWS, Databricks, Spark (PySpark), Delta Lake, Apache Nifi, Apache Airflow, Kedro, Docker, Deequ (PyDeequ), PostgreSQL, Git (Gitlab), CI/CD (GitLab CI/CD), Streamlit, and Linux. Gained expertise in Data Lake, Data Warehouse, and Data Lakehouse architectures, and various data modeling methods such as Star Schema, Snowflake, and Data Vault. Applied Agile methodologies like Scrum and Kanban effectively.

    Strengthened numerous soft skills including continuous learning, proactivity, organization, time management, teamwork, communication, problem-solving, analytical thinking, and self-learning, all contributing to consistently high project performance and successful outcomes.

  • Scholar in Artificial Intelligence Project
    5/1/2022 - 5/1/2023

    Focused on developing a project aimed at assisting experts in detecting fraud in bidding and public contracts for the Federal Police, leveraging extensive Python programming for web scraping to collect publications from the Official Gazette of the Union. Demonstrated proficiency in data cleaning, processing, and preparation using Python to ready data for training Deep Learning models, specifically Transformers. Created a comprehensive database of 200,000 annotated publications tailored for Named Entity Recognition (NER). Trained Deep Learning models with spaCy for NER, achieving a commendable 94% accuracy in Named Entity Recognition.

  • Product Engineering Intern
    12/1/2021 - 5/1/2023

    Developed proficiency in the cleaning, treatment, and preparation of sensor and IoT device data using Python and Pandas, facilitating further analytical processes. Analyzed product performance data utilizing Python, Excel, and Minitab, ensuring thorough insights and data-driven decision-making. Created comprehensive reports with Power BI, effectively communicating findings to stakeholders. Led the digitization and process automation (RPA) strategy within the department, spearheading initiatives aimed at improving operational efficiency. Implemented and maintained RPA projects using Python, showcasing technical adeptness in automating routine tasks. Authored and presented a paper at the International Symposium on Automotive Engineering (SIMEA) 2023, detailing the application of image processing techniques with Python to quantify corrosion in automotive products. Conducted presentations for internal and global teams, as well as suppliers, demonstrating strong communication skills and the ability to articulate complex technical concepts.

  • Control and Automation Engineering at University of Brasília (UnB)
    2018 - 2023

  • Databricks Certified Data Engineer Associate at Databricks
    12/1/2023

  • AWS Certified Cloud Practitioner at AWS
    12/1/2023

  • Academy Accreditation - Databricks Lakehouse Fundamentals at Databricks
    9/1/2023

  • Big Data Analytics with R and Microsoft Azure Machine Learning at Data Science Academy
    3/1/2023

  • Big Data Fundamentals 3.0 at Data Science Academy
    12/1/2022

  • Introduction to Data Science 3.0 at Data Science Academy
    12/1/2022

  • English (Advanced C1) at Cooplem Idiomas
    6/1/2018

Vinícius is available for hire

Hire Vinícius A.
Check icon

All Howdy Candidates are vetted for skills and english proficiency.