Howdy Logo
Image of Leonardo H.

Leonardo H.
Data Scientist

Sql
Python
Amazon Aws
Bio

A Data Engineer with a specialization in deriving valuable insights from datasets through scalable solutions involving AWS, Python (including libraries like pandas, polars, numpy), and Pyspark. Expertise lies in developing robust data pipelines that ensure secure collection, storage, and processing of information. An advocate for automation to enhance process efficiency, with a strong inclination towards innovation and collaborative work environments.

Proficient in maintaining a data lake using AWS services, optimizing legacy systems, and devising new methods for data ingestion and processing from various sources to improve data analysis quality. Experienced in constructing streaming data pipelines with contemporary technologies.

Conducted research utilizing Natural Language Processing (NLP) and Machine Learning to process and extract topics from extensive text volumes, resulting in visualizations that facilitate the comprehension of real-world events. This research culminated in the publication of an article in an SBC journal.

Possesses technical proficiency in Python, Pyspark, AWS Lambda, AWS Glue, EC2, Docker, Kubernetes, Step Functions, ECS, ECR, DynamoDB, API Gateway, Athena, Lake Formation, CloudFormation, shell scripting, SQL, Power BI, SQLite, fastAPI, NLP, Machine Learning, ETL, and ELT.

Portfolio available on GitHub at: [GitHub Portfolio](https://github.com/leonardorh18).

Published article accessible at: [SBC Journal Article](https://sol.sbc.org.br/journals/index.php/isys/article/view/2307).

  • Data Engineer
    10/1/2022 - Present

    Developed technical expertise in ETL, ELT, and Data Lake implementation. Created pipelines to extract and refine data to meet end-user needs in Power BI, and built a Landing Zone layer with configurable parameters stored in DynamoDB to mask sensitive data arriving at the Data Lake. Operated and maintained data pipelines in the ingestion and processing layers using AWS services such as Lambda, EC2, Glue (Catalog and Jobs), SQS, S3, Athena, API Gateway, Step Functions, CloudWatch, Lake Formation, and AWS Secrets Manager. Enhanced features using Pyspark with Hudi and Delta tables to implement slowly changing dimensions type 2 (SCD2). Additionally, designed a data streaming pipeline using SQS, Kubernetes, Glue, SQLite, FastAPI, Redis, and Polars in Python.

  • Data Engineering Internship
    6/1/2022 - 9/1/2022

    Executed ETL and ELT activities using Python with PySpark, demonstrating strong expertise in data transformation and manipulation. Conducted extensive data analysis utilizing SQL, ensuring the integrity and efficiency of data workflows. Developed and maintained insightful business intelligence reports with Power BI. Leveraged AWS services to optimize data storage, processing, and analytics, showcasing a robust understanding of cloud-based data solutions.

  • Machine Learning Researcher
    6/1/2020 - 6/1/2022

    Developed deep expertise in web scraping, encompassing data extraction techniques from diverse web sources for analysis and insight generation. Proficient in Natural Language Processing (NLP) with experience in text analysis, sentiment analysis, and entity recognition. Executed machine learning projects involving model selection, training, and evaluation using Python and relevant libraries. Engaged in extensive data analysis to uncover patterns, trends, and actionable insights. Utilized statistical methods to interpret complex data sets and guide decision-making processes. Excelled in applying data science methodologies to solve real-world problems and optimize predictive models.

  • Computer Science at Federal University of Southern Border
    2019 - 2023

  • Scraping with Python: data collection on the web at Alura
    8/1/2022

  • Spark: Introducing the Tool at Alura
    7/1/2022

  • Data Visualization: Exploring with Seaborn at Alura
    7/1/2022

  • Data Science: Time Series Analysis at Alura
    7/1/2022

  • Data Science: Time Series Analysis at Alura
    7/1/2022

  • SQL with MySQL: Manipulate and Query Data at Alura
    6/1/2022

  • Python for Data Science: Language and Numpy at Alura
    6/1/2022

  • Python for Data Science: Functions, Packages, and Pandas at Alura
    6/1/2022

  • Python for Data Science at Alura
    6/1/2022

  • Python Pandas: Handling and Analyzing Data at Alura
    6/1/2022

  • Python Collections Part 1: Lists and Tuples at Alura
    6/1/2022

  • Pandas: Different Input and Output Formats (IO) at Alura
    6/1/2022

  • LINUX II: Programs, Processes, and Packages at Alura
    6/1/2022

  • LINUX I: KNOWING AND USING THE TERMINAL at Alura
    6/1/2022

  • SQL Queries: Advancing in SQL with MySQL at Alura
    6/1/2022

  • Big Data Fundamentals 3.0 at Data Science Academy
    6/1/2022

Leonardo is available for hire

Hire Leonardo H.
Check icon

All Howdy Candidates are vetted for skills and english proficiency.