Leonardo H.
Data Scientist

Skills

Sql

Python

Amazon Aws

Leonardo is available for hire

Hire Leonardo H.

All Howdy Candidates are vetted for skills and english proficiency.

Bio

A Data Engineer with a specialization in deriving valuable insights from datasets through scalable solutions involving AWS, Python (including libraries like pandas, polars, numpy), and Pyspark. Expertise lies in developing robust data pipelines that ensure secure collection, storage, and processing of information. An advocate for automation to enhance process efficiency, with a strong inclination towards innovation and collaborative work environments.

Proficient in maintaining a data lake using AWS services, optimizing legacy systems, and devising new methods for data ingestion and processing from various sources to improve data analysis quality. Experienced in constructing streaming data pipelines with contemporary technologies.

Conducted research utilizing Natural Language Processing (NLP) and Machine Learning to process and extract topics from extensive text volumes, resulting in visualizations that facilitate the comprehension of real-world events. This research culminated in the publication of an article in an SBC journal.

Possesses technical proficiency in Python, Pyspark, AWS Lambda, AWS Glue, EC2, Docker, Kubernetes, Step Functions, ECS, ECR, DynamoDB, API Gateway, Athena, Lake Formation, CloudFormation, shell scripting, SQL, Power BI, SQLite, fastAPI, NLP, Machine Learning, ETL, and ELT.

Portfolio available on GitHub at: [GitHub Portfolio](https://github.com/leonardorh18).

Published article accessible at: [SBC Journal Article](https://sol.sbc.org.br/journals/index.php/isys/article/view/2307).

Data Engineer
10/1/2022 - Present

Developed technical expertise in ETL, ELT, and Data Lake implementation. Created pipelines to extract and refine data to meet end-user needs in Power BI, and built a Landing Zone layer with configurable parameters stored in DynamoDB to mask sensitive data arriving at the Data Lake. Operated and maintained data pipelines in the ingestion and processing layers using AWS services such as Lambda, EC2, Glue (Catalog and Jobs), SQS, S3, Athena, API Gateway, Step Functions, CloudWatch, Lake Formation, and AWS Secrets Manager. Enhanced features using Pyspark with Hudi and Delta tables to implement slowly changing dimensions type 2 (SCD2). Additionally, designed a data streaming pipeline using SQS, Kubernetes, Glue, SQLite, FastAPI, Redis, and Polars in Python.
Data Engineering Internship
6/1/2022 - 9/1/2022

Executed ETL and ELT activities using Python with PySpark, demonstrating strong expertise in data transformation and manipulation. Conducted extensive data analysis utilizing SQL, ensuring the integrity and efficiency of data workflows. Developed and maintained insightful business intelligence reports with Power BI. Leveraged AWS services to optimize data storage, processing, and analytics, showcasing a robust understanding of cloud-based data solutions.
Machine Learning Researcher
6/1/2020 - 6/1/2022

Developed deep expertise in web scraping, encompassing data extraction techniques from diverse web sources for analysis and insight generation. Proficient in Natural Language Processing (NLP) with experience in text analysis, sentiment analysis, and entity recognition. Executed machine learning projects involving model selection, training, and evaluation using Python and relevant libraries. Engaged in extensive data analysis to uncover patterns, trends, and actionable insights. Utilized statistical methods to interpret complex data sets and guide decision-making processes. Excelled in applying data science methodologies to solve real-world problems and optimize predictive models.