A Data Engineer with a specialization in deriving valuable insights from datasets through scalable solutions involving AWS, Python (including libraries like pandas, polars, numpy), and Pyspark. Expertise lies in developing robust data pipelines that ensure secure collection, storage, and processing of information. An advocate for automation to enhance process efficiency, with a strong inclination towards innovation and collaborative work environments.
Proficient in maintaining a data lake using AWS services, optimizing legacy systems, and devising new methods for data ingestion and processing from various sources to improve data analysis quality. Experienced in constructing streaming data pipelines with contemporary technologies.
Conducted research utilizing Natural Language Processing (NLP) and Machine Learning to process and extract topics from extensive text volumes, resulting in visualizations that facilitate the comprehension of real-world events. This research culminated in the publication of an article in an SBC journal.
Possesses technical proficiency in Python, Pyspark, AWS Lambda, AWS Glue, EC2, Docker, Kubernetes, Step Functions, ECS, ECR, DynamoDB, API Gateway, Athena, Lake Formation, CloudFormation, shell scripting, SQL, Power BI, SQLite, fastAPI, NLP, Machine Learning, ETL, and ELT.
Portfolio available on GitHub at: [GitHub Portfolio](https://github.com/leonardorh18).
Published article accessible at: [SBC Journal Article](https://sol.sbc.org.br/journals/index.php/isys/article/view/2307).