Marc Matt, Developer in Hamburg, Germany
Marc is available for hire
Hire Marc

Marc Matt

Verified Expert  in Engineering

Data Engineer and Developer

Location
Hamburg, Germany
Toptal Member Since
January 5, 2021

Marc是一名对数据充满热情的数据工程师,在领导团队和构建专注于信息技术的数据平台方面拥有15年以上的经验, real estate, and services industries. 他创建了一个基于python的AVRO模式生成器,使方案的部分可重用. Marc excels with automation, integrations, analysis, the building of models, statistics, big data, CI/CD pipelines, and data modeling.

Portfolio

Bold Metrics Inc.
SQL, Tableau, Python,数据分析,数据构建工具(dbt), Apache气流...
MediaMarktSaturn Retail Group
Python 3, Google Cloud, Google Kubernetes Engine (GKE), Apache NiFi...
Spin (Tier Mobility) - Main
SQL、ETL、云架构、谷歌云平台(GCP)、大数据...

Experience

Availability

Part-time

Preferred Environment

Apache气流,Tableau服务器,Tableau, SQL, Pandas, Python, Apache Beam, Git, Linux

The most amazing...

...我开发的应用程序可以实时提供姿势估计数据,以帮助优化客户的健身目标.

Work Experience

Senior Data Analyst

2023 - 2023
Bold Metrics Inc.
  • Created a template for ad hoc reporting for all clients.
  • 使用Amazon Kinesis设计并实现了流数据进入数据仓库, Lambda, and Python.
  • 在Redshift数据仓库中优化和标准化转换.
Technologies: SQL, Tableau, Python,数据分析,数据构建工具(dbt), Apache气流, Amazon Kinesis, AWS Lambda, Serverless Framework

Data Engineer

2022 - 2022
MediaMarktSaturn Retail Group
  • 建立了全国配送中心的供应链监控系统.
  • 对所有物流服务供应商实施api,并将其转换为公司范围内的报告.
  • 在GKE上使用Apache NiFi建立一个实时订单跟踪系统.
Technologies: Python 3, Google Cloud, Google Kubernetes Engine (GKE), Apache NiFi, Google BigQuery, SQL, Data Build Tool (dbt), Docker, Apache Airflow, Apache Beam, Google Data Studio, Database Schema Design, Data Management, Terraform, Google Cloud Platform (GCP), Google Cloud Functions, Cloud Run, Cloud Tasks, Node.js, APIs, Serverless, Data Lakes, Data Visualization, Kubernetes, Scaling, Dashboards, Data Wrangling, Azure Databricks, Database Architecture, ETL Tools

Cloud Data Engineer and Architect

2021 - 2022
Spin (Tier Mobility) - Main
  • 使用Google Vertex AI设计并建立了MLOps工作流.
  • Operationalized ML models for real-time use cases.
  • 准备将DWH从BigQuery迁移到Snowflake.
  • 建立交通违章事件的操作支援工具.
Technologies: SQL、ETL、云架构、谷歌云平台(GCP)、大数据, Architecture, Python, Snowflake, Hadoop, REST APIs, Apache Airflow, Git, DevOps, Microservices, Google BigQuery, Big Data Architecture, Machine Learning Operations (MLOps), CI/CD Pipelines, Cloud Security, Data Warehousing, Data Warehouse Design, Apache Avro, Kubeflow, Fivetran, Database Schema Design, Data Management, Terraform, Google Cloud Functions, Cloud Run, APIs, Serverless, Data Lakes, Kubernetes, Scaling, Data Wrangling, Database Architecture, ETL Tools

ETL Engineer

2021 - 2021
Food Marketing Company
  • Parsed JSON data in Talend and loaded it into Redshift.
  • Integrated data from web APIs with Talend into Redshift.
  • 使用Talend转换客户数据并将其加载到Salesforce.
技术:Talend, JSON,红移光谱,红移,api,数据争用,ETL工具

Data Engineer

2021 - 2021
Janus
  • 将传统ETL管道转换为可扩展的AWS Glue作业.
  • Automated resource deployment using AWS CloudFormation.
  • 在PySpark中设计和构建框架,使将来添加管道更容易.
Technologies: AWS Glue, Spark, SQL, Amazon Aurora, Python, Database Schema Design, Data Management, Serverless, Apache Spark, PySpark, Scaling, Data Wrangling, Database Architecture, ETL Tools, AWS IAM

Senior Data Engineer

2021 - 2021
Emma
  • 为数据平台设计了一个新的数据输入API,支持流分析.
  • 使用Kinesis设置binlog流处理和实时事件解析, Lambda, and Kinesis Data Firehose.
  • 通过分析查询和表来优化Redshift中的数据加载,以添加优化的排序和磁盘键.
Technologies: Python, Amazon Kinesis, Amazon Web Services (AWS), Redshift, Redshift Spectrum, Matillion ETL for Redshift, AWS Lambda, Parquet, AWS Fargate, Docker, Databases, Database Schema Design, Data Management, Terraform, APIs, Serverless, Kubernetes, Scaling, Data Wrangling, Database Architecture, ETL Tools, Amazon Elastic MapReduce (EMR), Amazon EKS, AWS IAM

Data Specialist

2020 - 2021
Ear-Reality GmbH
  • 开发了一个基于Kinesis和Athena的数据湖,包括在Metabase中嵌入报表.
  • 将生产系统转移到无服务器可扩展架构.
  • 使用Python和Locust对应用程序进行自动负载测试.io.
Technologies: Amazon Web Services (AWS), SQL, Amazon Kinesis, Amazon Athena, AWS Elastic Beanstalk, Docker, Python, AWS CloudFormation, Databases, Data Reporting, Business Intelligence (BI), Database Schema Design, Data Management, Terraform, APIs, Data Visualization, Dashboards, Data Wrangling

Senior Data Engineer

2018 - 2020
Engel & Völkers
  • 设计并搭建了一个数据平台,包括工具选择和数据建模.
  • 建立了一个TensorFlow模型来预测实时环境中的属性值.
  • 实现了CI/CD管道来自动部署数据平台的所有特性.
Technologies: Jenkins, SQL, Tableau, BigQuery, Apache Beam, Apache Airflow, TensorFlow, Google Kubernetes Engine (GKE), Docker, Python, Data Engineering, Data Architecture, Data Analysis, NoSQL, Google BigQuery, Data Pipelines, ETL, Data Warehouse Design, Data Warehousing, Database Modeling, Data Modeling, Google Cloud Platform (GCP), Google Cloud SQL, Data Science, Databases, Data Reporting, Business Intelligence (BI), Database Schema Design, Data Management, Google Cloud Functions, Cloud Run, APIs, Serverless, Data Lakes, Data Visualization, Scaling, Dashboards, Data Wrangling, Database Architecture, ETL Tools

Head of Data Engineering | Machine Learning

2014 - 2018
Surf Media
  • 领导一个六人的团队,并负责他们的个人发展.
  • 设计大数据系统和数据湖,包括工具选择和数据建模.
  • 为推荐引擎和欺诈的开发设计数据管道和模型选择. The recognition systems work in a real-time environment.
  • Created the technology roadmap. Oversaw the advancement of all affected data systems.
Technologies: TensorFlow, RabbitMQ, Apache Avro, Tableau, Hortonworks Data Platform (HDP), SQL, Apache NiFi, Apache HAWQ, Talend, Python, Data Engineering, PostgreSQL, Amazon S3 (AWS S3), AWS Lambda, Data Architecture, Amazon Web Services (AWS), NoSQL, Data Pipelines, ETL, Data Warehousing, Data Warehouse Design, Database Modeling, Data Modeling, Talend ETL, Data Science, Databases, Data Reporting, Business Intelligence (BI), Database Schema Design, Data Management, APIs, Spark, Data Visualization, PySpark, Scaling, Dashboards, Data Wrangling, Database Architecture, ETL Tools

Business Intelligence Analyst

2012 - 2014
Surf Media
  • 为由五家公司组成的公司集团设计、开发和运营DWH.
  • Developed a statistical model for predicting orders.
  • 分析客户,了解如何在社交网络中优化收益.
Technologies: Tableau, Perl, Python, MySQL, Data Engineering, PostgreSQL, Data Architecture, Amazon Web Services (AWS), Data Pipelines, ETL, Data Warehouse Design, Data Warehousing, Database Modeling, Data Modeling, Talend ETL, Databases, Data Reporting, Business Intelligence (BI), Database Schema Design, Data Management, APIs, Spark, Data Visualization, Apache Spark, PySpark, Scaling, Dashboards, Data Wrangling, ETL Tools

Database Consultant

2010 - 2012
EOS Information Services, GmbH.
  • 为风险管理中的决策引擎设计、开发和操作DWH.
  • Designed processes for risk management.
  • 使用Perl和Uniserv完成了地址管理过程的构思和开发.
Technologies: Oracle, Java, Perl, Data Engineering, Data Architecture, Data Analysis, Data Pipelines, ETL, Data Warehousing, Data Warehouse Design, Database Modeling, Databases, Business Intelligence (BI), Database Schema Design, Data Management, ETL Tools

Datawarehousing Consultant

2009 - 2010
Key-Work Consulting, GmbH.
  • Migrated the sales reporting for a mailorder company.
  • 开发了一个统计模型来优化邮购公司的销售计划.
  • 建立了动态运输计划的统计模型.
Technologies: Python, SQL, SQL Server 2010, Data Engineering, Data Analysis, Data Pipelines, ETL, Data Warehousing, Data Warehouse Design, Database Modeling, Data Modeling, Databases, Data Reporting, Business Intelligence (BI), Data Management, Dashboards, ETL Tools

Database Management

2008 - 2009
Coxulto Marketing Solutions, GmbH.
  • 为市场营销活动定义和选择目标群体.
  • 完成对整个客户群的亲和力分析.
  • 管理和操作地址数据库,包括重复终止.
Technologies: Perl, SQL, Data Engineering, Data Analysis, ETL, Data Warehouse Design, Data Warehousing, Databases, Data Reporting, Business Intelligence (BI), Dashboards, ETL Tools

Lead of Business Intelligence Consumer Products

2007 - 2008
1&1 Internet A
  • 协调和优先处理商业智能团队的所有任务.
  • 为董事会设计和制定KPI报告.
  • 分析客户结构,建立客户流失预测模型.
Technologies: Java, Perl, Data Engineering, Data Analysis, ETL, Data Warehouse Design, Data Warehousing, Database Modeling, Data Modeling, Databases, Data Reporting, Business Intelligence (BI), Data Visualization, Dashboards, ETL Tools

Business Intelligence Analyst

2003 - 2007
1&1 Internet AG
  • 设计和开发客户和合同库存的自动报告系统, as well as internet usage and customer behavior.
  • 将公司网站的客户使用数据整合到DWH中.
  • 协调管理部门和开发部门之间的所有任务.
  • 分析所有新老客户活动的有效性.
Technologies: Java, MySQL, Perl, Data Engineering, Data Analysis, ETL, Data Warehouse Design, Data Warehousing, Databases, Data Reporting, Business Intelligence (BI), Data Visualization, Dashboards, ETL Tools

AVRO Schema Generator

http://gitlab.com/datascientists.info/avro-generator
A Python-based AVRO schema generator I developed myself, that adds the ability to make parts of a schema reusable. 这是有用的,因为AVRO本身不提供此功能.

If certain data structures are used in several schemas, 该工具只提供一次定义这些结构,然后在多个模式上重用它们的能力.

Evalution of Property Value

我构建了一个基于Python/ tensorflow的深度学习模型和API,用于根据地理位置和其他属性预测房地产价格. 该值是使用集成在客户端网站上的Flask REST API实时预测的.

Design and Set-up of Data Platform

整合社交媒体公司所有相关数据的平台, where I designed and helped setting up various tools. 该平台为操作决策支持和分析工作负载提供了对所有数据的实时访问.

Languages

Python, SQL, Perl, Java, XML, Snowflake, Python 3, TypeScript

Tools

BigQuery, Apache HAWQ, Apache Avro, Git, Apache Beam, Tableau, Apache Airflow, Jenkins, Apache NiFi, RabbitMQ, Microsoft Excel, Terraform, Amazon Elastic MapReduce (EMR), Amazon EKS, AWS IAM, Google Kubernetes Engine (GKE), Talend ETL, Amazon Athena, AWS CloudFormation, Redshift Spectrum, Matillion ETL for Redshift, AWS Fargate, AWS Glue

Paradigms

ETL、商业智能(BI)、数据科学、DevOps、微服务

Platforms

Amazon Web Services (AWS), Linux, Docker, Talend, Hortonworks Data Platform (HDP), Oracle, AWS Lambda, Google Cloud Platform (GCP), Kubernetes, AWS Elastic Beanstalk

Storage

MySQL, Google Cloud, Database Modeling, Redshift, Databases, Database Architecture, SQL Server 2010, Data Pipelines, Amazon S3 (AWS S3), PostgreSQL, Google Cloud SQL, Data Lakes, Apache Hive, HDFS, NoSQL, Amazon Aurora, JSON

Other

Data Visualization, Data Analysis, Data Architecture, Data Engineering, Data Warehousing, Data Modeling, Data Warehouse Design, Data Reporting, Database Schema Design, Data Management, Google Cloud Functions, Cloud Run, APIs, Data Wrangling, ETL Tools, Tableau Server, Google BigQuery, Data Profiling, Google Data Studio, Fivetran, Serverless, Scaling, Dashboards, Amazon Kinesis, Parquet, Cloud Architecture, Big Data, Architecture, Big Data Architecture, Machine Learning Operations (MLOps), CI/CD Pipelines, Cloud Security, Kubeflow, Data Build Tool (dbt), Cloud Tasks, Azure Databricks

Frameworks

Spark, Apache Spark, Flask, Django, Hadoop,无服务器框架

Libraries/APIs

Pandas, PySpark, TensorFlow, REST APIs, Node.js

AUGUST 2019 - AUGUST 2021

Google Cloud Certified - Professional Data Engineer

Google