Indore, India | 9131291215 | [login to view URL] [login to view URL] | [login to view URL]
Experienced Data Engineer and Machine Learning Engineer with 8.7 years of hands-on expertise. I am proficient in Spark, Java-based architectures, Hadoop Framework, and Python data analysis, with a strong focus on Py Spark. Well-versed in CDP, leveraging Py Spark for scalable big data solutions. Adept at harnessing the power of CDP for advanced analytics and machine learning. Seeking a dynamic role that capitalizes on my skills to drive innovation and contribute to transformative data-driven solutions.
Professional Summary:
•
Architected and implemented scalable and fault-tolerant data storage solutions using HDFS within the CDP ecosystem.
•
Ensured efficient and reliable distribution and replication of data across the Hadoop cluster.
•
Designed and implemented object storage solutions using Ozone, optimizing data storage and retrieval for large-scale distributed environments.
•
Developed and managed Hive-based data warehouses, facilitating efficient querying and analysis of structured data stored in CDP.
•
Managed and optimized resource allocation for various workloads by configuring and tuning YARN within the CDP architecture.
•
Developed and deployed Spark-based data processing applications, leveraging CDP's Spark integration for high-performance analytics and machine learning tasks.
•
Designed and implemented scalable and fault-tolerant data streaming solutions using Kafka within the CDP ecosystem.
•
Developed and managed data flow pipelines using NiFi, ensuring seamless and secure data movement between systems and applications within CDP.
•
Architected and optimized NoSQL data storage solutions using HBase, providing high-speed access to large datasets within the CDP architecture.
•
Implemented Phoenix for SQL querying on top of HBase tables, enabling efficient and interactive data retrieval within CDP.
•
Implemented and maintained connectors or drivers to enable seamless integration of MySQL and Oracle databases with the Cloudera ecosystem.
•
Developed and executed data extraction, transformation, and loading (ETL) processes to ingest MySQL and Oracle data into Cloudera's distributed file system.
•
Ensured efficient data processing by optimizing queries, indexing, and performance tuning for MySQL and Oracle databases within the Cloudera environment.
•
Collaborated with database administrators to design and implement schema structures suitable for Cloudera's distributed architecture.
•
Implemented and managed SDX features to provide a unified and secure data experience across different workloads within CDP.