What are the responsibilities and job description for the Bigdata Engineer position at Disys - Oak Brook?
Job Description
Position Details:
Job Title: Bigdata Engineer
Location: Tampa, FL
Duration: 12 Months Contract to hire
Job Responsibilities:
Principal Responsibilities
· Design interfaces to the data warehouses/data storages and machine learning/Big Data
· applications using open source tools such as Scala, Java, Python, Perl and shell scripting.
· Design and create data pipelines to maintain stable dataflow to the machine learning models –
· both in batch mode and near real-time mode.
· Interface with Engineering/Operations/System Admin/Data Scientist teams to ensure data
· pipelines and processes fit within the production framework.
· Ensure that tools and environments adhere to strict security protocols.
· Deploy the machine learning model and serve its outputs as RESTful API calls.
· Understand the business needs in close collaborations with subject matter experts (SMEs)
· and Data Scientists to do efficient feature engineering for machine learning models.
· Maintain the code and libraries in code repository.
· Work with system administration team to proactively resolve issues/install tools and libraries
· on the AWS platform.
· Research and come up with architecture and solutions most appropriate for problems at hand.
· Maintain and improve tools to assist Analytics in ETL, retrospective testing, efficiency,
· repeatability, and R&D.
· Lead by example regarding software best practices, including code style and architecture,
· documentation, source control, and testing.
· Support the Chief Data Scientist/Data Scientists/Big Data Engineers in creating new and novel
· approaches to solve challenging problems using Machine Learning, Big Data and Cloud
· technologies.
· Handle ADHOC requirements to create reports for the end users.
Required Skills
· Strong skills with Apache Spark (Spark SQL) and SCALA with at least 2 years of experience.
· Understanding of AWS Big Data components and tools.
· Strong Java skills with experience in web services and web development is required.
· Hands on experience with model deployment.
· Hands on experience in application deployment on Docker and/or Kubernetes or other similar technology.
· Linux scripting is a plus.
· Fundamental understanding of AWS cloud components.
· 2 years of experience in data ingesting, cleansing/processing, storing and querying large datasets
· 2 years of experience in engineering large-scale data solutions with Java/Tomcat/ SQL/Linux
· Experience working in a data intensive role including the extraction of data (db/web/api/etc.), transformation and loading (ETL)
· Exposure with structured and/or unstructured data contents
· Experience with data cleansing/preparation on Hadoop/Apache Spark Ecosystem – MapReduce/Hive/HBase/Spark SQL
· Experience with distributed streaming tools like Apache KAFKA.
· Experience with multiple file formats (Parquet, Avro, OCR)
· Knowledge in AGILE development cycle.
· Efficient coding skills to enhance the performance/cost savings of the job running on AWS platform.
· Experience in building stable, scalable, and high-speed live streams of data and serving web platforms
· Enthusiastic self-starter with ability to work in a team environment.
· Graduate (MS) or Undergraduate degree in Computer Science/ Engineering/relevant field
Nice to have:
· Strong Software development experience
· Machine Learning model deployment experience
· Ability to write custom Map/Reduce programs to clean/prepare complex data
· Familiarity with Streaming data processing - Experience with distributed real time computation system like Apache STORM/Apache Spark Streaming.
Additional Information
All your information will be kept confidential according to EEO guidelines.