About the role
You’ll sit at the heart of the team that develops, productionizes and maintains Syapse's analytics platform including its data lake, data warehouse and a code base of high-quality data processing, analysis and visualization algorithms. You will work in close collaboration with clinical informatics, other platform teams and our partners to achieve groundbreaking results that advance oncological treatments.
The team will build and maintain a data lake and a data warehouse that stores data in organized format with metadata, and allows easy exploration, analysis and visualization of this data by our clinical informatics, internal and partner analytics teams. The team will also help develop and maintain a repository of scalable, reusable tools for data processing and visualization.
- Designs and implements cloud scale distributed ETL systems, services and frameworks including solutions to address high-volume and complex data collection, processing, transformation and reporting for analytical purposes.
- Works with the team's architects to develop architectural blueprints, and a long-term technical roadmap for Syapse’s analytics platform. Balances their focus between both the immediate needs and the long-term view.
- Writes code and unit tests, works on API specs, automation, and conducts code reviews and testing
- Owns technical aspects of software development and identifies opportunities to adopt innovative technologies.
- Identifies continuous improvements for service availability.
- Evaluates and recommends tools, technologies and processes to ensure that the services that the team provides achieve the highest standards of quality and performance.
- Debugs and troubleshoots problems in data flow, lineage, transformation and other stages of the ETL pipelines.
- Collaborates with other peer organizations (e.g., QA, DevOps, technical support, etc.) to prevent and resolve technical issues and provide technical guidance.
- Mentors Junior engineers within the team.
- B.S. in Computer Science or 5+ years of experience with delivering production quality software
- Expert in Python & Apache Spark
- Strong and demonstrable experience with more than One of the following:
- Relational Stores (E.g Postgres, MySQL, Oracle)
- Columnar or NoSQL Stores (Redshift, Cassandra, DynamoDB)
- Graph Stores (Neo4J, Titan, Triple/RDF Stores)
- Document Stores (Postgres JSONB, MongoDB)
- In memory stores (Redis, memsql)
- Other Distributed Processing Engines (Apache Storm, Celery, etc...)
- Distributed Queues (Kinesis, Apache Kafka, RabbitMQ)
- Worked with multiple types of databases, including both relational and non-relational
- Expert in SQL for extraction, querying and handling large amounts of data
- Experience in building data analytics platforms
- Experience working with AWS or similar public cloud platform technologies
- Working with partner data scientists, analytics, and other experts to understand their needs and develop solutions
Nice To Have
- Experience with MVC or MVCS frameworks such as Django
- Knowledge of data science techniques like supervised ML algorithms, clustering, or natural language processing
- Knowledge of healthcare datasets
- Sensitivity to healthcare data
- Having worked in regulated industry
- Knowledge of hierarchical, relational and unnormalized data formats
- Experience with visualization software (e.g Tableau, SpotFire etc)