About the Syapse Data Science Team
We work in the intersection of data science, statistics, data engineering and clinical informatics. We are solving a clinical domain problem (cancer and genetics) with the target of achieving human grade intelligence in extracting, validating, evaluating data quality at scale.
We primarily focus on NLP, text data processing, modeling and algorithms. A lot of focus is on building a generalized model and handling the last 10% at scale. We have done a lot of work to achieve accuracy in the 90+%. We pay extra attention to data security.
Syapse is well positioned as leader in precision medicine space with strong backings from hospitals and life science partners. We have also signed multi-year collaboration effort with the FDA on cancer research.
Our point of view about this position
As a Senior Data Scientist, you will be part of a world-class team that is focused on preparing data to apply machine learning (traditional and deep neural networks) techniques, explore different models, design validation workflow to increase model accuracy, doing statistical analysis, and building high-quality prediction systems integrated with our clinical data offerings. Experience in making sound judgement to weigh between what is practical and bleeding edge to fit the business and clinical need is crucial.
- Understand and research machine learning techniques to apply the best ML model to solve different problems in the clinical domain, with high accuracy.
- Process, maintain and generate knowledge graphs and databases to facilitate data validation and discovery of insights and patterns.
- Convert existing patient, clinical and molecular data into ground truth training data, explore methodologies to extend data sets. Help evaluate, define and measure data gaps that need to be filled for the generation of valuable insights.
- Automate a machine learning ecosystem using anomaly detection and novel techniques to continually improve model accuracy.
- Provide guidance to build repeatable, generalizable machine learning (NLP) pipeline to process free text at scale.
- Expected to write a technical specification, code review, design review and able to communicate and present well to different stakeholders at different development centers.
- Keep abreast of the latest models and techniques to solve the ever-changing demands of handling clinical data.
What you bring to the table:
- Expected to understand the full breadth of NLP problem areas, including but not limited to Word sense disambiguation, normalization, context boundary detection, anomaly detection, sentiment analysis, and prediction, named entity recognition, fuzzy matching.
- Experience with word embeddings, statistical graphical models, ngrams, skipgrams, Word2vec, RNN, Autoencoder, autoregressive modeling, Bert, Transformer, XLNet, and classic models - xGBoost, clustering, bayesian.
- Experience with knowledge graph - construction and usage.
- Experience in translation technologies is not required but a plus.
- Experience with common python data science toolkits and IDE, such as scikit-learn, keras, pandas, numpy, gensim, nltk, tensorflow, pyTorch, PyCharm, Jupyter.
- Data-oriented personality, attention to detail while able to simplify and draw insight from complex data.
- Good software development practices - clear documentation, validation and regressions, modular and maintainable code a must.
- Sound judgment and great communication skills.
- Experience with AWS desirable
- Ph.D. with 2+ years of experience or MS degree with 5+ years of industry or research experience in NLP. BS with demonstrable depth in relevant experience will also be considered.