Job Search and Career Advice Platform

Enable job alerts via email!

NLP Data Scientist/Scientific Data Engineer x2

Open Targets

United Kingdom

Hybrid

GBP 125,000 - 150,000

Full time

Today
Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading biomedical research organization in the UK seeks a skilled candidate to develop NLP methods for extracting drug-related information. You will collaborate with safety scientists from major pharmaceutical partners to enhance drug safety evaluation. Candidates should possess a PhD or a Master's degree in computational linguistics or related fields, with strong skills in machine learning and Python. Benefits include generous leave, flexible working, private medical insurance, and a relocation package.

Benefits

Private medical insurance
30 days annual leave
Flexible working arrangements
Generous family benefits

Qualifications

  • Experience with language models such as transformer models and LLMs.
  • Experience in data cleaning and transformation techniques.
  • Knowledge of cheminformatics and bioinformatics databases.

Responsibilities

  • Develop machine learning pipelines to extract drug side effects.
  • Investigate NLP methodologies for data extraction.
  • Collaborate with scientists to assess and refine methods.

Skills

Machine learning pipelines development
NLP methodologies
Data extraction from scientific literature
Python programming
Collaboration skills
Attention to detail
Version control

Education

PhD or Masters in computational linguistics, computer science, bioinformatics, or cheminformatics

Tools

pySpark
pandas
Job description
The Chemical Biology Services team at EMBL-EBI provides world-leading chemogenomics resources to the scientific community including ChEMBL, a database of quantitative small-molecule bioactivity data curated primarily from the scientific literature widely used to support drug discovery projects in industry and academia. The Safety 2.0 project is funded by Open Targets, a unique public-private partnership working to deliver experimental data and informatics resources that enable scientists to make more informed decisions about target selection for developing safer and more effective drugs. You will interact with safety scientists from Open Targets pharmaceutical partners MSD, Genentech, GSK, Pfizer, and Sanofi to understand requirements and how to help contribute to evaluating drug and target safety. You will be embedded at the world-leading EMBL-EBI, and will work collaboratively across the Chemical Biology Services and Open Targets groups, benefitting from a range of multi-disciplinary expertise and technologies. * Develop machine learning pipelines for extracting drug side effects from drug labels, clinical trials, publications and other documents* Investigate modern NLP methodologies and propose ideas for the implementation of data extraction methods and pipelines* Apply language models to extract and map drug-related information from unstructured text, e.g. from the scientific literature, ClinicalTrials.gov* Implement and/or fine-tune different NLP models, e.g. NER models, transformer models, LLMs* Integrate project workflows with existing infrastructures in the EBI Chemical Biology Services and Open Targets teams* Prepare and evaluate benchmark datasets from the open domain as training sets for NLP models* Work with domain experts to develop new gold standards for NLP tasks where needed* Assist with and/or perform data curation to prepare clean and reliable training sets* Apply and/or adapt existing methods for mapping extracted entities to biomedical ontologies, e.g. drugs, side effects/phenotypes, and diseases* Work closely with Safety 2.0 project group members bridging the ChEMBL and Open Targets teams* Work closely with the Open Targets Core team to ensure seamless integration of data and workflows into the Open Targets Platform and long-term sustainability* Collaborate with the Open Targets Partners to assess, prioritise, validate and refine the developed methods* Disseminate the outcomes of the project to the scientific community and stakeholders through presentations and publications* PhD, Masters or equivalent experience in computational linguistics, computer science, bioinformatics, or cheminformatics* Experience with language models e.g. transformer models, LLMs, AI agents for information extraction* Experience with document and text preprocessing, cleaning and transformation techniques including mapping to ontologies* Experience with data structures, data models and databases* Knowledge of cheminformatics resources and/or bioinformatics databases* Knowledge of data analysis and machine learning* Proficiency in Python* Knowledge of data frameworks e.g. pySpark, pandas, Polar* Excellent attention to detail* Strong communication skills, both presentations and verbal* Experience working in a team-oriented environment and working collaboratively* Able to work independently, to manage your time and work to deadlines* Experience with the application of NLP methods to cheminformatics and/or biomedical domains* Experience with version control* Experience in Safety/toxicology in industry or researchSalary: Grade 5 to Grade 6, depending on experience, qualifications. Monthly salary starting at £3,303 to £3,695 after tax but excl. pension & insurances) + other paid benefits based on personal circumstances* **Financial incentives:**Monthly family, child and non-resident allowances, annual salary review, pension scheme, death benefit, long-term care, accident-at-work and unemployment insurances* **Flexible working arrangements -** including hybrid working patterns* **Private medical insurance** for you and your immediate family (including all prescriptions and generous dental & optical cover)* **Generous time off:** 30 days annual leave per year, in addition public holidays* **Relocation package**including installation grant (if required)* **Campus life:**Free shuttle bus to and from work, on-site library, subsidised on-site gym and cafeteria, casual dress code, extensive sports and social club activities (on campus and remotely)* **Family benefits:** On-site nursery, 10 days of child sick leave, generous parental leave, holiday clubs on campus and monthly family and child allowances* **Benefits for non-UK residents:** Visa exemption, education grant for private schooling, financial support to travel back to your home country every second year and a monthly non-resident allowance.### Generous employee benefitsPlease visit this to find out more about the benefits at EMBL.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.