Taro Logo

Language Data Scientist II, AWS AI Data | Transcribe

Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuously innovating.
$125,500 - $212,800
Data
Senior Software Engineer
Hybrid
5,000+ Employees
3+ years of experience
AI · Healthcare
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Language Data Scientist II, AWS AI Data | Transcribe

The Language AI Services Team in Amazon Web Services (AWS) is seeking a Language Data Scientist to join our data team focused on Health AI. In this role, you will curate, engineer, and analyze natural language datasets in the medical domain, as well as design experiments and surveys to elicit human-in-the-loop insights for developing AI-powered language applications. You will partner with language engineers, program managers, clinical experts, applied scientists, engineers, and product managers to deliver data solutions that meet customer needs.

Key responsibilities include:

  • Translating business, modeling, and ethical requirements in Health AI into executable data collection projects
  • Designing human-in-the-loop evaluation tasks for model performance and usability in the medical domain
  • Developing materials for data collection efforts (guidelines, annotation interfaces, quality assurance workflows)
  • Supporting the sourcing and creation of high-quality language datasets for feature and language expansion
  • Analyzing data to provide actionable recommendations for improving data quality and model performance
  • Innovating on data collection methodologies to improve turnaround time and reliability
  • Incorporating LLMs, prompt engineering, and ML techniques to automate annotation and data creation workflows
  • Staying updated with AI developments, focusing on model fine-tuning and evaluation data needs

You will lead critical data projects related to AWS HealthScribe and Amazon Transcribe Medical, propose data collection and annotation strategies, and design experiments for human-in-the-loop insights. You'll also work on optimizing data workflows and processes, leveraging ML and Generative AI techniques to improve data sourcing and metric generation.

The team is part of AWS Language AI Services, developing cutting-edge services across various industries. We prioritize customer obsession and delivering high-quality, integrity-driven solutions.

AWS values diverse experiences and encourages candidates from all backgrounds to apply. We offer inclusive team culture, mentorship, career growth opportunities, and strive for work-life harmony. The position allows for flexible, hybrid work options near U.S. Amazon offices.

Last updated 9 months ago

Responsibilities For Language Data Scientist II, AWS AI Data | Transcribe

  • Lead multiple data collection and data analysis efforts in Health AI
  • Curate, engineer, and analyze natural language datasets in the medical domain
  • Design experiments and surveys to elicit human-in-the-loop insights
  • Translate business, modeling, and ethical requirements into executable data collection projects
  • Develop materials for data collection efforts (guidelines, interfaces, quality assurance workflows)
  • Analyze structured and unstructured data to improve data quality and model performance
  • Iterate and innovate on data collection methodologies
  • Incorporate LLMs, prompt engineering, and ML techniques to automate annotation workflows
  • Stay up to date with developments in AI, focusing on model fine-tuning and evaluation data needs

Requirements For Language Data Scientist II, AWS AI Data | Transcribe

Python
  • PhD in a language and human behavior related field with a strong quantitative component (e.g., Cognitive Linguistics, Sociolinguistics, Human-Computer Interaction); or, a Master's degree with 3+ years of field experience
  • 2+ years of data scientist experience
  • 3+ years of experience with data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.)
  • Experience in data mining and cleaning for NLP machine learning model pipelines
  • Experience in language data collection for quantitative analysis, including guidelines, workflow design
  • Experience in research and experimental design involving human participants
  • Experience in statistical measures for data quality assessment and research hypotheses testing
  • Practical knowledge of data labeling tools and techniques (e.g., Amazon SageMaker Ground Truth, brat, ELAN)
  • Excellent knowledge of semantics, pragmatics, conversation analysis, and/or discourse analysis
  • Ability to explain complex concepts and solutions in easy-to-understand terms

Benefits For Language Data Scientist II, AWS AI Data | Transcribe

Medical Insurance
Dental Insurance
Vision Insurance
401k
  • Medical Insurance
  • Dental Insurance
  • Vision Insurance
  • 401k

Interested in this job?