Data Manager For Language Technologies

hace 3 semanas


Barcelona, España Somma A tiempo completo

Context And Mission

The Language Technologies (LT) Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning. It has been entrusted by the Spanish and the Catalan government with the mission to develop essential open-source resources and technologies for Spanish and Catalan. In connection with this, the LT Unit is currently in charge of two flagship projects at the national and regional levels: the Spanish National Plan for the Advancement of Language Technology, funded by the Spanish Secretariat of Digitalisation and Artificial Intelligence, and the AINA project, aimed at developing AI resources for Catalan, funded by the Catalan Digitalisation Department. In addition, the Unit participates in various EU-funded international projects.
The Language Technologies Unit at BSC is seeking a Data Manager with experience in language technologies to lead the development of the largest curated Spanish language corpus. This corpus will be used to train reference foundational LLMs.
The successful candidate will work in a highly sophisticated HPC environment, have access to state-of-the-art systems and computational infrastructures, and establish collaborations with experts in different areas at the local and international levels.

Key Duties

- Collaboration with MLOps and Deep Learning engineers: Work closely with machine learning engineers and the MLOps team to define and understand data requirements for projects. Assist in optimizing data flow and usage within machine learning pipelines
- Operationalization of data acquisition and into existing pipelines: Design and oversee the operationalization of accessing external data and its integration into the internal data processing pipelines. Ensure that the data integration process is efficient, scalable, and aligns with the research group's technical infrastructure and goals.
- Data management and governance: Establish data management protocols to ensure the integrity, confidentiality, and availability of data. This involves setting up data governance practices, including data quality control, metadata management, and access controls.
- Dissemination and engagement activities: Lead the dissemination of findings and datasets within the scientific community and beyond. This includes publishing data reports, contributing to academic papers, and presenting at conferences. Also, engage with the broader research community to foster collaborations and share best practices in data management.
- Technical documentation: Write comprehensive technical reports, project documentation, and scientific papers in English, Spanish, and Catalan. Ensure documentation is clear, accurate, and accessible to stakeholder.
- Research support: Assist in preparing research proposals, including the articulation of data needs and plans for data acquisition and management. Contribute to writing scientific papers and reports on findings.
- Continuous learning and skill development: Keep abreast of the latest developments in data engineering, language processing tools, and machine learning operations. Continuously update skills to improve data processes and workflows within the research group.
- Collaboration facilitation: Facilitate collaborations between the research group and external partners to enhance the group's data capabilities. This may involve coordinating joint research projects, data-sharing agreements, and other forms of partnership.
- Monitoring and reporting: Regularly monitor the data landscape for new trends, sources, and tools that can benefit the research group. Provide reports and insights to the leadership on the status of data acquisitions, challenges faced, and the impact of data on research outcomes.
- Compliance and ethics oversight: Ensure all data management activities comply with relevant laws, ethical standards, and best practices in data handling. This includes overseeing the ethical review of data sources and uses, as well as managing any data protection implications.
- Training and support: Provide training and support to research team members on data-related topics, including best practices in data collection, management, and usage. Act as a resource for team members on data management tools and methodologies.


Requirements

- Education
Bachelor's Degree in Computer Science, Information Systems, Linguistics with a computational focus, or a related field. A Master's degree or higher in these areas is highly desirable
- Essential Knowledge and Professional Experience
Demonstrable experience in managing large datasets, including acquisition, storage, processing, and dissemination of data. Experience in handling linguistic data is highly preferred.
? Excellent understanding of data administration and management functions (governance, transfer, storage, analysis, distribution, exploration, etc.).
? Understanding of data privacy laws, ethical considerations in data handling, and best practices in data governance.
? Hands-on experience with database management systems (e.g., SQL, NoSQL) and data integration tools.
? Proven experience in UNIX/LINUX environments, scripting languages and Python Competences
? Skills in managing projects, including planning, execution, monitoring, and reporting.
- Additional Knowledge and Professional Experience
Familiarity with the basics of language models, natural language processing (NLP), or computational linguistics.
Strong understanding of linguistic concepts.
Experience in establishing and maintaining partnerships with data providers, research institutions, and other relevant organizations.
Fluent in written and spoken English, Spanish and Catalan.
- Competences
Ability to work effectively in a team, contributing positively to team operations and working relationships.
Willingness to stay abreast of new data sources, technologies, and methodologies in the rapidly evolving field of language technologies.
Strong organizational skills, with the ability to manage multiple tasks simultaneously and meet deadlines.
Ability to work independently and in a team to complete tasks on schedule.
Ability to work under set deadlines.


Conditions

- The position will be located at BSC within the Life Sciences Department
- We offer a full-time contract (37.5h/week), a good working environment, a highly stimulating environment with state-of-the-art infrastructure, flexible working hours, extensive training plan, restaurant tickets, private health insurance, support to the relocation procedures
- Duration: Open-ended contract due to technical and scientific activities linked to the project and budget duration
- Holidays: 23 paid vacation days plus 24th and 31st of December per our collective agreement
- Salary: we offer a competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living in Barcelona
- Starting date: asap


#J-18808-Ljbffr



  • Barcelona, España Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS) A tiempo completo

    Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS). 1 plaça de Data Manager for Language Technologies (RE3). Concurs o valoració de mèrits. Laboral temporal. 2024-05-31. Termini obert. A1 - Grau universitari (correspondència amb llicenciatures). Llicenciatura en Informàtica, Sistemes d'Informació, Lingüística amb...


  • Barcelona, España Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS) A tiempo completo

    Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS). 1 plaça de Data Engineer for Language Technologies (RE2). Concurs o valoració de mèrits. Laboral temporal. 2024-05-31. Termini obert. A1 - Grau universitari (correspondència amb llicenciatures). Llicenciatura. Fluïdesa en català escrit i parlat Veure convocatòria -...


  • Barcelona, España Barcelona Supercomputing Center - Centro Nacional de Supercomputación A tiempo completo

    **Context And Mission The Language Technologies (LT) Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning. It has been entrusted by the Spanish and the Catalan government with the mission to develop essential open-source resources and...


  • Barcelona, España Barcelona Supercomputing Center - Centro Nacional de Supercomputación A tiempo completo

    **Context And Mission The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government with the mission to develop...


  • Barcelona, España Barcelona Supercomputing Center - Centro Nacional de Supercomputación A tiempo completo

    **Context And Mission** The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government with the mission to develop...


  • Barcelona, España Barcelona Supercomputing Center A tiempo completo

    Context And Mission The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government to develop fundamental...


  • Barcelona, España Barcelona Supercomputing Center A tiempo completo

    Context And Mission The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government to develop fundamental...


  • Barcelona, España Center for Genomic Regulation A tiempo completo

    **The Institute** The Centro Nacional de Análisis Genómico (CNAG-CRG) is one of the largest Genome Sequencing Centers in Europe. CNAG-CRG researchers participate in major International Genomic Initiatives such as the International Cancer Genome Consortium (ICGC), the International Human Epigenome Consortium (IHEC), the International Rare Diseases Research...


  • Barcelona, España Center for Genomic Regulation A tiempo completo

    The Institute The Centro Nacional de Análisis Genómico (CNAG-CRG) is one of the largest Genome Sequencing Centers in Europe. CNAG-CRG researchers participate in major International Genomic Initiatives such as the International Cancer Genome Consortium (ICGC), the International Human Epigenome Consortium (IHEC), the International Rare Diseases Research...

  • Project Manager For Ado

    hace 1 semana


    Barcelona, España Amadeus A tiempo completo

    Project Manager for ADO (Amadeus Data Office) page is loaded Project Manager for ADO (Amadeus Data Office) Apply locations Nice Madrid time type Full time posted on Posted 2 Days Ago job requisition id R19250 Job Title Project Manager for ADO (Amadeus Data Office) Diversity & Inclusion We are an Equal Opportunity Employer and seek to hire the best candidate...


  • Barcelona, España Siemens Energy A tiempo completo

    Product Manager: Generative Language AI About the Role Location Spain Barcelona Barcelona Country/Region: Portugal State/Province/County: Lisbon City: Lisbon Country/Region: United Kingdom State/Province/County: England City: Manchester Country/Region: Croatia State/Province/County: City of Zagreb City: Zagreb Country/Region:...

  • Data Manager

    hace 4 semanas


    Barcelona, España Esteve Terradas 37-41, S.L. A tiempo completo

    Junior Data Manager (to build database for clinical trials, in Viedoc, commercial platform) **Responsibilities**: Electronic Case Report Form (eCRF) programming, data validation checks design and Data Management plan in collaboration with the clinical and the medical team Oversee all data management activities in all studies conducted by Anagram....


  • Barcelona, España Jobs for Humanity A tiempo completo

    Company Description Company Name: Nielsen **Job Description**: At Nielsen, we believe in a collaborative approach to career growth. We see it as a partnership where you take ownership, drive your journey, and fuel your success. Join our community of nearly 14,000 associates who will support you on this path. Your success is our success. Let's explore new...


  • Barcelona, España DataForce by TransPerfect A tiempo completo

    **Work Location**: Remote, Spain **Work Schedule**: Monday - Friday during regular business hours **Engagement Model**: Temporary for one month with possible of extension **Languages Needed**: Italian **Start Date**: Immediate DataForce by TransPerfect is currently looking for a **Language Data Annotator/Content Generator** with fluency in Italian to...


  • Barcelona, España DataForce by TransPerfect A tiempo completo

    **Work Location**: Remote, Spain **Work Schedule**: Monday - Friday during regular business hours **Engagement Model**: Temporary for one month with possible of extension **Languages Needed**: French **Start Date**: Immediate DataForce by TransPerfect is currently looking for a **Language Data Annotator/Content Generator** with fluency in French to join...


  • Barcelona, España DataForce by TransPerfect A tiempo completo

    **Work Location**: Remote, Spain **Work Schedule**: Monday - Friday during regular business hours **Engagement Model**: Temporary for one month with possible of extension **Languages Needed**: English GB **Start Date**: Immediate DataForce by TransPerfect is currently looking for a **Language Data Annotator/Content Generator** with fluency in English...


  • Barcelona, España DataForce by TransPerfect A tiempo completo

    **Work Location**: Remote, Spain **Work Schedule**: Monday - Friday during regular business hours **Engagement Model**: Temporary for one month with possible of extension **Languages Needed**: Spanish **Start Date**: Immediate DataForce by TransPerfect is currently looking for a **Language Data Annotator/Content Generator** with fluency in Spanish to...


  • Barcelona, España DataForce by TransPerfect A tiempo completo

    **Work Location**: Remote, Spain **Work Schedule**: Monday - Friday during regular business hours **Engagement Model**: Temporary for one month with possible of extension **Languages Needed**: German **Start Date**: Immediate DataForce by TransPerfect is currently looking for a **Language Data Annotator/Content Generator** with fluency in German to join...


  • Barcelona, España Somm Excellence Alliance A tiempo completo

    What We Are Looking for: ISGlobal is seeking a Data Manager who is interested in day to day management, validation, harmonisation and exploitation of the cohort data in END-VOC. The overall project goal is to support the European and global response to the COVID-19 pandemic and VOCs by developing and pooling well characterised COVID-19 cohorts with...


  • Barcelona, España CREAL A tiempo completo

    **Descripción**: The Barcelona Institute for Global Health, ISGlobal, is the fruit of an innovative alliance between academic, government, and philanthropic institutions to contribute to the efforts undertaken by the international community to address the challenges in global health, through research, translation to policy and education. ISGlobal has a...