Data Engineer for Language Technologies

hace 7 meses


Barcelona, España Barcelona Supercomputing Center - Centro Nacional de Supercomputación A tiempo completo

**Context And Mission
The Language Technologies (LT) Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning. It has been entrusted by the Spanish and the Catalan government with the mission to develop essential open-source resources and technologies for Spanish and Catalan. In connection with this, the LT Unit is currently in charge of two flagship projects at the national and regional levels: the Spanish National Plan for the Advancement of Language Technology, funded by the Spanish Secretariat of Digitalisation and Artificial Intelligence, and the AINA project, aimed at developing AI resources for Catalan, funded by the Catalan Digitalisation Department. In addition, the Unit participates in various EU-funded international projects.

The Language Technologies Unit at BSC is seeking a Data Manager with experience in language technologies to lead the development of the largest curated Spanish language corpus. This corpus will be used to train reference foundational LLMs.

**Key Duties
- Identification of open/public data sources: Proactively identify and evaluate open and public data sources for the creation of extensive corpora in Spanish and co-official languages. This includes scouting for datasets that are relevant to the group's research focus on language models, including translation, audio processing, and large language models (LLMs).
- Engagement with data providers: Act as the primary contact point for negotiations and communications with external data providers, including public entities, companies, and other research institutions. Establish and maintain relationships to secure access to valuable data resources.
- Data acquisition strategy design: Develop and implement strategies for the efficient acquisition of external data. This includes outlining procedures for data requests, licensing negotiations, and ensuring compliance with data privacy regulations.
- Data management and governance: Collaborate in data management protocols to ensure the integrity, confidentiality, and availability of data..
- Dissemination and engagement activities: Lead the dissemination of findings and datasets within the scientific community and beyond. This includes publishing data reports, contributing to academic papers, and presenting at conferences. Also, engage with the broader research community to foster collaborations and share best practices in data management.
- Manage corpora and language data according to the requirements specified in the Unit’s data managemt.
- Control the quality of collected data and metadata.
- Compliance and ethics oversight: Ensure all data management activities comply with relevant laws, ethical standards, and best practices in data handling. This includes overseeing the ethical review of data sources and uses, as well as managing any data protection implications.

**Requirements
- Education
- Bachelor’s Degree.
- Essential Knowledge and Professional Experience
- Proficiency in data management principles and techniques.
- Strong understanding of data acquisition strategies, including licensing negotiations and compliance with data privacy regulations.
- Knowledge of open/public data sources relevant to language models, translation, audio processing, and large language models (LLMs).
- Familiarity with data governance principles, including data integrity, confidentiality, and availability.
- Excellent communication and negotiation skills for engaging with external data providers and stakeholders.
- Experience in disseminating findings and datasets within the scientific community through reports, academic papers, and conference presentations.
- Strong attention to detail and ability to control the quality of collected data and metadata.
- Knowledge of compliance requirements and ethical standards in data management.
- Excellent understanding of data administration and management functions (governance, transfer, storage, analysis, distribution, exploration, etc.).
- Understanding of data privacy laws, ethical considerations in data handling, and best practices in data governance.
- Experience in establishing and maintaining partnerships with data providers, research institutions, and other relevant organizations.
- Additional Knowledge and Professional Experience
- Fluent in written and spoken Catala
- Competences
- Ability to work effectively in a team, contributing positively to team operations and working relationships.
- Willingness to stay abreast of new data sources, technologies, and methodologies in the rapidly evolving field of language technologies.
- Strong organizational skills, with the ability to manage multiple tasks simultaneously and meet deadlines.
- Ability to work independently and in a team to complete tasks on schedule.
- Ability to work under set deadlines.



  • Barcelona, España Somm Excellence Alliance A tiempo completo

    Context And Mission The Language Technologies (LT) Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning. It has been entrusted by the Spanish and the Catalan government with the mission to develop essential open-source resources and...


  • Barcelona, España Barcelona Supercomputing Center - Centro Nacional de Supercomputación A tiempo completo

    **Context And Mission** The Language Technologies (LT) Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning. It has been entrusted by the Spanish and the Catalan government with the mission to develop essential open-source resources and...


  • Barcelona, España Somm Excellence Alliance A tiempo completo

    Context And Mission The Language Technologies (LT) Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning. It has been entrusted by the Spanish and the Catalan government with the mission to develop essential open-source resources and...


  • Barcelona, España Barcelona Supercomputing Center (BSC) A tiempo completo

    **Job Reference**: - 608_24_LS_LT_RE2**Position**: - Deep Learning Engineer for Language Technologies RE2**Closing Date**: - Thursday, 17 October, 2024**Reference**: 608_24_LS_LT_RE2**Job title**: Deep Learning Engineer for Language Technologies RE2 **About BSC** - The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is...


  • Barcelona, Barcelona, España Isolutions Ag A tiempo completo

    Job Title: Data & Bi EngineerAbout the Role:Isolutions Ag is seeking an experienced Data Engineer with a strong focus on Microsoft technologies to join our team. As a Data Engineer, you will play a pivotal role in designing, developing, and implementing data solutions for diverse customer environments.Key Responsibilities:• Collaborate with the DATA & AI...


  • Barcelona, Barcelona, España Barcelona Supercomputing Center (Bsc) A tiempo completo

    About the RoleWe are seeking a skilled AI Engineer with expertise in Speech Technologies to join our team at the Barcelona Supercomputing Center (BSC). In this role, you will have the opportunity to work on cutting-edge projects related to massive language model building, biomedical text mining, machine translation, and unsupervised learning for...


  • Barcelona, Barcelona, España Barcelona Supercomputing Center A tiempo completo

    At the Barcelona Supercomputing Center, our Language Technologies Unit is a leading entity in several NLP areas, including massive language model construction, biomedical text analysis, machine translation, and unsupervised learning for under-resourced languages and domains. We have a strong track record of developing fundamental open-source resources and...

  • Data Engineer

    hace 1 mes


    Barcelona, Barcelona, España Freudenberg Sealing Technologies A tiempo completo

    About the RoleAs a Data Engineer at Freudenberg Sealing Technologies, you will play a key role in designing and building robust, scalable data pipelines to seamlessly integrate data from multiple sources. Your expertise in programming languages such as Python, strong knowledge of SQL, and experience with both relational and non-relational databases will be...

  • Data Engineer

    hace 1 mes


    Barcelona, Barcelona, España Isolutions Ag A tiempo completo

    Role OverviewWe are seeking an experienced Data Engineer to join our team at Isolutions Ag. As a Data Engineer, you will play a pivotal role in designing, developing, and implementing data solutions using Microsoft technologies.Key ResponsibilitiesDesign and develop data pipelines using Python and SQLImplement data solutions using Azure Synapse, Databricks,...

  • Data Engineer

    hace 2 semanas


    Barcelona, Barcelona, España Olx A tiempo completo

    Data Engineer - OLXWe are working to build a more sustainable world through trade at OLX. Our team is responsible for massive amounts of data and groundbreaking technologies. As a Data Engineer in Marketing Technology, you will be responsible for delivering data via key pipelines that can support the marketing teams in making decisions on targeting, sizing,...


  • Barcelona, Barcelona, España Barcelona Supercomputing Center A tiempo completo

    Job Context and MissionThe Language Technologies Unit at the Barcelona Supercomputing Center (BSC) has a strong track record in several NLP areas, including massive language model building, biomedical text mining, machine translation, and unsupervised learning for under-resourced languages and domains. The Unit has been entrusted by the Spanish and Catalan...


  • Barcelona, España Barcelona Supercomputing Center A tiempo completo

    Context And Mission The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government to develop fundamental...

  • Data Engineer

    hace 6 días


    Barcelona, Barcelona, España Freudenberg Sealing Technologies A tiempo completo

    About the RoleFreudenberg Sealing Technologies is a leading technology expert and market leader in sealing technology and electric mobility solutions worldwide. We offer a unique networked and diverse environment where employees can thrive individually.Job DescriptionWe are seeking a highly skilled Data Engineer to join our team. As a Data Engineer, you will...


  • Barcelona, España Barcelona Supercomputing Center A tiempo completo

    Context And Mission The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government to develop fundamental...


  • Barcelona, España Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS) A tiempo completo

    Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS). 1 plaça de Deep Learning Engineer for Language Technologies (RE2). Concurs o valoració de mèrits. Laboral temporal. 2024-10-17. Termini obert. A1 - Grau universitari (correspondència amb llicenciatures). Llicenciatura en Informàtica, Telecomunicacions, Lingüística Aplicada...

  • Data Engineer

    hace 6 meses


    Barcelona, España isolutions A tiempo completo

    Are you an experienced Data Engineer with a passion for Microsoft technologies? If so, join us in our journey of innovation and shape the future of data engineering! We look forward to welcoming you to our team. As a Data Engineer, you will thrive within our collaborative team, participating in high-end projects alongside skilled professionals. Your role...

  • Data Engineer

    hace 2 semanas


    Barcelona, España Nestlé Sa A tiempo completo

    .We are looking for a Data Engineer to be part of our Nestlé Nespresso Digital and Tech Team.Position SnapshotType of Contract: Permanent.Type of work: HybridWork Language: Fluent Business EnglishThe RoleWe are a dynamic and innovative team at the forefront of leveraging cutting-edge technologies to drive data-driven insights within our organization. As a...

  • Data Engineer

    hace 2 semanas


    Barcelona, España Nestlé Sa A tiempo completo

    .We are looking for a Data Engineer to be part of our Nestlé Nespresso Digital and Tech Team. Position Snapshot Type of Contract: Permanent. Type of work: Hybrid Work Language: Fluent Business English The Role We are a dynamic and innovative team at the forefront of leveraging cutting-edge technologies to drive data-driven insights within our organization....

  • Data Engineer

    hace 2 semanas


    Barcelona, España Nestlé Sa A tiempo completo

    .We are looking for a Data Engineer to be part of our Nestlé Nespresso Digital and Tech Team.Position SnapshotType of Contract: Permanent.Type of work: HybridWork Language: Fluent Business EnglishThe RoleAs a Data Engineer, you will be a key member of the Data Team, responsible for integrating various data sources and consolidating them into one platform....

  • Data Engineer

    hace 1 mes


    Barcelona, España Freudenberg Sealing Technologies A tiempo completo

    Working at Freudenberg: "We will wow your world!" This is our promise.As a global technology group, we not only make the world cleaner, healthier and more comfortable, but also offer our 52,000 employees a networked and diverse environment where everyone can thrive individually.Be surprised and experience your own wow moments.Freudenberg Sealing Technologies...