Data Engineer For Language Technologies

hace 3 semanas


Barcelona, España Barcelona Supercomputing Center A tiempo completo

Context And Mission

The Language Technologies (LT) Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning. It has been entrusted by the Spanish and the Catalan government with the mission to develop essential open-source resources and technologies for Spanish and Catalan. In connection with this, the LT Unit is currently in charge of two flagship projects at the national and regional levels: the Spanish National Plan for the Advancement of Language Technology, funded by the Spanish Secretariat of Digitalisation and Artificial Intelligence, and the AINA project, aimed at developing AI resources for Catalan, funded by the Catalan Digitalisation Department. In addition, the Unit participates in various EU-funded international projects.
The Language Technologies Unit at BSC is seeking a Data Manager with experience in language technologies to lead the development of the largest curated Spanish language corpus. This corpus will be used to train reference foundational LLMs.
The successful candidate will work in a highly sophisticated HPC environment, have access to state-of-the-art systems and computational infrastructures, and establish collaborations with experts in different areas at the local and international levels.

Key Duties

- Identification of open/public data sources: Proactively identify and evaluate open and public data sources for the creation of extensive corpora in Spanish and co-official languages. This includes scouting for datasets that are relevant to the group's research focus on language models, including translation, audio processing, and large language models (LLMs).
- Engagement with data providers: Act as the primary contact point for negotiations and communications with external data providers, including public entities, companies, and other research institutions. Establish and maintain relationships to secure access to valuable data resources.
- Data acquisition strategy design: Develop and implement strategies for the efficient acquisition of external data. This includes outlining procedures for data requests, licensing negotiations, and ensuring compliance with data privacy regulations.
- Data management and governance: Collaborate in data management protocols to ensure the integrity, confidentiality, and availability of data..
- Dissemination and engagement activities: Lead the dissemination of findings and datasets within the scientific community and beyond. This includes publishing data reports, contributing to academic papers, and presenting at conferences. Also, engage with the broader research community to foster collaborations and share best practices in data management.
- Manage corpora and language data according to the requirements specified in the Unit's data managemt.
- Monitor applications of data protection, licensing and security rules.
- Control the quality of collected data and metadata.
- Compliance and ethics oversight: Ensure all data management activities comply with relevant laws, ethical standards, and best practices in data handling. This includes overseeing the ethical review of data sources and uses, as well as managing any data protection implications.


Requirements

- Education
Bachelor's Degree.
- Essential Knowledge and Professional Experience
Proficiency in data management principles and techniques.
Strong understanding of data acquisition strategies, including licensing negotiations and compliance with data privacy regulations.
Knowledge of open/public data sources relevant to language models, translation, audio processing, and large language models (LLMs).
Familiarity with data governance principles, including data integrity, confidentiality, and availability.
Excellent communication and negotiation skills for engaging with external data providers and stakeholders.
Experience in disseminating findings and datasets within the scientific community through reports, academic papers, and conference presentations.
Strong attention to detail and ability to control the quality of collected data and metadata.
Knowledge of compliance requirements and ethical standards in data management.
Excellent understanding of data administration and management functions (governance, transfer, storage, analysis, distribution, exploration, etc.).
Understanding of data privacy laws, ethical considerations in data handling, and best practices in data governance.
Experience in establishing and maintaining partnerships with data providers, research institutions, and other relevant organizations.
Fluent in written and spoken Catala
- Competences
Ability to work effectively in a team, contributing positively to team operations and working relationships.
Willingness to stay abreast of new data sources, technologies, and methodologies in the rapidly evolving field of language technologies.
Strong organizational skills, with the ability to manage multiple tasks simultaneously and meet deadlines.
Ability to work independently and in a team to complete tasks on schedule.
Ability to work under set deadlines.


Conditions

The position will be located at BSC within the Life Sciences Department
We offer a full-time contract (37.5h/week), a good working environment, a highly stimulating environment with state-of-the-art infrastructure, flexible working hours, extensive training plan, restaurant tickets, private health insurance, support to the relocation procedures
Duration: Open-ended contract due to technical and scientific activities linked to the project and budget duration
Holidays: 23 paid vacation days plus 24th and 31st of December per our collective agreement
Salary: we offer a competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living in Barcelona
Starting date: asap


Applications procedure and process

All applications must be made through BSC website and contain:

A full CV in English including contact details
A Cover Letter with a statement of interest in English, including two contacts for further references - Applications without this document will not be considered

In accordance with the OTM-R principles, a gender-balanced recruitment panel is formed for every vacancy at the beginning of the process. After reviewing the content of the applications, the panel will start the interviews, with at least one technical and one administrative interview. A profile questionnaire as well as a technical exercise may be required during the process.

The panel will make a final decision and all candidates who had contacts with them will receive a feedback with details on the acceptance or rejection of their profile.

At BSC we are seeking continuous improvement in our recruitment processes, for any suggestions or feedback/complaints about our Recruitment Processes, please contact ******.

For more information follow this link

#J-18808-Ljbffr



  • Barcelona, España Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS) A tiempo completo

    Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS). 1 plaça de Data Engineer for Language Technologies (RE2). Concurs o valoració de mèrits. Laboral temporal. 2024-05-31. Termini obert. A1 - Grau universitari (correspondència amb llicenciatures). Llicenciatura. Fluïdesa en català escrit i parlat Veure convocatòria -...


  • Barcelona, España Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS) A tiempo completo

    Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS). 1 plaça de Data Manager for Language Technologies (RE3). Concurs o valoració de mèrits. Laboral temporal. 2024-05-31. Termini obert. A1 - Grau universitari (correspondència amb llicenciatures). Llicenciatura en Informàtica, Sistemes d'Informació, Lingüística amb...


  • Barcelona, España Barcelona Supercomputing Center - Centro Nacional de Supercomputación A tiempo completo

    **Context And Mission The Language Technologies (LT) Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning. It has been entrusted by the Spanish and the Catalan government with the mission to develop essential open-source resources and...


  • Barcelona, España Barcelona Supercomputing Center - Centro Nacional de Supercomputación A tiempo completo

    **Context And Mission The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government with the mission to develop...


  • Barcelona, España Barcelona Supercomputing Center - Centro Nacional de Supercomputación A tiempo completo

    **Context And Mission** The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government with the mission to develop...


  • Barcelona, España Barcelona Supercomputing Center A tiempo completo

    Context And Mission The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government to develop fundamental...


  • Barcelona, España Barcelona Supercomputing Center A tiempo completo

    Context And Mission The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government to develop fundamental...

  • Data Engineer

    hace 5 días


    Barcelona, Barcelona, España OXIGENT Technologies A tiempo completo

    ¿Te interesaría seguir desarrollándote como Ingeniero/a de Data en una empresa líder del sector transportes y turismo en un entorno colaborativo con jerarquía horizontal y con proyección a futuro, ubicada en el Baix Llobregat?Desde Oxigent Technologies seleccionamos un/a DATA ENGINEER para formar parte de un equipo de profesionales cuya misión será...

  • Data Engineer

    hace 1 semana


    Barcelona, España Chi Square Gaming A tiempo completo

    **Data Engineer - €40,000- €50,000 - Barcelona, Spain (hybrid)** A growing mobile gaming company is looking for a data engineer to join its expanding data team. The company has one of the most downloaded casual games on the app store, google play store, and has gathered more than 300 million players. The data engineer will be working along side data...

  • Monitoring Engineer

    hace 4 semanas


    Barcelona, España NTT DATA A tiempo completo

    And we are looking for you! ¿Want to take the next step in your career? Want to be part of a challenging and amazing team?? Would you like to be part of NTT DATA’s International Organisations division and take part in international projects? This is your opportunity, join NTT DATA! As a Monitoring engineer, you will be part of a our team in the...

  • Data Engineer

    hace 7 días


    Barcelona, España OXIGENT Technologies A tiempo completo

    Data Engineer (80% Remoto) en hibrido. ¿Te interesaría seguir desarrollándote como Ingeniero/a de Data en una empresa líder del sector transportes y turismo en un entorno colaborativo con jerarquía horizontal y con proyección a futuro ubicada en el Baix Llobregat? Desde Oxigent Technologies seleccionamos un/a DATA ENGINEER para formar parte de un...

  • Data Engineer

    hace 4 semanas


    Barcelona, España Hyperion Materials & Technologies A tiempo completo

    SummaryThe incumbent is responsible for architecting, implementing and maintaining Hyperion MT’s data platforms. It also includes the process of gathering, importing, wrangling, cleaning, querying, and analyzing data. Systems and workflows need to be monitored and finetuned for performance at optimal levels. Essential Duties and ResponsibilitiesArchitect,...

  • Junior Data Engineer

    hace 4 días


    Barcelona, España Kiteris Solutions A tiempo completo

    We are looking for a junior data engineer to join our team. You will use various methods to transform raw data into useful data systems. To succeed in this data engineering position, you should have strong analytical skills and the ability to combine data from different sources. Data engineer skills also include familiarity with several programming...

  • Data Engineer

    hace 4 semanas


    Barcelona, España Veeva Systems A tiempo completo

    Veeva is a mission-driven organization that aspires to help our customers in Life Sciences and Regulated industries bring their products to market, faster. We are shaped by our values: Do the Right Thing, Customer Success, Employee Success, and Speed. Our teams develop transformative cloud software, services, consulting, and data to make our customers more...

  • Sr. Data Engineer

    hace 2 semanas


    Barcelona, España Merlin Digital Partner A tiempo completo

    We are Merlin Digital Partner! A leading IT and Digital headhunting company who stands out from the crowd, boasting over a decade of experience. We've successfully collaborated and played a pivotal role in the growth of industry heavyweights such as Wallapop, Glovo, Banc Sabadell, and Factorial, among others. Our emphasis lies in people-centric approaches...

  • Senior Data Engineer

    hace 7 días


    Barcelona, España Signify Technology A tiempo completo

    Job title - Senior Data Engineer - Contract type: - Permanent - Location - 2339 - Industry: - Workplace type: - Hybrid - Reference: - 6305 - Contact name: - Harvey Cheadle - City: - Barcelona - Published: - July 28, 2023 1:58 **Role: Senior Data Engineer** **Location: Barcelona, Spain (hybrid 2-3 days per week)** **Job Type: Permanent** **Salary:...

  • Data Integration Engineer

    hace 2 semanas


    Barcelona, España Hitachi A tiempo completo

    DescriptionWe are looking for an experienced Data Integration Engineer to join our team. The ideal candidate will have a passion for developing and maintaining high-quality data processing and integration solutions using modern technologies.Design, develop, and maintain SQL code. Bring 100k LoC codebase into SDLC (including version control and testing)...

  • Data Engineer

    hace 5 días


    Barcelona, España Skillsearch Limited A tiempo completo

    Job reference - 13266 - Company Name - Skillsearch Limited - Salary - Location - Barcelona, Spain - Country - Spain - Industry sector - Programming - Job start date - ASAP - Posted date - 15/02/2023 - Experience Level - Not specified - Job type - Permanent | Full Time - Remote / Hybrid - No Do you have data engineering experience, and are you seeking a new...

  • Data Engineer

    hace 4 semanas


    Barcelona, España Skillsearch Limited A tiempo completo

    Job reference - 13266 - Company Name - Skillsearch Limited - Salary - Location - Barcelona, Spain - Country - Spain - Industry sector - Programming - Job start date - ASAP - Posted date - 17/04/2023 - Experience Level - Not specified - Job type - Permanent | Full Time - Remote / Hybrid - No Do you have data engineering experience, and are you seeking a new...

  • Principal Data Engineer

    hace 4 semanas


    Barcelona, España Hewlett Packard A tiempo completo

    In the GTM advanced analytics COE, our mission is to deliver impact by building machine learning (ML) products to optimize pricing, marketing investments and provide guidance to sales and other HP teams. We're looking for a principal data engineer / data architect to join our data engineering team. **Qualifications** - Typically 5+ years of experience in...