roberta-finetuned-CPV_Spanish

This model is a fine-tuned version of PlanTL-GOB-ES/roberta-base-bne on a dataset derived from Spanish Public Procurement documents from 2019. The whole fine-tuning process is available in the following Kaggle notebook.

It achieves the following results on the evaluation set:

  • Loss: 0.0465
  • F1: 0.7918
  • Roc Auc: 0.8860
  • Accuracy: 0.7376
  • Coverage Error: 10.2744
  • Label Ranking Average Precision Score: 0.7973

Intended uses & limitations

This model only predicts the first two digits of the CPV codes. The list of divisions CPV codes is the following:

Division English Spanish
03 Agricultural, farming, fishing, forestry and related products Productos de la agricultura, ganader铆a, pesca, silvicultura y productos afines
09 Petroleum products, fuel, electricity and other sources of energy Derivados del petr贸leo, combustibles, electricidad y otras fuentes de energ铆a
14 Mining, basic metals and related products Productos de la miner铆a, de metales de base y productos afines
15 Food, beverages, tobacco and related products Alimentos, bebidas, tabaco y productos afines
16 Agricultural machinery Maquinaria agr铆cola
18 Clothing, footwear, luggage articles and accessories Prendas de vestir, calzado, art铆culos de viaje y accesorios
19 Leather and textile fabrics, plastic and rubber materials Piel y textiles, materiales de pl谩stico y caucho
22 Printed matter and related products Impresos y productos relacionados
24 Chemical products Productos qu铆micos
30 Office and computing machinery, equipment and supplies except furniture and software packages M谩quinas, equipo y art铆culos de oficina y de inform谩tica, excepto mobiliario y paquetes de software
31 Electrical machinery, apparatus, equipment and consumables; lighting M谩quinas, aparatos, equipo y productos consumibles el茅ctricos; iluminaci贸n
32 Radio, television, communication, telecommunication and related equipment Equipos de radio, televisi贸n, comunicaciones y telecomunicaciones y equipos conexos
33 Medical equipments, pharmaceuticals and personal care products Equipamiento y art铆culos m茅dicos, farmac茅uticos y de higiene personal
34 Transport equipment and auxiliary products to transportation Equipos de transporte y productos auxiliares
35 Security, fire Equipo de seguridad, extinci贸n de incendios, polic铆a y defensa
37 Musical instruments, sport goods, games, toys, handicraft, art materials and accessories Instrumentos musicales, art铆culos deportivos, juegos, juguetes, art铆culos de artesan铆a, materiales art铆sticos y accesorios
38 Laboratory, optical and precision equipments (excl. glasses) Equipo de laboratorio, 贸ptico y de precisi贸n (excepto gafas)
39 Furniture (incl. office furniture), furnishings, domestic appliances (excl. lighting) and cleaning products Mobiliario (incluido el de oficina), complementos de mobiliario, aparatos electrodom茅sticos (excluida la iluminaci贸n) y productos de limpieza
41 Collected and purified water Agua recogida y depurada
42 Industrial machinery Maquinaria industrial
43 Machinery for mining, quarrying, construction equipment Maquinaria para la miner铆a y la explotaci贸n de canteras y equipo de construcci贸n
44 Construction structures and materials; auxiliary products to construction (except electric apparatus) Estructuras y materiales de construcci贸n; productos auxiliares para la construcci贸n (excepto aparatos el茅ctricos)
45 Construction work Trabajos de construcci贸n
48 Software package and information systems Paquetes de software y sistemas de informaci贸n
50 Repair and maintenance services Servicios de reparaci贸n y mantenimiento
51 Installation services (except software) Servicios de instalaci贸n (excepto software)
55 Hotel, restaurant and retail trade services Servicios comerciales al por menor de hosteler铆a y restauraci贸n
60 Transport services (excl. Waste transport) Servicios de transporte (excluido el transporte de residuos)
63 Supporting and auxiliary transport services; travel agencies services Servicios de transporte complementarios y auxiliares; servicios de agencias de viajes
64 Postal and telecommunications services Servicios de correos y telecomunicaciones
65 Public utilities Servicios p煤blicos
66 Financial and insurance services Servicios financieros y de seguros
70 Real estate services Servicios inmobiliarios
71 Architectural, construction, engineering and inspection services Servicios de arquitectura, construcci贸n, ingenier铆a e inspecci贸n
72 IT services: consulting, software development, Internet and support Servicios TI: consultor铆a, desarrollo de software, Internet y apoyo
73 Research and development services and related consultancy services Servicios de investigaci贸n y desarrollo y servicios de consultor铆a conexos
75 Administration, defence and social security services Servicios de administraci贸n p煤blica, defensa y servicios de seguridad social
76 Services related to the oil and gas industry Servicios relacionados con la industria del gas y del petr贸leo
77 Agricultural, forestry, horticultural, aquacultural and apicultural services Servicios agr铆colas, forestales, hort铆colas, acu铆colas y ap铆colas
79 Business services: law, marketing, consulting, recruitment, printing and security Servicios a empresas: legislaci贸n, mercadotecnia, asesor铆a, selecci贸n de personal, imprenta y seguridad
80 Education and training services Servicios de ense帽anza y formaci贸n
85 Health and social work services Servicios de salud y asistencia social
90 Sewage, refuse, cleaning and environmental services Servicios de alcantarillado, basura, limpieza y medio ambiente
92 Recreational, cultural and sporting services Servicios de esparcimiento, culturales y deportivos
98 Other community, social and personal services Otros servicios comunitarios, sociales o personales

Training and evaluation data

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss F1 Roc Auc Accuracy Coverage Error Label Ranking Average Precision Score
0.0354 1.0 9054 0.0362 0.7560 0.8375 0.6963 14.0835 0.7357
0.0311 2.0 18108 0.0331 0.7756 0.8535 0.7207 12.7880 0.7633
0.0235 3.0 27162 0.0333 0.7823 0.8705 0.7283 11.5179 0.7811
0.0157 4.0 36216 0.0348 0.7821 0.8699 0.7274 11.5836 0.7798
0.011 5.0 45270 0.0377 0.7799 0.8787 0.7239 10.9173 0.7841
0.008 6.0 54324 0.0395 0.7854 0.8787 0.7309 10.9042 0.7879
0.0042 7.0 63378 0.0421 0.7872 0.8823 0.7300 10.5687 0.7903
0.0025 8.0 72432 0.0439 0.7884 0.8867 0.7305 10.2220 0.7934
0.0015 9.0 81486 0.0456 0.7889 0.8872 0.7316 10.1781 0.7945
0.001 10.0 90540 0.0465 0.7918 0.8860 0.7376 10.2744 0.7973

Framework versions

  • Transformers 4.16.2
  • Pytorch 1.9.1
  • Datasets 1.18.4
  • Tokenizers 0.11.6

Aknowledgments

This work has been supported by NextProcurement European Action (grant agreement INEA/CEF/ICT/A2020/2373713-Action 2020-ES-IA-0255) and the Madrid Government (Comunidad de Madrid-Spain) under the Multiannual Agreement with Universidad Polit茅cnica de Madrid in the line Support for R&D projects for Beatriz Galindo researchers, in the context of the V PRICIT (Regional Programme of Research and Technological Innovation). We also acknowledge the participation of Jennifer Tabita for the preparation of the initial set of notebooks, and the AI4Gov master students from the first cohort for their validation of the approach. Source of the data: Ministerio de Hacienda.

Downloads last month
6
Safetensors
Model size
0.1B params
Tensor type
I64
F32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for oeg/roberta-finetuned-CPV_Spanish

Finetuned
(69)
this model