2 Mar, 2026
Featured post

Embeddings, Vector Search & BM25

Un ordenador no puede entender texto ni relaciónes semánticas o significados entre palabras. Solo puede entender números. Esto lo resolvemos mediante el uso de embeddings.

Un embedding es la representación de texto (en forma de números) en un espacio vectorial. Esto permite a los modelos de IA comparar y operar sobre el significado de las palabras.

flowchart TD
    A["perro"] --> B
    B --> C["[-0.003, 0.043, ..., -0.01]"]
    
    N1["(texto que queremos convertir)"]:::note --> A
    N2["(vectores con contenido semántico)"]:::note --> C
    
    classDef note fill:none,stroke:none,color:#777;

Los vectores de cada palabra o documento capturan el significado semántico del texto.

perro estará cerca de mascota
contrato estará lejos de playa

Vector vs SQL databases

El problema con las BBDD típicas es que solo buscan matches exactos. Si yo busco por coche solo me sacará las entradas que contengan coche.

En cambio, como las BBDD vectoriales pueden interpretar la semántica de las palabras mediante los vectores, si busco por coche puede sacarme valores como sedán, SUV, Land Rover, etc.

Las BBDD vectoriales son muy buenas cuando necesitamos buscar items similares por proximidad uno respecto al otro. Un ejemplo de uso es buscar películas parecidas (Netflix). Otro ejemplo son los recomendadores de items parecidos en tiendas online (Amazon).

Como ejecutar una búsqueda (query) mediante vectores

(You can see the code here)

Necesitamos:

Una BBDD Vectorial (CosmosDB)
Un modelo para transformar los embeddings (text-embedding-3-large)

El flujo completo es el siguiente:

Usar un embedding model para obtener los vectores del contenido que queremos indexar
Insertar el texto original y los vectores del contenido en una BBDD vectorial
Cuando queramos ejecutar una query usar el mismo embedding model de antes con la query a buscar. Con el embedding resultante buscamos vectores similares en la BBDD y sacamos el texto original de original_text
Introducir vectores en CosmosDB

Para poder buscar necesitamos rellenar antes la BBDD con contenido. Lo mantenemos simple. Metemos
- un ID a mano
- el texto original
- los vectores resultado de hacer el embedding sobre el texto original

El pseudocódigo se ve así y se ejecuta de uno en uno

text = "A shiba walks alone in the park"
# this sends the text to the model text-embedding-3-large 
vectors = createEmbeddingsForText(text)
item = {
	"id": "1",
	"original_text": text,
	"vectors": vectors
}
uploadToCosmosDB(item)

ejemplos de los datos que guardo

{
	"id": "1",
	"original_text": "A shiba walks alone in the park",
	"vectors": [-0.003, 0.043, ..., -0.001]
}

28 Apr, 2024

Advanced SQL

UNION

The union sentence is used to accumulate results for two SELECT sentences.

SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2

We have the following tables

company1

per	name	surname
1	ANTONIO	PEREZ
2	PEDRO	RUIZ

company2

per	name	surname
1	LUIS	LOPEZ
2	ANTONIO	PEREZ

1 Sep, 2022

ElasticSearch Query Examples

Find an exact match of a nested field (note that .keyword has to be added to string fields)

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "metadata.title.keyword": "File test.pdf"
          }
        }
      ]
    }
  },
  "size": 200
}

12 Aug, 2022

Count number of entries in filtered table

(for this post some formulas and menu names are in spanish as my excel and computer are in spanish and excel formulas depend on this).

The formula is:

=AGREGAR(3;3;J:J)-1

The first two parameters are for the function itself. The important one is J:J which marks the column to count. What’s important here is this is not going to count filtered items in tables.

Watch out with headers! If you have headers in your table, add -1 to your formula.

11 Aug, 2022

Find differences for big dynamic lists in Excel

(for this post some formulas and menu names are in spanish as my excel and computer are in spanish and excel formulas depend on this).

Here is how to find and mark differences in unequal, really long lists or tables in Excel. For my example, one list is a partial list from other. Some items are missing and you’ve to find which ones are.

This is the full list.

8 Apr, 2022

OAuth 2.0

Authentication process of verifying an identity. We confirm they’re who they say they are. (username & pwd).

Authorization process of verifying what someone is allowed to do. (Permissions and access control).

Past solutions

From worst one to best one and the problems they originate:

The worst one. An App is not able to differentiate between real user access and programmatical access.
Permissions are typically too broad. It also the ability to access more content than it should.

We could redirect the user off to the API where they could enter their credentials and get a cookie. This allows an app to access the API.

Dangerous because CSRF attacks. We’ve authorised the whole browser and not the app.

2 Apr, 2022

How to solve VirtualBox disk has run out of space

How to solve the problem “Low disk space on ‘Filesystem root’. The volume has only xMB disk space remaining” when you completely fill a virtual disk in VirtualBox.

(You have to delete all your snapshots first)

Open a cmd terminal and run the following command:

"c:\Program Files\Oracle\VirtualBox\VBoxManage.exe" modifymedium  
"c:\Users\mario\VirtualBox VMs\Ubuntu OTAN\Ubuntu OTAN.vdi" --resize 30000

The first path is an executable included with VirtualBox.
The second one is where your VDI actually is. --resize takes the size in MBs.

Open gpartitioner and resize it.

17 Mar, 2022

JMeter testing

JMeter is a testing tool to simulate thousands of users for web testing purposes. It uses Java Threads for this.

1. JMeter
2. JMeter performance testing

24 Mar, 2021

UX/UI Design - Index

Notes on the course “Mobile Design Course - UX, UI and Design Thinking” from Udemy.

Color Theory
Typography
UI Design
UX Design
Mobile Design Workflow

Notes from the course General UI Design

Design ground rules
Design base properties
Design using color and contrast appropiately
Design with typography and imagery
Creating and simplifying visual cues
UI Design Mantras

11 Mar, 2021

SCRUM PSM1 Certification - Index

TODO: add scrum-psm1-badge image

Status: Certified!

This notes are my watered-down, personal version of The Scrum Guide 2020 and the following Udemy Course: “Preparation For Professional Scrum Master Level 1 (PSM1)” by Vladimir Raykov.

If you want to get ready for the certification exam, I fully recommend buying and watching his course, several times, in Udemy.

Scrum Guide 2020
1. Scrum Guide 2020 Notes
2. Scrum Glossary

“Preparation For Professional Scrum Master Level 1 (PSM1)” by Vladimir Raykov
1. Scrum Introduction
2. The Scrum Team
3. Scrum Events
4. Scrum Artifacts
5. Scrum Practices and Charts
6. A few words before the Exam
7. Recap of key concepts
8. Possible exam questions

22 Feb, 2021

Linux Index

Prepare a fresh linux start
Bash and ZSH commands
VIM advanced commands

Vector vs SQL databases

Como ejecutar una búsqueda (query) mediante vectores

Introducir vectores en CosmosDB

UNION

Past solutions

Credential Sharing

Cookie