• Featured post

Embeddings, Vector Search & BM25

Un ordenador no puede entender texto ni relaciónes semánticas o significados entre palabras. Solo puede entender números. Esto lo resolvemos mediante el uso de embeddings.

Un embedding es la representación de texto (en forma de números) en un espacio vectorial. Esto permite a los modelos de IA comparar y operar sobre el significado de las palabras.

flowchart TD
    A["perro"] --> B
    B --> C["[-0.003, 0.043, ..., -0.01]"]
    
    N1["(texto que queremos convertir)"]:::note --> A
    N2["(vectores con contenido semántico)"]:::note --> C
    
    classDef note fill:none,stroke:none,color:#777;    

Los vectores de cada palabra o documento capturan el significado semántico del texto.

  • perro estará cerca de mascota
  • contrato estará lejos de playa

Vector vs SQL databases

El problema con las BBDD típicas es que solo buscan matches exactos. Si yo busco por coche solo me sacará las entradas que contengan coche.

En cambio, como las BBDD vectoriales pueden interpretar la semántica de las palabras mediante los vectores, si busco por coche puede sacarme valores como sedán, SUV, Land Rover, etc.

Las BBDD vectoriales son muy buenas cuando necesitamos buscar items similares por proximidad uno respecto al otro. Un ejemplo de uso es buscar películas parecidas (Netflix). Otro ejemplo son los recomendadores de items parecidos en tiendas online (Amazon).

Como ejecutar una búsqueda (query) mediante vectores

(You can see the code here)

Necesitamos:

  • Una BBDD Vectorial (CosmosDB)
  • Un modelo para transformar los embeddings (text-embedding-3-large)

El flujo completo es el siguiente:

  1. Usar un embedding model para obtener los vectores del contenido que queremos indexar
  2. Insertar el texto original y los vectores del contenido en una BBDD vectorial
  3. Cuando queramos ejecutar una query usar el mismo embedding model de antes con la query a buscar. Con el embedding resultante buscamos vectores similares en la BBDD y sacamos el texto original de original_text

    Introducir vectores en CosmosDB

    Para poder buscar necesitamos rellenar antes la BBDD con contenido. Lo mantenemos simple. Metemos

    • un ID a mano
    • el texto original
    • los vectores resultado de hacer el embedding sobre el texto original

El pseudocódigo se ve así y se ejecuta de uno en uno

text = "A shiba walks alone in the park"
# this sends the text to the model text-embedding-3-large 
vectors = createEmbeddingsForText(text)
item = {
	"id": "1",
	"original_text": text,
	"vectors": vectors
}
uploadToCosmosDB(item)

ejemplos de los datos que guardo

{
	"id": "1",
	"original_text": "A shiba walks alone in the park",
	"vectors": [-0.003, 0.043, ..., -0.001]
}

Read More

Docker, DockerFiles and docker-compose

TODO: Working docker-compose and DockerFile examples to complement this information Interesting tool to analyze custom Image layers size

Basic Definitions

Image Executable package that includes everything needed to run an application. It consists of read-only layers, each of which represent a DockerFile instruction. The layers are stacked and each one is a delta of changes from the previous layer.
Container Instance of an image.

Stack Defines the interaction of all the services
Services Image for a microservice which defines how containers behave in production

DockerFile File with instructions that allows us to build upon an already existing image. It defines:

  • the base image to build from
  • our own files to use or append
  • the commands to run

At the end, a DockerFile will form a service, which we may call from docker-compose or standalone with docker build.

DockerFiles vs docker-compose A DockerFile is used when managing a single individual container. docker-compose is used to manage an application, which may be formed by one or more DockerFiles. Docker-compose may also be used as support to input large customization options, which otherwise would be parameters in a really long command.

You can do everything docker-compose does with just docker commands and a lot of shell scripting

Read More

Git advanced

Config

  • see config git config -l
  • modify username git config --global user.name "newName"
  • modify email git config --global user.mail "new@mail.com"

Git bisect

Is a tool to find the exact commit where a bug was introduced.

Usage

I have a file with the following content and an obvious bug

Row row row your car at the river

Read More

HateOAS

Rest levels

Model of restful maturity used to help explain the specific properties of a web-style system.

Level 0

The starting point for the model is using HTTP as a transport system for remote interactions, but without using any of the mechanisms of the web.
We publish a document on how to use our API. We declare only one endpoint and do all the communication through this endpoint.

Read More

Liquid

Liquid is a template engine for HTML. It’s used by Jekyll.

Variables Usage

  1. Declaration in a config.yml file with home_sidebar: Home
  2. Usage with liquid in file.html as {{ site.home_sidebar }}

Functions

Show liquid code snippets

When writing liquid code snippets, jekyll process this code instead of showing it. To solve this, wrap the code snippet with the tags

{percent raw percent}
{percent endraw percent}

Read More

Patterns implementation

Implementation of several patterns in Java, which may be used as future example on how to technically implement them.

Database

DAO & DTO

Data Access Object & Data Transfer Object. DAO - Design pattern, used to encapsulate the access to a persistence resource (e.g a database) and eliminate dependencies which come with the implementation of the code. DTO is the object which representates an entity of the database, with all its own properties to be manipulated.

Read More

Personal Blog

Since the beginning of my programming career, I’ve been writing small notes and how-to’s in Markdown for my personal use. This approach helps me avoid relearning the same thing twice and saves me time from repeatedly searching solutions to obscure problems.

Over time, this collection grew significantly, and I needed a more efficient way to index, access, and search through all my information. Building a website that also serves as a portfolio seemed like the best long-term solution.

Technology Stack v2.2 (now)

After doing a POC with Statiq I found it too niche for me (I may be wrong), but the learning curve seems too step. I find the documentation sparse and I’m not able to find lots of examples, themes or customisation online. I prefer to build something more visual and all the Statiq examples I find look exactly like the v1 I had with Jekyll.

I decided instead to move to Blazor WASM as it seems like a balance to get something more visual yet still a static page that doesn’t cost me lots of money.

  • Azure Static Web Apps - azure hosting (free tier)
  • .NET / C# - code and language
  • Blazor WebAssembly - With WASM I build a static site so I’m still able to host it as a static page
  • Markdown - still used to write my notes. It doubles down nowadays as I can just paste notes straight to AI
  • Github (CI/CD) - code repo. automatically deploy when pushing new notes through Github Actions.
  • HTML & CSS

Technology Stack v2.1 (POC 2026)

  • Azure Static Web Apps - azure hosting (free tier)
  • .NET / C# - code and language
  • Statiq - static website generator (framework) to build websites from markdown using Razor
  • Markdown - still used to write my notes. It doubles down nowadays as I can just paste notes straight to AI
  • Github (CI/CD) - code repo. automatically deploy when pushing new notes through Github Actions.
  • HTML & CSS

Improvements v1.1 - v1.4 (2026)

After 7 years a migration and some improvements are long due. I still use this site 2 to 3 times a week to check my notes and a website still seems like the easiest way to access it anywhere (and be able to show a portfolio from time to time) and I want to keep my knowledge base in markdown.

After all this time I now work mainly in Microsoft’s environment as an AI .NET developer with Azure and I want to gather and join all the resources I use to reflect more on my daily workload and abilities.

Sadly I don’t have the time yet to fully move it to Microsoft’s environment and move out of Jekyll, so I’ll roll it over it slowly over this year.

Changes in the stack

  • Moved from Gitlab to Github - I mainly use Github for all my projects and this was the only one still in Gitlab (I wanted to test its CI/CD before Github’s got that good).
    • I have redone all CI/CD to automate checks and automatically deploy from Github to Azure
    • Before deploying it checks for dead links (403, 404 and 5xx) and sends me an email in case of problems
    • It also checks for dead image links
  • Moved from GCP to Azure. I deleted the project in Firebase and created a new Static Web App in Azure.

Minor improvements

  • I’ve changed Jekyll’s theme
  • I’ve set up a dark mode

Initial Stack v1 (2019-2026)

personal blog old image v1

  • Firebase - hosting and serve through Google’s cloud platform
  • Jekyll - transforms markdown notes into static websites. Deployable through Github Pages or other services (I use Firebase)
  • Ruby - All examples for Jekyll I found used ruby. Useful as I wanted to test it at the time, but I hate it now and I want to focus in .NET
  • Liquid - template language to improve on Ruby
  • Markdown - used to write my notes to avoid the need to write code
  • Gitlab (CI/CD) - code repo host. automatically deploy when pushing new notes
  • HTML, CSS, vanilla JS - self-explanatory