• Updated:
  • Featured post

Embeddings y Vector Search

Un ordenador no puede entender texto ni relaciones semánticas o significados entre palabras. Solo puede entender números. Este problema lo resolvemos mediante el uso de embeddings.

Un embedding es la representación de texto (en forma de números) en un espacio vectorial. Esto permite a los modelos de IA comparar y operar sobre el significado de las palabras.

flowchart TD
    A["perro"] --> B{{Modelo de embedding}}
    B --> C["[-0.003, 0.043, ..., -0.01]"]
    
    N1["(texto que queremos convertir)"]:::note --> A
    N2["(vectores con contenido semántico)"]:::note --> C
    
    classDef note fill:none,stroke:none,color:#777;    

Los vectores de cada palabra o documento capturan el significado semántico del texto.

  • perro estará cerca de mascota
  • contrato estará lejos de playa

Vector vs SQL databases

El problema con las bases de datos típicas es que solo buscan matches exactos. Si yo busco por coche solo me sacará las entradas que contengan coche.

En cambio las BBDD vectoriales pueden interpretar la semántica de las palabras mediante vectores. Si busco por coche puede sacarme valores como sedán, SUV, Land Rover, etc.

Las BBDD vectoriales son muy buenas cuando necesitamos buscar items similares por proximidad uno respecto al otro.

Algunos ejemplos de uso son:

  • buscar películas parecidas (Netflix)
  • Recomendadores de items parecidos en tiendas online (Amazon)
  • buscar canciones parecidas (Spotify)

Read More

Find differences for big dynamic lists in Excel

(for this post some formulas and menu names are in spanish as my excel and computer are in spanish and excel formulas depend on this).

Here is how to find and mark differences in unequal, really long lists or tables in Excel. For my example, one list is a partial list from other. Some items are missing and you’ve to find which ones are.

This is the full list.

Read More

OAuth 2.0

Authentication process of verifying an identity. We confirm they’re who they say they are. (username & pwd).

Authorization process of verifying what someone is allowed to do. (Permissions and access control).

Past solutions

From worst one to best one and the problems they originate:

Credential Sharing

The worst one. An App is not able to differentiate between real user access and programmatical access.
Permissions are typically too broad. It also the ability to access more content than it should.

We could redirect the user off to the API where they could enter their credentials and get a cookie. This allows an app to access the API.

Dangerous because CSRF attacks. We’ve authorised the whole browser and not the app.

Read More

How to solve VirtualBox disk has run out of space

How to solve the problem “Low disk space on ‘Filesystem root’. The volume has only xMB disk space remaining” when you completely fill a virtual disk in VirtualBox.

(You have to delete all your snapshots first)

Open a cmd terminal and run the following command:

"c:\Program Files\Oracle\VirtualBox\VBoxManage.exe" modifymedium  
"c:\Users\mario\VirtualBox VMs\Ubuntu OTAN\Ubuntu OTAN.vdi" --resize 30000

The first path is an executable included with VirtualBox.
The second one is where your VDI actually is. --resize takes the size in MBs.

Open gpartitioner and resize it.

SCRUM PSM1 Certification - Index

TODO: add scrum-psm1-badge image

Status: Certified!

This notes are my watered-down, personal version of The Scrum Guide 2020 and the following Udemy Course: “Preparation For Professional Scrum Master Level 1 (PSM1)” by Vladimir Raykov.

If you want to get ready for the certification exam, I fully recommend buying and watching his course, several times, in Udemy.

Scrum Guide 2020
1. Scrum Guide 2020 Notes
2. Scrum Glossary

“Preparation For Professional Scrum Master Level 1 (PSM1)” by Vladimir Raykov
1. Scrum Introduction
2. The Scrum Team
3. Scrum Events
4. Scrum Artifacts
5. Scrum Practices and Charts
6. A few words before the Exam
7. Recap of key concepts
8. Possible exam questions

Java Index

This are my Java-related notes. Here I have all the knowledge I refer to when I have doubts about how to use or how to implement a framework / feature I’ve already implemented once.

Version changes

Interesting changes, new functionality and APIs that come to Java with each new version. They don’t include the full changes but the ones I deemed most useful or most interesting.

From Java 8 to Java 11
Java12
Java13

Experience

Small, functional snipets on how to implement a determined feature.

Java experience sheet
How to create a database intermediate table
Java date time API
New script files in Java

Frameworks

How to use and implement determined frameworks in a Java project (using Maven).

Spring in Action (Book)
Spring Cache
Spring Beans
Thymeleaf
Spring Cors

Maven (builder)
Testing (JUnit, TestNG, Mockito)
Vert.x (microservices)
Lombok (builder)
MapStruct (mapper)

Splunk

Splunk take any type of data of millions of entries and allows you to process it into reports, dashboards and alerts.

It’s great at parsing machine data. We can train Splunk to look for certain patterns in data and label those patterns as fields.

Planning Splunk Deployments

A note on config files

Everything Splunk does is governed by configuration files. They’re stored in /etc and they’ve .conf extension.

They’re layered. You can have files with the same name in several directories. You might have a global level conf file and an app specific conf file. Splunk check which one to use based on the current app.

Read More

Oracle 1Z0-819 (Java11) Certification - Index

The new 1Z0-819 certification is the combination of the old existing ones (1Z0-815 & 1Z0-816) together.

OCP Java SE 11 Programmer I - Study guide for 1Z0-815

Welcome to Java
Java Building Blocks
Java Operators
Making Decisions
Core Java APIs
Lambdas and Functional Interfaces
Methods and Encapsulation
Class Design
Advanced Class Design
Exceptions
Java Modules

OCP Java SE 11 Programmer II - Study guide for 1Z0-816

Java Fundamentals
Java Annotations
Generics and Collections