Posts Tagged - databases

Embeddings y Vector Search

Un ordenador no puede entender texto ni relaciones semánticas o significados entre palabras. Solo puede entender números. Este problema lo resolvemos mediante el uso de embeddings.

Un embedding es la representación de texto (en forma de números) en un espacio vectorial. Esto permite a los modelos de IA comparar y operar sobre el significado de las palabras.

flowchart TD
    A["perro"] --> B{{Modelo de embedding}}
    B --> C["[-0.003, 0.043, ..., -0.01]"]
    
    N1["(texto que queremos convertir)"]:::note --> A
    N2["(vectores con contenido semántico)"]:::note --> C
    
    classDef note fill:none,stroke:none,color:#777;    

Los vectores de cada palabra o documento capturan el significado semántico del texto.

  • perro estará cerca de mascota
  • contrato estará lejos de playa

Vector vs SQL databases

El problema con las bases de datos típicas es que solo buscan matches exactos. Si yo busco por coche solo me sacará las entradas que contengan coche.

En cambio las BBDD vectoriales pueden interpretar la semántica de las palabras mediante vectores. Si busco por coche puede sacarme valores como sedán, SUV, Land Rover, etc.

Las BBDD vectoriales son muy buenas cuando necesitamos buscar items similares por proximidad uno respecto al otro.

Algunos ejemplos de uso son:

  • buscar películas parecidas (Netflix)
  • Recomendadores de items parecidos en tiendas online (Amazon)
  • buscar canciones parecidas (Spotify)

Read More

SQL UNION ALL - add column with source

The following is an example on how to add a new column source depending on the origin query

SELECT code, url, 'TABLE_1' AS source
	FROM my_schema.table1
	WHERE code = $1
UNION ALL
SELECT code, url, 'TABLE_2' AS source
	FROM my_schema.table2
	WHERE code = $1

PgSQL Functions (Stored Procedure)

(This is an implementation example. For an explanation on this, please check my other post: SQL Triggers & Stored Procedures)

Function example

-- example of function that triggers when an entry is inserted into a table
--   and manages this data inserting data as needed in another table
CREATE OR REPLACE FUNCTION your_schema.my_function_name(arg1 character varying, data character varying)
	RETURNS character varying
	LANGUAGE plpgsql
AS $function$
BEGIN

-- check if already exists in the other table 
IF
	(SELECT COUNT(*) FROM your_schema.other_table WHERE name=arg1) > 0) THEN
		RETURN 'Error: 1210. data already exists';
END IF;

-- insert and manage data
INSERT INTO your_schema.other_table (name, data) VALUES (arg1, data);
RETURN 'Success: 1200';

END $function$;

How to Debug in DBeaver

There are two options, logs or break the function with an exception.

If logs are enough, you just write the message to output per console.

RAISE NOTICE 'this is null';

To see logs in DBeaver click here. Then you may execute the function to see the logs.

If you want to break the function runtime with an exception, you write the following instead

RAISE EXCEPTION SQLSTATE '90001' USING MESSAGE = 'error. this already exists';

SQL Views & Materialized Views

SQL View

A view in SQL is essentially a virtual table. It doesn’t store data physically. Instead it presents data from one or more underlying tables through a predefined SQL query. Think of it as a saved query that you can treat like a table.

  • Views don’t hold data themselves. When you query a view, the database executes the underlying query to fetch data in real-time.
  • Views can simplify complex queries by encapsulating them. Instead of writing a complex JOIN or subquery each time, you select from the view.
  • Views can restrict user access to specific rows or columns, enhancing security by exposing only necessary data.
  • Since views are generated on the fly, they always reflect the current state.
CREATE OR REPLACE VIEW active_customers AS
SELECT customer_id, name, email
FROM customers
WHERE status = 'active';

When to use a view

  • Use it to simplify complex queries that you use frequently and you need the most current data every time.
  • Restrict user access to specific data by exposing only certain columns or rows through a view

Materialized View

A materialized view is like a regular view, but it stores the query result’s phisically and it doesn’t involve executing the underlying query each time.

  • Since data is precomputed and stored, querying a materialized view is faster for complex queries over large datasets.
  • Because of this, data in a materialized view can become outdated and needs to be refreshed periodically.
CREATE OR REPLACE MATERIALIZED VIEW sales_summary AS
SELECT product_id, SUM(quantity) AS total_quantity
FROM sales
GROUP BY product_id;

When to use a materialized view

  • Is ideal for speeding up complex queries that are resource-intensive and slow to execute.
  • Suitable for scenarios where data doesn’t change frequently and fast read performance is needed.
  • You need to tolerate data that’s not always up-to-date

SQL Triggers & Stored Procedures

SQL Triggers

A SQL trigger is a code block that executes automatically when a specified event occurs on a table or view, such as an insert, update or delete.

A trigger can be used to perform actions before or after the event such as checking data integrity or spread data changes between tables.

Such an example would be to create a trigger that prevents users from inserting or updating data in a table if the data violates a rule, such as a maximum length or a required field.

CREATE TRIGGER trigger_name
BEFORE/AFTER event
ON table_name
FOR EACH ROW
BEGIN
	-- trigger code or call to procedure
END;

SQL Stored Procedures

A SQL Procedure is a code block that performs one or more tasks and can be called by other programs or queries. A procedure can accept parameters, return values and use variables, loops and conditional statements.

A procedure can be used to encapsulate complex logic or reuse code.

CREATE PROCEDURE procedure_name (parameters)
BEGIN
	-- procedure code
END;

Best practices

  • Use descriptive and consistent names.
  • Document your code with comments and explain the purpose and logic
  • Avoid using too many or complex procedures or functions. This may affect performance or reliability of your database. They really increase difficulty to follow an operation.

Reference(s)

https://www.linkedin.com/advice/3/how-do-you-use-sql-triggers-procedures-functions?lang=en

SQL Indexes

Indexes are a basic structure type that apply to one or multiple columns in order to improve performance and speed up queries that: filter, sort or join data for a table.

this may improve performance for a query that uses last_name in a WHERE clause or an ORDER BY

CREATE INDEX idx_last_name ON employees (last_name);

you can also create composite indexes

CREATE INDEX idx_composite ON employees (last_name, first_name)

and also composite index for only active employees

CREATE INDEX idx_active_employees ON employees (status) WHERE status = 'active';

When to use indexes

Frequent filters WHERE - if you usually filter by a specific column (last_name for example), an index should improve performance.

JOIN for big tables - when you use JOIN with big tables through PKs or FKs.

ORDER BY or GROUP BY - queries that search for / group by a specific column also benefit from indexes.

Best practices

Don’t create indexes in every column. This slows things down on insert, delete or update operations.

Index are best used for big domain fields such as ids, names, surnames. Don’t use them for male/female (or boolean) fields.

Keep indexes optimized: operations where you mass update or mass delete items in your tables may fragment your indexes. You may need to periodically check them and REINDEX them.

ALWAYS MEASURE PERFORMANCE TIME before and after the creation of that index. If your index doesn’t improve performance, remove it as it causes overhead.

Reference(s)

https://stackoverflow.com/questions/7744038/decision-when-to-create-index-on-table-column-in-database
https://stackoverflow.com/questions/52444912/how-to-find-out-fragmented-indexes-and-defragment-them-in-postgresql

Advanced SQL

UNION

The union sentence is used to accumulate results for two SELECT sentences.

SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2

We have the following tables

company1

per name surname
1 ANTONIO PEREZ
2 PEDRO RUIZ

company2

per name surname
1 LUIS LOPEZ
2 ANTONIO PEREZ

Read More

Azure CosmosDB Query Examples

This are examples of queries to use for Azure CosmosDB, as this database uses a restricted version of SQL and it doesn’t allow all queries or joins.

Partial queries

To be used with the main, partial query. This query starts with something such as SELECT * FROM c

WHERE c.ts LIKE "%2022/03/10%" 
ORDER BY c.ts DESC
WHERE c.type = "Writeable" 
AND c.ts LIKE "%2022/03/10, 15:2%" 
ORDER BY c.ts DESC

Full queries

Normal SQL queries to be used as custom queries. Note that CosmosDB doesn’t support a 100% of all SQL statements.

(count number of entries)

SELECT COUNT(1) FROM c 
	WHERE c.ts LIKE "%2022/03/10%" 
	ORDER BY c.ts DESC

Read More

MySQL user privileges

How to grant privileges for a database to a user, when both already exist.

In this case the database name will be project_node1, the user project_user and the password project_pass. All the following commands have to be executed as root or with a user with enough privileges.

-- list system users
SELECT user, host FROM mysql.user;

-- see current privileges
SHOW GRANTS FOR 'project_user'@'%';

-- delete all previous privileges (if needed)
-- REVOKE ALL PRIVILEGES ON `project_node1`.* FROM 'project_user'@'%';

-- grant new privileges and flush
GRANT ALL PRIVILEGES ON `project_node1`.* TO 'project_user'@'%';
FLUSH PRIVILEGES;

Reference(s)

https://serverfault.com/questions/115950/how-do-i-change-the-privileges-for-mysql-user-that-is-already-created

MongoDB

Config

Config. to run MongoDB

  1. Set PATH as system variable (~/.bashrc)
  2. Create folders /data/db and set rw- permissions to correct user.
  3. start mongod

Queries

Find

by element

db.getCollection('documentX').find({ _id : ObjectId("5b7e99a9149559198c5024a4") })

Read More

MySQL administration

Connection

Connect to DB from CLI

mysql -u {$user} -p    

User

See current logged user SELECT CURRENT_USER();

Change password when we know the old one
SET PASSWORD FOR 'root'@'localhost' = PASSWORD('myNewPassword');

Read More

SQL Cheat Sheet

DDL Queries

  • Show all dbs - SHOW DATABASES;
  • Select one - USE {$db};
  • Show tables in a db - SHOW TABLES;
  • Show description of DB - DESCRIBE {$db}
Create new table
CREATE TABLE table_name (
	id numeric,
	description text);

Read More