Will data get duplicated upon re-running code?¶

LaminDB’s operations are idempotent in the sense defined here, which allows you to re-run code without duplicating data.

Records with name field

When you instantiate Record with a name, in case a name has an exact match in a registry, the constructor returns it instead of creating a new record. In case records with similar names exist, you’ll see them in a table: you can then decide whether you want to save the new record or pick an existing record.

If you set search_names to False, you bypass these checks.

Artifacts & collections

If you instantiate Artifact from data that already exists as an artifact, the Artifact() constructor returns the existing artifact based on a hash lookup.

# pip install 'lamindb[jupyter]'
!lamin init --storage ./test-idempotency

import lamindb as ln

ln.track("ANW20Fr4eZgM0000")

Records with name field¶

Let us add a first record to the ULabel registry:

label = ln.ULabel(name="My label 1").save()

If we create a new record, we’ll automatically get search results that give clues on whether we are prone to duplicating an entry:

label = ln.ULabel(name="My label 1a")

Show code cell output Hide code cell output

! record with similar name exists! did you mean to load it?

	uid	name	is_type	description	reference	reference_type	space_id	type_id	run_id	created_at	created_by_id	_aux	_branch_code
id
1	IqLEVMTt	My label 1	False	None	None	None	1	None	1	2025-03-10 11:52:00.136000+00:00	1	None	1

Let’s save the 1a label, we actually intend to create it.

label.save()

In case we match an existing name directly, we’ll get the existing object:

label = ln.ULabel(name="My label 1")

If we save it again, it will not create a new entry in the registry:

label.save()

ULabel(uid='IqLEVMTt', name='My label 1', is_type=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-03-10 11:52:00 UTC)

Now, if we create a third record, we’ll get two alternatives:

label = ln.ULabel(name="My label 1b")

! records with similar names exist! did you mean to load one of them?

	uid	name	is_type	description	reference	reference_type	space_id	type_id	run_id	created_at	created_by_id	_aux	_branch_code
id
1	IqLEVMTt	My label 1	False	None	None	None	1	None	1	2025-03-10 11:52:00.136000+00:00	1	None	1
2	vubDzkz6	My label 1a	False	None	None	None	1	None	1	2025-03-10 11:52:00.199000+00:00	1	None	1

If we prefer to not perform a search, e.g. for performance reasons, we can switch it off.

ln.settings.creation.search_names = False
label = ln.ULabel(name="My label 1c")

Switch it back on:

ln.settings.creation.search_names = True

Artifacts & collections¶

filepath = ln.core.datasets.file_fcs()

Create an Artifact:

artifact = ln.Artifact(filepath, key="my_fcs_file.fcs").save()

Create an Artifact from the same path:

artifact2 = ln.Artifact(filepath, key="my_fcs_file.fcs")

It gives us the existing object:

assert artifact.id == artifact2.id
assert artifact.run == artifact2.run
assert not artifact._subsequent_runs.exists()

If you save it again, nothing will happen (the operation is idempotent):

artifact2.save()

In the hidden cell below, you’ll see how this interplays with data lineage.