Services | Spirelight | Speech Data Collection & Transcription

Speech training data built to your model requirements

We collect, record, transcribe, and quality-check custom speech datasets for AI training. Every project is matched to your required languages, speaker profiles, dialects, recording format, metadata, and background noise conditions, so your team receives clean, structured files ready for model development.

01 · Speech collection

Finding speakers and capturing audio to your exact specifications.

Remote · On-site DA · SV · NO · DE · FI Monologue · Dialogue Custom metadata

Contributors recruited by language, dialect, age, gender, and location.

Remote captures via our platform; on-site captures with calibrated rigs.

Prompts, device rules, and noise checks configured per project.

02 · Transcription

Machine-assisted or human-validated transcripts with annotation rules to your spec.

Manual · ASR-assisted Word timestamps Speaker IDs JSON · VTT · TXT

[00:01.420 → 00:01.890] spk_02 · word-level alignment ✓

Multi-pass review for high-stakes domains, single-pass for fast turnaround.

Overlap, accent, and domain terminology handled by humans.

03 · Dataset delivery

Audio, transcripts, metadata, and manifests packaged for your training pipeline.

Bucket · API handoff Naming conventions Manifest + checksums Consent linkage

/dataset/da-DK/spk_044/take_03.wav · sha256 · meta.json

Schema agreed up front; delivery matches your training format exactly.

Every utterance traceable to consent, contributor, and capture conditions.

04 · Quality auditing & oversight

Strict quality gates that guarantee clean, deployment-ready data.

In-production review Statistical sampling Batch gates Issue escalation

Reviewers inspect recordings while the project is live, not after.

Each batch passes a quality gate before it enters the final delivery.

Audio, transcript, metadata, and format checked against project specs.

Tell us the speech data you need

Send over your speaker profiles, language needs, and background noise conditions. Our team will design a custom recording plan and deliver a complete project workflow within two business days.

Speech training data built to your model requirements

Four phases, one partner, one timeline.

Finding speakers and capturing audio to your exact specifications.

Machine-assisted or human-validated transcripts with annotation rules to your spec.

Audio, transcripts, metadata, and manifests packaged for your training pipeline.

Strict quality gates that guarantee clean, deployment-ready data.

Speech collection

Transcription and annotation

Dataset delivery

Quality auditing and project oversight

Six controls running on every project.

In-production review

Measurable checks

Batch handling

Issue escalation

Consistency controls

Reviewer sampling

Three milestones from kickoff to delivery.

Align on data specifications.

Transparent progress via live batch delivery.

Ready to plug into your models.

Tell us the speech data you need

Speech training data built to your model requirements

Finding speakers and capturing audio to your exact specifications.

Machine-assisted or human-validated transcripts with annotation rules to your spec.

Audio, transcripts, metadata, and manifests packaged for your training pipeline.

Strict quality gates that guarantee clean, deployment-ready data.

Speech collection

Transcription and annotation

Dataset delivery

Quality auditing and project oversight

Six controls running on every project.

In-production review

Measurable checks

Batch handling

Issue escalation

Consistency controls

Reviewer sampling

Three milestones from kickoff to delivery.

Align on data specifications.

Transparent progress via live batch delivery.

Ready to plug into your models.

Tell us the speech data you need |

Tell us the speech data you need