I am using defuddle and specifically the YoutubeExtractor functionalities for a project. I wish for the possibility to prevent the grouping of transcript segments. I find the others could find this functionality beneficial.
If it sounds okay, I'm willing to create a PR for this.
Specifically, the implementation I thought of would be something like:
type.ts
interface DefuddleOptions {
// ...existing options
// Prevent YoutubeExtractor to group transcript segments
// Default to false
preserveTranscriptSegments?: boolean;
}
or, being a YouTube specific option:
export interface DefuddleOptions {
// ...existing options
extractors?: {
youtube?: {
// Prevent YoutubeExtractor to group transcript segments
// Default to false
preserveTranscriptSegments?: boolean;
};
};
}
defuddle.ts
const extractor = ExtractorRegistry.findPreferredAsyncExtractor(this.doc, url, schemaOrgData, this.options.extractors);
youtube.ts
export class YoutubeExtractor extends BaseExtractor {
private preserveTranscriptSegments: boolean;
constructor(document: Document, url: string, schemaOrgData?: any, options?: { preserveTranscriptSegments?: boolean }) {
super(document, url, schemaOrgData);
this.videoElement = document.querySelector('video');
this.schemaOrgData = schemaOrgData;
this.preserveTranscriptSegments = options?.preserveTranscriptSegments ?? false;
}
// ...
}
private groupTranscriptSegments(segments: { start: number; text: string }[]): { start: number; text: string; speakerChange: boolean; speaker?: number }[] {
if (segments.length === 0) return [];
if (this.preserveTranscriptSegments) {
return segments.map(seg => ({
start: seg.start,
text: seg.text,
speakerChange: false,
}));
}
// ...existing logic
}
Thank you for your time.
I am using defuddle and specifically the
YoutubeExtractorfunctionalities for a project. I wish for the possibility to prevent the grouping of transcript segments. I find the others could find this functionality beneficial.If it sounds okay, I'm willing to create a PR for this.
Specifically, the implementation I thought of would be something like:
type.ts
or, being a YouTube specific option:
defuddle.ts
youtube.ts
Thank you for your time.