Conversation
Yup, I think these were from the original/legacy Clip definition. We no longer publish those attributes from Spark, and we shouldn't have any more content like that AFAIK
This may have been an incorrect assumption on my behalf! I think I assumed the reason they were optional in cp-content-pipeline was because they came from the Clip embed (in which case it would be covered by the |
2419ba5 to
68bf2c6
Compare
| ```ts | ||
| type ClipAccessibility = { | ||
| captions?: ClipCaption[] | ||
| transcript?: Body |
There was a problem hiding this comment.
Thought: I can't recall why this is a body. I suspect it was because only the body support multiple paragraphs and Spark publish them because they were coming as that from 3PlayMedia. Do you know more @adgad ?
There was a problem hiding this comment.
from the original PR:
There is a challenge around transcripts, which is currently modelled as a nested body. In the current component in cp-content-pipeline-ui, it is expecting another RichText graphql type (which has a graphql-ish data structure with fields like raw, structured, references). I'm not really sure how we model that in content-tree, or if. If we need content-tree to be different to cp-content-pipeline (i.e. maintain a workaround), that would also mean the UI component itself isn't really transferable.
a. DECISION IRL - we should not replicate the graphql structure in content-tree, but instead make cp-content-pipeline work with this somehow. Some ideas below, but still a bit hazy.
There was a problem hiding this comment.
So I'm looking through spark clips now, and I can't see any evidence that it's sending XML/HTML for transcripts:
- The CAPI schema has it as a "string" type
- In the Spark Clips code it looks like it's grabbing the text blocks and concatting
- Checked a few recent clips and all the transcripts were plain text.
I wonder if maybe the automatic AI transcripts are coming through as text, but the 3play media ones may be HTML? 🤔 let me see if i can find one of those...
There was a problem hiding this comment.
OKay yes I think that's it - the professionally transcribed ones are still coming as HTML. Example: https://api.ft.com/content/8a3f67bc-3c86-4779-a4e4-fe93a8642e49
There was a problem hiding this comment.
Should we strip them at Spark level and avoid dodgy HTML? We will need to amend old data. It seems that in HTML it just add complexity to content pipeline for no valuable reason
SPEC.md
Outdated
| type ClipSource = { | ||
| binaryUrl: string | ||
| mediaType: string | ||
| audioCodec?: string | ||
| duration?: number | ||
| pixelHeight?: number | ||
| pixelWidth?: number | ||
| videoCodec?: string |
There was a problem hiding this comment.
Could/Should this be potentially a generic mediaSource shared amongst multiple audio/video players? I guess we can keep it like this for now and change it when needed. If the fields don't change it should not be a breaking change, or would it be? @adgad @debugwand
There was a problem hiding this comment.
6de42c5#diff-7426c9e3a694ca6015df5f98637912975f2edea23270203ee89a8bdeed246ee0R41 - this is what would make sense to me for future possible audio sources, while keeping the dataSource property name intact
There was a problem hiding this comment.
Your solution seems quite neat to me, what do you reckon @adgad?
taken from https://github.com/Financial-Times/content-tree/pull/90/changes - rebase would have been harder
some questions:
* what happened withposterAltandposterCredits? do they not actually exist? are they for legacy?these were for old clips
some things marked as optional in cp content pipeline (really everything except id and type) that are not optional here: is there a reason?the following things are marked as optional, matches existing data and content pipeline optional settings
clipsource
clip