Compare commits

...

2 commits

Author SHA1 Message Date
Jan Böhmer
91bf8371ad Show hint of google/gemini-2.5-flash-lite in placeholder
Some checks are pending
Build assets artifact / Build assets artifact (push) Waiting to run
Docker Image Build / build (linux/amd64, amd64, ubuntu-latest) (push) Waiting to run
Docker Image Build / build (linux/arm/v7, armv7, ubuntu-24.04-arm) (push) Waiting to run
Docker Image Build / build (linux/arm64, arm64, ubuntu-24.04-arm) (push) Waiting to run
Docker Image Build / merge (push) Blocked by required conditions
Docker Image Build (FrankenPHP) / build (linux/amd64, amd64, ubuntu-latest) (push) Waiting to run
Docker Image Build (FrankenPHP) / build (linux/arm/v7, armv7, ubuntu-24.04-arm) (push) Waiting to run
Docker Image Build (FrankenPHP) / build (linux/arm64, arm64, ubuntu-24.04-arm) (push) Waiting to run
Docker Image Build (FrankenPHP) / merge (push) Blocked by required conditions
Static analysis / Static analysis (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.2, mysql) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.3, mysql) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.4, mysql) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.5, mysql) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.2, postgres) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.3, postgres) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.4, postgres) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.5, postgres) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.2, sqlite) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.3, sqlite) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.4, sqlite) (push) Waiting to run
PHPUnit Tests / PHPUnit and coverage Test (PHP 8.5, sqlite) (push) Waiting to run
2026-05-03 16:54:06 +02:00
Jan Böhmer
3c9866e90d Improved AI extractor
It now gives better results and use less tokens
2026-05-03 16:50:46 +02:00
3 changed files with 13 additions and 3 deletions

View file

@ -117,7 +117,7 @@ This provider can be particularly useful for extracting information from website
It also potentially extracts more detailed information than the Generic Web URL Provider, as it is not limited to the fields defined in the Schema.org format.
To use the AI Web Extractor, you need to setup an AI platform, in the AI settings tab, and chose a model, which support structured output.
For many use cases a small and cheap model like `google/gemini-2.5-flash-lite` will be sufficient, coming down to costs like 0.003$ per request.
For many use cases a small and cheap model like `google/gemini-2.5-flash-lite` will be sufficient, coming down to costs like 0.001$ per request.
For more complex websites, or if you wanna use the LLM for translation purposes too, you should consider a more powerful model.
You can add some additional instructions for the model, which gets added to the system prompt, to tweak the output of the model.

View file

@ -32,6 +32,7 @@ use App\Services\InfoProviderSystem\DTOJsonSchemaConverter;
use App\Services\InfoProviderSystem\DTOs\PartDetailDTO;
use App\Settings\InfoProviderSystem\AIExtractorSettings;
use Brick\Schema\SchemaReader;
use Imagine\Image\Format;
use Jkphl\Micrometa;
use League\HTMLToMarkdown\HtmlConverter;
use Psr\Cache\CacheItemPoolInterface;
@ -174,7 +175,8 @@ final class AIWebProvider implements InfoProviderInterface
*/
private function extractStructuredData(string $html, string $url): string
{
$micrometa = new Micrometa\Ports\Parser();
//Only parse microdata, json-ld and rdfa, as they are the most common formats for structured data on product pages. Links and microformat only create clutter for the LLM
$micrometa = new Micrometa\Ports\Parser(Micrometa\Ports\Format::JSON_LD | Micrometa\Ports\Format::MICRODATA | Micrometa\Ports\Format::RDFA_LITE);
$items = $micrometa($url, $html);
return json_encode($items->toObject(), JSON_THROW_ON_ERROR);
@ -264,6 +266,9 @@ Rules:
- If information is not found, use null
- Try to avoid duplicating parameters, if the same parameter is mentioned multiple times, or if it is already used in another field.
- Include only the 1 to 3 most relevant images, such as the main product image or important diagrams. Ignore decorative images, logos, or icons.
- Extract GTIN / EAN if available, as it can be useful for matching parts across different sources, even if the part number is different.
- Include detailed product description into notes field, as it can contain important information that doesn't fit into other fields, such as features, applications, or unique selling points.
PROMPT;
if ($this->settings->outputLanguage === null) {

View file

@ -34,6 +34,7 @@ use Jbtronics\SettingsBundle\Settings\SettingsTrait;
use Symfony\AI\Platform\Capability;
use Symfony\Component\Form\Extension\Core\Type\LanguageType;
use Symfony\Component\Form\Extension\Core\Type\TextareaType;
use Symfony\Component\Translation\StaticMessage;
use Symfony\Component\Translation\TranslatableMessage as TM;
use Symfony\Component\Validator\Constraints\Language;
@ -51,7 +52,11 @@ class AIExtractorSettings
public ?AIPlatforms $platform = null;
#[SettingsParameter(label: new TM("settings.ips.ai_extractor.model"), description: new TM("settings.ips.ai_extractor.model.help"),
formType: AiModelsType::class, formOptions: ['platform_selector' => self::MODEL_SELECTOR_LABEL, 'filter_capability' => Capability::OUTPUT_STRUCTURED],
formType: AiModelsType::class, formOptions: [
'platform_selector' => self::MODEL_SELECTOR_LABEL, 'filter_capability' => Capability::OUTPUT_STRUCTURED,
'attr' => ['placeholder' => new StaticMessage('google/gemini-2.5-flash-lite')]
],
)]
public ?string $model = null;