There are 2 applications: on flask, and on scrapy. Each of them is flooded into a separate lambda through zappa. The application faces have 3 endpoints, each of which is through SQS tiggerit scrap lambda. The trigger itself works fine, but there are 3 questions:
1) Is it possible to somehow remove the limit on the performance of lambda on scrapie? (I found an opportunity to increase the limit to only 15 minutes, during this time scrapy does not have time to collect all items)
2) Is it possible to flush through this sqm lambda without API Gateway through SQS, and whether it is possible to flood the application through zappa so that the api gateway is not created. Or do I need to fill in scrapes manually?
3) If you cannot trigger lambdas without API Gateway, then how can I return the correct response?
Now I have the following function:
def lambda_event(event, context):
try:
data = json.loads(event['body'])
scrapy_settings = get_project_settings()
scrapy_settings['ITEM_PIPELINES'] = {
'sunbiz_spiders.pipelines.DynamodbPipeline': 300,
}
scrapy_settings['DOWNLOAD_DELAY'] = 0.5
process = CrawlerProcess(settings=scrapy_settings)
if data['spider_name'] == 'SearchByPersonSpider':
spider = SearchByPersonSpider
elif data['spider_name'] == 'GetDetailSpider':
spider = GetDetailSpider
else:
spider = SearchByNameSpider
process.crawl(spider, search_params=data['spider_name'])
process.start()
except Exception:
pass
return {
'statusCode': 200,
'body': json.dumps('All done.'),
}
Config zappa:
{
"production": {
"app_function": "main.lambda_event",
"aws_region": "us-east-1",
"profile_name": "default",
"project_name": "sunbiz-search-s",
"runtime": "python3.6",
"s3_bucket": "zappa-envjkpiz6"
}
}
And when prompted I get list index out range werkzeug / test.py line 1146